+ Start a Discussion
Sanchit DuaSanchit Dua 

Creation of Batches from CSV resulting in erroneous state

I'm trying to make batches from a CSV file written using CSVWriter of opencsv as:
CSVWriter writer = new CSVWriter(new FileWriter(filePath+createFileName), ',', CSVWriter.DEFAULT_QUOTE_CHARACTER);

And BufferedReader to read the written file. The Csv file is written and I think read operation also goes well. So, far its working good. But when I chose particular data to be written to Csv using the same operations, creation of batches comes under error out of it.
An Exception is coming stating "Failed to parse CSV. Found unescaped quote. A value with quote should be within a quote" which is making the Application to not behave in a manner expected.

After going through this error it seems there's some ""(double quote) or "(double quote) symbol present in the data. (I 've the data in form of "asdf","1.0","",,"def").
As far as my understanding I tried to apply Regex to find double quotes but couldn't find any, as after examining the file it doesn't contain the repeated double quotes. The link I followed is:http://stackoverflow.com/questions/3180842/regular-expression-to-find-and-replace-unescaped-non-successive-double-quotes-in

Thereafter in the code, I'm making use of: File tmpFile = File.createTempFile("bulkAPIInsert", ".csv"); to hold the data in a temporary file and then deleting it.

After replacing the above code with the following I somehow handled the coming exception but it futher lead to another one stating "Failed to parse CSV. EOF reached before closing an opened quote".
File tmpFile = new File("bulkAPIInsert.csv");

I don't think the above workaround should be followed as it would be performance issues with the application.

By going through the CSVReader class I found a custom exception defined stating exactly the same Exception as I got. But I think it comes when a double quote is found within some double qoute (the cell value of CSV File). I referred the link as: https://github.com/mulesoft/salesforce-connector/blob/master/src/main/java/com/sforce/async/CSVReader.java

Can anybody suggest me where I'm doing wrong or any workaround for this Problem?

I'm sharing you the code snippet as:
Method1 then Method2 is called.

 

Method1: private List<BatchInfo> createBatchesFromCSVFile(RestConnection connection,
			JobInfo jobInfo, String csvFileName) throws Exception {
		List<BatchInfo> batchInfos = new ArrayList<BatchInfo>();
		BufferedReader rdr = new BufferedReader(new InputStreamReader(
				new FileInputStream(csvFileName)));

		// read the CSV header row
		String hdr = rdr.readLine();
		byte[] headerBytes = (hdr + "\n").getBytes("UTF-8");
		int headerBytesLength = headerBytes.length;
//      I was making use of the following code which I replaced with the next line of code.
//		File tmpFile = File.createTempFile("bulkAPIInsert", ".csv");
		File tmpFile = new File("bulkAPIInsert.csv");
		// Split the CSV file into multiple batches
		try {
			FileOutputStream tmpOut = new FileOutputStream(tmpFile);
			int maxBytesPerBatch = 10000000; // 10 million bytes per batch
			int maxRowsPerBatch = 10000; // 10 thousand rows per batch
			int currentBytes = 0;
			int currentLines = 0;
			String nextLine;

			while ((nextLine = rdr.readLine()) != null) {
				byte[] bytes = (nextLine + "\n").getBytes("UTF-8"); //TODO
				if (currentBytes + bytes.length > maxBytesPerBatch
						|| currentLines > maxRowsPerBatch) {
					createBatch(tmpOut, tmpFile, batchInfos, connection, jobInfo);
					currentBytes = 0;
					currentLines = 0;
				}
				if (currentBytes == 0) {
					tmpOut = new FileOutputStream(tmpFile);
					tmpOut.write(headerBytes);
					currentBytes = headerBytesLength;
					currentLines = 1;
				}
				tmpOut.write(bytes);
				currentBytes += bytes.length;
				currentLines++;
			}

			if (currentLines > 1) {
				createBatch(tmpOut, tmpFile, batchInfos, connection, jobInfo);
			}
		} finally {
			if(!tmpFile.delete())
				tmpFile.deleteOnExit();
			rdr.close();
		}
		return batchInfos;
	}

/**
	 * Wait for a job to complete by polling the Bulk API.
	 */
	Method2: private void awaitCompletion(RestConnection connection, JobInfo job,
			List<BatchInfo> batchInfoList) throws AsyncApiException { 
		try{
			/****
			Some code
			**/
				BatchInfo[] statusList = connection.getBatchInfoList(job.getId())
				.getBatchInfo();
				for (BatchInfo b : statusList) {
					if (b.getState() == BatchStateEnum.Completed) {
						if (incomplete.remove(b.getId())) 
							//Do Something
					}
					else if(b.getState() == BatchStateEnum.Failed){ 

						System.out.println("Reason: "+b.getStateMessage()+".\n  " +
								"Number of Records Processed: "+b.getNumberRecordsProcessed());
						throw (new Exception(""));
					}
				}
			}
		}catch(Exception ex){log.debug(" Exception occurred.");}
	}

 The getStateMessage() method of BatchInfo gives the discussed error messages.

Best Answer chosen by Admin (Salesforce Developers) 
Sanchit DuaSanchit Dua

The problem has been resolved by removing line-breaks for each cell. 

Answer

All Answers

Sanchit DuaSanchit Dua

I tried deleting some csv records from bottom, up-till a point came when I delete that record, a batch gets created with "numberRecordsProcessed=0" and it waits saying "waiting results-1" and then nothing happens. The record also doesn't look malicious.. Its something like "3745","TEST TEST1 
12345 TEST1 TEST 
TEST, TEST 43215","DPPI-3745" 
And if I don't delete this record the error message still comes stating "Failed to parse CSV. Found unescaped quote. A value with quote should be within a quote"

Sanchit DuaSanchit Dua

The problem has been resolved by removing line-breaks for each cell. 

Answer

This was selected as the best answer
Shreyas Dhond 16Shreyas Dhond 16

@Sanchit Can you please elaborate on how you solved the issue? We are facing the same issue with a csv file we have generated but not able to figure out which data is causing the error. Thanks!