function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
JO_DevJO_Dev 

How can I use a PDF Parser in Apex?

Are there any good examples of using a PDF Parser in Apex?
Vasani ParthVasani Parth
If you mean you want to look at the bytes that make up the file using Apex code you can't directly. You can turn it into a base64 string using EncodingUtil.base64Encode but as the string characters then don't align with the byte boundaries it is very hard work to do anything useful (and you are likely to run into CPU and heap governor limits).

AFAIK,Salesforce does not contain PDF Parsing library. So,as of now, it is not possible to read through pdf. 

Please mark this as the best answer if this helps
Moritz DausingerMoritz Dausinger
Extracting data from PDF can be tricky and I don't think that Apex offers the possibility to read PDF documents. PDF parsing comes especially difficult if you want to extract specific data fields and not just the whole text. Unlike HTML, the PDF standard does not include structural tags like <h1> or <table> which makes the data extraction process more difficult.

Our app Docparser (https://docparser.com/blog/pdf-salesforce-integration/) comes however with a Salesforce integration. You can for example post PDF files from Salesforce to Docparser, extract certain data fields and then post the data back to Salesforce. Happy to answer your questions!
FlorSFFlorSF

There is a free api at https://pdf-to-text-converter.p.rapidapi.com/api/pdf-to-text/convert , you just need to register and use your API Key in apex as follows:

public static string parsePdf(Attachment file){

		String boundary = 'A_RANDOM_STRING';

		// header
		String header = '--' + boundary + '\nContent-Disposition: form-data; name="file"; filename="' + file.Name + '"\nContent-Type: multipart/form-data;'+'\nnon-svg='+True;
		String headerEncoded;
		do
		{
			header += ' ';
			headerEncoded = EncodingUtil.base64Encode(Blob.valueOf(header + '\r\n\r\n'));
		}
		while(headerEncoded.endsWith('='));

		// body
		String footer = '--' + boundary + '--';
		String bodyEncoded = EncodingUtil.base64Encode(file.Body);

		if (bodyEncoded.endsWith('==')) 
		{
			bodyEncoded = bodyEncoded.substring(0, bodyEncoded.length()-2) + '0K';
		}
		else if(bodyEncoded.endsWith('=')) 
		{
			bodyEncoded = bodyEncoded.substring(0, bodyEncoded.length()-1) + 'N';
			footer = '\n' + footer;           
		}
		else
		{
			footer = '\r\n' + footer;
		}

		String footerEncoded = EncodingUtil.base64Encode(Blob.valueOf(footer));
		Blob bodyBlob = EncodingUtil.base64Decode(headerEncoded + bodyEncoded + footerEncoded);
		System.debug('bodyBlob.size()' + bodyBlob.size());

		// send
		HttpRequest req = new HttpRequest();
		req.setHeader('Content-Type', 'multipart/form-data; boundary=' + boundary);
		req.setHeader('X-RapidAPI-Key', API_KEY);
		req.setHeader('X-RapidAPI-Host', 'pdf-to-text-converter.p.rapidapi.com');
		req.setEndpoint('https://pdf-to-text-converter.p.rapidapi.com/api/pdf-to-text/convert');

		req.setMethod('POST');
		req.setBodyAsBlob(bodyBlob);
		req.setHeader('Content-Length', String.valueof(req.getBodyAsBlob().size()));
		Http http = new Http();
		HTTPResponse res;
		try{
			res = http.send(req);
		} catch(Exception e){
			system.debug('+-+ error making request: '+e.getMessage());
		}
		
		return res.getBody();
	}