How can I use a PDF Parser in Apex?

Are there any good examples of using a PDF Parser in Apex?

April 17, 2016
·
Answer
·
Like
0
·
Follow
2

Vasani Parth
If you mean you want to look at the bytes that make up the file using Apex code you can't directly. You can turn it into a base64 string using EncodingUtil.base64Encode but as the string characters then don't align with the byte boundaries it is very hard work to do anything useful (and you are likely to run into CPU and heap governor limits).

AFAIK,Salesforce does not contain PDF Parsing library. So,as of now, it is not possible to read through pdf.

Please mark this as the best answer if this helps

April 18, 2016
·
Like
0
·
Dislike
0

Moritz Dausinger
Extracting data from PDF can be tricky and I don't think that Apex offers the possibility to read PDF documents. PDF parsing comes especially difficult if you want to extract specific data fields and not just the whole text. Unlike HTML, the PDF standard does not include structural tags like <h1> or <table> which makes the data extraction process more difficult.

Our app Docparser (https://docparser.com/blog/pdf-salesforce-integration/) comes however with a Salesforce integration. You can for example post PDF files from Salesforce to Docparser, extract certain data fields and then post the data back to Salesforce. Happy to answer your questions!

May 9, 2017
·
Like
0
·
Dislike
0

FlorSF
There is a free api at https://pdf-to-text-converter.p.rapidapi.com/api/pdf-to-text/convert , you just need to register and use your API Key in apex as follows:

public static string parsePdf(Attachment file){ String boundary = 'A_RANDOM_STRING'; // header String header = '--' + boundary + '\nContent-Disposition: form-data; name="file"; filename="' + file.Name + '"\nContent-Type: multipart/form-data;'+'\nnon-svg='+True; String headerEncoded; do { header += ' '; headerEncoded = EncodingUtil.base64Encode(Blob.valueOf(header + '\r\n\r\n')); } while(headerEncoded.endsWith('=')); // body String footer = '--' + boundary + '--'; String bodyEncoded = EncodingUtil.base64Encode(file.Body); if (bodyEncoded.endsWith('==')) { bodyEncoded = bodyEncoded.substring(0, bodyEncoded.length()-2) + '0K'; } else if(bodyEncoded.endsWith('=')) { bodyEncoded = bodyEncoded.substring(0, bodyEncoded.length()-1) + 'N'; footer = '\n' + footer; } else { footer = '\r\n' + footer; } String footerEncoded = EncodingUtil.base64Encode(Blob.valueOf(footer)); Blob bodyBlob = EncodingUtil.base64Decode(headerEncoded + bodyEncoded + footerEncoded); System.debug('bodyBlob.size()' + bodyBlob.size()); // send HttpRequest req = new HttpRequest(); req.setHeader('Content-Type', 'multipart/form-data; boundary=' + boundary); req.setHeader('X-RapidAPI-Key', API_KEY); req.setHeader('X-RapidAPI-Host', 'pdf-to-text-converter.p.rapidapi.com'); req.setEndpoint('https://pdf-to-text-converter.p.rapidapi.com/api/pdf-to-text/convert'); req.setMethod('POST'); req.setBodyAsBlob(bodyBlob); req.setHeader('Content-Length', String.valueof(req.getBodyAsBlob().size())); Http http = new Http(); HTTPResponse res; try{ res = http.send(req); } catch(Exception e){ system.debug('+-+ error making request: '+e.getMessage()); } return res.getBody(); }

September 13, 2022
·
Like
0
·
Dislike
0

You need to sign in to do that.

Need an account? Sign Up

Have an account? Sign In

Dismiss

Browse by Topic

Welcome to Support!

Show

sorted by

How can I use a PDF Parser in Apex?

You need to sign in to do that.