+ Start a Discussion
Rahul Sangwan7341Rahul Sangwan7341 

Parse PDF to String issue

Hi I have written code for integration with sendgrid.
Now when i am sending a text or CSV file it is working fine but if i am sending PDF,JPEG,Word file it is showing error while i am setting attachment in URL parameter only.
//Code of webservice class to set file as attachment
            for (String att: attachmentMap.keySet()) {
                String content = attachmentMap.get(att).toString();
                body += '&files[' + att + ']=' + EncodingUtil.urlEncode(content, 'UTF-8');

Document doc = [Select Id, Name, body from Document where id='0159000000Bq3oI'];
List<String> toEmail = new List<String>();
List<String> ccEmail = new List<String>();
List<String> bccEmail = new List<String>();
toEmail.add('test@gmail.com , test To Name');

String fromEmail = test1@gmail.com';
String subject = 'test subject';
String bodyText = 'test body';
String bodyHtml = '<h1>test body</h1><br/>tested';
String fromName = 'Sangwan';
String replyto  = '';   
String extheaders = null;
Map<String,Blob> attachmentMap = new Map<String,Blob>();
//Blob att = doc.body;
attachmentMap.put(doc.Name, doc.body);
SendGripApiClass.sendEmail(toEmail, ccEmail, bccEmail, fromEmail, subject, bodyText, bodyHtml, fromName,
                          replyto, extheaders, attachmentMap);
NagaNaga (Salesforce Developers) 
Hi Rahul,

What you're trying to do is view the bytes of the attachment coverted into displayable characters where possible.   What opening it in a text editor is doing for you is ignoring the binary characters that don't make sense as display characters.   Unfortunately if you try and get there using apex, salesforce will stop you short since there's no way to tell it to ignore binary values that aren't character display data.   Instead of ignoring them, it will give you an error instead when it encounters them.  For example, you could try and convert the Attachment's Body field to a String directly using:

String pdf = attachment.Body.toString();

but it's going to give you an error saying the BLOB (binary data) is "not a valid UTF-8 string".   What it means by this is some of the values it's coming across in the binary data don't match to any character it could put in a String (Salesforce uses "UTF-8" string encoding) - so it rejects the whole lot.   Your text editor on the other hand just replaces these with whitespace and lets you view the valid ones.

You can turn binary data into a form that can be shown in a String but that's what you have already, and as you've found, in order to turn all the binary data into valid display characters it encodes the whole thing making it unreadable for your purposes.

It would be great if you were able to step through the binary data in the Attachment Body one byte at a time, extract the ones that are normal characters and ignore the others.   But this isn't possible in apex.

The convoluted workaround that some have done involves turning the binary data into encoded base64 format (as you have).   But then hand-rolling a base64 decoder which will pick through the encoded string piece by piece and give you access to the individual decoded byte values.    You would then be able to add your own logic to determine if the byte is in the normal ascii character range (0 to 128)  (This appears to be how the values are stored in the first part of a PDF)

Here's a stackexchange question where the accepted answer delves into this a bit: http://salesforce.stackexchange.com/questions/860/mimic-mysql-aes-encrypt-in-apex/910#910

Best Regards
Naga Kiran