+ Start a Discussion

get content of .docx attachment


I am trying get the content of the attachment using APEX.
For this I have used -
    string myEncodedString = EncodingUtil.base64Encode(mailAttachment.Body);
        system.debug('myEncodedString +++++++ ' + myEncodedString);
        string attBody = EncodingUtil.base64Decode(myEncodedString).toString();

With this I am able get data for csv and .txt(UTF-8 enconded) file but I am not abl read the contents of .docx, .xlsx, .pdf etc
The error I get is 'BLOB is not a valid UTF-8 string'.
After a bit of research I found that, though formats like .docx are encoded in UTF-8 but they are archieved formats (similar to zip).

So can anybody help me to get the content of these formats.

Help would be appriciated.

You may be able to convert the blob to hex [1] and then do something with that hex blob.  The problem will then be extracting the zip to get to the xml contents inside of the file.  I do not know of a way to do this in Apex, and I would not recommend even trying since you will be severly limited by the CPU governor.

[1] https://developer.salesforce.com/forums/?id=906F0000000AYFTIA4 
Thanks pcon.
I had already tried this solution, but I am getting only special characters in it (similar to Firoz Khan in the link pprovided by you).