function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
sp08830sp08830 

Reading contents of outlook message (.msg)

 

I need to parse the content of an attachment on client record and look for a keyword. I am trying to do this in a after insert trigger on Attachment without success so far. The attachement is an outlook email saved as .msg file. 

 

Here is partial code. 

 

for (Attachment E : trigger.new) { 

String parentObj = E.ParentID;

blob attBody = EncodingUtil.base64Decode(E.Body.toString()) ;

String aContent = aBody.toString();

 

 

Wondering is it possible read contents of a .msg attachement. I was able to read text and csv attachments. 

 

I would really appreaciate some sample code or any other help on this. 

sfdcfoxsfdcfox
I did some brief research for you on this subject, thinking it would be easy to parse, just like a .eml file (such as those produced by Thunderbird or older Outlook programs). I found out this is not the case.

.Msg files are in Microsoft Compound File Binary format (MS-CFB). To put it bluntly, it is basically a disk image in a file, where folders are called "storages," and files are called "streams." This means that just to decipher the data, you have read a binary header that describes the "files" and "folders" within the file, then seek to that binary offset and decode the information.

The only fortunate thing about this is the data can be found through indexes, all elements are of fixed or readable sizes, and are byte-aligned. I would anticipate, though, that this exercise is not going to be worth the effort for the end result. It is also possible that complex files might exceed the governor limits.

Outlook files appear to use storages and streams to store all their data, and are intermingled with binary and textual data. I wish you the best of luck if you'd like to attempt this.

Here's the relevant data:

http://www.fileformat.info/format/outlookmsg/

http://msdn.microsoft.com/en-us/library/dd942138.aspx

sp08830sp08830

Thanks for your research and effort. Is there a way to read the email header at least? say subject of the email. 

sfdcfoxsfdcfox
Reading any of the data requires potentially parsing the entire file from front to back. There's no magic key that you can look for, no identifier that guarantees a quick seek. Just trying to get a single header is at minimum dozens of lines of code, and probably around 10 lines of script statement per 3 binary bytes to search through, limiting the usefulness to files around 60,000 bytes or less.
sp08830sp08830

sfdcfox, Thanks for your time and effort. Really appreciate it.

 

One thing I didn't understand, when I print the string using system.debug more than 90% of the email body is printed as plain text into debug log. Why I am not able to read the content via apex...

sfdcfoxsfdcfox

Even though all the text is there, in plain sight, it's not as obvious to a computer program without going through the motions of parsing the file. Let me remind you that this file is actually a disk in a file, meaning it has a folder structure with file names, and so on. This is unlike other mail programs that store the raw message headers and information as a "replayable text file."

 

If you were trying to parse a Mozilla Thunderbird email, it would be similar to parsing a regular CSV file. Split the file by newlines, stitch together multi-line headers, observe boundaries, etc. It's a very easy, plain-text format that's easy to parse. This is the technical equivlance to writing a program in C++ and reading a plain text file.

 

However, in a .MSG file, you have to parse the main FAT (File Allocation Table) entry, which points to other FAT entries, which points to "directory chains" called "stores," which points to "files." Also, each "file" and "directory" may split up into pieces, and these pieces may be organized any way that the program likes. While it is basically guaranteed that the pieces will be in some predefined order, their position within the file is unpredictable without parsing the entire file.

 

This is roughly the same difficulty as writing a device driver to read files from a storage device. It's an entire magnitude of difficulty higher. Not to mention, the only way to parse it correctly is through a ton of binary manipulation, so you're looking at a high probability of exceeding governor limits for any decent sized email.

sp08830sp08830

Thanks for the insight.

varalaksshmi rajendranvaralaksshmi rajendran
Hi,

Have you got any solution for reading the content from .msg file. I'm working with a similar functionality which retrives FromID, ToID , Subject of the mail and body of the mail. Please help if you have any solution for this.