You need to sign in to do that
Don't have an account?
sp08830
Reading contents of outlook message (.msg)
I need to parse the content of an attachment on client record and look for a keyword. I am trying to do this in a after insert trigger on Attachment without success so far. The attachement is an outlook email saved as .msg file.
Here is partial code.
for (Attachment E : trigger.new) { String parentObj = E.ParentID; blob attBody = EncodingUtil.base64Decode(E.Body.toString()) ; String aContent = aBody.toString();
Wondering is it possible read contents of a .msg attachement. I was able to read text and csv attachments.
I would really appreaciate some sample code or any other help on this.
.Msg files are in Microsoft Compound File Binary format (MS-CFB). To put it bluntly, it is basically a disk image in a file, where folders are called "storages," and files are called "streams." This means that just to decipher the data, you have read a binary header that describes the "files" and "folders" within the file, then seek to that binary offset and decode the information.
The only fortunate thing about this is the data can be found through indexes, all elements are of fixed or readable sizes, and are byte-aligned. I would anticipate, though, that this exercise is not going to be worth the effort for the end result. It is also possible that complex files might exceed the governor limits.
Outlook files appear to use storages and streams to store all their data, and are intermingled with binary and textual data. I wish you the best of luck if you'd like to attempt this.
Here's the relevant data:
http://www.fileformat.info/format/outlookmsg/
http://msdn.microsoft.com/en-us/library/dd942138.aspx
Thanks for your research and effort. Is there a way to read the email header at least? say subject of the email.
sfdcfox, Thanks for your time and effort. Really appreciate it.
One thing I didn't understand, when I print the string using system.debug more than 90% of the email body is printed as plain text into debug log. Why I am not able to read the content via apex...
Even though all the text is there, in plain sight, it's not as obvious to a computer program without going through the motions of parsing the file. Let me remind you that this file is actually a disk in a file, meaning it has a folder structure with file names, and so on. This is unlike other mail programs that store the raw message headers and information as a "replayable text file."
If you were trying to parse a Mozilla Thunderbird email, it would be similar to parsing a regular CSV file. Split the file by newlines, stitch together multi-line headers, observe boundaries, etc. It's a very easy, plain-text format that's easy to parse. This is the technical equivlance to writing a program in C++ and reading a plain text file.
However, in a .MSG file, you have to parse the main FAT (File Allocation Table) entry, which points to other FAT entries, which points to "directory chains" called "stores," which points to "files." Also, each "file" and "directory" may split up into pieces, and these pieces may be organized any way that the program likes. While it is basically guaranteed that the pieces will be in some predefined order, their position within the file is unpredictable without parsing the entire file.
This is roughly the same difficulty as writing a device driver to read files from a storage device. It's an entire magnitude of difficulty higher. Not to mention, the only way to parse it correctly is through a ton of binary manipulation, so you're looking at a high probability of exceeding governor limits for any decent sized email.
Thanks for the insight.
Have you got any solution for reading the content from .msg file. I'm working with a similar functionality which retrives FromID, ToID , Subject of the mail and body of the mail. Please help if you have any solution for this.