function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
lucky41lucky41 

Parse the incoming dynamic html email content and populate them in the fields

Hello Every One,

 

I am having difficulties in parsing the incoming email alert. I tried using email services and apex behind to parse the content. The incoming email alert is as follows:

 

Data Integrity Reporting

Description:

The following is the user's data.

 

Result:

 

ContactName      ContactPhone        ContractID           Description        CreatedDT      ContractNumber

 

Testname 1        123-456-7890          132324                 test1                    05/13/2013       CT-1442

Testname 2        124-421-4124          124242                 test2                    05/14/2013       CT-1344

Testname 3        421-323-3242          421442                 test3                    05/15/2013       CT-1332

 

Additional Info:

Alarm execution took 1 seconds

 

 

But I am dealing with a different scenario here. In the above html email, the rows under "Result" are not constant and it varies when the incoming is sent. For Example, the incoming html email can contain 1 row when the email is received to the salesforce or zero rows or 3 rows and so on. The columns are constant.

 

Can any one suggest me how to process this kind of scenario. I would really appreciate you.

 

Thanks

Best Answer chosen by Admin (Salesforce Developers) 
sfdcfoxsfdcfox

I managed to shoehorn your code into a working prototype. I had to add an character entity definition to the XML, and I had to further strip out hr and br elements that were problematic. There's still some parsing errors, but I decided to ignore them for now because they are irrelevant for the demonstration.

 

Usage:

 

String[] tableRows = Scanner.scan(sourceText);

 Result (from test data):

 

ContactName;ContactEmail;ContactPhone;ContactID;Description;CreateDT;ContractNumber;
testname1;test1@gmail.com;1323737698;41452;test 1 desc;5/14/2013 8:58:42 PM;CT-285;
testname2;testname2@gmail.com;424-962-7311;14423;test2 desc;5/13/2013 9:01:21 PM;CT-2858;

You could use multidimensional array, but I was lazy and just made it ordinary delimited strings.

 

 Edit: And the source code...

 

@istest
class ScannerTest{
	@istest
        static void test() {
            String[] results = Scanner.scan('<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><meta name="Generator" content="Microsoft Word 14 (filtered medium)"><style><!--/* Font Definitions */@font-face{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}/* Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New Roman","serif";}h3{mso-style-priority:9;mso-style-link:"Heading 3 Char";mso-margin-top-alt:auto;margin-right:0in;mso-margin-bottom-alt:auto;margin-left:0in;font-size:13.5pt;font-family:"Times New Roman","serif";}a:link, span.MsoHyperlink{mso-style-priority:99;color:blue;text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed{mso-style-priority:99;color:purple;text-decoration:underline;}span.EmailStyle17{mso-style-type:personal-compose;font-family:"Calibri","sans-serif";color:windowtext;}span.Heading3Char{mso-style-name:"Heading 3 Char";mso-style-priority:9;mso-style-link:"Heading 3";font-family:"Times New Roman","serif";font-weight:bold;}.MsoChpDefault{mso-style-type:export-only;font-family:"Calibri","sans-serif";}@page WordSection1{size:8.5in 11.0in;margin:1.0in 1.0in 1.0in 1.0in;}div.WordSection1{page:WordSection1;}--></style><!--[if gte mso 9]><xml><o:shapedefaults v:ext="edit" spidmax="1026" /></xml><![endif]--><!--[if gte mso 9]><xml><o:shapelayout v:ext="edit"><o:idmap v:ext="edit" data="1" /></o:shapelayout></xml><![endif]--></head><body lang="EN-US" link="blue" vlink="purple"><div class="WordSection1"><h3 align="center" style="text-align:center"><span style="color:#FF6600">Data Integrity Reporting</span><o:p></o:p></h3><p class="MsoNormal"><b><span style="color:#FF6600">Description:</span></b><br>The following is the user\'s data<br><br><b><span style="color:#FF6600">Result:</span></b> <o:p></o:p></p><table class="MsoNormalTable" border="1" cellpadding="0"><tbody><tr><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactName<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactEmail<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactPhone<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactID<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>Description<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>CreateDT<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContractNumber<o:p></o:p></b></p></td></tr><tr><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">testname1<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal"><a href="mailto:test1@gmail.com">test1@gmail.com</a><o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">1323737698<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">41452<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">test 1 desc<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">5/14/2013 8:58:42 PM<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">CT-285<o:p></o:p></p></td></tr><tr><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">testname2<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal"><a href="mailto:testname2@gmail.com">testname2@gmail.com</a><o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">424-962-7311<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">14423<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">test2 desc<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">5/13/2013 9:01:21 PM<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">CT-2858<o:p></o:p></p></td></tr></tbody></table><p class="MsoNormal"><br><b><span style="color:#FF6600">Additional Info:</span></b><br>Alarm execution took 1 seconds <o:p></o:p></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;"><o:p>&nbsp;</o:p></span></p></div></body></html>');
                                            System.debug(results);
        }
}

 

public class Scanner {
    public static String[] scan(String source) {
        string src='<!DOCTYPE html [<!ENTITY nbsp " ">]>'+source;
		String[] values = new String[0];
        src=src.replaceAll('<(br|hr)>','');
        System.debug(System.loggingLevel.Error,src);
        XmlStreamReader r = new XmlStreamReader(src);
        Integer retry = 0;
        Boolean inRow = false, inCol = false;
        while(r.hasNext() && retry < 3) {
            try {
            	r.next();
                retry = 0;
                if(r.geteventtype()==xmltag.start_element && r.getlocalname()=='tr') { 
                    values.add('');
                    inRow = true;
                }
                if(r.geteventtype()==xmltag.end_element && r.getlocalname()=='tr') {
                    inrow = incol = false;
                }
                if(r.geteventtype()==xmltag.start_element && r.getlocalname()=='td') {
                    incol = true;
                }
                if(r.geteventtype()==xmltag.end_element && r.getlocalname()=='td') {
                    incol = false;
                    if(!values.isempty())
                        values[values.size()-1]+=';';
                }
                if(inRow&&inCol&&r.geteventtype()==xmltag.characters) {
                    values[values.size()-1]+=r.gettext();
                }
            } catch(exception e) {
                retry++;
            }
        }
    	return values;
    }
    
}

I apologize for the "brute force" appearance of this code, I just don't have a lot of time on my hands for code this size. This sample should work well for you, though. See the documentation on XmlStreamReader for more details.

All Answers

sfdcfoxsfdcfox
I'd want a look at the raw HTML, but basically you should be able to use the XMLStreamReader class in Apex Code to parse the HTML. Check out the XmlStreamReader class in the docs.
lucky41lucky41

Hi sfdcfox,

 

Thanks for the reply. The raw html looks like this. The values in the <table> varies. The <table> can contain zero rows or 1 row or 2 rows and so on. 

 

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
h3
{mso-style-priority:9;
mso-style-link:"Heading 3 Char";
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:13.5pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
span.Heading3Char
{mso-style-name:"Heading 3 Char";
mso-style-priority:9;
mso-style-link:"Heading 3";
font-family:"Times New Roman","serif";
font-weight:bold;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<h3 align="center" style="text-align:center"><span style="color:#FF6600">Data Integrity Reporting</span><o:p></o:p></h3>
<p class="MsoNormal"><b><span style="color:#FF6600">Description:</span></b><br>
The following is the user's data
<br>
<br>
<b><span style="color:#FF6600">Result:</span></b> <o:p></o:p></p>
<table class="MsoNormalTable" border="1" cellpadding="0">
<tbody>
<tr>
<td style="background:orange;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" align="center" style="text-align:center"><b>ContactName<o:p></o:p></b></p>
</td>
<td style="background:orange;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" align="center" style="text-align:center"><b>ContactEmail<o:p></o:p></b></p>
</td>
<td style="background:orange;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" align="center" style="text-align:center"><b>ContactPhone<o:p></o:p></b></p>
</td>
<td style="background:orange;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" align="center" style="text-align:center"><b>ContactID<o:p></o:p></b></p>
</td>
<td style="background:orange;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" align="center" style="text-align:center"><b>Description<o:p></o:p></b></p>
</td>
<td style="background:orange;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" align="center" style="text-align:center"><b>CreateDT<o:p></o:p></b></p>
</td>
<td style="background:orange;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal" align="center" style="text-align:center"><b>ContractNumber<o:p></o:p></b></p>
</td>
</tr>
<tr>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">testname1<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><a href="mailto:test1@gmail.com">test1@gmail.com</a><o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">1323737698<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">41452<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">test 1 desc<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">5/14/2013 8:58:42 PM<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">CT-285<o:p></o:p></p>
</td>
</tr>
<tr>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">testname2<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal"><a href="mailto:testname2@gmail.com">testname2@gmail.com</a><o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">424-962-7311<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">14423<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">test2 desc<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">5/13/2013 9:01:21 PM<o:p></o:p></p>
</td>
<td style="background:lightblue;padding:.75pt .75pt .75pt .75pt">
<p class="MsoNormal">CT-2858<o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal"><br>
<b><span style="color:#FF6600">Additional Info:</span></b><br>
Alarm execution took 1 seconds <o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;"><o:p>&nbsp;</o:p></span></p>
</div>
</body>
</html>

 

 

 

Thanks

sfdcfoxsfdcfox

This code is made non-trivial due to the extra fluff that Microsoft adds while composing this email. Regardless, basically, you'd want to note when you first encounter a TR element, loop through each TD, stripping out the text from the node, then adding this to a variable to be stored. To complicate matters, this is not well-formed XML (because of the non-closing meta tags), so additional work is required stripping out invalid parts.

 

I don't have time to write up a proper example now, but I'll see if I can't do something this afternoon.

lucky41lucky41

Hi sfdcfox,

 

An example on this would be really helpful for me. 

 

Thanks.

sfdcfoxsfdcfox

I managed to shoehorn your code into a working prototype. I had to add an character entity definition to the XML, and I had to further strip out hr and br elements that were problematic. There's still some parsing errors, but I decided to ignore them for now because they are irrelevant for the demonstration.

 

Usage:

 

String[] tableRows = Scanner.scan(sourceText);

 Result (from test data):

 

ContactName;ContactEmail;ContactPhone;ContactID;Description;CreateDT;ContractNumber;
testname1;test1@gmail.com;1323737698;41452;test 1 desc;5/14/2013 8:58:42 PM;CT-285;
testname2;testname2@gmail.com;424-962-7311;14423;test2 desc;5/13/2013 9:01:21 PM;CT-2858;

You could use multidimensional array, but I was lazy and just made it ordinary delimited strings.

 

 Edit: And the source code...

 

@istest
class ScannerTest{
	@istest
        static void test() {
            String[] results = Scanner.scan('<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv="Content-Type" content="text/html; charset=us-ascii"><meta name="Generator" content="Microsoft Word 14 (filtered medium)"><style><!--/* Font Definitions */@font-face{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}/* Style Definitions */p.MsoNormal, li.MsoNormal, div.MsoNormal{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New Roman","serif";}h3{mso-style-priority:9;mso-style-link:"Heading 3 Char";mso-margin-top-alt:auto;margin-right:0in;mso-margin-bottom-alt:auto;margin-left:0in;font-size:13.5pt;font-family:"Times New Roman","serif";}a:link, span.MsoHyperlink{mso-style-priority:99;color:blue;text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed{mso-style-priority:99;color:purple;text-decoration:underline;}span.EmailStyle17{mso-style-type:personal-compose;font-family:"Calibri","sans-serif";color:windowtext;}span.Heading3Char{mso-style-name:"Heading 3 Char";mso-style-priority:9;mso-style-link:"Heading 3";font-family:"Times New Roman","serif";font-weight:bold;}.MsoChpDefault{mso-style-type:export-only;font-family:"Calibri","sans-serif";}@page WordSection1{size:8.5in 11.0in;margin:1.0in 1.0in 1.0in 1.0in;}div.WordSection1{page:WordSection1;}--></style><!--[if gte mso 9]><xml><o:shapedefaults v:ext="edit" spidmax="1026" /></xml><![endif]--><!--[if gte mso 9]><xml><o:shapelayout v:ext="edit"><o:idmap v:ext="edit" data="1" /></o:shapelayout></xml><![endif]--></head><body lang="EN-US" link="blue" vlink="purple"><div class="WordSection1"><h3 align="center" style="text-align:center"><span style="color:#FF6600">Data Integrity Reporting</span><o:p></o:p></h3><p class="MsoNormal"><b><span style="color:#FF6600">Description:</span></b><br>The following is the user\'s data<br><br><b><span style="color:#FF6600">Result:</span></b> <o:p></o:p></p><table class="MsoNormalTable" border="1" cellpadding="0"><tbody><tr><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactName<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactEmail<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactPhone<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContactID<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>Description<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>CreateDT<o:p></o:p></b></p></td><td style="background:orange;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal" align="center" style="text-align:center"><b>ContractNumber<o:p></o:p></b></p></td></tr><tr><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">testname1<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal"><a href="mailto:test1@gmail.com">test1@gmail.com</a><o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">1323737698<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">41452<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">test 1 desc<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">5/14/2013 8:58:42 PM<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">CT-285<o:p></o:p></p></td></tr><tr><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">testname2<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal"><a href="mailto:testname2@gmail.com">testname2@gmail.com</a><o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">424-962-7311<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">14423<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">test2 desc<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">5/13/2013 9:01:21 PM<o:p></o:p></p></td><td style="background:lightblue;padding:.75pt .75pt .75pt .75pt"><p class="MsoNormal">CT-2858<o:p></o:p></p></td></tr></tbody></table><p class="MsoNormal"><br><b><span style="color:#FF6600">Additional Info:</span></b><br>Alarm execution took 1 seconds <o:p></o:p></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;"><o:p>&nbsp;</o:p></span></p></div></body></html>');
                                            System.debug(results);
        }
}

 

public class Scanner {
    public static String[] scan(String source) {
        string src='<!DOCTYPE html [<!ENTITY nbsp " ">]>'+source;
		String[] values = new String[0];
        src=src.replaceAll('<(br|hr)>','');
        System.debug(System.loggingLevel.Error,src);
        XmlStreamReader r = new XmlStreamReader(src);
        Integer retry = 0;
        Boolean inRow = false, inCol = false;
        while(r.hasNext() && retry < 3) {
            try {
            	r.next();
                retry = 0;
                if(r.geteventtype()==xmltag.start_element && r.getlocalname()=='tr') { 
                    values.add('');
                    inRow = true;
                }
                if(r.geteventtype()==xmltag.end_element && r.getlocalname()=='tr') {
                    inrow = incol = false;
                }
                if(r.geteventtype()==xmltag.start_element && r.getlocalname()=='td') {
                    incol = true;
                }
                if(r.geteventtype()==xmltag.end_element && r.getlocalname()=='td') {
                    incol = false;
                    if(!values.isempty())
                        values[values.size()-1]+=';';
                }
                if(inRow&&inCol&&r.geteventtype()==xmltag.characters) {
                    values[values.size()-1]+=r.gettext();
                }
            } catch(exception e) {
                retry++;
            }
        }
    	return values;
    }
    
}

I apologize for the "brute force" appearance of this code, I just don't have a lot of time on my hands for code this size. This sample should work well for you, though. See the documentation on XmlStreamReader for more details.

This was selected as the best answer
lucky41lucky41
Hi sfdcfox,

Thanks for the help. I have made few changes to the code that you have provided and fulfilled my purpose.

Thanks again for your help.