+ Start a Discussion

How to extract the href markup from the htmlbody

Hi Guys,

I have a requirement to show the markup of the href only and to exclude the rest.

For Instance:

<p>This is Paragraph One</p>
<a href="http://www.google.com">This is Google:</a>
<p>This is Paragraph Two</p>
<a href="http://www.yahoo.com">This is Yahoo:</a>
<p>This is Paragraph Three</p>
<a href="http://www.youtube.com">This is YouTube:</a>

Desired output -

<a href="http://www.google.com">This is Google:</a>
<a href="http://www.yahoo.com">This is Yahoo:</a>
<a href="http://www.youtube.com">This is YouTube:</a>

My Code so far -

trigger HtmlBody on EmailMessage (after insert) {

    Set<ID> sid = new Set<ID>();
    List<EmailMessage> LEm = new List<EmailMessage>([Select ParentId, HtmlBody from EmailMessage WHERE ParentId = 'SPECIFIC CASE ID' ]);
    for(EmailMessage em :LEm ){

    List<Case> cs =[select id, hrefinfo__c from Case where id IN :sid];
    for(Case c: cs){
        c.HrefInfo__c = ---------------------------------

I want to capture the end result into a text field hrefinfo__c.

I am not sure of which string methods to use here in order the achieve the above:

Appreciate your help!
So, this type of parsing is not a trivial thing to do well.  I don't know of any Salesforce libraries or calls that will do this for you.  You could do something naive like removing all of the new lines, and then add new lines back based on the closing and opening tags.  If you only care about the anchor tags you could probably grab them with a regex using the Pattern [1] and Matcher [2] classes [3].

[1] https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_classes_pattern_and_matcher_pattern_methods.htm
[2] https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_classes_pattern_and_matcher_matcher_methods.htm
[3] https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/apex_classes_pattern_and_matcher_using.htm