+ Start a Discussion
jlcoverityjlcoverity 

"Illegal regex" - HUH?!?!?

Hello all, Would someone please explain to me why this:

 

Pattern p = Pattern.compile('^AMAZON(\\.COM)?($|[,-\\s]+');

 

Would produce this error:

 

08:18:26.042 (42021000)|EXECUTION_STARTED 08:18:26.042 (42032000)|CODE_UNIT_STARTED|[EXTERNAL]|execute_anonymous_apex 08:18:26.043 (43805000)|FATAL_ERROR|System.StringException: Invalid regex: Illegal character range near index 22 ^AMAZON(\.COM)?($|[,-\s]+

 

What's up with the substitution of the pipe character???

 

This regex works perfectly fine in Javascript.

 

Thanks, -jl

Best Answer chosen by Admin (Salesforce Developers) 
sfdcfoxsfdcfox

JavaScript isn't Java, so some regexs work differently. However, I did find what you're looking for by playing around with a Java RegEx tester (http://www.regexplanet.com/advanced/java/index.html). I believe this is code you're seeking:

 

^amazon(\\.com)?($|[,\\-\\s])+

The trick was that the parser was seeing the unescaped - as a character range, so it tries to match "comma through space", which isn't a valid range (because space comes before comma in the UTF-8 code page). This code successfully matches "amazon", "amazon, inc", "amazon.com, inc", and "amazon.com", but not "amazon.co" or "amazons".

All Answers

Kamatchi Devi SargunanathanKamatchi Devi Sargunanathan

Hi,

 

Pattern p = Pattern.compile('^AMAZON(\\.COM)?($ [,-\\s]+'); // the highlighted character only getting you this error remove that and try the following,

 

Pattern p = Pattern.compile('^AMAZON(\\.COM)?($ [,-\\s]+');

 

Hope so this helps you...!

Please mark this answer a Solution and please give kudos by clicking on the star icon, if you found this answer as helpful.

jlcoverityjlcoverity

Yes, I know that the "pipe" character is the problem...but I want the pipe character as it indicates "OR" in a regular expression. What I am trying to do here is match a string that starts with "Amazon" or "Amazon.com" and then either ends ($) or is followed by a comma, a dash, a whitespace (or any combination of the 3, so that it will also match "Amazon - UK" or "Amazon.com, Inc"). 

 

The pipe character is a valid regular expression entity (see: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-or). In fact, it is even referenced in the Apex documentation:

        // This pattern reduces the email address to 'john@smithco' 
        // from 'john@*.smithco.com' or 'john@smithco.*'
        Pattern emailPattern = Pattern.compile('(?<=@)((?![\\w]+\\.[\\w]+$)[\\w]+\\.)|(\\.[\\w]+$)');

 So...why is it now being escaped? The pattern compile method is failing on the semi-colon on the HTML entity (&#124;).

 

sfdcfoxsfdcfox

JavaScript isn't Java, so some regexs work differently. However, I did find what you're looking for by playing around with a Java RegEx tester (http://www.regexplanet.com/advanced/java/index.html). I believe this is code you're seeking:

 

^amazon(\\.com)?($|[,\\-\\s])+

The trick was that the parser was seeing the unescaped - as a character range, so it tries to match "comma through space", which isn't a valid range (because space comes before comma in the UTF-8 code page). This code successfully matches "amazon", "amazon, inc", "amazon.com, inc", and "amazon.com", but not "amazon.co" or "amazons".

This was selected as the best answer
jlcoverityjlcoverity

That was it! I totally forgot that there are inconsistencies between Java & JS regex...been having too much fun with node.js lately, I guess. :)

 

Thank you!

 

 

UPDATED: To complete the full matching that I needed (in Apex), I also needed to add a [\\w\\s]* to the end (to match "Amazon.com - Palo Alto", for example): 

^amazon(\\.com)?($|[,\\-\\s]+[\\w\\s]*)