+ Start a Discussion
JoshVHJoshVH 

Pattern and Matcher Question

I am using the Pattern and Matcher classes to search text from an email.  Sometimes, I get an exception that says Regex too complicated.  I can't find any information on this.  Does anyone know what can cause?  I get the premise of the exception but don't know what to do to fix it.  If I put my regular expression and sample text into the tester on this site, http://www.fileformat.info/tool/regex.htm.  It works fine and returns what I want.  From what I understand Salesforce uses similar functionality as Java which the above site is using.  Any ideas?  Thanks.
paul-lmipaul-lmi

if you look in the debug log (turn it on if you haven't yet), how long does it take to execute?  you could be hitting some SF governor limit.

 

it'd also be helpful if you posted code.

JoshVHJoshVH
I turned the debug log on but can't get any useful information because this exception get's thrown.  I put a try catch in but the exception doesn't seem to be caught by the try catch statement.  I have logged a case to get some answers and I will post when I have more information.  The code is too complicated and would take too long to explain at this point.  I am more interested in seeing what the cause of this exception is and what can possibly be done to get around it.  Thanks..
fgwarbfgwarb
Did you ever sort this out?  I just got the same message and am looking for answers....
JoshVHJoshVH

I found a workaround kind of.  If I break the string into smaller overlapping chucks and run the regex it works.  This isn't the ideal solution but it works.  I logged a case and after several months it came back from engineering that the regex code was accessing the string over 1 million times.  They couldn't give an explanation as to why it would access the string over 1 million times.  It finally came back that they were closing the case because engineering wouldn't respond anymore.  Please log a case so they know that more users are having the issue.  If enough people open a case they will take the bug more seriously.

rklaassenrklaassen
Got this error too. I simpy use string.replace('\n') to get the lines from a text file. It works fine for files around the 500kb and 500 lines. If the text file gets too big (1mb) i get this error too. Don't know why that is, but the workaround described above works fine for me too!
ascuccimarraascuccimarra

Exactly same problem here. Spliting by ('\n') is causing that error, with big files (4000 lines++).

Will try that breaking the string into smaller pieces solution.

Thanks.

davehilarydavehilary

I've the same 'regex too complicated' problem with large volume attachment processing using Email Services. I got an answer from my ISV technical support contact last week, which I've posted in case it's of use to anyone:

 

"I’ve been doing some digging and the regex too complicated message is definitely based on the size of the files.  It looks like Email Services provides an entry point that allows developers to push in data sizes that far exceed the heap limits.  The regex seems to be failing because of the heap supporting the regex. The only alternative is to cut the file sizes down or choose another integration approach."

 

I've posted the original problem (and the same response) here:

 

http://community.salesforce.com/t5/Apex-Code-Development/Regex-too-complicated-error-for-large-volume-of-data-and-a/m-p/178769

 

 

MG ConsultingMG Consulting

Being able to catch the "Regex too complicated" exception is key to my working around it for my situation but I cannot catch it for some weird reason. I also cannot reproduce this exception on demand and that's driving me crazy because I can't look at the exception in detail to try and figure out why I can't catch it.

 

Does anyone have some code that reliably reproduces this exception?

Has anyone been able to successful catch this exception?

 

Thanks a lot!

ErlandErland

I ran into this issue today, and have yet to find any way to catch this exception. Highly disappointed by this.

 

Although this reply comes two years after the fact, this thread is still relevant, so here's my solution for dealing with arbitrarily large input files. This only applies specifically to splitting lines by \n or \r. More complex regex patterns will need some tweaking.

 

Below, allLines will become the final List containing all lines (go figure). divideString is a recursive method for breaking the input file into managable chunks, splitting them and appending to a global list. I think this will retain the original line order, but haven't checked yet so beware.

 

    public List<String> allLines {get; set;}
    private Integer inputCharacterMax { get { return 100000; } set; } // this is a somewhat arbitrary character limit

    public void divideString(String input) {
    	
    	Integer pivot = input.indexOf( '\r', Integer.valueOf(Math.floor(input.length() / 2)) );
    	
    	String left = input.substring(0,pivot);
    	String right = input.substring(pivot);
    	
    	if ( pivot < inputCharacterMax ) {
   	   	// split left and right chunks, add to allLines
    		List<String> leftLines = left.split('\r');
    		List<String> rightLines = right.split('\r');
    		allLines.addAll(leftLines);
    		allLines.addAll(rightLines);
    	}
    	else {
   	   	// divide and conquer!
    		divideString(left);
    		divideString(right);
    	}
    }

 To initiate the process:

Integer pivot = fileContents.indexOf( '\r', Integer.valueOf(Math.floor(fileContents.length() / 2)) );
if ( pivot > inputCharacterMax ) {
	divideString(contents);
}
else {
	allLines = contents.split('\r');
}

 Just managed to process a CSV with over 21,000 rows, and governor/script limits stayed within reasonable values!

If only this exception were catchable, this code would be simplified to begin recursion in a catch, rather than relying on an arbitrary character limit conditional.

 

Hope this helps someone out there!

vcharletvcharlet
Great thx Erland,

make few change in String right = input.substring(pivot + 1); to avoid a blank line ...
Kunal Parmar 9Kunal Parmar 9
Hi Erland,
        Thank you for providing the work around. I just have a small question. Did your resulting global list preserve the order of the lines from the original? This helps a lot. Thanks!
J BengelJ Bengel
Hello Erland of 2012: This is 2021 calling, and all these years later, this is still relevant. *sigh*

For my use case I could leave off the get/set methods, because all I was tyring to do was graft the contents of a static resource to an array (which you would think Apex would have a "canned" method for doing, but whatever). Since I wasn't workgin on a crontroller or VF page, I had all the access I needed directly.

Also, if your line terminator is CRLF (which turned out to be the case for me) you have to split the left and right halves at 0 through pivot  to get everything before the CR and pivot +to get everythign after the LF. Otherwise you end up with a blank line in your output from the 0a that follows the 0d.

Apart from that, everyting appeared to work beautifully at least in my initial testing. A very elegant and easy to follow solution, for the problem I was trying to solve. I guess if you had to parse out the fields from the record yo'ud have more to do, but for what I needed, this was the perfect answer.

Thanks!