Shortest way to normalize strings - replace accents and non-ASCII chars with 'normal' chars

Hi,

Banks are really old-fashioned in the way they handle text data in files sent through even the latest standards-managing IT systems. In SEPA, for example, they still handle text in their own charset which is not Unicode, more similar to old ASCII or even EBCDIC :(

In the US, its not a big problem, but in European and Asian countries, this is mega-important.

In Java, the Normalize and Pattern classes are great for this : Unicode text can be normalized down to ASCII in 2 or 3 instructions. In Apex, things are *very* long.

This is the normalization I go through today :

private String clean(String in) {
    //String minmaj = "ÀÂÄÇÉÈÊËÎÏÛÜÔÖaàâäbcçdeéèêëfghiîïjklmnoôöpqrstuùûüvwxyz";        
    //String maj    = "AAACEEEEIIUUOOAAAABCCDEEEEEFGHIIIJKLMNOOOPQRSTUUUUVWXYZ";
    String acc = 'ÀÂÄÇÉÈÊËÎÏÌÛÜÙÔÖÒÑ' + '°()§<>%^¨*$€£`#,;./?!+=_@"' + '\'';        // et Œ, Æ, &; 
    String maj = 'AAACEEEEIIIUUUOOON' + '                          ' + ' ';
    
    String out = '';                 
    for (Integer i = 0 ; i < in.length() ; i++) {
        String car = in.substring(i, i+1);
        Integer idx = acc.indexOf(car);
        if (idx != -1){
            out += maj.substring(idx, idx+1);
        } else if (car == 'Œ') {
            out += 'OE';
        } else if (car == '&') {
            out += 'ET';
        } else if (car == 'Æ') {
            out += 'AE';
        } else {
            out += car;
        }
    }
    
    return out;
}

Remember, this is to produce files where alignment is important, so I can't just replace Æ with AE without fixing the padding instructions (elsewhere).

This method uses too many instructions : has anyone got a better way of doing it, still in APEX ?

TIA,

Rup

June 10, 2013
·
Answer
·
Like
1
·
Follow
0

Kiran Kurella
Have you explored Matcher and Pattern (regular expressions) methods in Apex. If not, I would recommend exloring it to reduce the number of instructions.

http://www.salesforce.com/us/developer/docs/apexcode/index_Left.htm#CSHID=apex_classes_pattern_and_matcher_pattern_methods.htm|StartTopic=Content%2Fapex_classes_pattern_and_matcher_pattern_methods.htm|SkinName=webhelp

http://www.salesforce.com/us/developer/docs/apexcode/index_Left.htm#CSHID=apex_classes_pattern_and_matcher_pattern_methods.htm|StartTopic=Content%2Fapex_classes_pattern_and_matcher_pattern_methods.htm|SkinName=webhelp

June 10, 2013
·
Like
0
·
Dislike
0

@altius_rup
Hi Codeizard,
I know Pattern and Matcher well.
What I really need is the equivalent of Java's Normalize to go with them : anyone got an idea ?

Rup

June 10, 2013
·
Like
0
·
Dislike
0

David Waugh
@altius_rup, just wanted to say this normalizations snippet helped me. I extended it to cover additional characters. Not sure if my extension is covered by your alt strings that are commented out.

String accents = 'ÃÁÀÂÄÇÉÈÊËÎÏÌÍÚÛÜÙÓÔÕÖÒÑÝ' + '°()§<>%^¨*$€£`#;?!+=@©®™"'; String maj = 'AAAAACEEEEIIIIUUUUOOOOONY' + ' ';

Thanks. Would love to see more formal support for string Normalization from Salesforce. Can't the java implementation be lifted??

August 25, 2014
·
Like
0
·
Dislike
0

You need to sign in to do that.

Need an account? Sign Up

Have an account? Sign In

Dismiss

Browse by Topic

Welcome to Support!

Show

sorted by

Shortest way to normalize strings - replace accents and non-ASCII chars with 'normal' chars

You need to sign in to do that.