function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
duduavduduav 

howto convert text file from iso 8859-1 to utf-8 encoding in apex ?

howto convert text file from  iso 8859-1  to utf-8 encoding in apex ?

t

sfdcfoxsfdcfox

Could you describe the use-case for this?

 

It is hardly ever necessary to change the encoding to UTF-8, because of the various mechanisms in place:

 

1) U+0000 to U+007F are already mapped to ISO-8859-1 code points 0x00 to 0x7F, a benefit of variable-length character encoding.

2) Modern text editors, including Windows Notepad, will automatically convert ISO-8859-1 0x80 through 0xFF to their UTF-8 equivalents. Attempting to save as ASCII/ISO-8859-1 will result in these code points being lost or downgraded to code points 0x00 to 0x7F.

3) Modern web browsers submit all their data as UTF-8, so extended code points 0x80 through 0xFF will be converted to the correct UTF-8 code points.

4) The API uses XML, which uses UTF-8, so any integration will automatically convert extended code points to UTF-8 code points.

5) Most modern programming languages use only UTF-8, or use UTF-8 by default, such that extended code points will be handled by conversion or ignoring the extended code points.

 

The only use case I found was by creating a text file in a command shell with extended ISO-8859-1 characters, and uploading the file to Salesforce. This resulted in the file being interpreted as UTF-8, and coincidentally the conversion failed (resulting in an "invalid UTF-8" character symbol when using System.debug). And in this explicit case, it appears that by the time the file is in Apex Code context, it may already be too late to fix the mangling that occurs.

 

You could, however, use EncodingUtil.convertToHex on the Blob of such a file, and manually translate the code points using a series of if/then statements. I don't think there'd be more than half-a-dozen if/then statements per character. Though that still would limit you to converting only small amounts at a time (say 5000 bytes or less).

duduavduduav

thanks , for the quick response,

 

i created a visual force page, in that page the user is uploading a txt file

my problem is, when i am reading the file i see ASCII code, the file is  in iso 8859-1 format,

and i need to convert to utf so i can understand the text that in the file 

sfdcfoxsfdcfox

You must remember that Body or FileBody is a Blob. You should use String.valueOf(Body) to obtain the correct (unencoded) text. I believe this is what you are experiencing.

duduavduduav

thanks again if i use the

 String.valueOf(Body)  i am getting for example : �‰‰…‹‰�-„‘�‹„ ‘Ž   so i need a way to convert this