It is hardly ever necessary to change the encoding to UTF-8, because of the various mechanisms in place:
1) U+0000 to U+007F are already mapped to ISO-8859-1 code points 0x00 to 0x7F, a benefit of variable-length character encoding.
2) Modern text editors, including Windows Notepad, will automatically convert ISO-8859-1 0x80 through 0xFF to their UTF-8 equivalents. Attempting to save as ASCII/ISO-8859-1 will result in these code points being lost or downgraded to code points 0x00 to 0x7F.
3) Modern web browsers submit all their data as UTF-8, so extended code points 0x80 through 0xFF will be converted to the correct UTF-8 code points.
4) The API uses XML, which uses UTF-8, so any integration will automatically convert extended code points to UTF-8 code points.
5) Most modern programming languages use only UTF-8, or use UTF-8 by default, such that extended code points will be handled by conversion or ignoring the extended code points.
The only use case I found was by creating a text file in a command shell with extended ISO-8859-1 characters, and uploading the file to Salesforce. This resulted in the file being interpreted as UTF-8, and coincidentally the conversion failed (resulting in an "invalid UTF-8" character symbol when using System.debug). And in this explicit case, it appears that by the time the file is in Apex Code context, it may already be too late to fix the mangling that occurs.
You could, however, use EncodingUtil.convertToHex on the Blob of such a file, and manually translate the code points using a series of if/then statements. I don't think there'd be more than half-a-dozen if/then statements per character. Though that still would limit you to converting only small amounts at a time (say 5000 bytes or less).
You must remember that Body or FileBody is a Blob. You should use String.valueOf(Body) to obtain the correct (unencoded) text. I believe this is what you are experiencing.
Could you describe the use-case for this?
It is hardly ever necessary to change the encoding to UTF-8, because of the various mechanisms in place:
1) U+0000 to U+007F are already mapped to ISO-8859-1 code points 0x00 to 0x7F, a benefit of variable-length character encoding.
2) Modern text editors, including Windows Notepad, will automatically convert ISO-8859-1 0x80 through 0xFF to their UTF-8 equivalents. Attempting to save as ASCII/ISO-8859-1 will result in these code points being lost or downgraded to code points 0x00 to 0x7F.
3) Modern web browsers submit all their data as UTF-8, so extended code points 0x80 through 0xFF will be converted to the correct UTF-8 code points.
4) The API uses XML, which uses UTF-8, so any integration will automatically convert extended code points to UTF-8 code points.
5) Most modern programming languages use only UTF-8, or use UTF-8 by default, such that extended code points will be handled by conversion or ignoring the extended code points.
The only use case I found was by creating a text file in a command shell with extended ISO-8859-1 characters, and uploading the file to Salesforce. This resulted in the file being interpreted as UTF-8, and coincidentally the conversion failed (resulting in an "invalid UTF-8" character symbol when using System.debug). And in this explicit case, it appears that by the time the file is in Apex Code context, it may already be too late to fix the mangling that occurs.
You could, however, use EncodingUtil.convertToHex on the Blob of such a file, and manually translate the code points using a series of if/then statements. I don't think there'd be more than half-a-dozen if/then statements per character. Though that still would limit you to converting only small amounts at a time (say 5000 bytes or less).
thanks , for the quick response,
i created a visual force page, in that page the user is uploading a txt file
my problem is, when i am reading the file i see ASCII code, the file is in iso 8859-1 format,
and i need to convert to utf so i can understand the text that in the file
You must remember that Body or FileBody is a Blob. You should use String.valueOf(Body) to obtain the correct (unencoded) text. I believe this is what you are experiencing.
thanks again if i use the
String.valueOf(Body) i am getting for example : �‰‰…‹‰�-„‘�‹„ ‘Ž so i need a way to convert this