+ Start a Discussion
Devendra NataniDevendra Natani 

Error in reading document body - "BLOB is not a valid UTF-8 string"

I am having following error in apex class while reading a word document which is uploaded in Document object .

 

"BLOB is not a valid UTF-8 string".

 

Thanks,

Devendra Natani

Best Answer chosen by Admin (Salesforce Developers) 
Jia HuJia Hu
Body of Document is in type: base64
Use following:
Document d = [select id, body from document limit 1];
String strBody = EncodingUtil.base64Encode( d.body );

All Answers

joshbirkjoshbirk

Can you post the line causing the error?  Are you trying to send the Blob into a String field/var?

Devendra NataniDevendra Natani

Hi,

 

here is the complete scenario. I have uploaded a msword document in native document object. Then I am querying this document in apex class using below query.

 

Document d = [select id, body from document where name = 'test document'];

Blob b = d.body;
string content = b.tostring();

 

 

Please let me know what's wrong in it.

 

Thanks,

Dev

 

joshbirkjoshbirk

I think the problem is that since MS Word isn't stored in a text format, the toString method is returning a non-standard "string" ... ie binary data, which the string field isn't equipped to handle.

Devendra NataniDevendra Natani

Hi,

 

I am totally agreed with you.I think I will have to create a webservice class using c# because in .Net we can read the content of word document. 

 Is there any other way to solve this issue?

 

Thanks,
Devendra Natani

FangfangFangfang

Hello,now I have happened the same problem,and Idon't konw how to solve it,may you send your webService class to me and tell me how you use it to solve the problem,Please,thank you so much!

FangfangFangfang

My Email address is yangfangmeister@gmail.com and My name is Alex.  thank you!

RamboRambo

Even i am facing the same issue...Well I have an CSV file as my attachment and I am trying to have the values in the excel sheet as values in my fields...

 

Kindly suggest me a suitable solution.

CazoomiCazoomi

Is there any resolution to this error on this thread?

 

~Clint

@cazoomi

devisfundevisfun

The only resolution I can come up with is forcing the document to be UTF-8, then opening it back up to see what the invalid characters are.

 

My issue was a CSV file with seemingly innocent text-only characters on it throwing that error.  I opened the file up in Excel, saved it as a .txt file, opened that file up in Notepad++, used the Encoding menu function Convert to UTF-8 without BOM (not sure why that choice, but it worked), saved it, and opened it back up in Excel.  There were extraneous characters next to some email addresses, and when I deleted them and tried it again I didn't get the error.

 

Good luck.

Sumit Vakil.ax1203Sumit Vakil.ax1203

I ran into the same problem recently. After some debugging, I found out that the CSV file I was trying to upload had one field with a non-breaking space (Hex A0) character at the end of it. The presence of this character was causing me to get the "BLOB is not a valid UTF-8 string" error.

 

This character looks like any other whitespace, so you can't spot it easily. Using Excel functions such as trim, clean, etc. don't remove the character either. You have to use the substitute function:

=SUBSTITUTE(A1,CHAR(160),"")

 

Once I cleaned up my file with this function and saved it again, the problem went away.

 

Regards,

 

Sumit

Jia HuJia Hu
Body of Document is in type: base64
Use following:
Document d = [select id, body from document limit 1];
String strBody = EncodingUtil.base64Encode( d.body );
This was selected as the best answer
sanjayrs23sanjayrs23

Thanks for this tricky bit. It was helpful. I just opened the same csv file in Notepad++ and the applied "Convert to UTF-8 Without BOM" and tried uploading again and it worked well.

sambasamba

My friends, If you have solved this issue, please tell me. Thanks.

 

My Email: samba.gao@hotmail.com

csreddy7799csreddy7799

This code is not working with Word document. How to convert word doc body into string?