function readOnly(count){ }
Starting November 20, the site will be set to read-only. On December 4, 2023,
forum discussions will move to the Trailblazer Community.
+ Start a Discussion
spraksprak 

[PHP] The always popular "Invalid byte 1 of 1-byte UTF-8 sequence." problem

*** UPDATE 4 Feb 2005: It appears that one field was not having the utf8_encode function applied to it. Stupid error; you may now all point and laugh.



Greetings all; I recently had to code up a mechanism to insert/select cases into Salesforce via the SOAP API. Built everything up fine and began testing; eventually, some of the test cases included accented characters (umlauts, etc. e.g., ö). In these test cases, the insert call failed with the error message "java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8 sequence."

Having encountered this problem before, I scoured my archives and reminded myself that the text should be passed through utf8_encode() before making the SOAP call. No problem then; I wrapped each text item with a utf8_encode() call.

However, the problem still persists even after using utf8_encode(); now, the accented characters are being passed as values that look like garbage (e.g., �?¼). The same error message is returned from the SOAP call; scoping the wire shows that the SOAP headers are set to UTF-8 encoding.

Pretty stumped at this point; any help would be appreciated. Server details follows:

* Fedora Core 2
* Apache 2.0.51-2.7
* PHP 4.3.8 (cgi)
* PEAR::SOAP 0.8RC3
* Latest Salesforce client from Sourceforge

Cheers.

Message Edited by sprak on 02-04-2005 10:04 AM

adamgadamg
Glad you resolved the issue - this has come up more than once before..
thechadthechad
I have the exact same problem.  Did you find a fix to this.  The error goes away if you wrap the variable in utf8_encode... but you loose the Kanji (asian double-byte characters) and is replaced with the gobbeltygook.    I know everything else is encoded to utf8 as I can store the data in mysql, send an email, etc and the kanji is passed.  But when I send it to SaleForce via the API or Web-2-Lead.  It bombs with the same error.

Any help you could provide would be appreciated.  Thanks!

~Chad
spraksprak
The issue I was having was that one of my fields was not being wrapped in a utf8_encode() call, something I should have spotted before posting the thread initially.  My problems and yours goes away when you wrap the data in the call.  Our system has pushed European characters through successfully, but I do not believe we have tried pushing through kanji to Salesforce.

Where exactly is the kanji ending up as garbage?  Is it when you view the record via the Salesforce web interface and if so what browser?  When you pull it back out of Salesforce via an API call?  What encoding is the text in (shift jis, euc-jp, iso, etc.) before you pass it to Salesforce?

Cheers.
thechadthechad
thank you Sprak for the reply... I thought maybe the thread was too old for a response.

I must of misunderstood your original post, I thought that you resolved the error by using utf8_encode() but were still getting wierd characters (e.g., �?¼).   

This is the problem i am having.  European characters, accents, etc  have been working fine... But when the double-byte kanji come through it causes the api to die with the same error.    I saw your suggestion and added utf8_encode() to all the variables in the form. When I wrap in utf8_encode... the error goes away but the leads show up in SalesForce as (e.g., �?¼) instead of the actual characters. 

The data is all being sent and passed in utf-8.  I can store them in MySQL fine and retreive them in their original state, I can send utf-8 email and the characters show up fine.  But when I send them to the API, it dies on me. 

I have included some Kanji if you have a moment to test it.  I appreciate your help.

以下のフォームをご記入ください。 <ご記入上の注意事項> *は必須入力項目です。英数字は半角で入力してください。
spraksprak
After a bit of digging and testing, I'm stumped on this one; I can enter kanji via the standard Salesforce web interface and have it display correctly.  So, it definitely can store kanji/double-byte characters into the database.  However, running it through our API calls produces the same results you are seeing.  Poking around in this forum, the PHP docs, etc. have produced no useful solutions. At this point, I would suggest three things:
  1. If you are using a HTML form to pass the data into the API call process, add <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> in between the head tags if it is not already there and try changing the charset to various charsets (shift-jis, etc.).  Also, try adding and changing the value of the accept-charset attribute for the <form> tag.  Not sure if that will help, but it will rule out a few more variables.
  2. Start up a new thread in this forum titled "Unable to store kanji/double-byte characters via API" or something to that effect.  Someone might recognize the problem you are having from a new, better titled thread
  3. Fire off a ticket to Salesforce support; I've found them very responsive.
Sorry I cannot be of more assistance; if you do come across a solution, drop a link to it here if you remember.

Cheers.

Message Edited by sprak on 04-21-2006 02:41 PM

thechadthechad
I found the issue after contacting SalesForce...   So for anyone reading our thread... here is what I ahve found via salesforce documentation.

“We currently have four instances of our service: NA0, NA1, EMEA, and JP.

  • NA0 is our original instance created in 1999. No new organizations are added to this instance.
  • NA1 is now where signups from North America are created.
  • EMEA is where signups from Europe, Asia (minus Japan), and South America are created.
  • JP is where signups from Japan are created.

While NA1, EMEA, and JP support the UTF8 character set (aka Unicode), NA0 only supports the ISO-8859-1 character set.

What this all means is that for all customers that signed up from the US web site prior to roughly June 2002, those customers cannot use asian languages that are based on a double-byte character set. So, those customers would have a tough time putting on divisions or users from Japan, Korea, or China.

Note that these orgs CAN still use the following:

  • Multi-currency
  • All western languages such as German, French, Spanish, Italian, and Swedish “


So that said, we are migrating to a compatible server and we should be good to go.  Thanks for your quick replies... very helpful.

~Chad
thechadthechad
Ok,

Our data was migrated to the servers that support the asian characters.   When leads are created through the API it still bombs with that darn error:  "Invalid byte 1 of 1-byte UTF-8 sequence."

If I use utf8_encode, it doesn't die, but all the asian characters are transformed into something strange.

The error appears to occur on the SOAP level, and not the salesforce side, but I could be totally wrong...

Any ideas would help.

Thank you.
thechadthechad
I thought it would be benefital to the community to post my findings... we finally resolved this issue.  And of course it was user error (on my part)!

When dealing with Asian Characters (double-byte) functions like substr, strtolower, strtoupper will actualy distort the original characters (I originally thought they would be ignored).

we do not have mbstring functions installed,  but using mb_substr, mb_strtolower, etc will resolve this issue, or by removing the formating from double-byte characters.

The migration to the new servers was also critical from the SFDC end.

Thanks everyone for your help.
SynchroSynchro
And these issues will mostly magically go away in PHP 6: http://wiki.cc/php/PHP6

At least we hope they will!
PannarPannar
Hi,
 
Is there way to create the validation rule or something on the particula custom field Name(Text(255) under Opportunity object.
There should be a restriction in that particular field that it should only allow western characters. Is it possible? I don't think its possible. the character set would have been already defined by sforce server and how possible to set the particular charset on a particular field to restrict non-western chars? please help me..
 
please respond.
regards
pannar
SuperfellSuperfell
I don't believe there's anyway to do that.
PannarPannar

Dear Simon,

I've tried this code:-
 
Code:
if( 
REGEX(Name, "[a-zA-ZÀ-ÿ]"),
true, 
false 
)

 but it doesn't work. Its allowing me to enter any characters and saved the record without throwing any error! :-(
 
I have to restrict the user to enter ONLY Western characters on this field Name. is it possible?
I wonder why REGEX is not working properly,as it always return the condition false. is it something wrong with my REGEX function parameters?
Do you know what's range of Western characters?
PannarPannar

Simon,

Could you please tell me the range of Western characters which can be passed in REGEX. If i get the range, then i am done with my requirement.

thanks a lot

SF7SF7

Hi,

 

I am having the same problem as you faced earlier i was wondering if you found any solution for this.

 

 

Sorry it might be many years you might remember or not just want to take a chance.

 

Thanks

Akhil