cancel
Showing results for 
Search instead for 
Did you mean: 
Constant Contact wants to help you succeed! We’re celebrating our professional service programs on the Constant Contact Community this month and you have a chance to try one of the services for free! Learn more.

Unicode in JSON refers to wrong code point

Occasional Participant

Unicode in JSON refers to wrong code point

Hello. I'm a WordPress plugin developer. When I was adding a function using the Constant Contact V3 API, I encountered a weird problem.

 

First, I added an email list through the usual Constant Contact dashboard. I gave the list a Japanese title "テスト" (meaning "Test" in English).

 

Then, I tried to retrieve the lists collection using the API.

 

While lists with English titles were perfectly displayed, all of the lists with Japanese titles like "テスト" were completely garbled. For example, "テスト" were shown as "テスト".

 

I wondered and checked the raw JSON response before being decoded with PHP json_decode() function. The following is the part of the response representing the "テスト" list:

 

{
  "list_id" : "dae75b20-62a1-11e9-9ce0-d4ae52843d28",
  "name" : "\u00E3\u0083\u0086\u00E3\u0082\u00B9\u00E3\u0083\u0088",
  "favorite" : false,
  "created_at" : "2019-04-19T08:51:31-04:00",
  "updated_at" : "2019-04-19T08:51:31-04:00"
}

Something is wrong with the name field value. This string obviously doesn't represent the correct code points.

 

"テスト" should be represented as "\u30c6\u30b9\u30c8" in Unicode code points.

 

https://codepoints.net/U+30C6
https://codepoints.net/U+30b9
https://codepoints.net/U+30c8

 

"\u00E3\u0083\u0086\u00E3\u0082\u00B9\u00E3\u0083\u0088" seems to be UTF-8 representations of the characters, not Unicode code points.

 

According to RFC 8259, when you represent a character as "\uXXXX" form, it must be "four hexadecimal digits that encode the character's code point", so using UTF-8 encoded values there is not allowed.

 

I believe that is not an intended behavior and hope the API team will address this problem. Sorry in advance if this is a know issue already someone has reported.

4 REPLIES 4
Moderator

Re: Unicode in JSON refers to wrong code point

Hello,

 

Thank you for reaching out to Constant Contact API Developer Support.

 

Most of the Latin-based languages use the standard 8-bit Character Encoding that Constant Contact and virtually all email clients support. Text written in these languages can be used without any hassle or compatibility issues. Examples of 8-bit languages include English, Danish, Dutch, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish. These are also the languages that Constant Contact supports for the Email Footer/Privacy Policy.

 

Other languages, such as Japanese, Mandarin, Chinese, Korean, and Vietnamese use a 16-bit Character Encoding (sometimes referred to as "Double-Byte" or "Unicode" characters). Constant Contact does not officially support these languages. At this time there are no plans to change the way our API currently interacts with them in this way.

 

I have passed your feedback on to our product team for consideration. Please let me know if you have any other questions!

 

Regards,

 

David B.
Tier II API Support Engineer

Occasional Participant

Re: Unicode in JSON refers to wrong code point

I strongly hope the product team will consider this issue seriously. This is not about the range of supported language. As I wrote, the JSON output is invalid. Also, not only some Asian languages are affected. European languages you listed now use Unicode as the character set, and, of course, European users may use characters that are out of the old 8-bit range.

Occasional Participant

Re: Unicode in JSON refers to wrong code point

Here is another example. If you make a list titled "Citroën", it will be garbled as "Citroën".

 

This is the API response:

 

{
  "list_id" : "7165b5e8-68d7-11e9-94f1-d4ae527b8c41",
  "name" : "Citro\u00C3\u00ABn",
  "favorite" : false,
  "created_at" : "2019-04-27T06:30:14-04:00",
  "updated_at" : "2019-04-27T06:30:14-04:00"
}

"ë" should be represented as "\u00eb", so the name property should have "Citro\u00ebn", not "Citro\u00C3\u00ABn".

 

https://codepoints.net/U+00eb ë

 

As you see, there is a problem in the JSON output from the API that \uXXXX notation doesn't have a valid Unicode code point but has a UTF-8 code, and this problem affects every language.

Moderator

Re: Unicode in JSON refers to wrong code point

Hello,

 

Thank you for your continued concern.

 

I'm asking our development team to review this further.


Regards,
David B.
Tier II API Support Engineer