Unicode in JSON refers to wrong code point

user41190 · ‎04-20-2019

Hello. I'm a WordPress plugin developer. When I was adding a function using the Constant Contact V3 API, I encountered a weird problem.

First, I added an email list through the usual Constant Contact dashboard. I gave the list a Japanese title "テスト" (meaning "Test" in English).

Then, I tried to retrieve the lists collection using the API.

While lists with English titles were perfectly displayed, all of the lists with Japanese titles like "テスト" were completely garbled. For example, "テスト" were shown as "ãƒ†ã‚¹ãƒˆ".

I wondered and checked the raw JSON response before being decoded with PHP json_decode() function. The following is the part of the response representing the "テスト" list:

{
  "list_id" : "dae75b20-62a1-11e9-9ce0-d4ae52843d28",
  "name" : "\u00E3\u0083\u0086\u00E3\u0082\u00B9\u00E3\u0083\u0088",
  "favorite" : false,
  "created_at" : "2019-04-19T08:51:31-04:00",
  "updated_at" : "2019-04-19T08:51:31-04:00"
}

Something is wrong with the name field value. This string obviously doesn't represent the correct code points.

"テスト" should be represented as "\u30c6\u30b9\u30c8" in Unicode code points.

https://codepoints.net/U+30C6 テ
https://codepoints.net/U+30b9 ス
https://codepoints.net/U+30c8 ト

"\u00E3\u0083\u0086\u00E3\u0082\u00B9\u00E3\u0083\u0088" seems to be UTF-8 representations of the characters, not Unicode code points.

According to RFC 8259, when you represent a character as "\uXXXX" form, it must be "four hexadecimal digits that encode the character's code point", so using UTF-8 encoded values there is not allowed.

I believe that is not an intended behavior and hope the API team will address this problem. Sorry in advance if this is a know issue already someone has reported.

David_B. · ‎04-24-2019

Hello,

Thank you for reaching out to Constant Contact API Developer Support.

Most of the Latin-based languages use the standard 8-bit Character Encoding that Constant Contact and virtually all email clients support. Text written in these languages can be used without any hassle or compatibility issues. Examples of 8-bit languages include English, Danish, Dutch, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish. These are also the languages that Constant Contact supports for the Email Footer/Privacy Policy.

Other languages, such as Japanese, Mandarin, Chinese, Korean, and Vietnamese use a 16-bit Character Encoding (sometimes referred to as "Double-Byte" or "Unicode" characters). Constant Contact does not officially support these languages. At this time there are no plans to change the way our API currently interacts with them in this way.

I have passed your feedback on to our product team for consideration. Please let me know if you have any other questions!

Regards,

David B.
Tier II API Support Engineer

user41190 · ‎04-25-2019

I strongly hope the product team will consider this issue seriously. This is not about the range of supported language. As I wrote, the JSON output is invalid. Also, not only some Asian languages are affected. European languages you listed now use Unicode as the character set, and, of course, European users may use characters that are out of the old 8-bit range.

user41190 · ‎04-27-2019

Here is another example. If you make a list titled "Citroën", it will be garbled as "CitroÃ«n".

This is the API response:

{
  "list_id" : "7165b5e8-68d7-11e9-94f1-d4ae527b8c41",
  "name" : "Citro\u00C3\u00ABn",
  "favorite" : false,
  "created_at" : "2019-04-27T06:30:14-04:00",
  "updated_at" : "2019-04-27T06:30:14-04:00"
}

"ë" should be represented as "\u00eb", so the name property should have "Citro\u00ebn", not "Citro\u00C3\u00ABn".

https://codepoints.net/U+00eb ë

As you see, there is a problem in the JSON output from the API that \uXXXX notation doesn't have a valid Unicode code point but has a UTF-8 code, and this problem affects every language.

David_B. · ‎04-29-2019

Hello,

Thank you for your continued concern.

I'm asking our development team to review this further.

Regards,
David B.
Tier II API Support Engineer

Developer Portal