- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello. I'm a WordPress plugin developer. When I was adding a function using the Constant Contact V3 API, I encountered a weird problem.
First, I added an email list through the usual Constant Contact dashboard. I gave the list a Japanese title "ใในใ" (meaning "Test" in English).
Then, I tried to retrieve the lists collection using the API.
While lists with English titles were perfectly displayed, all of the lists with Japanese titles like "ใในใ" were completely garbled. For example, "ใในใ" were shown as "รฃฦโ รฃโยนรฃฦห".
I wondered and checked the raw JSON response before being decoded with PHP json_decode() function. The following is the part of the response representing the "ใในใ" list:
{ "list_id" : "dae75b20-62a1-11e9-9ce0-d4ae52843d28", "name" : "\u00E3\u0083\u0086\u00E3\u0082\u00B9\u00E3\u0083\u0088", "favorite" : false, "created_at" : "2019-04-19T08:51:31-04:00", "updated_at" : "2019-04-19T08:51:31-04:00" }
Something is wrong with the name field value. This string obviously doesn't represent the correct code points.
"ใในใ" should be represented as "\u30c6\u30b9\u30c8" in Unicode code points.
https://codepoints.net/U+30C6 ใ
https://codepoints.net/U+30b9 ใน
https://codepoints.net/U+30c8 ใ
"\u00E3\u0083\u0086\u00E3\u0082\u00B9\u00E3\u0083\u0088" seems to be UTF-8 representations of the characters, not Unicode code points.
According to RFC 8259, when you represent a character as "\uXXXX" form, it must be "four hexadecimal digits that encode the character's code point", so using UTF-8 encoded values there is not allowed.
I believe that is not an intended behavior and hope the API team will address this problem. Sorry in advance if this is a know issue already someone has reported.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for reaching out to Constant Contact API Developer Support.
Most of the Latin-based languages use the standard 8-bit Character Encoding that Constant Contact and virtually all email clients support. Text written in these languages can be used without any hassle or compatibility issues. Examples of 8-bit languages include English, Danish, Dutch, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish. These are also the languages that Constant Contact supports for the Email Footer/Privacy Policy.
Other languages, such as Japanese, Mandarin, Chinese, Korean, and Vietnamese use a 16-bit Character Encoding (sometimes referred to as "Double-Byte" or "Unicode" characters). Constant Contact does not officially support these languages. At this time there are no plans to change the way our API currently interacts with them in this way.
I have passed your feedback on to our product team for consideration. Please let me know if you have any other questions!
Regards,
David B.
Tier II API Support Engineer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I strongly hope the product team will consider this issue seriously. This is not about the range of supported language. As I wrote, the JSON output is invalid. Also, not only some Asian languages are affected. European languages you listed now use Unicode as the character set, and, of course, European users may use characters that are out of the old 8-bit range.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is another example. If you make a list titled "Citroรซn", it will be garbled as "Citroรยซn".
This is the API response:
{ "list_id" : "7165b5e8-68d7-11e9-94f1-d4ae527b8c41", "name" : "Citro\u00C3\u00ABn", "favorite" : false, "created_at" : "2019-04-27T06:30:14-04:00", "updated_at" : "2019-04-27T06:30:14-04:00" }
"รซ" should be represented as "\u00eb", so the name property should have "Citro\u00ebn", not "Citro\u00C3\u00ABn".
https://codepoints.net/U+00eb รซ
As you see, there is a problem in the JSON output from the API that \uXXXX notation doesn't have a valid Unicode code point but has a UTF-8 code, and this problem affects every language.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Thank you for your continued concern.
I'm asking our development team to review this further.
Regards,
David B.
Tier II API Support Engineer
