Archived Forum Post

Index of archived forum posts

Question:

unicode characters with code points greater than 65535

Jun 02 '15 at 14:25

When using various wchar_t chilkat functions I’ve noticed that the library doesn't seem to handle unicode characters with code points above 65536. In one routine were I needed to take a zip entry that contained a utf-8 string and inflate it into a CStringW the unicode characters with code points above 65536 were stripped out / not converted. Example:

strLogFileBuff = CurrentZipEntry->unzipToString(0,L"utf-8"); // This will not covert Unicode characters above 65535.

We got around it by inflating the zip entry into a byte array, casting it to a CStringA and then using the multibytetowidechar function. My understanding is that characters with code points above 65536 require 2 code units (4 bytes) on Windows systems as wchar_t is only 2 bytes. Can anyone else confirm this behavior and if so, is the case throughout library?

Thanks