When using various wchar_t chilkat functions I’ve noticed that the library doesn't seem to handle unicode characters with code points above 65536. In one routine were I needed to take a zip entry that contained a utf-8 string and inflate it into a CStringW the unicode characters with code points above 65536 were stripped out / not converted. Example:

strLogFileBuff = CurrentZipEntry->unzipToString(0,L"utf-8"); // This will not covert Unicode characters above 65535.

We got around it by inflating the zip entry into a byte array, casting it to a CStringA and then using the multibytetowidechar function. My understanding is that characters with code points above 65536 require 2 code units (4 bytes) on Windows systems as wchar_t is only 2 bytes. Can anyone else confirm this behavior and if so, is the case throughout library?

Thanks

Archived Forum Post

Index of archived forum posts

unicode characters with code points greater than 65535

Jun 02 '15 at 14:25