Archived Forum Post

Index of archived forum posts

Question:

Look at the BOM header of a string and determine the encoding method?

Jul 25 '17 at 16:34

Is there a ChilKat API anywhere that can look at the BOM header of a string and determine the encoding method. I can easily write this but didn't want to reinvent the wheel


Accepted Answer

This question only makes sense if you have bytes. For example, if you have a string in C#, the question make no sense. It would have to be a byte[] in C#.

In C++, if you have a "char ", then you have a pointer to bytes. However, if you have a "wchar_t ", then you have a pointer to byte representing chars in what should be utf-16 (or perhaps utf-32) encoding, depending on the meaning of "wchar_t" by the compiler. In that case, you had better not be pointing to utf-8 bytes.

You're probably only interested in the BOM's for utf-16 (LE and BE), utf-8, and utf-32(LE and BE). Assuming you have a "char " (or "const char "), then it's just a matter of looking at the 1st few bytes.

The utf-8 BOM is 3 bytes: EF BB BF

The other BOM's are listed here: https://en.wikipedia.org/wiki/Byte_order_mark