Archived Forum Post

Index of archived forum posts

Question:

What are the actual bytes that are encrypted when a string is encrypted?

Jul 20 '16 at 11:18

Can you please clarify?

  1. Using CkCrypt2W
  2. AES block size for IV is 16 bytes
  3. Hex encoded block (from sample): 000102030405060708090A0B0C0D0E0F

Is it okay to pass as HEX encoded unicode? Does it strip out the zeros?

For example, if my "block" was 3 bytes of all AAA hex encoded is 414141.

Do you pass L"414141" which is actually 16-bytes "410041004100". If the IV was block size of 2 would it just use 4100->A or would it internally process 414141->AAA?

Hope this makes sense. Basically I'm asking if the AES block is 16 bytes:

  1. Do I pass a Unicode encoded hex string (like 410041) or ANSI encoded as unicodeString (L"4141")?

  2. Internally maybe it is converting the L"414141" to utf8 414141?

Just wanted to verify.


Answer

Encryption algorithms operate on bytes, and therefore the byte representation of a string becomes very important. Chilkat solves this problem by understanding that in different programming languages, a "string" can be many different things. It can be an object. It can be implicitly utf-16. It can be implicitly ANSI or utf-8, etc.

The Chilkat solution is to provide a Crypt2.Charset property that indicates the desired byte representation to be used for encryption. This way, you don't have to worry about the string passed in -- whether it's from C# or if it's a "wchar_t *" from C++, etc. However, in cases where there is implicit ambiguity, such as a "const char *" in C/C++, then one must take care to tell the Chilkat API *what* is getting passed (ANSI or utf-8).

So... let's say you have the string "É". This is a single accented character found in Western European languages.

In the iso-8859-1 character encoding, it is represented by a single byte: 0xC9
In the utf-8 character encoding, it is represented by a two bytes: 0xC3 0x89
In the ucs-2 (or utf-16) character encoding, it is represented by a two bytes: 0x00 0xC9

To control what byte representation is actually passed to the internal encryptor, just set the Charset property equal to "ansi", "utf-8", "utf-16", or whatever. The full set of accepted charsets is listed here: http://cknotes.com/chilkat-charsets-character-encodings-supported/

Regarding block size -- that doesn't really have anything to do with the byte representation of characters. The block size is defined by the algorithm (for example AES encryption in ECB/CBC modes always pads to a multiple of the AES algorithm's block size, which is always 16 bytes (regardless of key size).