Archived Forum Post

Index of archived forum posts

Question:

gzip CompressStringENC problems

Aug 13 '13 at 11:36

Hi

I'm having some problems with CompressStringEnc and sending data to a third party

If I use CompressStringEnc and feed the result to UnCompressStringENC I seem to end up with the original but 'the other end' is having problems and having used this online tool (interface is a bit odd the checkboxes seem to be the opposite of what you would expect) I can see there is something strange

Initally the data I was dealing with had no characters > 128 and all was well. I then had to cope with a UK pound sterling sign and found a solution although I was not totally comfortable with it. Now however I have data with european characters and things are not working at all

Here are some test senarios on which hopefully you can spread some light

Pound sign first

If I start with this string .... ABC£D Which in UTF8 speak is 41 42 43 C2 A3 44

If I run it through CompressStringEnc UTF8 base64 I get this back H4sIAAAAAAAAC3N0cn40Zc3hXS4A2N9+tQkAAAA=

If I run this through the base64 decompress on the proviously mentioned website (i-Tools) I get this 1F 8B 08 00 00 00 00 00 00 0B 73 74 72 7E 34 65 CD E1 5D 2E 00 D8 DF 7E B5 09 00 00 00 and I run that through their gZip decompress and I get 41 42 43 E2 94 AC C3 BA 44

which is obviously not where we started

If I repeat the same test but this time use ComressStringEnc ibm437 base64 Then iTools B64 decopmpress gives me this 1F 8B 08 00 00 00 00 00 00 0B 73 74 72 3E B4 D8 05 00 1D AC E4 B8 06 00 00 00 and then use iTools gzip decompress and I get this 41 42 43 C2 A3 44 Which IS where we started

So although I'm giving it UTF8 data it doesn't appear to do the "right thing" ??

Now moving onto european characters

Lets take something like o diaeresis So ABCöD 41 42 43 C3 B6 44

If I pass this to ComressStringEnc ibm437 base64 I get an empty string back

If I pass this to ComressStringEnc UTF8 base64 I get H4sIAAAAAAAAC3N0cn40Zc7hJhcAgpvKUgkAAAA= If this is then passed to iTools b64 decompress I get 1F 8B 08 00 00 00 00 00 00 0B 73 74 72 7E 34 65 CE E1 26 17 00 82 9B CA 52 09 00 00 00 and giving that to its gzip decompress I get 41 42 43 E2 94 9C C3 82 44 which again is not where we started

Hopefully the above makes sense and you can advise me what I am doing wrong because at the moment any data conatining european characters in data we are sending to a government webservice is getting rejected

Thaks in advance


Answer

In any Chilkat API involving compression, encryption, etc. where it is important to know the exact byte representation of the string being compressed, encrypted, signed, etc., you'll find that somehow the charset (i.e. character encoding) can be specified -- either by a function argument, or a property (typically named "Charset"). The Chilkat API's are offered in many different programming languages. The goal is to make it possible to pass a string, such as "ABC£D" from any programming language and get the same results. The only way to do this is to tell the Chilkat API the exact character encoding to be used for the byte representation of the string when it comes to point of compression, hashing, encrypting, signing, etc. Therefore, if you call CompressStringENC, passing it "ABC£D" and "ibm437", you'll get the same result in any programming language regardless of the character encoding used for strings in that particular language. For example, ActiveX uses Unicode (BSTR's), C# and VB.NET use String objects, C++ programs can use wchar_t's, or "char *" which points to a null-terminated byte array where your program must know the character encoding (and thus the need for the "utf8" property on all Chilkat C++ classes).


Answer

Thanks for the prompt reply.

I need to work with UTF-8 strings and these are not native to Visual dataflex which uses OEM

So I guest my next question would be is there a version of CompressStringENC that does the compression and base64 encoding but does NOT attempt to chage the character encoding of the source data ?

At the moment it sounds like I am getting an extra translation I don't need but have no way to prevent it ?

Would CompressMemory be an option ? except as you may remember VDF doesn't like a byte array as a return type and in the past you have made a string version for us

Thanks in advance for all you help


Answer

I don't think you understood my explanation. If you are working in Visual Dataflex, it means you are using the Chilkat ActiveX -- and when it comes to Active/COM, regardless of programming languages, strings are ALWAYS passed as utf-16 (BSTR's). Therefore, you MUST always tell it what charset you really want. If you want "utf-8", then pass "utf-8" for the argument indicating charset.

Likewise, when you decompress, you'll have to tell Chilkat the charset that needs to be used to correctly interpret the decompressed bytes -- are they utf-8, ibm437, or what? Chilkat needs to know how to interpret the decompressed bytes so that it gets the correct chars, and can then return the utf-16 (BSTR's) to your Visual Dataflex application..


Answer

Thanks for the reply

OK I really want UTF-8 so ...

If I start with this string .... ABC£D ... and when I say that I mean specifically these bytes 41 42 43 C2 A3 44

If I run it through CompressStringEnc UTF8 base64 I get this back H4sIAAAAAAAAC3N0cn40Zc3hXS4A2N9+tQkAAAA=

If I then give that to something else and ask it to reverse the B64 and the gzip I do NOT end up back with these byes 41 42 43 C2 A3 44 I get these instead 41 42 43 E2 94 AC C3 BA 44

So how do I achieve my goal ?

Thanks again


Answer

When you give it to "something else", which I'm assuming you mean some non-Chilkat application, it's probably interpreting the bytes as ANSI, or ibm437, or another one byte per char encoding and therefore "C2" and "A3" are both interpreted as individual chars, instead of the utf-8 representation of '£'. Maybe you don't want utf-8. Maybe you want whatever the "something else" is expecting...


Answer

Hi

OK just to be clear

The real something else is the government gateway that only talks in UTF8

the something else I'm referring to above is the I-tools website that I found on the web and was using to try and work out why things were not working

Ultimately I want to start with a UTF8 string gzip it and base64 encode it and have the government gateway do the reverse process and end up with the original string

Thanks again


Answer

I tested it, and everything looks perfect. If the string "ABC£D" is passed to gzip.CompressStringENC, with the charset arg equal to "utf-8" and the encoding arg equal to "base64", the result is:

H4sIAAAAAAAAC3N0cj602AUAHazkuAYAAAA=

If you then decode this and save the decoded bytes to a .gz file, and then use a tool such as 7-Zip to decompress, the resulting file contains the following bytes: 0x41 0x42 0x43 0xC2 0xA3 0x44, which is exactly as expected.


Answer

Hi

Thanks for keeping going with this

What you show is good news ! .... except it's not what i'm seeing

If I do this

Move (Character(65)+Character(66)+Character(67)+Character(194)+Character(163)+Character(68)) to sData
Get ComCompressStringENC of hoZip sData "utf-8" "base64" to sZippedB64
I get this H4sIAAAAAAAAC3N0cn40Zc3hXS4A2N9+tQkAAAA=

If I do this

Move (Character(65)+Character(66)+Character(67)+Character(194)+Character(163)+Character(68)) to sData
Get ComCompressStringENC of hoZip sData "ibm437" "base64" to sZippedB64
I get this H4sIAAAAAAAAC3N0cj602AUAHazkuAYAAAA=  -Which is what you are getting with utf-8.

Please note I have no interest with ibm437 but it's the only way I can produce the 'correct' result for this specific test string ... but of course it's not the solution because it does not 'work' for me all of the time and should not cloud this discussion

Here is the code dataflex created for making the CompressStringENC call

Function ComCompressStringENC String llstrIn String llcharset String llencoding Returns String
    Handle hDispatchDriver
    String retVal
    Get phDispatchDriver to hDispatchDriver
    Send PrepareParams to hDispatchDriver 3
    Send DefineParam to hDispatchDriver OLE_VT_BSTR llstrIn
    Send DefineParam to hDispatchDriver OLE_VT_BSTR llcharset
    Send DefineParam to hDispatchDriver OLE_VT_BSTR llencoding
    Get InvokeComMethod of hDispatchDriver 41 OLE_VT_BSTR to retVal
    Function_Return retVal
End_Function

Hopefully this gives you some clues because at the moment we are seeing different results from the same starting point

Thanks again for all your help


Answer

OK further to my long post above i've tried somthing else and it appears to produce gthe correct result

This

Move ("ABC"+Character(156)+"D") to sData
Get ComCompressStringENC of hoZip sData "utf-8" "base64" to sZippedB64
produces this H4sIAAAAAAAAC3N0cj602AUAHazkuAYAAAA=

and this

Move ("ABC"+Character(148)+"D") to sXML  
Get ComCompressStringENC of hoZip sXML "utf-8" "base64" to sZippedB64
produces H4sIAAAAAAAAC3N0cj68zQUAPiCTjgYAAAA=

So strangely it looks like I should not do the UTF8 conversion BEFORE passing it to you component


Answer

Another update .... I'm still not winning as far as the other end is concerned ... will keep trying