Archived Forum Post

Index of archived forum posts

Question:

Unzip a zipped folder, in which files have unicode filenames

Jan 20 '15 at 09:01

Suppose I need to extract files from a zip Att.zip. It has a folder MyAtt inside, which contains two files 你觉得简.txt and 兩猿聲啼不住.txt. I am using

CkZipW Zip;
Zip.UnlockComponent(...);
Zip.put_OemCodePage(65001);
Zip.OpenZip(Att.zip);
Zip.Extract(aPath);
...

but extracted file can not display their names correctly.(Att.zip is generated through 7zip correctly)

Any ideas what should I do? Thanks for your help!


Answer

When you say "cannot display their names correctly", how are you viewing their names? What is displaying the names and how?

Also, you can see what happened by turning on verbose logging (zip.put_VerboseLogging(true)) and then examine the contents of the zip's LastErrorText property


Answer

This is still not useful information. Your code snippet above does not show anything relating to getting the zip entry's filename. For example, to list the files within the .zip, you would do something like this:

    success = zip.OpenZip(L"a.zip");
    if (success != true) {
        wprintf(L"%sn",zip.lastErrorText());
        return;
    }

int n;

//  Get the number of files and directories in the .zip
n = zip.get_NumEntries();
wprintf(L"%d\n",n);

CkZipEntryW *entry = 0;

int i;
for (i = 0; i <= n - 1; i++) {

    entry = zip.GetEntryByIndex(i);
    if (entry->get_IsDirectory() == false) {
        //  (the filename may include a path)
        wprintf(L"%s\n",entry->fileName());
    }

    delete entry;
}

You should be able to verify the unicode characters for the entry filenames returned by their unicode values (perhaps written in hex form). If Chilkat is returning the unicode character values correctly, then the task of displaying the characters is entirely outside the scope of Chilkat. In other words, verify that the correct string (Unicode byte values) are being returned first. If the correct Unicode byte values are returned, but you cannot display them correctly, then your problem is simply with displaying Unicode characters and has nothing to do with Chilkat Zip...


Answer

@chilkat ♦♦

I tried your method and output lasterrortext into a txt file. I could see the file names in simplified Chinese and Japanese are correctly encoded with utf8 but the one in traditional chinese is not correctly encoded with utf8(the third one below):

 filenameQP: tttest/Att_mytest3/=E3=82=BF=E3=83=BC=E3=81=AB=E3=82=88=E3=82=8B=E3=83=81=
 =E3=83=BC=E3=83=A0=E3=83=96=E3=83=AD=E3=82=B0.txt

        parseExtraCentralDirFields:

          ExtraHeaderId: 0xa

          ExtraHeaderLen: 32

        --parseExtraCentralDirFields

      --loadCentralDirInfo

      loadLocalFileHeader:

        localFileHeader:

          versionNeeded: 20

          compressionMethod: 8

          bitFlags: 0x800

          crc: 0xca871443

          compressedSize: 1861

          uncompressedSize: 15086

        --localFileHeader

      --loadLocalFileHeader

    --preLoadEntryInfo

    preLoadEntryInfo:

      index: 4

      loadCentralDirInfo:

        centralDirHeader:

          versionMadeBy: 63

          versionNeeded: 20

          compressionMethod: 8

          bitFlags: 0x800

          crc: 0x5bcbc6e7

          compressedSize: 7210

          uncompressedSize: 30406

        --centralDirHeader

        filenameQP: tttest/Att_mytest3/=E4=BD=A0=E8=A7=89=E5=BE=97=E7=AE=80=E4=BD=93=E5=AD=97=

       =E8=83=BD=E4=B8=8D=E8=83=BD=E8=A1=8C=E5=91=A2.txt

        parseExtraCentralDirFields:

          ExtraHeaderId: 0xa

          ExtraHeaderLen: 32

        --parseExtraCentralDirFields

      --loadCentralDirInfo

      loadLocalFileHeader:

        localFileHeader:

          versionNeeded: 20

          compressionMethod: 8

          bitFlags: 0x800

          crc: 0x5bcbc6e7

          compressedSize: 7210

          uncompressedSize: 30406

        --localFileHeader

      --loadLocalFileHeader

    --preLoadEntryInfo

    preLoadEntryInfo:

      index: 5

      loadCentralDirInfo:

        centralDirHeader:

          versionMadeBy: 63

          versionNeeded: 20

          compressionMethod: 8

          bitFlags: 0x0

          crc: 0x3b9a5245

          compressedSize: 1892

          uncompressedSize: 15134

        --centralDirHeader

        filenameQP: tttest/Att_mytest3/=B4=FA=B8=D5=B3=FC=A4U=C1c=C5=E9=A4=E5=A5=F3=A6W.txt

        parseExtraCentralDirFields:

          ExtraHeaderId: 0xa

          ExtraHeaderLen: 32

        --parseExtraCentralDirFields

      --loadCentralDirInfo

      loadLocalFileHeader:

        localFileHeader:

          versionNeeded: 20

          compressionMethod: 8

          bitFlags: 0x0

          crc: 0x3b9a5245

          compressedSize: 1892

          uncompressedSize: 15134

        --localFileHeader

      --loadLocalFileHeader

    --preLoadEntryInfo

    preLoadEntryInfo:

      index: 6

      loadCentralDirInfo:

        centralDirHeader:

          versionMadeBy: 63

          versionNeeded: 20

          compressionMethod: 8

          bitFlags: 0x0

          crc: 0x22f42080

          compressedSize: 208

          uncompressedSize: 555

        --centralDirHeader

Answer

The problem is not in Chilkat Zip, it is in your app's usage of fopen and fwprintf.

You must open the file in binary mode. For example:

    // Make sure you open with binary mode here ("wb" or "ab")
    FILE* ff = fopen("C:\AAWorkarea\ting\ZipLog.txt", "wb");

...

int n =  zip.get_NumEntries();

CkString strCrLf;
strCrLf.append("\r\n");

CkZipEntryW *entry;
int i;
for(i = 0; i< n; i++)
{
entry = zip.GetEntryByIndex(i);

entry->put_VerboseLogging(true);

// Both fwrite and fwprintf (below) accomplish the same now..

fwrite(entry->fileName(),wcslen(entry->fileName()),2,ff);
fwrite(strCrLf.getUtf16(),strCrLf.getSizeUnicode(),1,ff);

fwprintf(ff, L"\n entry->fileName: %ls\n", entry->fileName());

// Don't forget to delete...
delete entry;
}