Question:
I'm trying to use the zip component to wrangle a large quantity of small data files into a single archive. I can't seem to find an example to update a file in a zip, so I came up with the following
Dim binDat,success,entry,zip
set zip = CreateObject("Chilkat.Zip2")
set fac = CreateObject("Chilkat.FileAccess")
success = zip.UnlockComponent("****")
' obtain new information to use to update an existing entry
binDat = fac.ReadEntireFile("Updated_Info.txt")
' open large zip containing 20,000+ data files
success = zip.OpenZip("Large_Zip.zip")
' this is where we need to supply our update
Set entry = zip.GetEntryByName("Info.txt")
entry.ReplaceData(binDat)
zip.WriteZipAndClose()
MsgBox "done"
This appears to rewrite the entire zip to a temporary file (about 3 seconds) and replace the original.
I was wondering if this is the ideal way of performing this task and
if done in this manner: can I consider the updated zip to be safely written to the disk by the time my script alerts "Done"? Because I was thinking I could read the file back to verify, unless that would be an enormous waste of resources...
The reason it must re-write the entire zip is due to the structure of the zip file format (see below). Unfortunately, the WriteZipAndClose method writes the entire .zip, which is the easiest and most compact solution for producing the desired zip (but not the fastest solution). Even if a method existed to update a single entry, it would only be possible if the new data was equal or smaller (after compression) than the old data. If it was bigger, then the quickest way to update would be to "unlink" the existing entry, leaving a hole of unused space within the .zip, and then add a new entry to the end (containing the new data).
The above image is from http://en.wikipedia.org/wiki/File:ZIP_File_Format.png
Thank you for the quick reply, and sorry to get back to you so late.
I expected something like that was the cause of the re-write. In fact that's largely why I want to use the zip formatting to archive the data files, because I want to be able to read them quickly if need be, but they take up way too much space- because when compressed they are mostly under 4kb!
When I observed the temp file and finally determined that that is what it was doing, I decided that I could simply cache updates to the archived files, and batch the updates to reduce the re-writes. There is nothing "unfortunate" about that to me. It all seems like a pretty reasonable way to accomplish what I need.
I think you have answered my first question- that is quite likely the best way to do it. But I was wondering if you could address the second question: when the COM object releases control back to my application- can I assume that the new version of the zip is completely finished writing? or is it still processing that, and it will be done when it's finished? I know it's a synchronous call, so my expectation is that it is finished, and I can begin the next batch as soon as it is ready. But if it is still processing- I could read the information back to verify its existence (and simultaneously its integrity). But I wouldn't want to do that if it is unnecessary.
(sorry to post this in the answer section and not the comments, but I have grossly surpassed my char limit...)