Hi, I've been using Chilkat SFTP to download files using Python 2.7 for a while now and all has been great. Now, though, it's time to make the switch to Python 3.3. I'm having a problem. I'm using the latest Chilkat SFTP (chilkat-9.4.1-python-3.3-win32.zip). I'm not entirely sure what the version of Chilkat in my currently-running Python 2.7 version is, but it's from approximately Aug 2011.
I'm downloading the file a chunk at a time, to allow a progress display. I'm getting a UnicodeDecodeError from calling getBytes() (or getData()) method of a CkByteData object. Calling put_Utf8() with 1 or 0 first makes no difference.
def download_file(self, srcfilename, dstfilename): oh = open(dstfilename, 'wb', buffering=0) self.curbytes = 0 chunk = chilkat.CkByteData() #chunk.put_Utf8(1) handle = self.sftp.openFile(srcfilename, "readOnly", "openExisting") chunksize = 16384 eof = False while not eof: success = self.sftp.ReadFileBytes(handle, chunksize, chunk) if success: cnt = chunk.getSize() #print("chunk.get_Utf8()", chunk.get_Utf8()) ## error happens on next line #data = chunk.getData() data = chunk.getBytes() oh.write(data) self.curbytes += cnt else: self.logger.error(self.sftp.lastErrorText()) break eof = self.sftp.Eof(handle) self.sftp.CloseHandle(handle) oh.close()
You can see my commented-out debugging attempts. This is the error I get:
Traceback (most recent call last): File "D:\temp\inteng\intengthread.py", line 235, in run data = next(work) File "D:\temp\inteng\src_sftp_p.py", line 279, in get self.download_file(srcfilename, dstfilename) File "D:\temp\inteng\src_sftp_p.py", line 386, in download_file data = chunk.getBytes() File "D:\python33\lib\site-packages\chilkat.py", line 322, in getBytes def getBytes(self): return _chilkat.CkByteData_getBytes(self) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 12: invalid continuation byte
And the lastErrorText right before the call that generates the error:
ChilkatLog: ReadFileBytes: DllDate: Aug 15 2013 ChilkatVersion: 220.127.116.11 UnlockPrefix: HILLASSSH Username: DEV-LARRYLAPTOP:lkeber Architecture: Little Endian; 32-bit Language: Windows Python VerboseLogging: 0 SshVersion: SSH-2.0-CerberusFTPServer_6.0 SftpVersion: 3 handle: 2E2F646F776E6C6F61645F746573742F486569646953514C5F382E325F506F727461626C652E7A6970 numBytes: 16384 nextReadIdx: 0 downloadLoop: socketOptions: SO_SNDBUF: 8192 SO_RCVBUF: 8192 TCP_NODELAY: 1 --socketOptions startingOffset: 0 numBytesToDownload: 16384 bReadUntilEnd: 0 timeToDownloadFileDataMs: Elapsed time: 1029 millisec --downloadLoop NumBytesSentToOutput: 16384 newNextReadIdx: 16384 numBytesReceived: 16384 numBytesReturned: 16384 Success. --ReadFileBytes --ChilkatLog
The ReadFileBytes method is not trying to interpret the received bytes according to any character encoding. The put_Utf8() would never matter w.r.t. any method that is downloading bytes (CkByteData). The put_Utf8() only applies to methods that return strings, such as ReadFileText. (If ReadFileBytes was to interpret the received bytes according to some character encoding, then it would be impossible to download any file other than text, such as JPG's, zip's, GIF's, or any other binary file that does not actually contain text.)
The ReadFileBytes is returning the exact bytes as received from the server.
Posting this as an Answer, rather than Comment, due to size limitations:
Thanks for your response! It makes sense that ReadFileBytes would make no attempt to do anything with the bytes but it's still throwing the UnicodeDecodeError when I call chunk.GetBytes(). I've even tried running the example code here: http://www.example-code.com/python/sftp_streamingDownload.asp
with the same results (after changing the print statements to print() functions, adding using my license key, and server information). The example does the same thing- line 65 throws a UnicodeDecodeError.
I suspect something in the C-to-Python interface is doing the data type conversion wrong for Python 3, but I'm new to Python 3 myself, and have never tried doing any C interfaces myself, so I can't speculate on the problem any more usefully than that... Possibly it's returning a unicode() object than a bytearray?