Archived Forum Post

Index of archived forum posts


Python 3.3 SFTP download unicode problem

Jan 02 '14 at 11:46

Hi, I've been using Chilkat SFTP to download files using Python 2.7 for a while now and all has been great. Now, though, it's time to make the switch to Python 3.3. I'm having a problem. I'm using the latest Chilkat SFTP ( I'm not entirely sure what the version of Chilkat in my currently-running Python 2.7 version is, but it's from approximately Aug 2011.

I'm downloading the file a chunk at a time, to allow a progress display. I'm getting a UnicodeDecodeError from calling getBytes() (or getData()) method of a CkByteData object. Calling put_Utf8() with 1 or 0 first makes no difference.


def download_file(self, srcfilename, dstfilename):
    oh = open(dstfilename, 'wb', buffering=0)
    self.curbytes = 0

    chunk = chilkat.CkByteData()

    handle = self.sftp.openFile(srcfilename, "readOnly", "openExisting")
    chunksize = 16384
    eof = False

    while not eof:
        success = self.sftp.ReadFileBytes(handle, chunksize, chunk)

        if success:
            cnt = chunk.getSize()
            #print("chunk.get_Utf8()", chunk.get_Utf8())
            ## error happens on next line
            #data = chunk.getData()
            data = chunk.getBytes()

            self.curbytes += cnt

        eof = self.sftp.Eof(handle)


You can see my commented-out debugging attempts. This is the error I get:

Traceback (most recent call last):
  File "D:\temp\inteng\", line 235, in run
    data = next(work)
  File "D:\temp\inteng\", line 279, in get
    self.download_file(srcfilename, dstfilename)
  File "D:\temp\inteng\", line 386, in download_file
    data = chunk.getBytes()
  File "D:\python33\lib\site-packages\", line 322, in getBytes
    def getBytes(self): return _chilkat.CkByteData_getBytes(self)
 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 12: invalid continuation byte


And the lastErrorText right before the call that generates the error:

    DllDate: Aug 15 2013
    UnlockPrefix: HILLASSSH
    Username: DEV-LARRYLAPTOP:lkeber
    Architecture: Little Endian; 32-bit
    Language: Windows Python
    VerboseLogging: 0
    SshVersion: SSH-2.0-CerberusFTPServer_6.0
    SftpVersion: 3
    handle: 2E2F646F776E6C6F61645F746573742F486569646953514C5F382E325F506F727461626C652E7A6970
    numBytes: 16384
    nextReadIdx: 0
        SO_SNDBUF: 8192
        SO_RCVBUF: 8192
        TCP_NODELAY: 1
      startingOffset: 0
      numBytesToDownload: 16384
      bReadUntilEnd: 0
      timeToDownloadFileDataMs: Elapsed time: 1029 millisec
    NumBytesSentToOutput: 16384
    newNextReadIdx: 16384
    numBytesReceived: 16384
    numBytesReturned: 16384


The ReadFileBytes method is not trying to interpret the received bytes according to any character encoding. The put_Utf8() would never matter w.r.t. any method that is downloading bytes (CkByteData). The put_Utf8() only applies to methods that return strings, such as ReadFileText. (If ReadFileBytes was to interpret the received bytes according to some character encoding, then it would be impossible to download any file other than text, such as JPG's, zip's, GIF's, or any other binary file that does not actually contain text.)

The ReadFileBytes is returning the exact bytes as received from the server.


Posting this as an Answer, rather than Comment, due to size limitations:

Thanks for your response! It makes sense that ReadFileBytes would make no attempt to do anything with the bytes but it's still throwing the UnicodeDecodeError when I call chunk.GetBytes(). I've even tried running the example code here:

with the same results (after changing the print statements to print() functions, adding using my license key, and server information). The example does the same thing- line 65 throws a UnicodeDecodeError.

I suspect something in the C-to-Python interface is doing the data type conversion wrong for Python 3, but I'm new to Python 3 myself, and have never tried doing any C interfaces myself, so I can't speculate on the problem any more usefully than that... Possibly it's returning a unicode() object than a bytearray?