Archived Forum Post

Index of archived forum posts

Question:

Russian Text problem for PostUrlEncoded

Dec 03 '12 at 00:54

When I send russian text in post param with PostUrlEncoded, I get some unreadable stuff on the server. I tried simple WebRequest for this purpose and it worked pretty fine for me. Examples given below. How can I get readable russian post param using chilkat?

===========WebRequest example===========

            // Create a request using a URL that can receive a post. 
            WebRequest request = WebRequest.Create("http://synoparser.ru/clientsite/cookdir/");
            // Set the Method property of the request to POST.
            request.Method = "POST";
            // Create POST data and convert it to a byte array.
            string postData = "pass=мы сюда пришли";
            byte[] byteArray = Encoding.UTF8.GetBytes(postData);
            // Set the ContentType property of the WebRequest.
            request.ContentType = "application/x-www-form-urlencoded";
            // Set the ContentLength property of the WebRequest.
            request.ContentLength = byteArray.Length;
            // Get the request stream.
            Stream dataStream = request.GetRequestStream();
            // Write the data to the request stream.
            dataStream.Write(byteArray, 0, byteArray.Length);
            // Close the Stream object.
            dataStream.Close();
            // Get the response.
            WebResponse response = request.GetResponse();
            // Display the status.
            Console.WriteLine(((HttpWebResponse)response).StatusDescription);
            // Get the stream containing content returned by the server.
            dataStream = response.GetResponseStream();
            // Open the stream using a StreamReader for easy access.
            StreamReader reader = new StreamReader(dataStream);
            // Read the content.
            string responseFromServer = reader.ReadToEnd();

            rtbSourceCode.Text = responseFromServer;

            webBrowser1.DocumentText = rtbSourceCode.Text;

=============WebRequest example result POST:Array ( [pass] => мы сюда пришли ) ============

=======Chilkat example========

        Chilkat.Http http = new Chilkat.Http();
        http.UnlockComponent("My code");
        Chilkat.HttpRequest request = new Chilkat.HttpRequest();
        Chilkat.HttpResponse response = null;

        request.AddParam("pass", "мы сюда пришли");

        response = http.PostUrlEncoded("http://synoparser.ru/clientsite/cookdir/", request);

        rtbSourceCode.Text = response.BodyStr;

        webBrowser1.DocumentText = rtbSourceCode.Text;

=============Chilkat example result============= POST:Array ( [pass] => мы Ñюда пришли )

============


Answer

I can see from the WebRequest code snippet that the body of the HTTP request contains the URL-encoded utf-8 bytes of the param. The code snippet achieves this by explicitly getting the utf-8 bytes like this:

byte[] byteArray = Encoding.UTF8.GetBytes(postData);

With Chilkat, you also need to explicitly indicate utf-8, but you only need to tell Chilkat to do it by setting the request.Charset property = "utf-8". If you also want the charset to be explicitly identified within the HTTP request, you can set the request.SendCharset property = true. (I don't think this is required for your case though.)

In summary, add one line to set the Charset property, like this:

request.Charset = "utf-8";
request.AddParam("pass", "мы сюда пришли");


Answer

ok I rewrite the code like this:

============ Chilkat.Http http = new Chilkat.Http(); http.UnlockComponent("fraafvHttp_DNNDTxQclLIC"); Chilkat.HttpRequest request = new Chilkat.HttpRequest(); Chilkat.HttpResponse response = null;

        request.Charset = "utf-8";
        request.AddParam("pass", "мы сюда пришли");

        response = http.PostUrlEncoded("http://synoparser.ru/clientsite/cookdir/", request);

        rtbSourceCode.Text = response.BodyStr;

        webBrowser1.DocumentText = rtbSourceCode.Text;

============= Getting the following:

POST:Array ( [pass] => мы Ñюда пришли )


Answer

Set the http.SessionLogFilename property equal to the path of a log file that will be created by the Chilkat.Http object. Re-run your test and then examine the contents of the log file to see the exact HTTP request sent to the server.

You should see something like this:

---- Sending ----
POST /clientsite/cookdir HTTP/1.1
Content-Type: application/x-www-form-urlencoded;
Host: synoparser.ru
Content-Length: 83

pass=%D0%BC%D1%8B%20%D1%81%D1%8E%D0%B4%D0%B0%20%D0%BF%D1%80%D0%B8%D1%88%D0%BB%D0%B8

Also, if using an older version of the Chilkat lib, make sure to test using the latest version.


Answer

By the way in logfile I see correct responce:

========
POST:Array ( [pass] => мы сюда пришли )
=========== as far as I understand I need to get it to corresponding charset.


Answer

I found I need to convert response.BodyStr (which is in cp1251) to UTF. How can I do that? I would not ask , if it would be the stream, but it is arleady in string.


Answer

It shouldn't actually be necessary -- Chilkat should return the string already converted -- assuming it can detect that the HTTP response uses cp1251.

Please do this: Send me the session log file (produced by setting the SessionLogFilename property). Send it in a zipped attachment to support@chilkatsoft.com Also, turn on verbose logging by setting the http.VerboseLogging property, re-run, and capture the contents of the http.LastErrorText property after the call to http.PostUrlEncoded. Also send me the LastErrorText.


Answer

The problem is that the server indicates charset cp1251 in the response header:

HTTP/1.1 200 OK
Server: nginx/1.2.0
Date: Fri, 30 Nov 2012 19:14:55 GMT
Content-Type: text/html; charset=cp1251
Content-Length: 386
Connection: keep-alive
X-Powered-By: PHP/5.2.17
Set-Cookie: testcookiename=testcookienamevalue; expires=Fri, 30-Nov-2012 20:14:55 GMT; path=/clientsite/cookdir/

However, the body ACTUALLY contains utf-8. Chilkat is working as it should: The Content-Type header indicates that the type is text/html and that chars are represented using the cp1251 encoding. Therefore, Chilkat interprets the bytes of the body according to cp1251. This causes the corruption because the body actually contains utf-8. The bug is on the server. It should not indicate cp1251 when in fact the body contains utf-8.


Answer

can I enforce chilkat to consider the body is utf-8? (to disable the autorecognition encoding functional)


Answer

Instead of using response.BodyStr (which returns a string), use response.Body, which returns a byte array. You'll get the body bytes uninterpreted according to charset. You can then load the utf-8 bytes into a C# string.


Answer

thanks it works. Here is my code (maybe it will be useful for somebody):

sourceCode sourceCode = "";

if (supposedCharset == "") { sourceCode = response.BodyStr;

            }
            else
            {
                byte[] arbyte = response.Body;
                sourceCode = System.Text.Encoding.GetEncoding(supposedCharset).GetString(arbyte);

            }