Archived Forum Post

Index of archived forum posts

Question:

How do we read text content of a .pdf links using (in c#)

Apr 28 '14 at 21:11

I want to be able to read the text content of .pdf links..

I use this to get the html of the links

http://www.example-code.com/csharp/http_get_using_ssl_tls.asp

Chilkat.Http http = new Chilkat.Http(); // Send the HTTP GET and return the content in a string. string html; html = http.QuickGetStr("https://site.com/file.pdf");

Chilkat.Mime mime = new Chilkat.Mime();

mime.SetBodyFromPlainText(html);

string strPdfBody = mime.GetBodyDecoded();

With this I'm getting decoded text, but not the text that is on the .pdf.. Can tell me what I'm doing wrong?

Thanks


Answer

AFAIK no Chilkat libraries can be used to parse out plain text from a PDF file. You will need to use a library specifically designed to get text out of a PDF file for this job. I use a library called QuickPDF personally, but there are others out there.