Results 1 to 3 of 3
  1. #1
    New Lounger
    Join Date
    Nov 2013
    Posts
    12
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Post Query for Pro programmers: How to retrieve text string from a PDF file and print?

    Deleted.
    Last edited by kingpain; 2013-12-29 at 10:17. Reason: This post has no meaning and no useful code

  2. Get our unique weekly Newsletter with tips and techniques, how to's and critical updates on Windows 7, Windows 8, Windows XP, Firefox, Internet Explorer, Google, etc. Join our 480,000 subscribers!

    Excel 2013: The Missing Manual

    + Get this BONUS — free!

    Get the most of Excel! Learn about new features, basics of creating a new spreadsheet and using the infamous Ribbon in the first chapter of Excel 2013: The Missing Manual - Subscribe and download Chapter 1 for free!

  3. #2
    5 Star Lounger
    Join Date
    Jan 2010
    Location
    Los Angeles, CA
    Posts
    793
    Thanks
    3
    Thanked 27 Times in 25 Posts
    You are not going to be able to read a PDF file using a stream reader. A PDF file has a very specific layout, and the actual text is usually encoded in some fashion. Usually, to read a PDF file, you need to find a PDF library that will decode the PDF file for you and provide access to the various different parts of the PDF. Googling will reveal some PDF libraries accessible to all .NET languages; I found this stackoverflow post that listed several: http://stackoverflow.com/questions/6...ibrary-for-net

  4. #3
    New Lounger
    Join Date
    Mar 2010
    Location
    Norfolk, VA
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    To potentially make matters worse, the PDF file may actually contain a graphic that looks like text and not text itself. A PDF is basically a container. In many cases it contains formatted text (and graphics) as cafed00d stated. With the proper tools, you can access that text through your program. However in other cases a PDF contains a graphic that looks like text. It's similar to the difference between having an editable Word document versus someone printing that Word document and scanning it into a jpg. They look the same, but they are two different representations of the same beast. The only way to turn the graphic into text is to OCR (optical character recognition) the graphic to extract the text - not a perfect process.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •