Results 1 to 5 of 5
  1. #1
    New Lounger
    Join Date
    Nov 2012
    Posts
    20
    Thanks
    9
    Thanked 0 Times in 0 Posts

    PDF rendered text and OCR issue

    I know this may not be the correct group to post this is, but I've had plenty of solutions to my MS Office issues solved here that I thought I'd try.

    I have Adobe Acrobat 9 Pro. I'm trying to convert scanned pdfs to OCR. I'm running into some that give me this error msg:

    This page contains renderable text.

    And it won't continue with the OCR process.

    I've tried printing to the .xps printer, converting to pdf and running OCR, but no luck.
    I've also tried extracting the problem pages, saving as .eps (Encapsulated Post script), then converting to pdf and running OCR - it's not consistent with the text, so not really working.

    Any help is appreciated.

    Thanks,

    Chris Bowman

  2. Subscribe to our Windows Secrets Newsletter - It's Free!

    Get our unique weekly Newsletter with tips and techniques, how to's and critical updates on Windows 7, Windows 8, Windows XP, Firefox, Internet Explorer, Google, etc. Join our 480,000 subscribers!

    Excel 2013: The Missing Manual

    + Get this BONUS — free!

    Get the most of Excel! Learn about new features, basics of creating a new spreadsheet and using the infamous Ribbon in the first chapter of Excel 2013: The Missing Manual - Subscribe and download Chapter 1 for free!

  3. #2
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    3,861
    Thanks
    0
    Thanked 179 Times in 165 Posts
    That sounds like someone has already done the OCR work. If the PDF is searcheable (e.g. you can copy/highlight blocks of text), that's indeed the case. If so, all you need do now is copy & paste the text to Word - or save the file as a Word document then delete the page images.
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  4. #3
    New Lounger
    Join Date
    Nov 2012
    Posts
    20
    Thanks
    9
    Thanked 0 Times in 0 Posts
    Hi Paul,

    The documents are mostly searchable and I am able to copy/paste some text out of the pdf. The issue is that for some reason, not everything in the document appears to have been OCR'd completely. I come across some text that I can't highlight or copy/paste. That's when I print to .xps and follow the above steps. And when that doesn't work, I try the .eps (Encapsulated Post script) route and that doesn't work like I'd want it. I apologize for not making this clear.

    Thanks again.

  5. #4
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    3,861
    Thanks
    0
    Thanked 179 Times in 165 Posts
    If the unprocessed content is:
    • whole pages, you need to identify the pages that haven't been OCR'd and export them to a new file so you can do the OCR work on them; or
    • partial pages, that suggests there are sufficient problems with the page content (e.g. due to a poor scan and/or background shading) that some of it couldn't be converted. In that case, exporting for an OCR retry is unlikely to resolve the issue - those pages may simply need to be retyped.
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  6. #5
    New Lounger
    Join Date
    Feb 2014
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    There is an interesting thread here on the subject with some solutions: http://forums.adobe.com/thread/1067686

    Another solution might to use some more specialized OCR software. Here's a very strong OCR software: http://www.card-reader.com/cloud_ocr.htm


    Hope this helps,

    Michel

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •