Results 1 to 9 of 9
  1. #1
    3 Star Lounger
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    314
    Thanks
    0
    Thanked 0 Times in 0 Posts

    BATCH OCR OF PDFS

    Being the proud owner of the very cheap $50 Canon LIDE 20 flatbed scanner, which comes with software to OCR
    and produce SEARCHABLE PDFS at the touch of a button, I was inspired to buy a MUCH MORE EXPENSIVE Fujitsu 4120
    scanner. This latter beast has DUPLEX scanning and an auto document feeder and is accompanied by Adobe Acrobat Standard 6.0 .
    Aha! I said to myself, now I can have my very own PAPERLESS office - thanks to the wonderful text search of the free Adobe reader.
    Then you can always find that elusive letter from a long lost Aunt by simply using Adobe's search engine and typing in 'Traufazz', or whatever
    appellation she blessed her Persian cat with.
    Not so simple.
    Unlike the CHEAP Canon scanner, the PDFS produced by Adobe's marvellous software are NOT searchable.
    You can make them searchable, but that requires another manual step that you must tediously apply to each and every document
    and... wait, seemingly forever. The results? Only so-so compared to the cheap, cheap Canon scanner.
    I ask you. What is the point of an automatic document feeder if you must then process every page or series of such by selecting and clicking to get the same result?
    The point of the auto document feeder, or ADF to the cognoscenti, is that you shove your boring old letters and bank statements - alas even more boring in my case,
    especially 'post-scanner' - into the machine and it churns out a whole lot of PDFs ( or JPGS etc) without further intervention. It's all so easy.
    Sort of...
    Clicking, clicking, clicking - till the cows come home. The same cows refuse to pitch in and help, I might add.
    So, here is my question ( which seems to have no answer despite hours of 'Googling')
    Is there a cost effective software that will take my PDFs and convert them to SEARCHABLE pdfs, en masse?
    Something that will convert a whole folder of unsearchable PDFs into searchable ones, while I am out and about having some sort of life?
    Thanks!

  2. #2
    Platinum Lounger
    Join Date
    Nov 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    5,016
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: BATCH OCR OF PDFS

    Don't know too much about this stuff, but you might find something here. Maybe Aldo's Text-PDF PRO+ will do it.

    Alan

  3. #3
    Platinum Lounger
    Join Date
    Jan 2001
    Location
    Quedgeley, Gloucester, England
    Posts
    5,333
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: BATCH OCR OF PDFS

    Have you tried Googling on "make pdf searchable"?

    Or people have suggested AutoIT to automate sequences of keystrokes...

    John
    <font face="Script MT Bold"><font color=blue><big><big>John</big></big></font color=blue></font face=script>

    Ita, esto, quidcumque...

  4. #4
    Platinum Lounger
    Join Date
    Nov 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    5,016
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: BATCH OCR OF PDFS

    Veering off topic here, but ... holy <img src=/w3timages/censored.gif alt=censored border=0>! Visited the Adobe site and see that Acrobat v6 is a 200MB download. No, that's not a typo, that's 0.2GB for a program that does just one thing. Unless I've missed the part of the fineprint that talks about turning base metals into gold, no thanks to that one. I'm sure earlier versions had nowhere this level of bloat.

    Alan

  5. #5
    Platinum Lounger
    Join Date
    Jan 2001
    Location
    Quedgeley, Gloucester, England
    Posts
    5,333
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: BATCH OCR OF PDFS

    Alan

    I supposed they argue that you get an <big>AWFUL</big> (and I mean it literally!) lot of code for your money...!

    John
    <font face="Script MT Bold"><font color=blue><big><big>John</big></big></font color=blue></font face=script>

    Ita, esto, quidcumque...

  6. #6
    Super Moderator
    Join Date
    Dec 2000
    Location
    Renton, Washington, USA
    Posts
    12,560
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Re: BATCH OCR OF PDFS

    When I go to http://www.adobe.com/products/acrobat/readstep2.html the size is much smaller. I think you are trying to down load the "Adobe" full pachage and NOT just Adobe Reader.
    Attached Images Attached Images

    Now running HP Pavilion a6528p, with Win7 64 Bit OS.

  7. #7
    Platinum Lounger
    Join Date
    Nov 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    5,016
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: BATCH OCR OF PDFS

    No, I don't mean the reader, but the full program as you suggest. I bought a copy of Acrobat some years back and it was nothing like the cosmic magnitude of that download on the website. For my basic needs, I found a very good shareware one that integrates with Word, and I think there are <img src=/S/free.gif border=0 alt=free width=30 height=15>ware ones that do a similar job. These are only a few MB... OK, about 15 - just checked. Still... 15 vs 200 - I'd want that base metal -> gold converter for the size & price of Acrobat.

    Alan

    Edited - Now this is rather spooky <img src=/S/spook.gif border=0 alt=spook width=15 height=15>. Here I am wanting base metals turned to gold and woof <img src=/S/shocked.gif border=0 alt=shocked width=15 height=15>! No sooner do I post it than I become bronze... well, it's goldISH. They do say to be careful what you wish for.

  8. #8
    5 Star Lounger
    Join Date
    Mar 2002
    Location
    Buenos Aires, Argentina
    Posts
    877
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: BATCH OCR OF PDFS

    Great link.

    I'd modify something in the steps to copy an image from a PDF file: don't just paste into Word, but have another app (such as PhotoEd) save it to JPG. Then INSERT it from word instead of just pasting as a raw bimap or metafile; that will make the document much lighter (and, generally, less corruption-prone).

    Out of the list of freebies there's one I've personally tested and found really useful: CutePDF. It's been mentioned before here, maybe you want to give it a go.
    <img src=/w3timages/blue3line.gif width=33% height=2>
    <img src=/S/flags/Argentina.gif border=0 alt=Argentina width=30 height=18> <big><font color=4682b4><font face="Comic Sans MS">Diegol</font face=comic></font color=4682b4> </big>

  9. #9
    2 Star Lounger
    Join Date
    Jan 2001
    Location
    Windy Wellington, Wellington
    Posts
    123
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: BATCH OCR OF PDFS

    <hr>Is there a cost effective software that will take my PDFs and convert them to SEARCHABLE pdfs, en masse?
    Something that will convert a whole folder of unsearchable PDFs into searchable ones, while I am out and about having some sort of life?
    <hr>

    Omnipage Pro 12 has a utility called something like the Schedule Manager. I've used it to OCR a PDF file. Although I haven't tried it with multiple PDF files the format of the interface certainly suggests you can.
    Keith Rodgers
    <img src=/S/flags/NewZealand.gif border=0 alt=NewZealand width=30 height=18>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •