Page 1 of 2 12 LastLast
Results 1 to 15 of 29

Thread: OCR software

  1. #1
    Gold Lounger
    Join Date
    Dec 2000
    Location
    New Hampshire, USA
    Posts
    3,386
    Thanks
    0
    Thanked 0 Times in 0 Posts

    OCR software

    It's been a while since there has been a thread on OCR software, so I'll ask here to get the latest opinions.

    What is the best software for:

    1. Converting PDF to a Word document.
    2. Scanning in documents that are mostly text and tables.

  2. #2
    5 Star Lounger
    Join Date
    Aug 2001
    Location
    Confoederatio Helvetica
    Posts
    602
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    Although I have no experience with the Windows Version, my Mac version of Readiris is the best OCR I've ever used, and it handles multiple languages!

  3. #3
    Gold Lounger
    Join Date
    Dec 2000
    Location
    New Hampshire, USA
    Posts
    3,386
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    I could not resist the after rebate price of $115.

    http://www.atomicpark.com/productDet...x?prodId=18877

  4. #4
    5 Star Lounger
    Join Date
    Mar 2002
    Location
    Buenos Aires, Argentina
    Posts
    877
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    Hi Howard,

    I don't know if there's a more accesible choice, but recent Acrobat (full package) versions such as 5 and 6 allow to convert from PDF to RTF (ver. 5) and to DOC (ver. 6). Of course the feasibility of the conversion will depend on whether the PDF is protected or not; and on how the PDF was created. For instance, a PDF created out of images containing text will be impossible to convert to word. Moreover there are several ways to create PDFs (the technicisms behind them go beyond my knowledge) and I believe e.g. that a Word document can be converted to PDF in such a way that it would be impossible to convert back to Word. But I'm not sure, and if so, these cases are a minority.
    <img src=/w3timages/blue3line.gif width=33% height=2>
    <img src=/S/flags/Argentina.gif border=0 alt=Argentina width=30 height=18> <big><font color=4682b4><font face="Comic Sans MS">Diegol</font face=comic></font color=4682b4> </big>

  5. #5
    Gold Lounger
    Join Date
    Dec 2000
    Location
    New Hampshire, USA
    Posts
    3,386
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    I've got Acrobat 5. It does not do a good job of converting PDF to RTF.

    I doubt that one can guarantee 100% success in a round-trip of Word to PDF to Word, but one can always convert PDF to Word, might not be as good as an initial creation in Word.

    The trigger for all this is that I want to scan in some docs I wrote 17-18 years ago and get them into a Word edit form.
    In some cases, I will edit and re-issue the documents. For others, I'll just make them available.

    I've heard that OmniPage does a pretty good job. I'll find out soon as the software was shipped to me earlier today.

    Also, I've got PDF for some standards that were published a while ago, for which I was a/the main contributor.
    I want to compare them to my Word files, and maybe issue corrections.

  6. #6
    5 Star Lounger
    Join Date
    Mar 2002
    Location
    Buenos Aires, Argentina
    Posts
    877
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    It's highly likely that the conversion from PDF to Word won't be 100% exact. Image objects will very probably be lost in the way. When you convert a text from Word to PDF, paras turn into disassociated lines. What Word treats as one single paragraph breaks into a number of lines that look exactly as in Word (but the "object" is different) after the conversion to PDF. So when you convert back to RTF / DOC, you won't get the original paragraph "object" but a bunch of lines (this becomes more notorious if you want to justify the text: it will seem to remain aligned to the left). This is analogous (to no more than this extent) to what happens with e-mails, when paras break in several lines.

    My <img src=/S/2cents.gif border=0 alt=2cents width=15 height=15>.
    <img src=/w3timages/blue3line.gif width=33% height=2>
    <img src=/S/flags/Argentina.gif border=0 alt=Argentina width=30 height=18> <big><font color=4682b4><font face="Comic Sans MS">Diegol</font face=comic></font color=4682b4> </big>

  7. #7
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: OCR software

    Do you have Office 2003? I was skimming Ed Bott and Woody Leonhard's Special Edition Using Office 2003 and they mention an OCR applet that comes with. It reads TIFF files, though, not PDFs, from what I recall.

  8. #8
    Gold Lounger
    Join Date
    Dec 2000
    Location
    New Hampshire, USA
    Posts
    3,386
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    Yes, and that is to be expected.
    A PDF Converter has no way of telling how you wish to break lines in Word, al lthe converter can do is convert lines, unless it wants to start guessing about justification. I'd rather not have it guess.
    Image objects can be included exactly as images, so no image needs to be lost.

  9. #9
    Gold Lounger
    Join Date
    Dec 2000
    Location
    New Hampshire, USA
    Posts
    3,386
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    The Office component is useful for scanning, but not OCR.

    For example, I justasked the critter to open a PDF file.
    It started Acrobat to process the PDF, then created an MDI (MSFT Document Imaging) file.
    When I open the MDI file, I can save as TIFF or MDI, so tho useful, for scanning, it does not convert the PDF to a Word document.

    If I were to print the PDF and then scan it using MDI, I could evaluate the OCR.
    I'll do a scan now...
    It just created an image, no way to save as Word/RTF, just TIFF or MDI.

  10. #10
    Silver Lounger
    Join Date
    Jan 2001
    Location
    Swanzey, New Hampshire, USA
    Posts
    1,707
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    <hr>Howard remarked:
    I've heard that OmniPage does a pretty good job. I'll find out soon as the software was shipped to me earlier today.<hr>
    I can't give you an honest personal appraisal of OmniPage because I haven't been able to find a place to download it to try it out. But I have used and continue to use the "proto-type" of OmniPage; Textbridge Millennium Pro (version 9.5). I have tried several other recommended OCR applications, e.g., Abbyy FineReader, ReadIris which I tried out yesterday and immediately uninstalled, and a couple of others some time back and I can't remember their names. The bottom line for me is, Textbridge was not only as accurate if not more so than any of these others, but the GUI is MUCH easier to use and it has more USABLE features than any of the others, e.g., dual-page scanning, color magazines pages, multi-column pages, etc. So, my guess is that OmniPage should be a top-notch program in all areas. I'd be interested in hearing back from you after you have used OmniPage for a bit. I'm assuming you purchases OmniPage Pro 14?
    Jeff
    simul iustus et peccator

  11. #11
    Super Moderator
    Join Date
    Dec 2000
    Location
    Renton, Washington, USA
    Posts
    12,560
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Re: OCR software

    I started out using Textbridge when it came with my first scanner. I used it until it was bought out and replaced by OmniPage. I have used most all versions since the buy out and currently using version 14. I have found none others that will keep up with OmniPage. It seems that that the first version after the buy out contained the best of both.

    Now running HP Pavilion a6528p, with Win7 64 Bit OS.

  12. #12
    Silver Lounger
    Join Date
    Jan 2001
    Location
    Swanzey, New Hampshire, USA
    Posts
    1,707
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    Dave,

    Thanks for your insights. My version of Textbridge is the "Xerox" edition, which was the last one before the buy out. I checked today and I see that ScanSoft is still offering Textbridge Millennium 11. Would I be right in assuming that this is simply the OCR engine less all the frills etc. of Omnipage? If I ever decided to upgrade, it would be nice to know this since the price differential between the two is significant. <img src=/S/grin.gif border=0 alt=grin width=15 height=15>

    Jeff
    Jeff
    simul iustus et peccator

  13. #13
    Gold Lounger
    Join Date
    Dec 2000
    Location
    New Hampshire, USA
    Posts
    3,386
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    I have an old version of Textbridge that came with my scanner.

    1. It had a bug which needed a crude registry hack to overcome.
    2. My recollection, the few times I tried to OCR with textbridge was that was not that greak.

    Heck having textbridge qualifues me foe the upgrade price to OmniPage Officve, so I guess it's wort $115 after rebate.
    Probably won't perform well, as I have a Pentium II and the system requirements state Pentium 3.
    In any case, I'd hold on to the software for use on my next PC.
    I'll need another OC for Visual Studio 2005 and, likely, the next version of Office.

  14. #14
    Super Moderator
    Join Date
    Dec 2000
    Location
    Renton, Washington, USA
    Posts
    12,560
    Thanks
    0
    Thanked 4 Times in 4 Posts

    Re: OCR software

    Since I upgraded to OmniPage I thought that ScanSoft stopped updating TextBridge and have paid no attention to it.

    Now running HP Pavilion a6528p, with Win7 64 Bit OS.

  15. #15
    Silver Lounger
    Join Date
    Jan 2001
    Location
    Swanzey, New Hampshire, USA
    Posts
    1,707
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: OCR software

    Howard,

    I really don't want to stretch this thread out with more chatter about "Textbridge", but in it's defense, one must distinguish between the "basic" program which was bundled with many scanners and Textbridge Millennium Pro, which was far different and exponentially superior to the "basic" version. I speak from my own experience, of course, and remember vividly using the bundled "basic" version that came with my scanner and was very much disappointed with it. However, the Millennium Pro version which I eventually bought was more than I had hoped for and I'm still using it today after these many years. The only "bug" that this version had was in regard to saving scanned documents to MS Word. But there was a a fix called, "Patch1 for the core", which after it was installed, the problem was resolved. It does have some issues which tech support said couldn't be resolved, e.g., the "Add" feature when proofreading/spellchecking doesn't work. But that to my mind is nothing more than a minor annoyance. Needless to say, since you have already purchased Omnipage Pro, I am confident that it will perform well.

    ENJOY! and do let me know what you think of its performance and features after you have had time to use it for a spell. <img src=/S/grin.gif border=0 alt=grin width=15 height=15>

    Jeff
    Jeff
    simul iustus et peccator

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •