OCR recognizes table as two printed columns one after another
I am a member of a small local and non-profit genealogy society http://cfgs.org. One of our functions is to publish old birth and death records online where others can find them. Years ago someone typed a many pages of tombstone records from local cemeteries. These tables contain the personís name along with a birth and/or death date. They are in an unlined table format with a large space between the name and dates.
We want to put these records online so people can find them. Several of us have tried scanning these records using the built-in OCR with our all-in-one printers. We have also scanned them into PDF format and used Acrobat Pro to recognize the text. In both cases the result is two columns, like newspaper columns instead of a table. This disassociates the dates from the names and when this is places in a word processor or spreadsheet we get one column with the dates are below the names.
We really would like to avoid having volunteers manually enter these records into Excel or a database like Access. Does anyone have a suggestion on how to get these typed pages into a data table format?
Another problem, but totally different, is that neither Excel nor Access recognize a number as a date prior to Jan 1, 1900 and many of our dates are older than this. This makes sorting or searching difficult.