Results 1 to 2 of 2
  1. #1
    2 Star Lounger miladytn's Avatar
    Join Date
    Apr 2002
    Location
    East Tennessee, USA
    Posts
    153
    Thanks
    4
    Thanked 4 Times in 4 Posts

    Cleaning scanned text (Word XP)

    I would appreciate some suggestions on "cleaning up" scanned text. The scanner is a 3-4 year old HP (6000 series I think..I'm not at my office at this writing) and I'm using the generic software that came with it. I'm using Windows and Office XP Pro. Font and clarity of the original document make a huge difference in the scanned result, some fairly successful but some not. My problem deals with large fonts such as Courier. Smaller fonts and legible originals usually scan fairly well without much "cleaning up." However, my most recent effort was a 28 page document in a Courier font. And yes, there were 2 spaces after the . ! After scanning, the document was over 60 pages of text, each line with a paragraph mark and a blank line before the next one. And with all the space with the Courier font, the result included some unnecessary tables and sentence fragments in the wrong places. Does anyone know of a quick/easy way to delete all those paragraph marks without the tediousness of doing it one at a time.

  2. #2
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts

    Re: Cleaning scanned text (Word XP)

    Hi,

    Cleaning up documents like your's is fairly straightforward, using Search&Replace.

    Assuming each line has a para mark at the end, and that there is an empty line (or a line with just a space) with a para mark separating true paras:
    First: Do a Search&Replace to replace all para marks (^p) with a pair of tildes (~~), or some other character combination that isn't found in the document.
    Second: Do a Search&Replace to replace every occurrence of a pair of tildes followed by a space, followed by a pair of tildes (~~ ~~) with a para mark (^p)
    Third: Do a Search&Replace to replace every occurrence of a pair of tildes followed by another pair of tildes (~~~~) with a para mark (^p)
    Fourth: Do a Search&Replace to replace every occurrence of a pair of tildes followed (~~) with a single space ( )
    Fifth: Do a Search&Replace to replace every occurrence of a pair of spaces followed ( ) with a single space ( ). Repeat until none is found.

    By now, your document should be fairly 'clean'.

    If your document doesn't have a spare empty line between true paras, put one in before doing the above. If it has a a spare empty line within true paras, try Search&Replace to replace all double para marks (^p^p) with a single para mark (^p).

    HTH, Cheers
    Cheers,

    Paul Edstein
    [MS MVP - Word]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •