Results 1 to 15 of 15
  1. #1
    New Lounger
    Join Date
    Jul 2012
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Ms Word: Join paragraphs without punctuation ONLY

    Hello!

    I've converted thought a OCR software a PDF Document in Word format and need to fix the layout and join some paragraphs.

    So far I used "search & replace" with ^P to identify the paragraphs however what I need to do is join only the paragraphs /sentences without a full stop at the end. Is there any change that I can do this easily?

    Many thanks in advance
    Al

  2. #2
    Plutonium Lounger Medico's Avatar
    Join Date
    Dec 2009
    Location
    USA
    Posts
    12,631
    Thanks
    161
    Thanked 936 Times in 856 Posts
    Al, Welcome to the Lounge.

    Have you tried CutePDF for this chore? I cannot answer your question but I have used this app and been very impressed with the outcome. I am not sure if this is what you need, but it's worth looking at.
    BACKUP...BACKUP...BACKUP
    Have a Great Day! Ted


    Sony Vaio Laptop, 2.53 GHz Duo Core Intel CPU, 8 GB RAM, 320 GB HD
    Win 8 Pro (64 Bit), IE 10 (64 Bit)


    Complete PC Specs: By Speccy

  3. #3
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,434
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Al,

    You can do this with a series of 3 search and replaces.
    1. Search for .^P and replace with a string that does not appear anywhere else in the text like #@#@#@.
    2. Search for ^P and replace with a space.
    3. Search for your string (#@#@#@) and replace with .^P.

    Of course you want to do this on a copy of your document just in case.
    You may also want to eliminate doubled paragraphs first by searching on ^p^p and replacing with ^p.

    I hope this solves your problem.
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  4. #4
    Silver Lounger Charles Kenyon's Avatar
    Join Date
    Jan 2001
    Location
    Sun Prairie, Wisconsin, Wisconsin, USA
    Posts
    2,048
    Thanks
    124
    Thanked 119 Times in 116 Posts
    Try first replacing all .[paragraph] with .xyxy[paragraph].

    Then replace all of your [paragraph] marks with a space.

    Then replace .xyxy[space] with .[paragraph].

    The code to put in the find or replace box for [paragraph] is ^p.
    Charles Kyle Kenyon
    Madison, Wisconsin

  5. #5
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Try the following macro. It'll clean up pretty well any text you throw at it. The only restriction you need to be aware of is that the text to be cleaned up is deemed to have one paragraph break per intra-paragraph line and two (or more) per inter paragraph break.
    Code:
    Sub CleanUpPastedText()
    ' Turn Off Screen Updating
    Application.ScreenUpdating = False
    With ActiveDocument.Content.Find
      .ClearFormatting
      .Replacement.ClearFormatting
      .Forward = True
      .Wrap = wdFindContinue
      .Format = False
      .MatchAllWordForms = False
      .MatchSoundsLike = False
      .MatchWildcards = True
      'Replace single paragraph breaks with a space
      .Text = "([!^13])([^13])([!^13])"
      .Replacement.Text = "\1 \3"
      .Execute Replace:=wdReplaceAll
      'Replace all double spaces with single spaces
      .Text = "[ ]{2,}"
      .Replacement.Text = " "
      .Execute Replace:=wdReplaceAll
      'Delete hypens in hyphenated text formerly split across lines
      .Text = "([a-z])-[ ]{1,}([a-z])"
      .Replacement.Text = "\1\2"
      .Execute Replace:=wdReplaceAll
      'Limit paragraph breaks to one per 'real' paragraph.
      .Text = "[^13]{1,}"
      .Replacement.Text = "^p"
      .Execute Replace:=wdReplaceAll
    End With
    ' Restore Screen Updating
    Application.ScreenUpdating = True
    End Sub
    Last edited by macropod; 2012-07-07 at 18:54. Reason: Code fix
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  6. The Following User Says Thank You to macropod For This Useful Post:

    RetiredGeek (2012-07-03)

  7. #6
    New Lounger
    Join Date
    Jul 2012
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hello and thanks for everyone who made suggestions...

    I've tried with find & replace but doesn't work (step 2 is not possible) therefore I am going to try with the macro. Just a quick question? How I do insert the code in Word 2010? Sorry but I'm not familiar with visual basic or similar...

    Yesterday I've also tried to convert the text in HTML to see if I can tweak a little bit the code but the problem is that I need to remove all tags <p> and result is a mess...

    I'll let you the results tonight!

    Many thanks again
    Al
    Last edited by Djangocor; 2012-07-04 at 07:23.

  8. #7
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  9. #8
    New Lounger
    Join Date
    Jul 2012
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I've tried the macro but I get this error in Word 2010... Compile error: Invalid use of the property (the last line of the code is highlighted)...
    please help, I think forgot something

  10. #9
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi Al,

    It's probably to do with your system's regional settings. Change:
    .Text = "[^13]{1,}"
    to:
    .Text = "[^13]{1;}"
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  11. #10
    Super Moderator
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    3,852
    Thanks
    4
    Thanked 259 Times in 239 Posts
    If the last line is the one that is not compiling, the problem is there is text missing from the code. Try changing the last line to


    Application.ScreenUpdating = True
    Andrew Lockton, Chrysalis Design, Melbourne Australia

  12. #11
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Ah, I see what happend - the last line and a half got missed when I copied & pasted the code into my post. Fixed.
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  13. #12
    New Lounger
    Join Date
    Jul 2012
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I've tried your macro Paul and seems to work great when you need to clean a text however I am losing all text layout and formatting with that.
    Basically the document which I scanned from PDF to Word are some chapters of a book hence I think that the only way for me to fix them will be
    to correct the split paragraphs each time using ^p manually...

    Thanks anyway for your precious help and support!
    Al

  14. #13
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts
    I'd be surprised in the macro has any effect on the formatting, as its main function is to delete unwanted paragraph breaks (^13 is functionally equivalent to ^p). It does very little else and has no format-related content.
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  15. #14
    New Lounger
    Join Date
    Jul 2012
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts
    The result generated from the macro is a plain continuous text (same fonts, no extra spaces) without any break or paragraphs interruption.

    Let's say that I had a text like this:

    aaaaa bbbb,
    cccc dddd.

    eeee ffff.

    gggg.

    now it looks like:

    aaaaa bbbb, cccc dddd. eeee ffff. gggg.

    which is great if you are copying a text from internet but for a book basically doesn't work.

    I'm using Word 2010 and have launched only your macro once the document has been opened. May be what I'm asking is not possible...

    Regards,
    Al

  16. #15
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts
    Hi Al,

    That suggests you haven't had sufficient regard for:
    The only restriction you need to be aware of is that the text to be cleaned up is deemed to have one paragraph break per intra-paragraph line and two (or more) per inter paragraph break.
    If you'd rather select the text that is to be processed, so that the macro doesn't simply process everything, change:
    ActiveDocument.Content.Find
    to:
    Selection.Find
    and change:
    .Wrap = wdFindContinue
    to:
    .Wrap = wdFindStop
    Cheers,

    Paul Edstein
    [MS MVP - Word]

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •