Results 1 to 7 of 7
  1. #1
    New Lounger
    Join Date
    Nov 2002
    Posts
    13
    Thanks
    3
    Thanked 0 Times in 0 Posts

    MS Word - Find and Replace involving mixed hypertext and plain text

    Esteemed Loungers,

    This has vexed me for YEARS. How can I search and replace in MS Word where what I want to find and replace contains both hypertext and plain text?

    I'm attaching an example. I save articles from various web sources. Sometimes (for specific reasons, like wanting to preserve hyperlinks to other articles) I do this by copying and paste into MS Word. But the comment sections (which I also want to save) have extra repetitive text (for example, "Reply", "Like", etc) that I want to delete. Some of this is hypertext. However, it's interspersed with paragraph marks and (sometimes) bullets that are NOT hypertext.

    For example: (see the attached MS Word document for the actual thing)



    Each line is hypertext, followed by a paragraph mark for the linefeed/return. I want to delete all the Like/Reply combinations. I'd usually use the search string Like^pReply^p and replace it with nothing. But since the words "Like" and "Reply" are hypertext, Word doesn't find any instances of that combination, even though they're clearly visible.

    I've tried expanding out the fields, I've tried using wildcards, even macros ... but the hypertext always messes things up.

    Can anyone help me delete portions of text like this across the whole document without having to do it manually?

    Many thanks ...

  2. #2
    2 Star Lounger
    Join Date
    Dec 2009
    Location
    Canada
    Posts
    122
    Thanks
    3
    Thanked 20 Times in 18 Posts
    Use this 2-step Find and Replace procedure will get rid of any paragraphs starting with a hyperlink linking to "www.randomhouse" (you can add more to be more specific about what link gets found).

    First, expose the field codes with Alt-F9 so the Find is able to see the content in the field codes. The first F&R will change the hyperlinks to some unique character:

    Find what: ^p^d HYPERLINK "http://www.randomhouse
    Replace with: ^p (any unique character; this was entered as Alt-0254)
    Replace All

    Now get rid of the paragraphs that contained the hyperlinks:

    Find what: ^p
    Replace with: empty
    Replace All

    Restore the field code view with Alt-F9.

  3. The Following User Says Thank You to EricFletcher For This Useful Post:

    ebruskin (2012-09-23)

  4. #3
    New Lounger
    Join Date
    Nov 2002
    Posts
    13
    Thanks
    3
    Thanked 0 Times in 0 Posts
    First, THANK YOU for a quick reply that already gets me farther than I've been before.

    I don't know why you started the "Find what" with a ^p. I want to get rid of the ^p after the hypertext.

    I tried this:

    Find what: ^d HYPERLINK "http://www.randomhouse.ca/hazlitt/feature/how-succeed-journalism-when-you-cant-afford-internship"
    Replace with: X

    It removed all the text in all of the hyperlinks. I did want to save the 3rd and 4th ones with the time and the "in reply to" information. What did I do wrong?

    Also, when I tried to enter Alt-0254 for the "Replace with:", I tried holding down alt and pressing the 0 and I got a "clunk" sound and nothing got entered. I also pressed alt and then the 0, and it still didn't work. How do I enter those characters?

    (I'm actually pretty experienced with MS Office, but I never did figure out the Alt characters.)

    Continued thanks,

    Eric (me too)

  5. #4
    2 Star Lounger
    Join Date
    Dec 2009
    Location
    Canada
    Posts
    122
    Thanks
    3
    Thanked 20 Times in 18 Posts
    When you enter Alt codes, you need to use the numeric pad for the digits (i.e. not the digits on the normal keyboard). Just keep the Alt key pressed while typing the digits (and use all 4 digits). Within a F&R, you can use the caret with the digits: ^0254. (Use my printable PDF cheat sheet for a handy reference for entering the set of special characters in various ways, including the Alt sequences.)

    I put the ^p before the ^d to ensure that the Find only catches hyperlinks at the start of a paragraph on the assumption that other ones you may want to keep would be included within paragraphs. The replace puts the paragraph mark back in, but with the unique code in place of the hyperlink so you can remove the affected paragraphs in the next step.

    The hyperlinks don't have sufficient information within them to be able to differentiate between them for the Like and Reply text. The hidden stuff must be kept -- and presumably be available -- somewhere but I don't know how to find it. This may be related to the changes Microsoft made to the INCLUDEPICTURE field code: older versions could be expanded as field codes so the path and switches could be edited, but in Word 2010, the pictures are inserted as Inline Shapes, and the path & switches info can no longer be accessed except via a VBA procedure.

    That being said, you can isolate the ones with links by adding the following step at the beginning:

    Find what: ^d HYPERLINK "http://www.randomhouse.ca/hazlitt/feature/how-succeed-journalism-when-you-cant-afford-internship" \l "comment
    Replace with: ^& (the here is Alt-0253)
    Replace All

    Now the hyperlinks with the \l switch will be preceded by the character, so they won't be changed in the next steps. You'll need to remove the as a 4th step. (BTW, the ^& code is interpreted as "whatever was found" in the Replace).

  6. The Following User Says Thank You to EricFletcher For This Useful Post:

    ebruskin (2012-09-23)

  7. #5
    New Lounger
    Join Date
    Nov 2002
    Posts
    13
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Thanks again for the information contained in your reply. (By the way, I recognize the strategy of "tagging" certain text with characters, doing some changes, then getting rid of the special characters. I use it frequently.)

    But - I can't differentiate between the hyperlinks based on their visible text? Of the four hyperlinks following the first comment, three look identical when I do Alt+F9, so I can't distinguish between them? Meaning that I'd have to delete either all or none?

    Then I thought - what if I could find only those strings that had two consecutive occurrences of the "plain" hyperlink string without the /l? (Each followed by a paragraph mark.) That would trap the first two links of each group, which are the "Like" and "Reply" that I want to get rid of. So I tried searching for this:

    ^d HYPERLINK "http://www.randomhouse.ca/hazlitt/feature/how-succeed-journalism-when-you-cant-afford-internship" ^p^d HYPERLINK "http://www.randomhouse.ca/hazlitt/feature/how-succeed-journalism-when-you-cant-afford-internship" ^p

    but it didn't work. It worked up through the space following the end of the first quoted URL, but if I added a ^p to that, it would fail. So is there no way to search for a hyperlink FOLLOWED BY plain text (or in this case, a paragraph mark)?

  8. #6
    2 Star Lounger
    Join Date
    Dec 2009
    Location
    Canada
    Posts
    122
    Thanks
    3
    Thanked 20 Times in 18 Posts
    I think you'd find that the hyperlinks include more info that is not available from the UI (but may be via VBA). That is certainly the case with the "new" method of dealing with linked images: teh Inline Shape object has information available from VBA that is not available via the UI.

    The ^d denotes a field code, and the characters following it further refine which codes to find. The problem is that there is no "end of field code" delimiter, so the 2nd ^d (and even the ^p) is probably being considered as more characters within the hyperlink string.

    If you are handy with recorded macros, you could modify some recorded code to find the ^d HYPERLINK "http://www.randomhouse.ca/hazlitt/feature/how-succeed-journalism-when-you-cant-afford-internship" then delete, find again and delete, then find and skip twice. Put it in a loop to repeat for the full document. That would work if -- and only if -- every set always has 4 instances and you want to always keep just the 3rd & 4th.

  9. The Following User Says Thank You to EricFletcher For This Useful Post:

    ebruskin (2012-09-23)

  10. #7
    New Lounger
    Join Date
    Nov 2002
    Posts
    13
    Thanks
    3
    Thanked 0 Times in 0 Posts
    I've tried VBA in the past with deleting hyperlinks, but found its behavior inconsistent. (Sometimes it would select a hyperlink, sometimes not, which would throw off the whole thing.) However, I haven't tried it when Alt-F9 is in effect. I'll try that and (if necessary) try more things, and if I find anything successful or interesting, I'll replrt back. Meanwhile, thank you for providing more useful information on this question than I've had from other forums or experts.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •