Results 1 to 8 of 8
  1. #1
    New Lounger
    Join Date
    Sep 2002
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Extracting text from multiple docs (Word 2000)

    We have 100+ Word documents containing specifications for various jobs. We want to extract the title, ID number and synopsis from each of them to create a "master" listing. Outside of opening each document separately, cutting and pasting the pieces we need into a master document, is there an easier way?

    Appreciate any suggestions!

  2. #2
    5 Star Lounger
    Join Date
    Jul 2002
    Location
    Toronto, Ontario, Canada
    Posts
    1,139
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting text from multiple docs (Word 2000)

    If they are laid out in EXACTLY the same format, then it would be possible with VBA to open the doc, get the required data and put it into a new doc.

    If you could post a sample document, then it would probably be easier for folks to help.
    --
    Bryan Carbonnell - Toronto <img src=/S/flags/Ontario.gif border=0 alt=Ontario width=30 height=18> <img src=/S/flags/Canada.gif border=0 alt=Canada width=30 height=18>
    Unfortunately common sense isn't so common!!
    Visit my website for useful Word, Excel and Access code, templates and Add-Ins

  3. #3
    New Lounger
    Join Date
    Sep 2002
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting text from multiple docs (Word 2000)

    They all have the same headers (for lack of what else to call them) but they may be on different lines, depending on the length of the specs. The ID number will always be on the first line, however the other 2 pieces we want (TITLE and SCOPE) can be anywhere.

    I have attached 2 pages of one of our specs.
    Attached Files Attached Files

  4. #4
    5 Star Lounger
    Join Date
    Jul 2002
    Location
    Toronto, Ontario, Canada
    Posts
    1,139
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting text from multiple docs (Word 2000)

    Wel,, here is a start.

    To get the ID use the following lines of code:
    <pre>Set rng = doc.Range(8, 8) 'Assumes the start of the Document will always be "ID No. "
    rng.Expand wdWord

    Debug.Print rng.Text</pre>


    As for the title and scope, I can't figure out which pieces of the text they are.
    --
    Bryan Carbonnell - Toronto <img src=/S/flags/Ontario.gif border=0 alt=Ontario width=30 height=18> <img src=/S/flags/Canada.gif border=0 alt=Canada width=30 height=18>
    Unfortunately common sense isn't so common!!
    Visit my website for useful Word, Excel and Access code, templates and Add-Ins

  5. #5
    New Lounger
    Join Date
    Sep 2002
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting text from multiple docs (Word 2000)

    The title always has the text "TITLE" before it. The scope always has the text "SCOPE" before it. They both appear on page 2 of the sampe document. Wish I knew VBA better. Is there a way to do it based on text? Like search for the word TITLE and capture the word title plus any text that appears after it and the same for SCOPE? I'm not sure how it would determine the end of the text; as the scope may be a couple of lines of text.

  6. #6
    New Lounger
    Join Date
    Sep 2002
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting text from multiple docs (Word 2000)

    I just remembered something else too; not knowing if this would help matters. The pieces we want to extract (i.e. ID number, title and scope) all have tags around them. For example, the ID number has a beginning tag of <id> and a ending tag of </id>; the same for title (<title>, </title>) and scope (<scope>,</scope>). These are used for something totally different but i was wondering if there would be a way to search of these (the first tag indicating the beginning of the text and the ending tag indicating the ending of the text) to extract this information into one document.

  7. #7
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Los Angeles Area, California, USA
    Posts
    7,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Extracting text from multiple docs (Word 2000)

    I hate to come up with an idea that I don't know how to implement (since I'm not a VBA expert), but here's a start of an idea.

    The 3 items that you want from each document can be found using wildcards & searching respectively for:
    ID No.*^013
    SCOPE*^013
    TITLE*^013

    If some of the titles or scopes have multiple paragraphs or if the words (in caps) were used elsewhere in the document, then this method wouldn't work.

    If all these documents are in the same folder (or can be placed there), perhaps someone can write a macro that would:
    1. Open the documents in the folder, one at a time,
    2. Search for the string shown above,
    3. Copy it to the new document.

    For the future, it would be better to create these similar documents from the same specialized template. Certain items (such as ID, Title, Scope) could be bookmarked in the template, so they would exist in each document. Then creating your "master document" would involve using a series of INCLUDETEXT fields, a much easier process.

  8. #8
    Plutonium Lounger
    Join Date
    Nov 2001
    Posts
    10,550
    Thanks
    0
    Thanked 7 Times in 7 Posts

    Re: Extracting text from multiple docs (Word 2000)

    I can't see the <> tags you describe. But assuming that these are the only paragraphs that start with the text Title: and Scope: then this will do the trick.
    <pre>Sub TestIt()
    MsgBox GetParaFromPrefix("Title:")
    MsgBox GetParaFromPrefix("Scope:")
    End Sub

    Function GetParaFromPrefix(strStart As String) As String

    Dim strTemp As String

    With ActiveDocument.Content.Find
    .ClearFormatting
    .Text = "^p" & strStart
    .Replacement.Text = ""
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchCase = False
    .MatchWholeWord = False
    .MatchWildcards = False
    .MatchSoundsLike = False
    .MatchAllWordForms = False
    If .Execute Then
    strTemp = .Parent.Paragraphs(2).Range.Text
    GetParaFromPrefix = Right(strTemp, Len(strTemp) - Len(strStart))
    End If
    End With

    End Function
    </pre>

    StuartR

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •