Page 1 of 2 12 LastLast
Results 1 to 15 of 19
  1. #1
    New Lounger
    Join Date
    Feb 2002
    Location
    Little Rock, Arkansas USA
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    need to list all words in a document (Word 2000/2002)

    I know I've seen this on Woodys but cannot find it now.
    I need a way to open Word Documents and to generate a list of all the words that are in that document( - not the word count!). I have many, many documents, and have to create a list of key words to search for each one. Does anyone have a link, or code or suggestions on how I can get a list of each word in the document, and if possible, how many times that word appears? Thank you in advance,
    Chalmers

  2. #2
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts

    Re: need to list all words in a document (Word 2000/2002)

    hi Chalmers,

    There's an indexing utility by one of the loungers (Chris Greaves) at: http://www.vif.com/users/cgreaves/Indexer.htm. That may do what you want.

    Cheers
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  3. #3
    New Lounger
    Join Date
    Feb 2002
    Location
    Little Rock, Arkansas USA
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: need to list all words in a document (Word 2000/2002)

    Thank you. There is really nothing there but ads to buy his software.
    If anyone else has done this, or has a link to an example of how to do this it would be really greatly appreciated!
    Sincerely
    Chalmers

  4. #4
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: need to list all words in a document (Word 2000/2002)

    If you are interested in writing some code... there is an object in the Scripting library called a Dictionary; it's similar to a Collection in VB (or an associative array in Perl). You can do something like the following to keep track of the words in a document:

    *** WARNING: UNTESTED CODE FOR ILLUSTRATION ONLY ***

    Dim dict as Object
    Set dict = CreateObject("Scripting.Dictionary")
    On Error Resume Next 'ignore errors in adding to the dictionary
    For Each [some way to access each word in the document!] in ActiveDocument.Content
    dict.Add strWord, 0 ' duplicates generate an error, ignored per the above
    dict.Item(strWord) = dict.Item(strWord) + 1 'increment the word counter
    Next

    So now we get to the harder part... what you and I consider to be words, and what Word considers to be a word, are different. Word's concept can be understood by using Ctrl+RightArrow in a body of text. The insertion point stops after spaces, before a run of one or more punctuation marks, after a run of one or more punctuation marks, and so forth. Messy.

    On the whole, if you can find a $30 program that accurately catalogs the words, I think you'll save yourself a lot of programming grief!

  5. #5
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,055
    Thanks
    2
    Thanked 417 Times in 346 Posts

    Re: need to list all words in a document (Word 2000/2002)

    Hi Chalmers,

    Sorry that the pervious pst wasn't much good. Perhaps the following will do better. I 'lifted' it from soemwhere, but can't remember where:

    <pre>The following macro compiles a sorted table showing the frequency with which any given word
    appears in a document. Words not required for inclusion in the count can be excluded by
    activating two lines of code that have been comment out under
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  6. #6
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: need to list all words in a document (Word 2000/2002)

    This is some complicated code. I would guess that Chris might have written this, although the absence of calls to other procedures from his big procedures library suggests to the contrary...

    Anyway, there is a bug: I tested this procedure against its own source code, and comments are thrown out because they do not start with a letter from A to Z (or should I say A to z). If I check for this, I come up only one word short of a "hand-count." Close, but... why is this so hard?!

    Here's the change - old code:

    <pre> If SingleWord < "a" Or SingleWord > "z" Then SingleWord = "" 'Out of range?</pre>

    new code:
    <pre> If (Left(SingleWord, 1) = "'") And (Len(SingleWord) > 1) Then
    SingleWord = Mid(SingleWord, 2) 'trash that leading apostrophe
    If Right(SingleWord, 1) = "'" Then 'drop any trailing one, too.
    SingleWord = Left(SingleWord, Len(SingleWord) - 1)
    End If
    Else
    If SingleWord < "a" Or SingleWord > "z" Then SingleWord = "" 'Out of range?
    End If</pre>

    It would be interesting to test on more "real world" documents. I think as long as you do not mind having punctuated words (such as Amazon.com or MS-DOS or TCP/IP) count as two different words, then using Microsoft's notion of a word is "good enough."

  7. #7
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: need to list all words in a document (Word 2000/2002)

    I did it a different way, which results in a different set of words, different compromises on functionality. Take a look at the Excel sheet to see the differences in results for the issue of Woody's Office Watch in the Word doc. The macros are in separate modules in the document. Fun, but tiring stuff.
    Attached Files Attached Files

  8. #8
    Super Moderator
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    3,852
    Thanks
    4
    Thanked 259 Times in 239 Posts

    Re: need to list all words in a document (Word 2000/2002)

    When I look at this puzzle, I think that you may be able to simplify it by breaking into phases like
    1. Cut the content and Paste Special as text only
    2. Make every word a paragraph by replacing white space with a return
    3. Select it all and sort the paragraphs - this puts all like words together and alphabetically sorts them
    4. Now parse the file and set a counter to compare one paragraph with the next and implement some way to record the count

    I have developed my code along these lines without worrying too much about bug catching but it does go pretty close to the task albeit at a leisurely pace.
    <small><pre>Sub temp1()
    Selection.WholeStory
    Selection.Cut
    Selection.Range.PasteSpecial DataType:=wdPasteText
    With Selection.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .text = "^w"
    .Replacement.text = "^p"
    .Forward = True
    .Wrap = wdFindContinue
    .Format = False
    .MatchWildcards = False
    .Execute Replace:=wdReplaceAll
    .text = "^p^p"
    .Execute Replace:=wdReplaceAll
    .Execute Replace:=wdReplaceAll
    .Execute Replace:=wdReplaceAll
    End With
    Selection.WholeStory
    Selection.Range.Case = wdLowerCase
    Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _
    SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _
    FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _
    wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _
    wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _
    wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _
    :=wdEnglishAUS, SubFieldNumber:="Paragraphs", SubFieldNumber2:= _
    "Paragraphs", SubFieldNumber3:="Paragraphs"

    Dim iCount As Integer, x As Integer, aRange As Range, doc As Document
    Set doc = ActiveDocument
    x = doc.Paragraphs.count
    iCount = 1

    While x > 1
    Do While doc.Paragraphs(x).Range.text = doc.Paragraphs(x - 1).Range.text
    iCount = iCount + 1
    doc.Paragraphs(x).Range.Delete
    x = x - 1
    Loop
    Set aRange = doc.Range(doc.Paragraphs(x).Range.Start, _
    doc.Paragraphs(x).Range.End - 1)
    aRange.InsertAfter vbTab & iCount
    x = x - 1
    iCount = 1
    Debug.Print x
    Wend

    Set aRange = Nothing
    Set doc = Nothing

    End Sub</pre>

    </small>
    Andrew Lockton, Chrysalis Design, Melbourne Australia

  9. #9
    New Lounger
    Join Date
    Feb 2002
    Location
    Little Rock, Arkansas USA
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: need to list all words in a document (Word 2000/2002)

    Macropod, Thank you! Along with Jscher2000's suggestion, this works pretty good!
    Now ... If I could just find a way to have code or a macro to open each document in the folder, scan it, write out the document name at the top and word list, and loop until all documents are scanned!
    I presented this to our 2 web people, and they don't want to do each of the 4000 documents individually! This is the first time I've ever gotten to VBA code for Word!
    Again, I cannot express fully how much you have helped me!
    Chalmers

  10. #10
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: need to list all words in a document (Word 2000/2002)

    There have been many code examples on the Word and VB/VBA boards that involve opening all files in a folder, and there are several ways of doing it. I can't put my finger on any code at the moment, but perhaps someone who has benefited from such as post in the past will share theirs.

  11. #11
    New Lounger
    Join Date
    Feb 2002
    Location
    Little Rock, Arkansas USA
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: need to list all words in a document (Word 2000/2002)

    Thank you, jscher2000.
    I have been unsucessful in location Word boards. Do you have some good recommendations? I have pleanty of Access/ASP/VBScript boards, but no VBA or Word!
    I thought that after studying macropod's code, I could figure out how to open a word document, break out the words, close and open another, etc. until all 4000 files have been opened and scanned. Hopefully someone will post this code!
    Thank you all in advance,
    Chalmers

  12. #12
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: need to list all words in a document (Word 2000/2002)

    Here is code to loop through a folder:

    Dim strPath As String
    Dim strFile As String
    Dim oCurDoc As Document
    Dim oDoc As Document

    Set oCurDoc = ActiveDocument ' write results to this document

    strPath = "C:Word" ' note the trailing
    strFile = Dir(strPath & "*.doc")

    Do While Not strFile = ""
    ' Open document
    Set oDoc = Documents.Open(FileName:=strPath & strFile, AddToRecentFiles:=False)
    ' Process document here using code provided by others
    ...
    ' Close document without saving it
    oDoc.Close SaveChanges:=wdDoNotSaveChanges
    strFile = Dir
    Loop

    ' Rest of processing here
    ...

    ' Cleaning up
    Set oCurDoc = Nothing
    Set oDoc = Nothing

  13. #13
    New Lounger
    Join Date
    Feb 2002
    Location
    Little Rock, Arkansas USA
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: need to list all words in a document (Word 2000/2002)

    You guys are just too much! I've never been on a board where I get so much help so quickly! I am mostly an Access and ASP programmer, and am just really getting started in Word VBA. I guess I hadn't realized just how powerful this can be! You can bet on it that I will frequent this board often, and if ever the chance comes up that I may be able to help someone, I will!
    Sincerely
    Chalmers

  14. #14
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Los Angeles Area, California, USA
    Posts
    7,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: need to list all words in a document (Word 2000/2002)

    Hi Chalmers:
    I'm afraid I can't help with the code, but if you notice your post, it's much too wide. When posting long lines, you can do one of two things:
    1. post them in an attachment (as you've done) only; or,
    2. use pre tags <!t>[pre]<!/t>code goes here<!t>[/pre]<!/t> AND break up long lines by ending the line with space underscore. That is a continuation mark but will allow the code to wrap.
    Cheers,

  15. #15
    New Lounger
    Join Date
    Feb 2002
    Location
    Little Rock, Arkansas USA
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: need to list all words in a document (Word 2000/2002)

    This is the final code I have.

    Thanks to HansV and macropod for getting me started
    and furnishing the code. I made some changes:

    The user can specify the folder with the files to be scanned.
    The file extension (.txt, .doc, .asp) can also be specified.
    I also write the file name to the first row of the table.

    I wanted to code this so that each file's table of words would
    be appended at the end of the previous file's list,
    after a page break, but could not figure out how
    to do that.

    It would be so much nicer to have one long file, than to have
    100, 300, or even 1000's of individual files! We have approx.
    4000 files we have to scan!

    Maybe someone else can nudge me in the right direction?

    Anyway, I hope this helps someone else as much as it has
    helped me!
    Thanks to all that responded!
    I've exported the code to WordBreaker.txt that can be imported.
    Just rename WordBreaker.txt to WordBreaker.bas.

    As Phil pointed out, this was a long post so I deleted the code and
    included the text listing.
    Chalmers
    Attached Files Attached Files

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •