Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    3 Star Lounger
    Join Date
    Apr 2004
    Location
    Boston, Massachusetts, USA
    Posts
    389
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Iterating Word objects efficiently (Word VBA)

    Some recent threads have had to do with deleting objects from Word documents. As the posts show, when deleting objects, you're usually confined to using a For...Next loop rather than the much faster For...Each loop.

    But in some cases, there's a third alternative that I haven't seen discussed before (though I admittedly didn't look very hard) that offers nearly the same speed as a For...Each loop, but with the flexibility to delete objects along the way found in a For...Next loop.

    One such case is with Paragraphs, obviously something that you might often need to iterate and occasionally delete.

    Paragraphs, along with a handful of other objects (fields ar e another), include a "Next" property, which returns the next object in the series. By using the Next property, you can quickly move along a collection, while still being able to delete items along the way if needed.

    The following three examples don't actually delete paragraphs -- I thought I'd keep it simple for illustration purposes -- but they do nicely illustrate the three different techniques for iterating Paragraphs in a Word doucment. I ran all three on the same, 252-page, 4,030-paragraph Word document.

    1. <LI>The first uses a For...Next loop, and was quite slow, as you might expect. 2 minutes, 20 seconds.
      <LI>The second uses a For...Each loop, and was a bit speedier, at 2 seconds (yup!)
      <LI>The third, which starts at the first paragraph and uses the Next property to move along, also took ... 2 seconds
    <pre>Sub IterateParasTheSlowWay()
    Dim doc As Document
    Dim para As Paragraph
    Dim k As Integer
    Set doc = ActiveDocument

    For k = doc.Paragraphs.count To 1 Step -1
    Set para = doc.Paragraphs(k)
    If para.Style = doc.Styles(wdStyleHeading1) Then
    para.Range.HighlightColorIndex = wdBrightGreen
    End If
    Next k

    End Sub
    '----------------------------------------------------------
    Sub IterateParasTheFastestWay()
    Dim doc As Document
    Dim para As Paragraph

    Set doc = ActiveDocument

    For Each para In doc.Paragraphs
    If para.Style = doc.Styles(wdStyleHeading1) Then
    para.Range.HighlightColorIndex = wdBrightGreen
    End If
    Next para

    End Sub
    '------------------------------------------------------------
    Sub IterateParasTheFastAndFlexibleWay()
    Dim doc As Document
    Dim para As Paragraph
    Dim paraNext As Paragraph
    Set doc = ActiveDocument

    Set para = doc.Paragraphs.First
    Do While Not para Is Nothing
    Set paraNext = para.Next
    If para.Style = doc.Styles(wdStyleHeading1) Then
    para.Range.HighlightColorIndex = wdBrightGreen
    End If
    Set para = paraNext
    Loop

    End Sub
    </pre>


    In the case of this last subroutine, instead of applying highlighting, I could just as easily have deleted those Heading 1 paragraphs, and still been able to move along the collection correctly, since I've already got my hands on the following paragraph, which becomes the current paragraph on the next trip through the loop.

    For...Each loops are still my weapon of choice when doing standard iterations, but using the Next property technique (the "linked-list method" formally) has proved a valuable additon to my Word macro toolbox.

    Cheers!

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 28 Times in 28 Posts

    Re: Iterating Word objects efficiently (Word VBA)

    Thanks for sharing! This will come in handy, I'm sure.

    For others reading this, Next and Previous are properties of the following objects:
    Cell
    Column
    Field
    FormField
    MailMergeField
    Pane
    Row
    TabStop
    TextFrame
    Window

  3. #3
    3 Star Lounger
    Join Date
    Apr 2004
    Location
    Boston, Massachusetts, USA
    Posts
    389
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating Word objects efficiently (Word VBA)

    Thanks for putting that list together, Hans!

  4. #4
    Platinum Lounger
    Join Date
    Nov 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    5,016
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating Word objects efficiently (Word VBA)

    Thanks Andrew. Neat trick and neat code samples too.

    Alan

  5. #5
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts

    Re: Iterating Word objects efficiently (Word VBA)

    Andrew,

    That is neat, thanks for sharing it. There have been threads here before with regard to using .Next to iterate quickly, but that "Set obj = objNext" is a really nice trick.

    Another object that needs to be added to the list of objects that support First/Next is the Range object; in particular, this allows you iterate through the Characters collection (you can also do it with For Each, but that is minus the benefit of your method). In this example, all characters that are upper-case get highlighted (don't try this on a 200 page document!):
    <pre>Sub IterateCharactersNext()

    Dim doc As Document
    Dim char As Range
    Dim charNext As Range
    Set doc = ActiveDocument

    Set char = doc.Characters.First
    Do While Not char Is Nothing
    Set charNext = char.Next
    If char.Case = wdUpperCase Then
    char.HighlightColorIndex = wdBrightGreen
    End If
    Set char = charNext
    Loop

    Set doc = Nothing
    Set char = Nothing
    Set charNext = Nothing

    End Sub
    </pre>

    Gary

  6. #6
    3 Star Lounger
    Join Date
    Apr 2004
    Location
    Boston, Massachusetts, USA
    Posts
    389
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating characters selectively (Word VBA)

    Hi Gary,

    Thanks for the info on the Range object -- that also means you can go by word as well as by character (and sentences, but Word's definition of a sentence is a bit sketchy).

    Another recurring topic on the board is iterating over each character, just like you've described. The standard objections to doing so is slow (which is why you've warned against running your macro on a long document).

    But sometimes iterating each character is the best or only way to tackle a problem, so the question moves to how to optimize the iteration, so that you (well, not you specifically, Gary) only iterate characters when you absolutely have to.

    For example, if you wanted to work on any characters in a document whose formatting was different from that defined by its paragraph or character style (such as direct bold or italic applied), one fairly efficient approach is the following.

    This macro uses two supporting functions to isolate only those words in the document that contain some degree of direct formatting (for illustration purposes, I've confined 'direct formatting' to mean bold, italic, size or font name change -- in practice, that's usually sufficient).

    <pre>'=============================
    Sub IterateCharactersSelectively()
    Dim doc As Document
    Dim wrd As Range
    Dim char As Range
    Dim para As Paragraph

    Set doc = ActiveDocument
    For Each para In doc.Paragraphs
    If AnyDiffFontsInPara(para) = True Then
    For Each wrd In para.Range.Words
    If AnyDiffFontsInWord(wrd) = True Then
    wrd.Select
    MsgBox "This word has character formatting " & _
    "that is inconsistent with its style"
    ' now you only have to iterate each character
    ' in a word, rather than a whole paragraph
    ' or a whole document. Put your character
    ' iterating/modifying code here
    End If
    Next wrd
    End If

    Next para
    End Sub
    </pre>

    Basically, there's no point iterating all the characters in a particular paragraph if none of them are any different from the paragraph style properties. So by checking that first, we can move quickly past a lot of text. If and when we do find a paragraph that contains differing formatting, then we go word by word to isolate the problem, only then iterating each character. Depending on the amount of direct formatting in a document, and the average number of characters per word in your document, this technique can be several orders of magnitude faster than iterating each character in the document. Your mileage may vary.

    Here are the two supporting functions used by the main macro. These could be adjusted as needed to look for things like highlighting or superscripting.
    <pre>'============================================ ===
    Function AnyDiffFontsInPara(para As Paragraph) As Boolean
    Dim lDiffBold As Long
    Dim lDiffItal As Long
    Dim lDiffSize As Long
    Dim sDiffName As String

    AnyDiffFontsInPara = False

    With para.Range.Font
    lDiffBold = .Bold
    lDiffItal = .Italic
    lDiffSize = .Size
    sDiffName = .Name
    End With

    Select Case wdUndefined
    Case lDiffBold
    AnyDiffFontsInPara = True
    Exit Function
    Case lDiffItal
    AnyDiffFontsInPara = True
    Exit Function
    Case lDiffSize
    AnyDiffFontsInPara = True
    Exit Function
    End Select

    If Len(sDiffName) = 0 Then
    AnyDiffFontsInPara = True
    Exit Function
    End If
    End Function
    '==========================================
    Function AnyDiffFontsInWord(wrd As Range) As Boolean

    Dim docstyles As Styles
    Dim wrdstyle As String
    wrdstyle = wrd.Style
    Set docstyles = wrd.Parent.Styles

    Select Case True
    Case (Not wrd.Font.Bold = docstyles(wrdstyle).Font.Bold)
    AnyDiffFontsInWord = True
    Case (Not wrd.Font.Italic = docstyles(wrdstyle).Font.Italic)
    AnyDiffFontsInWord = True
    Case (Not wrd.Font.Name = docstyles(wrdstyle).Font.Name)
    AnyDiffFontsInWord = True
    Case (Not wrd.Font.Size = docstyles(wrdstyle).Font.Size)
    AnyDiffFontsInWord = True
    End Select
    End Function



    </pre>


    Cheers!

  7. #7
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts

    Re: Iterating characters selectively (Word VBA)

    Andrew,

    Thanks for posting this as well - this is great stuff, and deserves a star of its own. I may have missed some related threads in the past year or so, but recall a long one from 2001 or so on this same topic (will post a link later if I can track it down). If I recall right, Klaus Linke suggested a similar approach to optimizing by filtering what gets searched, but it's safe to say that nothing posted back then, approached this for elegance.

    Thanks also for demonstrating some unusual ways to use Select Case structures:<pre>Select Case wdUndefined
    Case lDiffBold

    Select Case True
    Case (Not wrd.Font.Bold = docstyles(wrdstyle).Font.Bold)</pre>

    Who knew? <img src=/S/shrug.gif border=0 alt=shrug width=39 height=15> <img src=/S/clapping.gif border=0 alt=clapping width=19 height=23>

    Gary

  8. #8
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating characters selectively (Word VBA)

    Andrew thanks for this. I found that it wasn't isolating individual words within a range, and so I modified it slightly (attached) to collect a "fnt" object from the first character of the paragraph; I also made it a function that accepts a Range as parameter, so I'm not restricted to a docUment.

    PS I should add that I purchased a copy of WordHacks two weeks ago, and love it.

  9. #9
    3 Star Lounger
    Join Date
    Apr 2004
    Location
    Boston, Massachusetts, USA
    Posts
    389
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating characters selectively (Word VBA)

    Hi Chris,

    Glad to hear you like the book -- it's very gratifiying to hear that people have found it useful.

    I'm a little unclear on what you mean by "wasn't isolating individual words within a range"; could you describe the problem (or post a sample document)? I wasn't able to get it to not isolate on each word. Your revised macro and my original produced the same results for me.

  10. #10
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating characters selectively (Word VBA)

    > "wasn't isolating individual words within a range"

    Andrew, I have attached a Sample.doc containing two paragraphs which themselves contains a text formatted in a user-defined character style (MacroCharacters). The VBA module has a copy of your code. If I extend the formatting to include the second part of the word preceding the original formastting, it is well-detected.

    Your code is timely as I am currently analysing 6,000+ documents with a client's request to isolate all non-standard formatting, and had been using an abbreviated "font" object, much as you suggest, for matching:<pre>With .Font
    strresult = strresult & .Bold & strdelim & .Italic & strdelim & _
    .Underline & strdelim & .Size & strdelim & .StrikeThrough & strdelim
    .Bold = wdUndefined</pre>

    >people have found it useful
    I wouldn't have described it as "useful" (grin!)

  11. #11
    3 Star Lounger
    Join Date
    Apr 2004
    Location
    Boston, Massachusetts, USA
    Posts
    389
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating characters selectively (Word VBA)

    Ah, perhaps I wasn't clear on the macro's evaluation criteria. The goal is to isolate direct formatting not associated with a style -- a paragraph style or a character style. In the case of your sample document, those words that use the "MacroCharacter" style are perfectly acceptable -- the user has correctly applied a character style to differentiate a portion of a paragraph. If, however, you apply additional formatting on top of the MacroCharacter style, like italics, the text will get flagged.

    If you want to detect any deviation from the paragraph style (including the use of character styles), you could probably just change:
    <pre>Dim wrdstyle As String
    wrdstyle = wrd.Style
    </pre>

    to<pre>Dim parastyle as String
    parastyle = wrd.Paragraphs.First.Style
    </pre>

    I have not tested that, by the way.

    Hope this makes sense. Cheers!

  12. #12
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating characters selectively (Word VBA)

    > detect any deviation from the paragraph style
    Right, thanks, and yes, it does make sense.
    I realised this morning that an essential part of any detection like this will be establishing the basic criteria.
    My mind was on "different from the first character of the paragraph", but it could have easily been "different from the first word of the paragraph" (including FWIW "undefined" as an allowable basis for comparison). Your example was, then "different from the style of the word". I think I got that right. I'm pretty sure, though, that my problem was caused by my not reading your definition, and trying to make the code do what I wanted without first priming it in the correct manner.

  13. #13
    Silver Lounger
    Join Date
    Jan 2001
    Location
    West Long Branch, New Jersey, USA
    Posts
    1,921
    Thanks
    6
    Thanked 9 Times in 7 Posts

    Re: Iterating Word objects efficiently (Word VBA)

    Hi Andrew,

    I just read the entire thread to date since I haven't had a chance to read much of anything on the Lounge lately. This looked interesting.

    Let me hypothesize that the first loop is doing something different than the 2nd and 3rd. Practically speaking they result in the same output. I don't know if this is really true, so this could be all smoke.

    The For...Next goes thru the paras in reverse order. Is it possible that Word or VBA has to step thru a link list each time to find the i-th para? Although I know why you go in reverse order, would the 1st approach be a little better than it is now if you went forward? Maybe not much.

    The 2nd approach lets Word/VBA do the driving and keeps track of pointers as you go thru the loop of paras. It probably takes advantage of the For...Each construct to step thru the collection of paras using pointer mechanisms. The 3rd approach seems like it could be the same with you doing the work instead of Word. In fact, I wouldn't be surprised if the actual implementation of the 2nd approach looked like your 3rd approach.

    I would also suspect that there could be a difference in how the loop conditions were handled in the first 2 cases. For example, do you know if the looping statement

    For k = doc.Paragraphs.count To 1 Step -1

    has to retrieve the doc.Paragraphs.count in each iteration? Although this would be bad programming in terms of compiling the source code (or interpreting it), I've seen worse. If this makes a diff, than I always stored this kind of var in a local var. That is:

    paraCount = doc.Paragraphs.count
    For k = paraCount to 1 Step -1

    Also, another key diff between your 1st and 2nd approaches is the need for the Set stmt in the 1st approach. This probably adds overhead in that the code has to set a pointer after retrieving yet other info (doc.Paragraphs(k)). The 2nd approach is letting the loop mechanism take care of this so you're cutting out figuring out what doc.Paragraphs(k) is.

    Even tho the 3rd approach does a Set, you're setting paraNext, itself a "pointer", to a "pointer" in the current para, which you already have access to. So, as I mentioned above, the 2nd and 3rd approaches should be the same.

    I'd also wonder if the size of the doc may have something to do with the big diff. For example, in getting to a 200+ page doc, I doubt that you entered the paras sequentially. Type a few paras, go back and insert a para before another para, copy and paste a few paras from another document in between 2 existing paras. I'm going to guess not and that Word probably relinks the para collection when you insert. So the link list would look the same when all's said and done regardless of whether you entered them "right the first time" or went back and forth as mentioned just above.

    Or this could be way off base.

    Fred

  14. #14
    3 Star Lounger
    Join Date
    Apr 2004
    Location
    Boston, Massachusetts, USA
    Posts
    389
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Iterating Word objects efficiently (Word VBA)

    Hi Fred,
    <hr>Let me hypothesize that the first loop is doing something different than the 2nd and 3rd. Practically speaking they result in the same output. I don't know if this is really true, so this could be all smoke.<hr>
    For ... Each loops are an optimized shortcut for iterating a collection of objects, or an array of Variants. It's functionally (though not performance wise) to use either of the following two loops:
    <pre>For Each para in ActiveDocument.Paragraphs
    ' Do something here
    Next para
    ' ------
    For i = 1 to ActiveDocument.Paragraphs.Count
    Set para = ActiveDocument.Paragraphs(i)
    ' do something here
    Next i
    </pre>

    In the case of the For Each loop, the set statement is implicit, and there's no need for an iterator variable, since the 1 to .count is also implicit. The For Each loop is faster because VBA can, in effect, pre-load the objects your loop needs. The For..Next loop on the other hand, can't do that, because VBA has no way of knowing whether or how much the value of i might change between iterations.
    <hr>
    The For...Next goes thru the paras in reverse order. Is it possible that Word or VBA has to step thru a link list each time to find the i-th para? Although I know why you go in reverse order, would the 1st approach be a little better than it is now if you went forward? Maybe not much.
    <hr>
    The order in which you iterate doesn't matter for speed, but is important if you want to delete any items. Deleting items while moving forward will result in skipped items, which is the same thing that can happen if you try deleting while using a For..Each loop. Consider this example:
    A document with 4 paragraphs, in this order: Heading 1, Heading 2, Heading 2, Normal.
    <pre>For k = 1 to ActiveDocument.Paragraphs.Count
    If ActiveDocument.Paragraphs(k).Style = "Heading 2" Then ActiveDocument.Paragraphs(k).Delete
    Next k
    </pre>

    In this case, the third paragraph won't get deleted (and you'll get an error when k gets to 4).

    <hr>
    The 2nd approach lets Word/VBA do the driving and keeps track of pointers as you go thru the loop of paras. It probably takes advantage of the For...Each construct to step thru the collection of paras using pointer mechanisms. The 3rd approach seems like it could be the same with you doing the work instead of Word. In fact, I wouldn't be surprised if the actual implementation of the 2nd approach looked like your 3rd approach.<hr>
    I think you're probably pretty close on that.

    <hr>
    I would also suspect that there could be a difference in how the loop conditions were handled in the first 2 cases. For example, do you know if the looping statement
    For k = doc.Paragraphs.count To 1 Step -1
    has to retrieve the doc.Paragraphs.count in each iteration?
    <hr>
    No, the value is computed once at the start of the loop, not during each iteration.

    <hr>
    I'd also wonder if the size of the doc may have something to do with the big diff. For example, in getting to a 200+ page doc, I doubt that you entered the paras sequentially. Type a few paras, go back and insert a para before another para, copy and paste a few paras from another document in between 2 existing paras. I'm going to guess not and that Word probably relinks the para collection when you insert. So the link list would look the same when all's said and done regardless of whether you entered them "right the first time" or went back and forth as mentioned just above.
    <hr>
    I actually did insert them sequentially, using the rand() trick. I wouldn't think the order in which the paragraphs were entered would have much impact on the efficiency of the iteration, but I could be wrong.

    Thanks for the insightful comments!

  15. #15
    5 Star Lounger st3333ve's Avatar
    Join Date
    May 2003
    Location
    Los Angeles, California, USA
    Posts
    705
    Thanks
    0
    Thanked 2 Times in 2 Posts

    Re: Iterating Word objects efficiently (Word VBA)

    Based on my experience, it looks to me like if you use For Each...Next to iterate through a document's paragraphs, deleting some of the paragraphs as you go, no paragraphs get skipped (assuming the code in the loop isn't using some kind of index reference that hasn't been adjusted to account for the deletions). To take your 4-paragraph document example, this works (without any skipping):

    <pre> For Each parX In docX.Paragraphs
    If parX.Style = "Heading 2" Then
    parX.Range.Delete
    End If
    Next parX</pre>


    I've found the same to be true of iterating through a document's styles, and I expect it's true of most of the VBA object collections.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •