Page 1 of 2 12 LastLast
Results 1 to 15 of 29
  1. #1
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts

    Direct Formatting Detector (Word 97 or later)

    Hello,

    This grew out of a discussion on a separate thread, but I'm posting this as a new thread.

    The attached text file contains a macro that works as a direct formatting detector, both on the paragraph and character level.

    Once you're run this on a document, paragraphs that contain paragraph-level direct formatting (things like left or right indent, space before or after) get highlighted in light gray, and individual characters that contain direct formatting (font properties like bold, italic, size etc.) get highlighted in turquoise.

    This is a demonstration of concept and doesn't detect every possible kind of direct formatting - it detects 7 common kinds of paragraph formatting and 8 common kinds of character formatting. It would be possible to broaden this to include every possible kind of direct formatting; this would just mean a whole lot more (and slower-running) code but it is doable.

    Note that this code runs extremely slowly! - it has to examine every character in the document for a number of properties. So try test-running this on a short document first. If you run it on a long document, plan to let it run during a coffee (or lunch!) break.<g> Actually on a short document, it's kind of fun to watch it putter along highlighting the characters with direct formatting.

    BTW this was done using Word 2000, it should work fine on Word 97 as well, but hasn't been tested on Word 2002.

    Hope this is useful to someone, and I'd be grateful for feedback and further suggestions.

    Gary
    Attached Files Attached Files

  2. #2
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Los Angeles Area, California, USA
    Posts
    7,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Gary:

    I can see it's usefulness. I ran it in a heavily formatted 241 page document. It did take a long time. I actually got tired of timing it, so I watched TV <img src=/S/smile.gif border=0 alt=smile width=15 height=15>.

    I got a message that a variable was not defined on the last line of this part:
    'SECOND pass will be for characters:
    For Each aPara In ActiveDocument.Paragraphs
    lngParaLen = aPara.Range.Characters.Count
    For c = 1 To lngParaLen
    With aPara.Range.Characters©
    With .Style.Font

    so it never finished & I never got to see the lovely turquoise. If I like the turqoise, maybe I'll make it my default <img src=/S/grin.gif border=0 alt=grin width=15 height=15>.

    I wish it ran faster, but I can see the uses for this. And it's cerainly faster than searching through using Shift+F1. While you could just torch a document by Ctrl+A, & then removing all direct formatting, this allows you to see what you're removing. You can even search through the document looking for highlight & deciding how you want to make changes. Great work! <img src=/S/salute.gif border=0 alt=salute width=15 height=20>

  3. #3
    Silver Lounger
    Join Date
    Jan 2001
    Location
    West Long Branch, New Jersey, USA
    Posts
    1,921
    Thanks
    6
    Thanked 9 Times in 7 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Gary,

    Good job - and really useful. I have a client who's resisted styles forever and applied direct formating all over. He recently came to me and asked for a tutorial on styles. While I haven't done it yet, I've printed Charles Kenyon's style tutorial in preparation.

    As Phil indicated, this beats hitting Shift+F1. But I had suggested to Klaus that using Shift+F1 in a macro and trying to read its output might be the way to go. Is this possible?

    A couple of thoughts to make it run faster:
    On the paras:
    - since you're not letting the IsParaDirectFormatted function find all direct formatting but exiting on the first one, here's a suggestion that might make this part run faster:
    - it seems like all the variables associated with para formatting are declared as either integer or single. Is it possible to just add all the properties of the style to a "sum of properties of style" and do likewise for the paragraph as formatted. Then compare the sums. If they're not equal, there's direct formatting. Of course, you have to be careful of offsetting values. SInce I don't know the values these variables take on, this suggestion may not be worth much.
    - in the IsParaDirectFormatted function, I'd put a false at the very end of the function. Just a style (of programming) thing - if the function falls thru to the end (ie, no direct formatting), I always like to explicitly set the function's return value and not depend on it being some value.


    On the chars - this is probably where most of the time is spent (assuming decent number of chars per para). It might be interesting to test a doc or 2 (with reasonablely-sized paras) and see how much time is spent on the first pass vs the second pass.
    If I understood this part correctly, it seems you're checking every char in the para to get it's formatting and it's style's formatting. This seems a little wasteful bcs, in my opinion, people don't create many char styles, let alone apply them to different chars. I realize that the para style has char attributes as well. (BTW: it wasn't clear to me that the char loop checks both the font characteristics of the para style as well as the font characteristics of the char style.)

    Is there some way to gather info on the non-direct char formatting allowed for the document ahead of the char loop. That is, loop thru the para styles and the char styles and accumulate all non-direct formatting character attributes. Then in the char loop, compare each char to just 2 sets of char attributes:
    - the char attributes of the underlying para style for the para
    - the char attributes of each char style (of which I'm betting there won't be many)
    Maybe the trick of adding attribute values to some sum might work here as well.


    I still like reading the output of Shift+F1 if it's possible.

    Fred

  4. #4
    Silver Lounger
    Join Date
    Jan 2001
    Location
    West Long Branch, New Jersey, USA
    Posts
    1,921
    Thanks
    6
    Thanked 9 Times in 7 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Gary,

    It's me again.

    I was just looking at the output of Shift+F1 for paras. I think it might also be useful to apply my "pre-processing" idea that I mentioned for char in the previous email to paras also.
    - build an array of n rows (1 for each para style used in the document) and m+1 cols (1 for each para characteristic and maybe 1 col for the style name)
    - loop thru the paras in the doc
    - set attributes of underlying para style including its name
    - find the row in the array with the matching style
    - compare attributes of the para 1 by 1 with the style
    - if no failures, then no direct formatting

    what this saves is gathering the para attributes for each para when the same para style may be used for many paras. While the payoff may not be as great as for chars, I think there's some benefit here too.

    BTW: recording a macro and hitting Shift+F1 resulted in
    Application.HelpTool
    but not much info on it in VBA help (with Word97)

    Fred

  5. #5
    3 Star Lounger
    Join Date
    Jan 2001
    Location
    Wellington, Wellington, New Zealand
    Posts
    378
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Gary

    Good work!

    I might have missed something in the replies you've had so far, but would the following approach speed things up for character formatting:
    <UL><LI>Do this for each type of character formatting.
    <LI>Search from start of document for an occurrence.
    <LI>Search for a non-occurence.
    <LI>Process the character string starting with occurrence and ending one before non-occurence.
    <LI>Search for the next occurence.[/list]No doubt the approach would have its own difficulties, but if it can be made to work, it should be much faster.

    Postcript, added later
    In fact, I wonder if you could go even faster by using this approach to identify strings with ANY character formatting:
    <UL><LI>Search FORWARD for non-occurrence of ALL character formatting.
    <LI>Search BACKWARD for non-occurrence of ALL character formatting.[/list]


    Good luck
    Dale

  6. #6
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Fred,

    I'm making a last attempt to do a bit of advertising for my Reveal Codes White Paper

    It is styled rather well (compared to some other docs from Microsoft).

    Running my macro (which takes as long as saving and re-opening the file), you see a text file with some bold and italic formatting (that is normal; if you use "Emphasis" and "Bold" character styles, everything should look like plain text).

    Then two headings stick out: "Contents" and "How to Retain Formatting While You Edit Documents"; they have been manually formatted to look like headings.

    Next, you will notice a lot of empty paragraphs that were not very visible in the original document. Blank paragraphs are a no-no and should be removed.

    Also, you see bulletted and numbered lists. If these had been done with styles, they should not be visible. It's easy to clean them up: remove the manual bullets/numbers and assign a poper list or bullet style.

    Some paragraphs have a (manual) indent. Looking closer it's obvious that most of those should really have been formatted in list or list-continuation styles.

    Last, you'll notice the keyboard shortcuts, which have been manually formatted as 8 pt, and capitalized. They should have been given a special character style.

    All in all, it's easy to see and remove the manual formatting.

    Now if you run Garys macro, the document will look like before, and you'll see lots and lots of coloured paragraphs.
    On the first pages, you see that manual formatting has been applied to
    Reveal Codes
    White Paper
    Published: August 2000
    Abstract
    Contents ...

    One problem is, the macro takes too long (my documents are of megabyte size, and even with this small file, I stopped the macro after 10 minutes). Even if you could make it run ten times faster, it would be much too slow to be used often.

    Even though the macro doesn't look for all manual formatting, you see too many changes.
    You don't see why it has been changed (is it marked because the numbering was applied manually? or because the indent was changed? or "space after"? or because it's bolded?)
    You still have to decide for every paragraph on how to fix it: remove some or all manual formatting, or apply a different style?

    I have tried this approach, too, but I became convinced that it isn't practical.

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Klaus

  7. #7
    3 Star Lounger
    Join Date
    Jan 2001
    Location
    Wellington, Wellington, New Zealand
    Posts
    378
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Gary

    Here's a first crack at using Find rather than character-by-character (as suggested in my previous post).

    Problems:
    <UL><LI>Doesn't currently deal with more than one occurrence per paragraph. But that's easy enough to fix.
    <LI>Doesn't deal with styles containing character formatting.
    Attached Files Attached Files

  8. #8
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Los Angeles Area, California, USA
    Posts
    7,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Klaus:

    I like your idea & tried your macro. I ran it on a 241 page heavily formatted document. At some point, I got the message that:
    "The style name already exists or is reserved for a built-in style".

    When I clicked Debug, the following was highlighted:

    Documents(NewFile).Styles.Add _
    Name:=myStyle.NameLocal, Type:=wdStyleTypeParagraph

    I don't know what happened. Do you?

  9. #9
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Phil,

    I am not quite sure; possibly you have the empty paragraph in "Normal.Dot" formatted in another style than "Normal"?

    I put in some code to fix this, and to make the macro a bit safer by removing styles that have been added to Normal.DOT by the user.

    I also added some documentation, and fixed a bug: previously, only user-defined character styles were made "invisible", now, built-in character styles also are formatted like the "default paragraph font".

    Hope it works now; if it doesn't, send me a file that makes problems, and I'll try to find out what goes wrong.

    After you have cleaned up the document, you can select all text and paste it back into the original document. For users that seldom use styles, this looks like magic: Paste a "plain text file" into an empty document and -- <img src=/S/cauldron.gif border=0 alt=cauldron width=20 height=20> boom <img src=/S/witch.gif border=0 alt=witch width=15 height=15> -- it's formatted.

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Regards, Klaus
    Attached Files Attached Files

  10. #10
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    <P ID="edit" class=small>Edited by Gary Frieder on 02-Dec-01 05:07.</P>Hi Klaus,

    Actually I had read the code but hadn't tried running your macro (until now <g>).
    Mainly the macro I posted was done to respond to JVBNZ's (and Fred's)request <A target="_blank" HREF=http://www.wopr.com/cgi-bin/w3t/showthreaded.pl?Cat=&Board=wrd&Number=96077&page=3 &view=expanded&sb=5&o=0&vc=1#Post96077>here </A> [Mods note: sorry, I shouldn't have broken this thread into a new one <img src=/S/doh.gif border=0 alt=doh width=15 height=15>] for one that would highlight the text that had direct formatting applied.
    That, and the fact that I thought it would be fun to try out the brute force approach to this.

    Your macro is a great application of lateral thinking, runs quick and does provide the user with visual cues as to where there's direct formatting that needs to be fixed.
    The constraint I see with this approach is that it may take a fairly experienced eye to understand what one is seeing - that user can look at the output document and know what needs to be done, but a less experienced user might not.

    The idea of providing highlighting for direct formatting is sort of a 'training wheels' approach - make it very obvious (and based on the reponses on the thread, there is a desire out there for it).
    To get more details on paragraph-level direct formatting, the user can then use the Shift+F1 feature, but now it's narrowed down to only needing to look at those paragraphs that are flagged with gray highlighting.
    As far as character formatting, the blue highlighting takes you right to it, and in the case of character formatting, it's obvious at a glance what needs to be done.
    All in all, I think the highlighting does provide a useful feature.
    (It would also probably be possible to adapt the code to produce a list of all the direct paragraph-level formatting elements found in a given paragraph, and add a comment at the beginning or end of the paragraph, listing what are the direct formatting elements found in that paragraph.)

    That said, the macro I posted isn't very practical.
    Reason #1: it runs amazingly slowly - I'll probably reply to Fred in a separate post; the main point is that it has to examine each character in the document and that's incredibly slow. The first loop (for paragraphs) probably takes about 1% of the processing time - the rest is for the characters. Fred's suggestions are good but aren't going to help speed that up. Barring some genius method to speed it up 1000x, it's going to be too slow.

    The only testing I'd done on this one was at work, on relatively benign documents and on a fairly fast PC.

    Reason #2: it doesn't catch every type of direct formatting - while it would be quite possible to take the time to write a test into the code for evey possible type of formatting, that would just make the code run that many times more slowly.

    [Amending my post here: I initially thought it had choked on running a test on the White Paper, but in fact it just finished and did handle it OK, it just ran really slowly!]

    Based on my tests at work, on a less complex document I'd say it would take about 1 minute per page, which is far too slow but still feasible for shorter documents.
    It does work well on shorter, less complex docs (or longer but uncomplex docs - if you have the time) so it still might be useful to some.

    What it needs though is some way to make it much faster, and I don't think that's possible. If it were possible to make it run much faster, then it might be worth adding the feature to add a comment listing the direct formatting that was found, to each paragraph, something which might improve its usefulness.

    Lacking that, I would advise JVBNZ and Fred to try your macro, since that will probably work for them.

    Regards,
    Gary

  11. #11
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Fred,

    Thanks. Per Klaus' and my discussion, I am interested to hear if it seems to be useful for real-life users and real-life documents. Klaus's macro is a good solution and I'd recommend giving that one a try too.

    See my longer reply to Klaus, but to some of your points:

    There doesn't seem to be any way to get at any of the values displayed in the What's This? dialog - clearly that would be easy and superfast if it were possible but this appears SOL...

    As far as methods to speed it up - first off there's not much point in trying to speed up the paragraphs loop - the paragraphs loop probably takes up less than 1% of the processing time.
    What's really slowing things down is needing to step though every character - a For Each loop would be approx. 100 times faster but there is no way to do for ex:

    Dim aChar As Character
    For Each aChar in ActiveDocument.Characters
    aChar.Font etc.

    You can sort of cheat and do:
    Dim aChar As Variant
    For Each aChar in ActiveDocument.Range.Characters

    but this turns out to run just as slowly as running through all the characters via a counter.

    You're right that I've got it checking for character styles - whether it's true that they're rarely used, it seems unfair to leave out allowance for them. And once you decide to account for that, then you have to check for the full set of properties character by character.

    But even if you eliminate checking against chracter styles, I still don't see how to avoid visiting each character in the document, and that's where things are slowing down - i.e. it's the actual visiting each character, rather than processing the variables relating to it, that slows things down.

    It might be possible to take advantage of some aspect of Find as suggested in Dale's note, but I'll have to leave that one for tomorrow...

    Regards,
    Gary

  12. #12
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Dale,

    Thanks for your suggestion.
    Using Find would obviously be a whole lot faster, but I can't at first glance figure out how this will work in this case.
    It is late though so I'll try to play around with your suggested approach tomorrow.

    Regards,
    Gary

  13. #13
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Phil,

    Thanks...
    Looks like I've already used up a lot of bandwidth tonight so better keep it brief <img src=/S/grin.gif border=0 alt=grin width=15 height=15>:

    Just wondering - did your test document contain a table of contents?
    When I tried running it on the MS White Paper referred to in Klaus's post, I got the same error message when it hit the TOC. When I removed the TOC, it ran OK (just verry slowly) - so I might have to code around that.

    Regards,
    Gary

  14. #14
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Los Angeles Area, California, USA
    Posts
    7,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Gary:
    >>Looks like I've already used up a lot of bandwidth tonight so better keep it brief<<
    I've used up a lot of brain cells, so I'll keep it brief. <img src=/S/grin.gif border=0 alt=grin width=15 height=15>

    Yes, it had a TOC. I'll try to delete the TOC & run it. After all, I can always put in a TOC. Klaus has a macro too so I'll see which works better in the end.

  15. #15
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Direct Formatting Detector (Word 97 or later)

    Hi Gary,

    Perhaps your macro could still be helpful even if you don't check the formatting on a character level (which is, as you said, the slow part of the macro).
    You could check if some much-used font formatting for the whole paragraph corresponds to the style definition:
    <pre> With aPara.Range.Font
    strParaFontName = .Name
    intParaFontSize = .Size
    intParaBold = .Bold
    intParaItalic = .Italic
    intParaUnderline = .Underline
    intParaColor = .Color
    End With</pre>

    Probably giving <font color=red>some</font color=red> different <font color=blue>colours</font color=blue> to different kinds of manual formatting might be nice; since you can only use one color on a given paragraph, that option is limited.

    You are right in that using my macro to clean up documents needs some (rather good) understanding of styles (and a bit of getting used to) . I think everybody can use both your or my macro for a quick check on how well some document is "styled", and once the advantages of styling documents becomes obvious, the need for a macro like yours or mine diminishes <img src=/S/grin.gif border=0 alt=grin width=15 height=15>

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Klaus

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •