Results 1 to 6 of 6
  1. #1
    3 Star Lounger
    Join Date
    Jan 2007
    Location
    Massachusetts, USA
    Posts
    249
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Macro needed to convert styles to XML tags (MS Word 2003, SP2)

    (Edited by HansV to make URL clickable - see <!help=19>Help 19<!/help>)

    Hello,

    I have been looking for a way to convert my word files to clean XML, using the styles that I have in my documents.

    Basically, I would like to find a way to do this without generating over 400 pages of XML code - using the standard Microsoft approach.

    I found a web article that talks about using a Visual Basic form to convert MS Word documents to XML. But, since I do not have Visual Basic, I would like to find out if it is possible to use the same
    in a word macro. Does anyone know if this is possible?

    If it is possible, then how would the macro be setup and could I do it all with one macro or would I have to use two or more macros?
    Also, I would like to have the macro point to a folder full of word documents and then open and tag all the documents.

    I am not worried about the graphics, because I am removing all the graphics anway before running this Word to XML conversion.

    Here is the web article that mentions this conversion:

    http://www.devx.com/dotnet/Article/17358/1954?pf=true

    And here is the code (see attached) from the website.

    Thanks in advance for any/all suggestions.

    Regards,
    -J
    Attached Files Attached Files

  2. Get our unique weekly Newsletter with tips and techniques, how to's and critical updates on Windows 7, Windows 8, Windows XP, Firefox, Internet Explorer, Google, etc. Join our 480,000 subscribers!

    Excel 2013: The Missing Manual

    + Get this BONUS — free!

    Get the most of Excel! Learn about new features, basics of creating a new spreadsheet and using the infamous Ribbon in the first chapter of Excel 2013: The Missing Manual - Subscribe and download Chapter 1 for free!

  3. #2
    Super Moderator
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    3,515
    Thanks
    3
    Thanked 143 Times in 136 Posts

    Re: Macro needed to convert styles to XML tags (MS

    That is an excellent article and gives good descriptions of the basic process to be followed. The short answer is 'yes' the same can be coded directly from Word and this would simplify things slightly. I would recommend you to simplify some things such as bothering with the local formatting since handling this complexity is not as simple as the article makes out. Character styles for instance can span multiple paragraphs and would break the nesting rules for xml. However the idea of removing formatting from the paragraph marks would allow you solve this - and might be run more easily by a search and replace early in the process. Ultimately, you need to ask if the bold/italic etc is REALLY important in the xml file or whether it is only window dressing compared to the graphics which you have been busy getting rid of.

    The number of macros required to perform the task could be one or many, depending on how extreme your programmer is on building modular code. In any case, all could be run from a single start point so the actual number of macros is largely irrelevant to the task.
    Andrew Lockton, Chrysalis Design, Melbourne Australia

  4. #3
    3 Star Lounger
    Join Date
    Jan 2007
    Location
    Massachusetts, USA
    Posts
    249
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Macro needed to convert styles to XML tags (MS

    Huge screenshot shrunk by HansV - don't post images larger than 640x480 PLEASE

    Hello Andrew and all,

    Thanks for the great response. I appreciate it.

    I would like to use one super macro (Word to XML) that would do the following:

    1. Point to a folder full of word documents (.doc format).
    2. Find and replace all BOLD text with bold text tags.
    3. Find and replace all Italics text with italics tags.
    4. Pass through all the documents and remove paragraph marks (carriage returns) and manual spacing and tabs.
    5. Map all styles in the document to elements. For example, the Paragraph Level1 Style becomes the Paragraph Level1 element and so on...
    6. The article I recently referred to, talked about the preservation of page numbers. I am not too concerned with that now, but would like the macro to convert my page header and page footer styles to elements.
    7. Remove "HARD" page breaks from the document as the macro runs through it.
    8. The macro should also map and convert bullets, numbered paragraphs and numbered steps to corresponding elements.
    9. Finally, the macro should (as discussed on the last page of the previously mentioned web link) remove all of Word's control characters. Basically, I would like anything not related to a specific style, removed from each document, as the macro runs through it. Also, I am not worried about graphics and would like all graphics (embedded, or linked) to be removed from the document (perhaps also leaving a placeholder for the figure/graphic style).

    I started to go about trying to create this macro, but ran into some coding errors. I am not sure what the differences are between the code for Visual Basic coding for forms and the MS Word specific code - that would be used here in this macro, but I have attached a screen shot that shows some of the initial errors I was hitting. This was after following the exact example in that website link - with a little modification at the beginning.

    My end goal is to take all the (CLEAN) processed XML documents and then open them up into FrameMaker 8 with and EDD, add graphics and then deliver structured FrameMaker 8 books.
    This macro would writers to potentially bypass the standard route of importing .DOC and/or .RTF formats into FrameMaker and doing extensive cleanup.

    Overall, if I could have some ideas on how to create a "working" macro that will basically tag 20-30 generic styles and remove everything else (save for the bold, italic text) then that would be truly fantastic.

    Thanks in advance for responses related to this posting.

    Regards,

    Jim
    Attached Images Attached Images
    • File Type: jpg x.jpg (30.4 KB, 13 views)

  5. #4
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 16 Times in 16 Posts

    Re: Macro needed to convert styles to XML tags (MS

    Remove the parentheses ( ) after ClearFormatting
    Replace the opening parenthesis after Execute with a space, and remove the corresponding closing parenthesis.

  6. #5
    3 Star Lounger
    Join Date
    Jan 2007
    Location
    Massachusetts, USA
    Posts
    249
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Macro needed to convert styles to XML tags (MS

    Hello Hans,

    Thanks for the tip. Yes, I followed your instructions and the red lines were removed from the code, but I was still getting an error message.

    Then, I found the error and changed the line to point to clearCRFormatting doc

    There are no more errors now, but I am not sure that this macro has worked, especially since I removed () in various places.

    Here is my current code:

    -----------------------------------------------------------------------------------------------------------------------------

    Sub ReplaceInFolder()
    ' Modify as needed, but keep the trailing backslash
    Const strFolder = "C:Test 3"
    Dim strFile As String
    Dim doc As Document
    Application.ScreenUpdating = False
    strFile = Dir(strFolder & "*.doc")
    Do While Not strFile = ""
    Set doc = Documents.Open(strFolder & strFile)
    clearCRFormatting doc
    doc.Close SaveChanges:=True
    strFile = Dir
    Loop
    Application.ScreenUpdating = True
    End Sub

    Sub clearCRFormatting(ByVal doc _
    As Word.Document)
    With doc.Content.Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Replacement.Font.Bold = 0
    .Replacement.Font.Italic = 0
    .Replacement.Font.Underline = 0
    .Execute findtext:="^p", ReplaceWith:="^p", _
    Format:=True, _
    Replace:=Word.WdReplace.wdReplaceAll
    End With
    End Sub

    -----------------------------------------------------------------------------------------------------------------------------

    Is there a way to cross check the document(s) in my Test 3 folder to make sure
    that the paragraph remarks (carriage returns) have been replaced with UNFORMATTED paragraph marks?

    Thanks again,

    -J

  7. #6
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 16 Times in 16 Posts

    Re: Macro needed to convert styles to XML tags (MS

    I'd simply perform a visual inspection on a few documents.

    BTW, if you are running this from within Word itself, you can change Word.WdReplace.wdReplaceAll to wdReplaceAll

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •