Results 1 to 4 of 4
  1. #1
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post

    ISO a "smart" Word to HTML conversion tool

    I'm looking for a Word-to-HTML conversion tool that's differently oriented than the ones I know about. I wonder if anything like it exists.

    The tools I've encountered try to produce an HTML document that looks as much as possible like the original. Some handle an impressive variety of Word formatting features, but the HTML they produce isn't very clean, and doesn't lend itself to use with style sheets. Thus, you end up with a "dead end" HTML file; the only practical way to change its appearance or content is to modify the Word document and generate new HTML.

    I need a tool that's more or less the opposite of that. My goal is to convert a group of Word documents to HTML that will be used with a particular style sheet, then discard the Word documents and make the HTML files the "live" documents.

    I don't expect the tool to produce perfect looking output; if it does 95% of the work and I have to do the last 5% by hand, I'll be happy. But it must produce output that is clean enough to make editing practical.

    I also need to be able to configure the tool to produce output that works with a specified style sheet. For example, I have to be to tell it, "When the source document uses the Courier New font in a paragraph, wrap the target text in a <span> block with such-and-such a class name."

    Is anything like this available?

  2. #2
    Super Moderator
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    3,852
    Thanks
    4
    Thanked 259 Times in 239 Posts
    I have worked with a company where we parse a 600pg Word document with 100s of images and tables to load into a joomla website (don't ask). This is all done using very complex coding and is a nightmare to keep on track.

    If you can find a tool which does what you have described then it would probably be useful to me as well. I think the closest you are going to get is to export your Word document to XML and then create your own stylesheet transformation or perhaps a parser to work from there.
    Andrew Lockton, Chrysalis Design, Melbourne Australia

  3. #3
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post
    If I have to do it myself I'll probably do it in VBA. The document object model is closest to the original content, so it should give me the most information about the document (hopefully in the most accessible form). This would be a major undertaking, though, and I hope to find that someone has already done it.

  4. #4
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post
    I found a tool named DocToHtml which looks like it may be what I want. It costs $39, which is ridiculously cheap if it works well enough to be useful.

    I also found this web site, which looks like it's designed to do exactly what I want... except that the author doesn't seem to be taking care of it, and it no longer processes native Word files, only OpenOffice files. I'll try it when I have a chance, but the idea of converting a document to OpenOffice in order to convert it to HTML sounds unpromising.
    Last edited by jsachs177; 2013-08-22 at 05:02. Reason: Found additional information

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •