Results 1 to 2 of 2
Thread: HTML and MSWord2000
2001-02-16, 15:30 #1
- Join Date
- Feb 2001
- Thanked 0 Times in 0 Posts
HTML and MSWord2000
Woody (or whomever :-);
I am part of a joint editing team for an International Standard. That
standard will be released in HTML, that HTML must pass the W3C Validator,
it will be multi-page, it will use Unicode (MS Arial Unicode) and it will
include extensive internal hyperlinking.
Group-development tools for HTML-based documents appear to be
non-existant, whereas MSWord tools for same (revision marking, comments,
styling, equation editing, etc.) are much better.
We have therefore adopted a production strategy based on MSWord,
followed by a post-processing step to generate valid HTML from the initial
"messy" HTML that MSWord (both 97 and 2000) are capable of generating. The
former clearly does little in this regard; the latter imposes a *lot* of
XML overhead that doesn't pass the W3C Validator. The latter also doesn't
seem to do a very good job with consistently producing hyperlinking.
Fortunately, there is HTML Tidy (see: http://www.w3.org/People/Raggett/tidy/ )
and it does a great number of things right (although handling Unicode doesn't
seem to be one as yet).
And MS also provides separate capability clean up their MSWord output:
If you run the MS Filter, and then HTML Tidy, you can almost pass validation!
But there is a problem (feature?) We are not (at this time) prepared to switch to
another document production environment -- especially as we are a mixed
PC/Mac team. So our problem/feature is significant, to us. And here's where
your team comes in ...
I really want (among other more minor issues) the MSWord built-in
linking (autoreferencing to section header numbers, caption numbers,
equation numbers) that internally maintains consistency of references despite
reorganizations to a document, to get converted to hyperlinks in HTML. This does
*not* seem to be done by anyone (that I've tracked down, yet) -- they uniformly
throw away this information, and assume that you will have built specific
hyperlinks to accomplish the same.
Not only is this painful, but it also doesn't offer the same functionality. While
it is nice that a hyperlink can have a different label than the target text, if the
target text changes (say, a section number) I want the label to change too.
However, the default labels that makes this work not only include underscores
but are really based on MSWord internal tag-names and *not* the name/text
that actually appears at the target location (e.g., you get "Figure_4_2_1_caption"
instead of "Figure 4-1 Caption"). Using explicit hyperlinks doesn't provide me
the functionality that good-old-references do (inside of MSWord). And only the
former seem to get exported into the HTML; the latter are dropped.
This still doesn't seem like an inappropriate expectation on my part, so what
am I missing?
That said, a lot of tools appear to handle CSS, and a host of other niceties.
But it seems that if I want autoreferences to get converted to hyperlinks, I will
need to build my own post-processor to do this.
I can provide additional detail, and test-case files, if it turns out that you may
have a possible partial/complete solution.
2001-02-19, 07:21 #2
- Join Date
- Feb 2001
- Silicon Valley, USA
- Thanked 94 Times in 90 Posts
Re: HTML and MSWord2000
I would bet that if you mention this problem on the VB/VBA board, someone will write you a crude but functional post-processor in about 24 hours. Microsoft, on the other hand, has no incentive to add features to Office 2000 that it can use to generate sales for the next version.