Results 1 to 10 of 10
  1. #1
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Save as HTML loses spaces (Word97/sr2)

    When I saveAsHtml, Word97 squeezes out leading spaces from each line of text. Rather like posting VBA code here without the PRE tags.

    I've tried replacing all spaces (chr(32) with hard spaces (160), but the saveAsHTMl code is smart/dumb enough to detect that too.

    Short of prefacing each line with a non-white-space character, such as asterisk, has anyone any suggestions for a strategy to outwit this demon?

  2. #2
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    This is very different in 2000, so I took a quick look up at MS. The article <A target="_blank" HREF=http://msdn.microsoft.com/library/en-us/dnword97/html/w2h.asp>Converting Word 97 Documents to HTML</A> says: "Tabs are either ignored or converted to spaces (with no way to predict which will happen)" but doesn't mention leading spaces.

    Tried downloading the latest converter pack (8/25/2000)? (<A target="_blank" HREF=http://www.microsoft.com/downloads/release.asp?releaseid=24015>WD: Additional Text Converters and Image Filters Available in Microsoft Office Converter Pack</A>) Be forewarned: the converter released for Word 2000 creates remarkably verbose output by inserting proprietary office tags. It might be easier to program around the leading spaces problem than the clean up this stuff.

  3. #3
    Gold Lounger
    Join Date
    Dec 2000
    Location
    New Hampshire, USA
    Posts
    3,386
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    msft also supplies a tool to strip the extra stuff.
    It's called the Office 2000 HTML filter.

  4. #4
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    Yes, the export filter is much "lighter," but in its efforts to maintain fidelity to the original, it too adds a lot of taggage. This is one line from an exported table of contents:

    <pre><p class=MsoToc1><span class=MsoHyperlink><u><font color=blue>
    <a href="#_Toc530033456">Preparing to Work Offline<font color=black>
    <span style='color:windowtext;display:none;text-decoration:none;'>. </span></font>
    <font color=black><span style='color:windowtext;display:none;text-decoration:none;'>2
    </span></font></a></font></u></span></p></pre>

    Still interested?

    <A target="_blank" HREF=http://office.microsoft.com/downloads/2000/Msohtmf2.aspx>Office 2000 HTML Filter 2.0 </A>

    <A target="_blank" HREF=http://support.microsoft.com/support/kb/articles/Q291/3/25.ASP>HOWTO: Programmatically Use the HTML Filter DLL to Save Word Documents as Plain HTML</A>

    I haven't tried the techniques in the latter article, but they appear to provide much more control than the interactive macro that gets installed. I'm not sure if either will work with Word 97.

  5. #5
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    >This is very different in 2000,

    Yes. I think Word97 was a try-out for a 3rd-party supplier, or "We ran out of time so we wrote a macro", but by Word2000 the conversion code was properly written and embedded right into the product, not as an add-on. I'd anticipaite different behaviour.


    > It might be easier to program around the leading spaces

    I think you're right. I dropped off to sleep thinking about this - replace all paragraphmark-whitespace combinations with paragraphmark-asterisk, save as html, then undo the replace after the save.

    That will probably open the door to a host of other things I could do.


    Or I could fire up the hard drive containing WinXP/OfficeXP and see how that deals with it all.

  6. #6
    Silver Lounger
    Join Date
    Jun 2001
    Location
    Morden, Surrey, United Kingdom
    Posts
    1,838
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    ... Or just find and replace all spaces with something else (not found otherwise!) and reverse once converted ...?
    Beryl M


  7. #7
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    True.

    Now. In dealing with string-handling procedures, what characters do i *never* use. The solution has to work for 400 procedures in one module, pluss the other modules, plus the other templates.

    The para-whitespace method has the advantage of sloving the problem without incurring other problems. My problem is with preservation of indentation, that is, the leading white-space in any line.

  8. #8
    Silver Lounger
    Join Date
    Jun 2001
    Location
    Morden, Surrey, United Kingdom
    Posts
    1,838
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    Surely there must be SOME character you don't use otherwise??!! In Symbols, maybe? As against the para-whitespace method, it replaces any number of spaces, not just one (or a set number) after a para ... although I suppose if you always use the same number of spaces that would be fine!

    Otherwise I can only wish you luck! <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>
    Beryl M


  9. #9
    Star Lounger
    Join Date
    Aug 2001
    Location
    St. Louis, Missouri, USA
    Posts
    67
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    When I need a unique identifier for this kind of thing, I use more than one character in combination.
    For instance qq rarely happens in real life, or QQ, or maybe zzz (two Zs aren't enough).

    It depends on what sort of data you have, but this always works for me.

    Lin

  10. #10
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Save as HTML loses spaces (Word97/sr2)

    In Word2000, I use Unicode characters that are in "private use" areas, such as all hexadecimal codes starting wih E: ChrW(&HE000) ...
    This gives you 4096 characters that are guaranteed not to appear in the document elsewhere. They all show as boxes, but if you only use them temporarily in a conversion, that usually isn't a big drawback.
    It should work in Word97, too, but I never tried.

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Klaus

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •