Results 1 to 7 of 7
  1. #1
    3 Star Lounger
    Join Date
    Mar 2003
    Location
    Elkins Park, Pennsylvania, USA
    Posts
    325
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Inserting Unicode Defies Logic (Word XP)

    Hi, again. I have a most bizarre problem with one particular document. (I know how to work around it for business purposes, so this investigation is purely academic.)

    I work a a pharmaceutical company, so we use a lot of special symbols. An add-in template was developed that provides a selection of the most common symbols, so that the user isn't tempted to select an inappropriate font from the Insert: Symbol menu. This template essentially presents these characters as options on a toolbar, and when one is selected, it is inserted using the Selection.TypeText method. This template has served us well for years. But now I have a document that, when I inserted the "less-than-or-equal-to" ( ChrW(8804) ), it appears in the font SimSun. And I have since found out that it seems to happen in this document with ANY Unicode (double-byte) character, e.g. ChrW(8734) (the infinity symbol), ChrW(8776) (approx. equal to), etc.

    What is also perplexing is the behavior of the character. It sometimes behaves like a protected character (i.e. if you highlight it and use CTRL+spacebar, it doesn't change), but unlike a protected symbol, the font name "SimSun" appears in the formatting toolbar, instead of the name of the paragraph font. (And in case you were wondering, there is no character-style defined that might be doing that.) If I copy the text into another document, the problem "goes away." Similarly, I can remove all of the text from the document and I can still duplicate the behavior. I'm attaching the file to this post, curious as to whether any of you can duplicate it, too.

    To make matters more maddening (I loathe a lot of alliteration!), this "phenomenon" doesn't happen when the desired symbol is inserted directly from the Insert: Symbol menu, nor if it's inserted using the numeric keypad, i.e. ALT+08804, but only when "typed" through code, either with the Selection.TypeText method or the Selection.InsertBefore method.

    Comments? Theories?
    Attached Files Attached Files
    <font face="Comic Sans MS">That's what you do in a herd; you look out for each other!</font face=comic> - Mike

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Inserting Unicode Defies Logic (Word XP)

    That's really strange. It seems that some kind of East Asian feature has become embedded in the document, but why this only occurs when you insert a character through code escapes me.

  3. #3
    3 Star Lounger
    Join Date
    Mar 2003
    Location
    Elkins Park, Pennsylvania, USA
    Posts
    325
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Inserting Unicode Defies Logic (Word XP)

    Maybe it's an escape code? [img]/forums/images/smilies/tongue.gif[/img]

    Seriously, I now have a greater issue with this font phenomenon: we are now tasked with finding all of these characters and making sure they display correctly! The problem is, Word isn't sure if they're Times New Roman or SimSun. The formatting toolbar and the Format: Font dialog show "SimSun" as the font for the character, but the Edit: Find command doesn't "see" anything in that font. Indeed, if I use Format: Font to apply Times New Roman directly, it doesn't change anything; the dialog box acts as though I didn't change the font, so the character stays exactly the same.

    Since I'm pretty sure our Standards committees don't want the character that way, this is quite a holdup! The only workaround I can think of is to search for all unicode characters (BLEAH!) and check their font, which will not go over well.

    Help!
    <font face="Comic Sans MS">That's what you do in a herd; you look out for each other!</font face=comic> - Mike

  4. #4
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Inserting Unicode Defies Logic (Word XP)

    Since it's only one document, how about recreating it from scratch?

  5. #5
    3 Star Lounger
    Join Date
    Mar 2003
    Location
    Elkins Park, Pennsylvania, USA
    Posts
    325
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Inserting Unicode Defies Logic (Word XP)

    Well, today, a second document has turned up with the same issue...

    Not knowing what else to do, and with little to lose, I decided to dive under the hood and see what I might find. I saved it as RTF and changed the extension to TXT so that I could open it up in Notepad (WordPad actually opened the ersatz TXT file in Word format; impressive, but unwanted.) Then I simply scanned the document for anything that jumped out at me (nothing did, but then I've had a rough week!) Not seeing anything particularly funky, I just searched for "SimSun" and replaced it with "Times New Roman." Then I backed out the way I came in: saved the file, changed the TXT extension to RTF, and then used Word to perform an Open and Repair. No errors were reported (but then again, none showed up before!) and my document looked and behaved fine. Even the "bad" symbols were now in Times New Roman. And now, I can insert ChrW(8804) either from the keyboard or through code and it behaves as it should.

    For those interested, here are the four "codes" (?) that I found in the RTF file that I modified.
    {f13fnilfcharset134fprq2{*panose 02010600030101010101}SimSun{*falt 'cb'ce'cc'e5};}
    {f133fnilfcharset134fprq2{*panose 02010600030101010101}@SimSun;}
    {f307fnilfcharset0fprq2 SimSun Western{*falt 'cb'ce'cc'e5};}
    {f1507fnilfcharset0fprq2 @SimSun Western;}

    I suspect that some of those were formatting codes for the symbols that were already in the document, but I'm just guessing.

    My worry is that I may be asked to do this on a large Word document. We commonly process documents of 100+ pages with scores of tables, and I think that NotePad has a limit to the size of the text file it can open... unless someone can tell me how to "force" WordPad to open a file as TXT and not to "interpret" it as Word/RTF.
    <font face="Comic Sans MS">That's what you do in a herd; you look out for each other!</font face=comic> - Mike

  6. #6
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Inserting Unicode Defies Logic (Word XP)

    Notepad can handle very large files in recent versions of Windows.

  7. #7
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Inserting Unicode Defies Logic (Word XP)

    Hi Mike,

    If a doc was created with Far East language support installed, Word keeps two different fonts for Western and Far East characters. Unfortunately, it treats a lot of symbols that aren't either as "Far East", and applies the Far East font.

    If you have support for, say, Chinese installed, the "Format > Font" dialog has two different dropdowns for the two fonts, and you can search for the Far East font.
    Unfortunately, that only helps to locate the characters but often not to fix the problem, because most Western fonts aren't allowed in the "Far East" font dropdown.
    Luckily for you, "Times New Roman" is one of the few that does work... else, your fix might have run into problems.

    When Word applies the "Far East" font by itself, it also changes the language. And in some simple cases, changing the language back to "English" also changes the font back to the Western font.
    But often, this doesn't work. And -- most annoyingly -- even if it does work when you change the language through the user interface, in my experience it never works when you change it with a macro.

    The only more or less reliable way I've found to get rid of the Asian font (short of editing the RTF, XML or HTML as you have done) is to locate the character, cut it to the clipboard, and reinsert it as plain text.

    But the whole thing is a pretty mess. Even after I uninstalled Chinese support, Word still produced docs with those Asian fonts, and docs I had produced kept showing the problem.
    Probably the Normal.dot was changed permanently, and I'll have to re-build it.

    For now, I kept the Asian language support installed so I can at least find the characters when they turn up.
    Without that support, you probably can search for them in a macro, looking for .Find.Font.NameAscii="SimSun".

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16> Klaus

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •