Results 1 to 11 of 11
  1. #1
    New Lounger
    Join Date
    Feb 2001
    Location
    Jackson, Mississippi, USA
    Posts
    20
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Unicode confusion (Word 2000)

    Our document database has large number of older WordPerfect 5.x documents that have been previously convertedt, modified and resaved as Word 97 documents. Now, I'm finding instances where 'special symbols' , like a degree or cent symbol, have not been converted correctly . I don't have a particular 'symbol' to search for, but want to find any non-keyboard character, and take a look at it.

    We are currently in the process of pre-testing our documents and programs for a conversion to Word 2000 in a few weeks.

    So, I am trying to find a way to search through multiple documents looking for any occurrence of a character that is not a standard keyboard character.

    I wrote a procedure to test the AscW value of each character, using a range object, but that was way too slow.

    Next, I tried to use the find.execute method to look for any character that had a higher AscW value than 128, for example:
    For x = 128 To 34168
    temp = "^u" + Trim(Str(x))
    With docrange.Find
    .Execute temp
    If .Found Then...

    This works much faster than looping through the character collection, but I immediately came across a problem which I cannot understand. In one of my documents, each instance of the lowercase character i was found to be the unicode character 304.

    I then tried using ^u304 in the search/replace dialog, where it also found every instance of the lowercase i in the document. Using ^u105 in the search and replace dialog found every instance of the character i whether upper or lower (what I expected).

    The font used in this particular document was Courier New. I tested with another document which had a Times New Roman font, and the lower case i was not picked up as a ^u304, and the processing did pick up a degree symbol converted to a Courier New Box Drawing (^u9554) symbol, which was the kind of magic I was looking for.

    I've spend a few days fooling around with this and I'm more confused than when I started. I've looked at numerous postings and web sites but can't quite get the hang of unicodes. Any suggestions or comments would be appreciated.

  2. #2
    5 Star Lounger
    Join Date
    Dec 2000
    Location
    Tallahassee, Florida, USA
    Posts
    901
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    I realize you're getting a Unicode value, but it's been my experience that in the documents from WP to Word, the WP Typographic Symbol font set is a legacy -- in that particular font set, A = open curly quote, etc. What you may want to search for (although I can't provide a routine for it) is the presence of that font within the document.

    Good luck!
    Karen

  3. #3
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    ^u304 is the capital I with dot above, that is used in Turkey. So if you have "Match case" not checked, this will find small i, too.

    Two suggestions to find "special characters":
    -- use a wildcard search for [!^001-^0255] (any character not in the old Windows code page).
    -- use a macro with
    <pre>Dim myCset, i
    For i=1 to 255
    myCset = myCset & Chr(i)
    Next i
    ...
    Selection.MoveWhile Cset:=myCset, Count:=wdForward</pre>

    You can add a lot of characters to be skipped to the Cset-String as you go along, and it's pretty fast.

    Characters from old, "decorative" fonts such as "Symbol" or "Wingdings" are kept in a special code page starting at Hex F000; selecting such a character, you can see the code in the "Insert > Symbol..." dialog, if you go into the sub-dialog to define a keyboard shortcut. You can then use that code to search for it (for example "alpha" from "Symbol"-font: ^u61537. Unfortunately, you can't find the "alpha" by searching for ^u61537 *and* the Symbol font if the character was inserted directly from the "Insert > Symbol" dialog.

    For a way to get around that problem, and some additional good information, see the article
    http://www.mvps.org/word/FAQs/MacrosVBA/Fi...laceSymbols.htm
    Most of the "messiness" comes from the use of decorative fonts (which don't conform to the Unicode standard), and old legacy fonts like those from Word Perfect.

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Regards, Klaus

  4. #4
    New Lounger
    Join Date
    Feb 2001
    Location
    Jackson, Mississippi, USA
    Posts
    20
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    Thanks Klaus for the suggestions. I'll try them.
    Regards,
    Ann

  5. #5
    3 Star Lounger
    Join Date
    Jan 2001
    Location
    Toronto, Ontario, Canada
    Posts
    230
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    Karen, Sounds as though you know your wp typographic and Word unicode stuff, so: Can you help me find a source that lists all the character values? i.e. especially for Marlett bullets. In Word 2000 we use the Marlett square bullet and at some point in these large docs we want to search and replace the bullet with clipboard contents. TIA.
    Patricia

  6. #6
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    Hi Patricia,

    You can use the method I described to Ann to get the code of the bullet:
    Select it, then look in "Insert > Symbol... > Keyboard shortcut".
    If you got the code (probably 61543 in your case), you can search for that character with ^u61543.

    I append a code table so you can look up all codes at once.
    You can edit the "Decorative" character style to display other decorative fonts.

    Be careful that you don't find characters from decorative fonts like "Marlett" that have been inserted directly from the "Insert >Symbol"-dialog if you search for the code and the font, since Word lies <img src=/S/liar.gif border=0 alt=liar width=25 height=22> about the font in that case. For a work-around see the link to mvps.org/word I gave previously.

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Regards, Klaus
    Attached Files Attached Files

  7. #7
    Silver Lounger
    Join Date
    Jun 2001
    Location
    Morden, Surrey, United Kingdom
    Posts
    1,838
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    There is a simpler method to just find one character wherever it appears in a document -
    <UL><LI>highlight the first instance of the character and copy it
    <LI>go into Find and paste the character in the find box (Ctrl-v - can't use menu or toolbar options in here)
    <LI>click Find Next.[/list]There are a few (very few!) characters that can't be done this way, but it usually works.

    Just my <img src=/S/2cents.gif border=0 alt=2cents width=15 height=15>.
    Beryl M


  8. #8
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    <img src=/S/bummer.gif border=0 alt=bummer width=15 height=15>... but that doesn't work with decorative fonts like Marlett, Symbol, Wingdings et al (though I agree it *should*).

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Klaus

  9. #9
    Silver Lounger
    Join Date
    Jun 2001
    Location
    Morden, Surrey, United Kingdom
    Posts
    1,838
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    You surprise me! It does in Word97 - why shouldn't it work in Word2000? I just tested it in Word97 with characters in both Marlett and Wingdings - they don't show correctly in the find box, but it finds them alright.

    The ways of Microsoft are definitely beyond my understanding ... <img src=/S/crazy.gif border=0 alt=crazy width=15 height=15>
    Beryl M


  10. #10
    3 Star Lounger
    Join Date
    Jan 2001
    Location
    Toronto, Ontario, Canada
    Posts
    230
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    Yes - it worked nicely in Word97. Word2000 is making me crazy. I haven't quite figured out the code table but I'll try when I feel saner and my jaw is unclenched. Thanks Klaus.
    Patricia

  11. #11
    5 Star Lounger
    Join Date
    Dec 2000
    Location
    Tallahassee, Florida, USA
    Posts
    901
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Unicode confusion (Word 2000)

    Sorry, Patricia -- I'm afraid I don't have a source as such. My suggestion would be the same as Beryl's, with the addition of formating the "find" as the Marlett font.

    Another option, if that's the only place in the document that Marlett is used, skip the character and look for the font. On 2k's Find and Replace dialog, if you click "More" button you'll get the format option.

    Good luck!
    Karen

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •