Results 1 to 4 of 4
  1. #1
    New Lounger
    Join Date
    Jan 2003
    Location
    Cleveland, Ohio, USA
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Opening UTF-8 Encoded Files in Excel (2002 (10.2614.3311) and others)

    I need to import a mixed-language UTF-8 encoded CSV or TXT file into excel. In the import text wizard, it doesn't offer me any correct options for the encoding... I can export an Excel spreadsheet in "Unicode text" but never import it correctly again. Word allows me to open the file and choose Unicode (UTF-8). Any advice on how to do this?

    Thanks,
    Jim Jarrett

  2. #2
    New Lounger
    Join Date
    Jan 2003
    Location
    Cleveland, Ohio, USA
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Opening UTF-8 Encoded Files in Excel (2002 (10.2614.3311) and others)

    I would enclose an example mixed-language UTF-8 encoded CSV file, but I'm getting server errors when I try.

  3. #3
    New Lounger
    Join Date
    Jan 2003
    Location
    Cleveland, Ohio, USA
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Opening UTF-8 Encoded Files in Excel (2002 (10.2614.3311) and others)

    I'll try to attach the example again...

  4. #4
    New Lounger
    Join Date
    Jan 2003
    Location
    Cleveland, Ohio, USA
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Opening UTF-8 Encoded Files in Excel (2002 (10.2614.3311) and others)

    I've done more research, and I've found a black-magic sequence that gets me to excel, and ideas how I could get back out of excel as well... but this really stinks! Word has the best Unicode support of the suite; excel and access are awful.

    I tried to import the UTF-8 CSV file into MS Access to see if its conversion was any better. No luck. It's as dumb as Excel.

    1. Open the UTF-8 CSV or TXT file in MS Word and choose "Unicode (UFT-8)" encoding. (If you switch the Normal font to Arial Unicode MS, you'll see more of the characters, but even if you don't, it gets them in okay.)

    2. Select all the text and choose Table/Convert/Text to Table. It should figure out the formatting correctly and put everthing in Cells correctly.

    3. File/Save As... choose File Type of Web, Filtered. You get an HTML file.

    4. Open the HTML file in a text editor and remove this line from the <head> area, and save it again:
    <meta name=Generator content="Microsoft Word 10 (filtered)">

    5. Open Excel. Open the HTML file and it will put the table in cells correctly, with all the unicode characters preserved (because UTF-8 encoding is a standard encoding for HTML files). If you save this as an .xls file, it will preserve all of the data correctly. However, if you save it as CSV or TXT, you will lose all the encoding.

    6. At this point, you have a number of options. If you save it as Unicode Text from Excel, you have to bring it back into word and do all that crap again, and you can't save it in UTF-8. If you save it as HTML from Excel, you get Microsofts unfiltered - horribly namespaced - stuff that is unusable as a text file. If you save it as XML, you get all of the characters preserved in a text form you might even be able to use in another tool, but it's still pretty specific to Excel. At least you can open it again in Excel, though! Ugh.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •