Results 1 to 8 of 8
  1. #1
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post
    I use Word 2003, and I've got a few files created with Word 2007 that I need to read. No biggie, right? Install Microsoft's compatibility pack and it's done, right?

    Wrong. I installed the compatibility pack. Now, instead of opening a message box that says this version of Word can't read this file, Word opens a box that says "There was an error opening the file.

    Any ideas what's wrong and how to fix it?

  2. #2
    Super Moderator
    Join Date
    Aug 2001
    Location
    Evergreen, CO, USA
    Posts
    6,623
    Thanks
    3
    Thanked 60 Times in 60 Posts
    Hmmm - I've used the compatibility pack for several years with no problems. Is it possible that the files were corrupted as they were being transferred to you? Or if you tried to open and read them with Word 2003 before you opened them, that might well have corrupted them.
    Wendell

  3. #3
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts
    Quote Originally Posted by jsachs177 View Post
    I installed the compatibility pack. Now, instead of opening a message box that says this version of Word can't read this file, Word opens a box that says "There was an error opening the file.
    When it first came out, I noticed that the converter is picky about file extensions; this may have changed, but just in case... If the file is in Word 2007 native format, make sure it has a .docx extension, and if it was saved in pre-2007 binary format, make sure it has a .doc extension.

  4. #4
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post
    I don't think Word corrupted the file when I first tried to open it. First, because I didn't tell it to save, and second, because its time stamp still matches the time when I created the file. (There are several of these files, but I'm referring here to one which I created myself, and which I'd like to be able to read now.)

    And the file type is docx, as it should be.

    Is there a way to retrieve the text from this file that doesn't depend on Word? It's just a list of about a dozen book titles; there are no footnotes, headers, etc., and formatting is unimportant. I looked at the file wilh a binary editor, but I didn't see anything resembling ASCII.

  5. #5
    Super Moderator
    Join Date
    Dec 2000
    Location
    New York, NY
    Posts
    2,970
    Thanks
    3
    Thanked 29 Times in 27 Posts
    Since the document is in a .docx format, you can retrieve the text as follows (note: the following steps work in Windows XP; might be slightly different in later versions of Windows):

    • Locate the file in Windows Explorer.
    • Change the file extension from .docx to .zip. Say OK to the warning message.
    • Double click on the zip archive to open it up in a separate window.
    • Locate the 'word' folder, and within that, the 'document.xml' file.

    You should be able to get to the document text from within the document.xml file.

    (When finished, change the file extension back from .zip to .docx if you need to try using the document again in Word)

    Gary

  6. #6
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post
    Yup, that worked. Thank you!

    I was grossed out by the amount of junk in the file. Text buried and lost under a mountain of tags. Yet another reason to avoid Word 2007.

  7. #7
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts
    Quote Originally Posted by jsachs177 View Post
    I was grossed out by the amount of junk in the file. Text buried and lost under a mountain of tags. Yet another reason to avoid Word 2007.
    XML is very heavy on the metadata, and not meant to be read by humans. But until you've seen the bit soup inside of a .DOC file, I'm not sure you can really make a fair comparison.

    == Edit ==

    If you open the document.xml file in Word as a plain text file, you can strip the tags and end up with unformatted text with a few find and replace operations. This was tested in a simple document with headings and ordinary paragraphs with no direct styling.

    (1) Wildcards OFF

    Find what: </w>
    Replace with: ^p

    (2) Wildcards ON

    If you want to mark the Heading styles for later restoration:
    Find what: (\<wStyle w:val=")(Heading*)("\/\>)
    Replace with: [[\2]]

    Remove all remaining tags:
    Find what: \<*\>
    Replace with:

    (That last replace contains nothing.)

  8. #8
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post
    Stripping the tags out is pretty much what I did. I'll remember the trick for the next time I need it... which may be quite a while. I don't need to get at the other docx files right away, but when I do, I'll want them as formatted documents.

    Actually, I have looked at the old word format. Years ago I considered writing an API that would convert Word documents to other forms, and actually started coding. For some reason I'm bothered by heaps of tags more than by thickets of offsets and bitmaps. I won't claim it makes sense.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •