Results 1 to 12 of 12
  1. #1
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post

    Four pages, 80 MB: how did they DO that? (2003)

    My client asked me to clean up a manual produced by a previous technical writer. It came in about three dozen files. About half are between 100 KB and 1 MB. The rest are between 50 MB and 250 MB. I must handle the latter with extreme care to avoid crashing Word before I can get useful information out of them.

    I examined a typical one of the larger files to try to figure out what made it so big. The document was four pages long and occupied about 80 MB. It consisted of text with elementary formatting and about a dozen embedded graphics. The graphics were captures of a screen or part of a screen.

    I tried deleting the graphics to see if they were responsible for the file's size. With all the graphics removed, the file's size was essentially unchanged. Doing a "Save As" made no difference; nor did writing the file to RTF, reading it back, and saving it as a new document.

    When I tried to insert the original document into a new Word document, Word crashed. When I inserted the graphics-free version into a new document (with Insert/File), the resulting file was under 100 KB. I re-introduced the graphics by saving them from the original document as PNG files and then inserting them in the new document, and the document grew to a few hundred KB.

    Some of the graphics have "rotate" handles, which implies that they were not inserted in the normal way (with Insert/Picture/From file). I don't think that the presence of such graphics is the immediate cause of the size problem, though. For example, right now I'm looking at a 229,353,472 byte document file that is six pages long, with ten paragraphs of text and eleven images, none of which have the "rotate" handles.

    I've figured out how to deflate these bloated files, but I'm baffled about how they were created in the first place. Any ideas?

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    Perhaps the graphics have an extremely high resolution, i.e. they are actually very large pictures that have been shrunk to fit on the page.

  3. #3
    Super Moderator
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    3,852
    Thanks
    4
    Thanked 259 Times in 239 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    The file format that the graphics were saved in will make a big difference, as will the colour depth of the images. If the images are screen captures then they are probably not a massive resolution but the colour depth is likely to be 24bit which probably looks the same as 8bit except 3 times larger.

    The PNG format you are converting to is a lossless compression format so the images occupy less space than if they were saved in a BMP (uncompressed) format. This file size difference is especially pronounced when the image has continuous regions of the same colour such as in a screen capture.
    Andrew Lockton, Chrysalis Design, Melbourne Australia

  4. #4
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post

    Re: Four pages, 80 MB: how did they DO that? (2003)

    I puzzled over the original format of the graphics, but I couldn't think of a way to deduce what it was, or even how much space the individual graphics occupied in the documents.

    The experiments with Save As and with writing a document to RTF, then reading it back were meant to test whether the size of the huge documents was caused by uncompressed bit map graphics. If these experiements' results are valid, they indicate that the problem is something different, or at least something more. But perhaps something subtle is happening that makes these results misleading.

  5. #5
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    That's a lot of metadata. Could they have used versions?

  6. #6
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    Hi jsachs,

    If a graphic was originally of a full screen dump, but has been cropped to show only a portion of the screen, the complete image will still be there. One way of reducing the file size would be to save the file to HTML, using a suitable screen resolution for the graphics, then re-open the file, & replace the graphics with the ones from the HTML version. By default, these will be 8-bit jpeg images and anything that had been cropped will have been deleted.
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  7. #7
    Plutonium Lounger
    Join Date
    Nov 2001
    Posts
    10,550
    Thanks
    0
    Thanked 7 Times in 7 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    It would probably be worth trying to compress the pictures from within Word, not the best of picture editors but it can often remove bloat quickly.

    Select any picture, Format > Picture..., go to the Picture tab and click the Compress... button. Select "All pictures in document" plus whichever quaility is appropriate and set the two check boxes for "Delete cropped areas of pictures" and "Compress pictures".

    StuartR

  8. #8
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post

    Re: Four pages, 80 MB: how did they DO that? (2003)

    jscher2000 wrote: “That's a lot of metadata. Could they have used versions?”

    Good idea, but it turned out they don’t.

    macropod wrote: “If a graphic was originally of a full screen dump, but has been cropped to show only a portion of the screen, the complete image will still be there.”

    Another good idea that appears not to apply here. I did a spot check and found no cropped images.

    StuartR wrote: “It would probably be worth trying to compress the pictures from within Word, not the best of picture editors but it can often remove bloat quickly.”

    That one tickled me, because I didn’t know Word had this feature. I imagined discovering that the two days of work I did last Friday and Monday could have been accomplished in a few minutes. But it turned out that compressing all of the images in one of the original files saved only a tiny amount of space.

    I tried to shrink the original file and the version with Word-compressed graphics by the technique which worked after I exported, compressed, and re-inserted the graphics: by saving the file and inserting it into a new, empty document. Word crashed in both cases.

    Here are some more numbers (not from the file that I described in the original post, but from a similar one).

    Text contents: 6 pages, 12 paragraphs, 1377 characters including spaces.

    Image contents: four embedded 1024x768 images, nine others ranging from 182x126 to 554x437. Total size is about 4M pixels. Assuming uncompressed 24-bit graphics, that’s about 12 MB.

    Original size: 90,917,888 bytes.

    Size with all images compressed in Word: 90,911,744 bytes.

    So at most 12 MB of this 90 MB file consists of images; my experience shrinking the files tells me that the text and document format overhead probably account for another 100 KB or so. The rest has been gobbled up by an as-yet unidentified “Factor X,” and we may never know what it is.

  9. #9
    Super Moderator
    Join Date
    Jan 2001
    Location
    Melbourne, Victoria, Australia
    Posts
    3,852
    Thanks
    4
    Thanked 259 Times in 239 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    The most obvious place to look would be for tracked revisions in the document - I haven't seen you mention that you checked for that.

    I have also seen document size fall dramatically simply by unfloating graphics to put them inline. I have no idea why this might work but it is worth a shot.

    Another thing you might want to do is to examine the headers and footers. If you turn on both odd/even headers and different first page and make sure each section has at least three pages then you could reveal a whole bunch of content that doesn't show up otherwise. It might even be possible that there are graphics in a header or footer that is floated off the page so you might try selecting all and deleting for the purposes of removing that possibility.
    Andrew Lockton, Chrysalis Design, Melbourne Australia

  10. #10
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    I assumed that deleting the images would fix the following, so I didn't mention it before, but perhaps it makes sense to check anyway. If the images were accompanied by embedded applications, then the document will be immensely bloated. To check for this, use Alt+F9 to toggle the display of field codes and check the image locations for any EMBED fields.

  11. #11
    4 Star Lounger
    Join Date
    Mar 2002
    Location
    Sacramento, California, USA
    Posts
    509
    Thanks
    4
    Thanked 1 Time in 1 Post

    Re: Four pages, 80 MB: how did they DO that? (2003)

    Andrew Lockton wrote: "...I have also seen document size fall dramatically simply by unfloating graphics to put them inline. I have no idea why this might work but it is worth a shot."

    I think that's it. I unfloated the graphics in the course of cleaning up the documents, but I didn't think about what I was doing because I have never used floating graphics (apparently with good reason). Now I tried it without making any other changes, and when I did a Save As the document size dropped from 210 MB to about 75 MB -- still much larger than it should be, but a substantial reduction. Then I inserted the file into a new empty file and saved that; the resulting file was 325 KB, which is just about right.

    I'm baffled that unfloating plus Save As yielded only part of the expected size reduction, but I'm pretty sure it's the key to the mystery.

    jscher2000 wrote: "...If the images were accompanied by embedded applications, then the document will be immensely bloated. To check for this, use Alt+F9 to toggle the display of field codes and check the image locations for any EMBED fields."

    That's not it (I checked just now), but I want to thank you for the suggestion, because the embedded application is another bit of Word functionality that I didn't know about.

    I assume that this feature is associated with the IncludePicture field, which is not the usual way of inserting a picture. (Toggling field codes did not reveal any fields for the pictures in my document, before or after cleanup.) I looked for information in the help system, but couldn't find any. Perhaps this is a feature that postdates Word 2003.

  12. #12
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Four pages, 80 MB: how did they DO that? (2003)

    INCLUDEPICTURE works with a path. EMBED includes an OLE object along with the image data, so it's quite a different beast. More like Insert>Object.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •