Page 1 of 3 123 LastLast
Results 1 to 15 of 34
  1. #1
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    MetaData - defined? (Word97 and on)

    In searching the various forums here there seem to be various definitions of MetaData in Word documents, or, if you prefer, word metadata.

    1) Data under the control of the user, such as is found and can be cleared by choosing File, Properties.

    2) Data under control of the VBA-enabled user, such as orphaned style defintions, custom document properties, data modules and document variables.

    3) Data apparently not used, but of potential use from outsise the document. Bookmarks are the obvious candidate (user can link a chunk of bookmarked text)

    4) Data not under the control of the average VBA-enabled user. Here I would lump that stuff that pads out a 64K block (I recall reading that Word picks up garbage from assigned RAM) and data/VBA code that occurs during document bloat (such as Fast Saves)


    Several discussions speak of a "clean" document, as a counter to a metedata-dirty document, as what you can get when you print a hard copy or publish it in a format such as PDF.

    It seems to me that any utility that addresses the MetaData issue, especially in terms of users feeling secure in emailing a copy of a document, ought to clarify that utility's definition of MetaData and break down cleansing into at least these four areas.


    If your company is using cleansers to flush out potentially embarassing stuff, I'd be interested to hear where your defintions of 'embarrassing stuff" fits into the four broad areas I've listed above. I'm open to other broad areas.

    Embarassing doesn't have to be limited to confidential. A document that prints as one page of text but files at 500KB because of bloat is, to my mind, embarassing.

  2. #2
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    <img src=/S/confused.gif border=0 alt=confused width=15 height=20>To confuse matters some more: The definition of "metadata" I was familiar with (for example from SGML or data mining applications) is that it's data about your data.

    In Word, metadata according to that definition would be for example
    -- styles (since a paragraph style "Heading 1" or a character style "First name" tells me something about the text formatted in that style)
    -- the properties in the DocumentProperty object
    -- revisions
    -- bookmarks
    -- or generally anything that helps you to format/structure your doc in a way that the information can be easily accessed and understood.


    I was pretty surprised when I read the threads in the forums about metadata as being something that should be deleted from your documents ...

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16>Regards, Klaus

  3. #3
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    > t's data about your data.

    Me too. That started my confusion. It's probably too late to use a different term, in the emantime it seems to me that there are varying degress or levels of data-other-than-what-you-see-printed that give users concern.

    In the strictest sense of a Word document, macro code could be construed as ""metadata"" since it is stuff (data) associated with a document, yet not visible on the printed page or PDF version.

    If I write a meta-cleanser, I think I'll separate the issues much as I had suggested above. There's be a quick-start button that says "get rid of anything that doesn't show up on paper", that is, in viewing-mode only. That might entail converting fields to literal values.

  4. #4
    Silver Lounger Charles Kenyon's Avatar
    Join Date
    Jan 2001
    Location
    Sun Prairie, Wisconsin, Wisconsin, USA
    Posts
    2,049
    Thanks
    124
    Thanked 119 Times in 116 Posts

    Re: MetaData - defined? (Word97 and on)

    The term seems to have taken on a meaning beyond the "data about data" formal definition.

    My understanding is that the meaning being given when people talking about stripping metadata is that it is information contained in the file that the ordinary (ignorant) person using Word would not realize are present. If versions is turned on or track changes is on with display of change marking turned off, that can be extensive. Fast Saves is another real source (as well as being a source of all sorts of other mischief).

    I know that there are several companies marketing cleaners. I have a piece of an article about metadata in my Users Guide entitled <A target="_blank" HREF=http://www.addbalance.com/usersguide/metadata.htm> Confidentiality</A> that links to some of them.
    Charles Kyle Kenyon
    Madison, Wisconsin

  5. #5
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    > my Users Guide entitled Confidentiality

    Thank you. I had already visited here as a result of my earlier search across all forums for "metadata".

    It seems to me that a commercially acceptable "metadata" cleanser would have to be pretty open about what it was doing, and why. For example, potential users might want to inspect the code to satisfy themselves that it did indeed process ecery field (backwards sequence) and so on.

    Also users might need to be convinced that in order to properly "disconnect it from its past life" there might be a need to re-create the document anew and save the new copy right over the top of the old one.

    In short, user education.

  6. #6
    3 Star Lounger
    Join Date
    Jun 2001
    Location
    Los Angeles, California, USA
    Posts
    289
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    Gentlemen,

    Thanks for the links to this discussion. Just as the dot in dotcom is, in fact, a point, or a period, and not a dot (which is something else entirely) the term metadata, because of its "sexy" mouth feel has become a catch-all phrase for any data the user does not immediately see upon opening the document in Word. My firm has chosen not to use a "metadata cleaner," primarily because we forbid the use of Track Changes by any staff.

    I think if you use Track Changes and always remember to "Accept or Reject" all before sending to adversarial parties, then there should be no "metadata" issues. What's the harm of having the creator or editor identified? Our big concern is adversarial parties discovering edits we don't want them to know about because someone forgot to "Accept or Reject All" and simply turned off highlighting of changes in the document before e-mailing it somewhere.

  7. #7
    BAM
    Guest

    Re: MetaData - defined? (Word97 and on)

    Hi vswearingen,

    Another item you may want to add to the "forbidden" list is Versions. Under File/Versions, there is a check box for "Automatically save version on close". Not only can this bloat a file and can cause document corruption, it is also a prime source of the type of Metadata you do not want to send out.

    The same goes for making sure Fast Saves is disabled in Tools/Options/Save.

    Another post that covers Metadata is this one
    ~~~~~~~~~~~~~~
    Cheers! <img src=/S/smile.gif border=0 alt=smile width=15 height=15>

  8. #8
    3 Star Lounger
    Join Date
    Jun 2001
    Location
    Los Angeles, California, USA
    Posts
    289
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    Thanks, both items you mentioned are disabled by default with VBA code in AutoNew and our training includes not turning them on again, EVER, FOR ANY REASON. Fast Saves, aside from being logically unnecessary, is EVIL; it isn't fast, it doesn't really save, and leads to corruption. We version documents through our DMS -- DocsOpen.

    Back to ChrisGreaves' question at top. I think you're on the right track. A one-button solution to what we've stop-gapped at my firm would be nice.

  9. #9
    Platinum Lounger
    Join Date
    Dec 2000
    Location
    Queanbeyan, New South Wales, Australia
    Posts
    3,730
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    Van Swearingen,

    Even with those changes, I think there's things left behind which may be visible to other users.

    I just tried this experiment:
    .I created a new blank document
    .I typed in two lines of text
    .I saved and closed the document
    .I opened it up again
    .I deleted those 2 lines
    .Closed and saved again.

    I then opened up the document in Notepad. The 2 lines I had deleted were still in there. I don't have fast save and I don't have track changes. I think it's just some sort of Work area which Word uses, and doesn't clean up.
    Subway Belconnen- home of the Signboard to make you smile. Get (almost) daily updates- follow SubwayBelconnen on Twitter.

  10. #10
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: MetaData - defined? (Word97 and on)

    I think you can flush that with this:

    Sub AutoClose()
    ActiveDocument.UndoClear
    End Sub

    (Of course, if the user presses cancel, he might regret losing the Undo buffer...)

  11. #11
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    > The 2 lines I had deleted were still in there

    This is more interesting than it first appeared.

    As a good scientist I tried to duplicate your experiment (Word97SR2), first by typing two lines (OK, I used two PARAGRAPHS)
    <pre>This is the first line.
    This is the second line.
    </pre>

    and then with three "lines"
    <pre>This is the first line.
    This is the second line.
    This is the third line.
    </pre>


    After deleting the text and reopening the document in NotePad, I used WordWrap-on and Search Find to look for the string "line".

    In each case, my search resulted in TWO occurrences of "This is the first line.", both towards the end of the document. In no cases did I find the Second or Third lines.

    I'm going to think about why I'd want to preserve that text, were I Word.

  12. #12
    BAM
    Guest

    Re: MetaData - defined? (Word97 and on)

    Hi Chris/Geoff,

    Did you also delete the Title from the Summary tab in File/Properties?

    Since Word uses the first line for the Title, I'd say that is why it is appearing again. <img src=/S/grin.gif border=0 alt=grin width=15 height=15>
    ~~~~~~~~~~~~~~~
    Cheers! <img src=/S/smile.gif border=0 alt=smile width=15 height=15>

  13. #13
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    > Word uses the first line for the Title,


    Bingo!


    I reloaded the document in Word, File, properties and deleted the title line (which as "firtst line", as you had predicted), Saved and closed Word, opened the doc in Notepad and a Search Find in Notepad reveals no trace.

    Here. Let me place another gold star against your name .....

  14. #14
    Platinum Lounger
    Join Date
    Dec 2000
    Location
    Queanbeyan, New South Wales, Australia
    Posts
    3,730
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    BAM,

    No I didn't try that. It may be worth while.

    My experiment (as simplistic as it was) was just intended to show that there is the possibility of data leaking which is not covered by the most obvious plugs. It makes me wonder how many more leaks may be possible.

    I've seen templates with macros blow up in size with very few changes. Because it's not a major concern of mine, I have not investigated. But I would strongly suspect that there's remants of old macros still sitting around.

    And just the fact that the most simplistic experiment has shown a possible leak, would lead me to be wary of documents I send to anyone, anywhere.
    Subway Belconnen- home of the Signboard to make you smile. Get (almost) daily updates- follow SubwayBelconnen on Twitter.

  15. #15
    Star Lounger
    Join Date
    Jan 2001
    Location
    New York, New York, USA
    Posts
    84
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: MetaData - defined? (Word97 and on)

    In addition to the items mentioned below, metadata can also include names of users who have edited the document, path to the document, hidden text, comments, tracked changes, embedded Excel workbooks (even though the user thought they were only including a particular worksheet), etc.

    The easiest way to see most of this "metadata" is to open any document using the "recover text from any file" method.

    These are the things we were told to watch out for at my office and this is why we use a "metadata cleaner" that intercepts our "send to" before emailing any documents from our firm.

    In a legal environment where files are traded back and forth between co- and opposing counsel, as well as clients who might be dismayed (not to be mention angry) if they realized that "their" documents may actually have been cloned from pre-existing similar documents, it's become standard procedure for us to "clean" our documents.

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •