Results 1 to 15 of 15
  1. #1
    New Lounger
    Join Date
    May 2002
    Location
    New York, New York, USA
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Hyphenation Logic (if any?) (Word 2002)

    I'm studying Word's hyphenation, and find there is very little documentation about how it works behind the curtain.

    When hyphenation is turned on, how does Word make the decision about where to place the hyphen? A few things can be controlled (the width of the zone, whether or not to hyphenate a word in caps, and the number of successive hyphens), but apart from those, how does Word know where to put the break?

    Does it refer to the spelling dictionary (which could contain recommended hyphenation points) or perhaps there are some rudimentary algorithms (allow a hyphen after "re", break only next to a vowel, etc.)?

    On older systems such as the GEM version of Ventura Publisher and Xywrite, for example, the quality of the hyphenation logic was a major feature that attracted professional users. But now, the typical document that one sees has hyphenation turned off , and when it is on the quality of the breaks is often poor. Are there third party addins that address hyphenation?

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    As far as I know, hyphenation uses two files per installed language. For English, they are MSHYPH2.DLL and MSHY2_EN.LEX; presumably, the first contains the hyphenation algorithm and the second a list of hyphenated words.

    There used to be third-party software for spell checking etc. in older versions of Word, but I'm not aware of any for Word 2002.

  3. #3
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    If you'd like more DTP-like hyphenation options (Don't hyphenate words with less than # letters, Don't hyphenate first # letters/last # letters, ...), you might want to tell MS about it <img src=/S/grin.gif border=0 alt=grin width=15 height=15>

    Word doesn't use a full hyphenation and spelling dictionary (as XYWrite used to), but a comparatively small dictionary plus heuristic rules. This works pretty well for English, but not so good for a language like German where you can combine words to form new words. Many times, the recommendations that Word makes for mis-spelled words are funny but useless in German.

    Word's hyphenation algorithm gets confused pretty easily. For example, it won't hyphenate any word following a slash /.
    You pretty often have to insert optional hyphens (Ctrl+Hyphen).

    BTW, Word usually puts everything that doesn't fit on the next line. With full justification, it only expands spaces by default.
    If you want Word to compress spaces a bit, too, to make the text fit in the line, you can check "Tools > Options > Compatibility > Do full justification like WP 6.0 for Windows".

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16> Klaus

  4. #4
    New Lounger
    Join Date
    May 2002
    Location
    New York, New York, USA
    Posts
    22
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    Hans and Klaus, thanks for your input which was very useful. MSHY2_EN.LEX is a binary, but there is some clear text at the top showing a copyright by Inso Corporation 1996, "2.91.8125 English RAM/Disk pref hyphendict.ics". It's a 161K file. Inso used to be Infosoft, prior to that the software division of Houghton Mifflin according to some Google info. www.inso.com now resolves to stellent.com, a content management firm. No indication on their web site that they still do dictionaries.

    I'm looking at my old Ventura Publisher documentation (1989). Ventura came with algorithm based hyphenation plus a small dictionary to handle special cases where the algorithm failed. "Since the algorithm fails on only a few words, only a few words are actually looked up in the dictionary. The result is very fast hyphenation." And a bit further on: "The standard algorithm used for American English is very fast and quite accurate, but it may miss some hyphenation opportunities." Ventura also had the Edco hyphenation dictionary which could be installed if desired. It "provides a complete 130,000 word English hyphenation dictionary which has been used by typographic professionals for over nine years" and was quite good. You could add words to it, and it provided extensive control over hyphenation points.

    So why, 15 years later, does Word offer such poor quality hyphenation? Mainly I think because Word has lowered typographic expectations to the point where hyphenation is considered a nuisance.

  5. #5
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    It does seem that Word's English Language line-break hyphenation doesn't follow general precepts, but the general precepts are pretty complex:

    http://www.hyphenologist.co.uk/book/BOOK-ED3.HTM#hbc
    http://www.xs4all.nl/~talo/talo/e_rules.html
    http://www.ruddle.com/writing_grammar.html

    One nuisance is that Word will attempt to break Proper case words; since in English these are usually names, this is incorrect.

    Does anyone have or know of a custom add-in or algorithm for line-break hyphenation in Word?
    -John ... I float in liquid gardens
    UTC -7ąDS

  6. #6
    5 Star Lounger
    Join Date
    May 2001
    Location
    Stuttgart, Baden-W, Germany
    Posts
    931
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    A few years ago, we used a program called Carlos (by "Text & Satz") to add all possible hyphenation points as optional hyphens.
    But it was for German texts only, cost quite a bit, and the book publishers stopped paying us for the additional work. So we stopped using it.

    I don't know if there is an English program that does the same... a quick Google search didn't turn up anything.

    <img src=/S/cheers.gif border=0 alt=cheers width=30 height=16> Klaus

  7. #7
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    Sometime in '95 or '96 I did some work on this, and the issue is recurring for my wife in her present job. As is made clear by some of the links noted above, English spelling is sufficiently unstructured that there are more exceptions than rules to line-break hyphenation. However, the first general rule is that breaks should be made first on any suffix then on any prefix.

    Since I have never been able to listen to common sense and reason, and in the face of my complete ignorance of Word VBA (for that matter my user knowledge of Word is shallow), I wrote the attached code to break many English words greater than 8 characters at suffixes and prefixes. It has a success rate of maybe about 60%, with probably 10% ensuing incorrect hyphenations. By design it works on the paragraph where the insertion point is. I didn't bother with two letter suffixes, they are a huge list in themselves.

    One problem that I haven't been able to figure out is just how Word treats a single word Range at the end of a sentence; as written the code won't hyphenate a word at the end of a sentence: this should be changed so it works on every word at the end of the sentence EXCEPT the last word in the paragraph. How can this be done?

    Any and all kind and gentle feedback invited, especially ideas to make it run faster.
    Attached Files Attached Files
    -John ... I float in liquid gardens
    UTC -7ąDS

  8. #8
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    Can anybody point me in the right direction on this question:

    ... as written the code won't hyphenate a word at the end of a sentence: this should be changed so it works on every word at the end of the sentence EXCEPT the last word in the paragraph. How can this be done?
    -John ... I float in liquid gardens
    UTC -7ąDS

  9. #9
    Plutonium Lounger
    Join Date
    Nov 2001
    Posts
    10,550
    Thanks
    0
    Thanked 7 Times in 7 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    I don't see the "end of sentence" problem, but as the code is written it tends to skip over words "at random" because...

    When you replace a word with the same word containing an optional hyphen, Word sees the new thing as TWO words. (or more if there was a prefix and a suffix) This causes the code that does "For Each wrd In rngP.Words" to miss out the next word (or two). I think you need to change this construction to step backwords through the paragraph (for i = rngP.Words.Count to 1 step -1 : with rngP.Words(i). etc.)

    StuartR

  10. #10
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    On my system, the macro as written by you systematically skips the last word of every second sentence. Looping backwards as Stuart suggested works better, but still not correctly. The following modifications to your code work for me, but it does a lot of redundant work, and is much slower:

    Below the line that sets rngP, add

    rngP.MoveEnd Unit:=wdWord, Count:=-3

    This moves the end of the paragraph 3 words back: the paragraph mark, the period, and the last real word (that was to be skipped). Change the line that executes the Find instruction to

    rngP.Find.Execute FindText:=Trim(wrd.Text), MatchCase:=True, Wrap:=wdFindStop, ReplaceWith:=strText, _
    Format:=False, Replace:=wdReplaceAll

  11. #11
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    Thanks Hans and Stuart. I modified the loop to step backwards through the words as Stuart suggested, and Hans, I skipped the last word in the paragraph the easy way;

    For intThisWordNumber = (intWordCountInPara - 1) To 1 Step -1
    Set wrd = rngP.Words(intThisWordNumber)

    It does seem a little slower, but tolerable.
    -John ... I float in liquid gardens
    UTC -7ąDS

  12. #12
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    If that works for you, fine (it produced inconsistent results when I tried it.)

  13. #13
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    Hans, it seems to; what kind of inconsistencies were you finding?
    -John ... I float in liquid gardens
    UTC -7ąDS

  14. #14
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    For intThisWordNumber = (intWordCountInPara - N) To 1 Step -1

    Whatever value I selected for N, the last word in the paragraph was sometimes hyphenated. Shrinking the range before looping worked better for me.

  15. #15
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Hyphenation Logic (if any?) (Word 2002)

    Thanks, I'll watch for the problem. I don't suppose it will hurt me to use a belt AND suspenders though ... <img src=/S/grin.gif border=0 alt=grin width=15 height=15>
    -John ... I float in liquid gardens
    UTC -7ąDS

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •