Results 1 to 7 of 7
  1. #1
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Soundex (crude method) (word97/sr2)

    I wrote a crude Soundex method (attached in PKZip format ) based on a web page I found.

    Crude because I haven't spent time making it adaptable.

    While the original Soundex delivers a letter and three digits, I'd rather have a function that allowed me to vary the length of the return string and so on.
    Attached Files Attached Files

  2. #2
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Soundex (crude method) (word97/sr2)

    For anyone not familiar with Soundex indexing, the Census bureau has posted a brief explanation: The Soundex Indexing System. Knowing what Soundex is turned out to be useful in a lawsuit I worked on; maybe it will help one of you someday.

    As for creating a system that uses different strings, I suppose one could come up with refinements to the system, but why? The system was developed to compensate for poor spelling skills, and fallible hearing, while still giving acceptably speedy results. Unless you want to use Soundex in a domain of words with long and highly tortured prefixes, in which the first letter and next three consonants return too many matches, I don't think the extra work would be justified.

  3. #3
    Plutonium Lounger
    Join Date
    Dec 2000
    Location
    Sacramento, California, USA
    Posts
    16,775
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: Soundex (crude method) (word97/sr2)

    I don't like the original algorithm *because* it uses the first letter. Since some letters sound similar and can easily be misspelled, I was never satisfied with the first letter stuff. I wasn't happy with other limitations, so I built some modified soundex routines that I used for special purposes, like returning a soundex value for each word in a phrase, etc. I haven't looked at this code in years (literally), but I'm attaching it in case anyone is curious.
    Attached Files Attached Files
    Charlotte

  4. #4
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Soundex (crude method) (word97/sr2)

    > first letter and next three consonants

    This was my impression of Soundex for over twenty years. I heard about it in the early 70s, but it wasn't until yesterday that I thought to check an authorative source. My stale thought was "first letter and ignore the vowels" giving a variable-length string.


    > refinements to the system, but why?

    I'm looking at a 13,000 record name-and-address file for the Ukrainian population right now. The original Soundex may not deal with the proliferation of consonants. The original designer fudged a key of the first six letters of the surname and then a two-digit serial sequence. That cast the onus on the data-entry operator to devise a unique key ("Let's see, I think I've used as far up as GREAVE06, so this one should be (counts on fingers) ....").

    I have found that if I combine "surname" and "given name" and generate a string of 10 characters I obtain unique keys in all but 4% of the cases, most of which appear to be genuine duplicates. (getting rid of 500 duplicate records isn't a bad aim, either!).

    I'm also thinking that I can do some look-ahead searching of the database, so that as the user begins to type in a surname I can produce a drop-down list of best matches. Ideal for things like Country or City codes.

    Someone is about to tell me that that feature is built into WordVBA, and that I just haven't read far enough, right? (grin!)

  5. #5
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: Soundex (crude method) (word97/sr2)

    Seems the intention of Soundex is the opposite of unique keys. Any reason not to give up on lastname_firstname_etc. as keys and just number your database items?

  6. #6
    Plutonium Lounger
    Join Date
    Dec 2000
    Location
    Sacramento, California, USA
    Posts
    16,775
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: Soundex (crude method) (word97/sr2)

    Soundex keys should never be unique. They're intended for searching near matches, not for uniquely identifying a record.
    Charlotte

  7. #7
    Platinum Lounger
    Join Date
    Feb 2001
    Location
    Yilgarn region of Toronto, Ontario
    Posts
    5,453
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Soundex (crude method) (word97/sr2)

    > Any reason not to give up

    Why do I always feel as if my King has been cornered by two rooks, a queen and at least one knight? (grin!).

    You might be right (even bigger grin!). I guess I just got carried away by actually reading an article after all these years.

    Using the full names ought to be the most unique there is, since using any transformation of that string of characters can only lose information, right?

    Using both names would make an impossibly long key , so Soundex would be a trade-off between uniqueness and length of key.

    The only oyther reason I can now think of for employing Sondex is to build a list of potential matches as the user types in a name. I've not thought this through, but it seems that the technique used in VBE when I type the period after an object - VBE scrolls to the best-match-so-far in a list - would be nice for the user.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •