Results 1 to 6 of 6
  1. #1
    3 Star Lounger
    Join Date
    Feb 2001
    Posts
    369
    Thanks
    2
    Thanked 1 Time in 1 Post

    'Phonetic' matching utility (2000sr1a)

    Dear Woody's gurus,

    I am looking for a "plug in" search facility which will give ranked hits depending on number of letters matching between short (50 char max) strings, and preferably highlight the differences. I'm interested in strings that differ by 5 or 6 characters maximum, but, and here's the tricky bit, some may be "misaligned". For instance, I'd like "th

  2. #2
    Plutonium Lounger
    Join Date
    Dec 2000
    Location
    Sacramento, California, USA
    Posts
    16,775
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: 'Phonetic' matching utility (2000sr1a)

    I'm not sure exactly what you're looking for, but you can use the Split function in Access 2000 to parse the individual strings into a param array, then you can compare the elements of the array to an array of the other string parsed the same way. Since you deal with the individual pieces, you can tally the differences for each grouping and return that value as the result. Does that help?
    Charlotte

  3. #3
    3 Star Lounger
    Join Date
    Feb 2001
    Posts
    369
    Thanks
    2
    Thanked 1 Time in 1 Post

    Re: 'Phonetic' matching utility (2000sr1a)

    Dear Charlotte,

    Maybe you could explain a little more about a parameter array and I might get it. I think it may be tricky though, because each component of the name would need to be tested against the 30000 names in the database, and if I have to split each of these names it will be tricky.

    Maybe I can give some examples. If I enter the name Charles Patterson, and the following names are in the master name list, I would want them to appear in roughly this order.

    1) Charles Henry Patterson
    2) Charlie Patterson
    3) Charline Patterson
    4) Charlie Mark Patterson
    5) Charles Potterton
    6) Charles Peterson
    7) Charlie Potterton

    do you get my drift?

    thanks

    Mark

  4. #4
    5 Star Lounger
    Join Date
    Jan 2001
    Location
    Vancouver, Br. Columbia, Canada
    Posts
    632
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: 'Phonetic' matching utility (2000sr1a)

    Mark
    this may be right off the wall, but have you considered using the Soundex algorithm? Soundex creates a code for words such that similar sounding words have the same code. It *may* allow for matching the various permutations of the names.

    Here's a link to a VB implementation that I found by searching Google for "soundex visual basic"
    http://www.developersdomain.com/vb/articles/soundex.htm

    In your examples, I would guess that you would have to consider first name and last name separately. I would probably ignore any middle name to keep the logic easier.

    HTH
    --------------------------------------------------
    Jack MacDonald
    Vancouver, Canada

  5. #5
    Plutonium Lounger
    Join Date
    Dec 2000
    Location
    Sacramento, California, USA
    Posts
    16,775
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: 'Phonetic' matching utility (2000sr1a)

    You asked about matching between two strings. You didn't mention that you wanted to compare a string to an entire table, which is a whole different thing. What exactly are you trying to do? Your original post suggested you wanted to find out where exactly the strings differed, but now you're talking about somehow ordering the records based on some kind of custom sort order. Since I am baffled by the algorithm for that sort order, such as what would cause Charline to sort in between two Charlies, I can't really suggest a technique for doing so. However, soundex, handy as it is, won't give you the sort of thing you're looking for.

    You'll have to provide more of the necessary details to get more useful answers.
    Charlotte

  6. #6
    3 Star Lounger
    Join Date
    Feb 2001
    Posts
    369
    Thanks
    2
    Thanked 1 Time in 1 Post

    Re: 'Phonetic' matching utility (2000sr1a)

    Sorry, the order is not a sort order, per se, it is a "degree of matching" order.

    I was hoping to avoid the full explanation, but here goes (can't find the "deep breath" smiley).

    I have a list of names and ages of all individuals in an area compiled by community workers in that area. Every 3 months we see a sample of individuals from that area, part selected, part self selected. When we see these individuals the name, head of household and age are recorded on a form, togehter with some information from that visit. There are several different ethnic groups in the area, and at least one of the major groups has no written language, and has a completely different structure to the language of the majority ethnic group (to the extent that one is tonal and the other not), to which belong nearly all the workers who are recording the details on the forms. Thus the same name can be written in a number of different ways. The middle name can be very important as well, as the range of given names is quite small, and in one of the ethnic groups there are no surnames. However, the search would have to cope with and without middle names, as sometimes they appear, then disappear, and sometimes even change, even amongst individuals from the majority ethnic group. Nobody knows their date of birth, and even age can be uncertain - particularly as it depends which calendar is being used. It's politically unacceptable to issue individuals with identity numbers, and anyway, as there are nearly 35,000 people in the population, it might be a bit tricky.

    My task is to tie an individual in the sample to an individual in the population list. This is usually possible through a combination of the individuals name, the name of the head of the household and the age. It's almost impossible to do it automatically, so the system we have at the moment is to pull out likely candidates on the basis of the information entered and then ask the user to choose. The difficulty is pulling out all the possible candidates, and it's here where I need help. At the moment I run an exact match on first and surname only, together with some modifications for common vowel substitutions and compensating for the surname problem. It's impossible to keep up with all the possibilities, however. An additional problem is that when typed, diacritical marks are often, but not always, coded with an additional letter (the root of most of my letter by letter matching problems), which makes simple typos potentially add or subtract letters from a word, and there are also 5 more vowels than in english.

    The soundex algorithm is unsuitable without extensive modification. I may be able to modify it, but it would be very tricky. For instance "thu" with no tone, with a falling tone, with a rising tone, and with a falling then rising tone are all different names (coming out in text as thu, thu

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •