Results 1 to 6 of 6

Thread: Close Matches

  1. #1
    Bronze Lounger
    Join Date
    Jan 2001
    Location
    La Jolla, CA
    Posts
    1,468
    Thanks
    30
    Thanked 61 Times in 57 Posts

    Close Matches

    I'm about to undertake a project that will require matching first and last names from one file with another file to determine if records match.
    Two questions initially come to mind:

    1. Would a VBA program be more efficient than running brute force formulas to do the matching? If so, I'll ask for some guidance later on that.
    The files are 65K names in one and about 500K names in another.

    2. Has anyone developed an approach for "close matching"? The two files were people-generated, so typing errors could occur and "close" could also occur.
    There might be a case where Charles became Charlie or Anderson became Andersen. It seems to me at first blush that I would need another list of possible exceptions and somehow make those comparisons.

    Any and all thoughts are GREATLY appreciated.

  2. #2
    Gold Lounger Maudibe's Avatar
    Join Date
    Aug 2010
    Location
    Pa, USA
    Posts
    2,629
    Thanks
    114
    Thanked 645 Times in 589 Posts
    KW,

    How close does it have to be. You could easily match the first two letters such as John and Jon. But then Williams and Wilson would also match. How about the first three? You could also set them in combinations with parameters such as the length within a difference of 1, 75% of the characters the same, or a matching segment of at least 3 characters. It is a matter of degree of how close. I can think of ways to write code that will reference a cross-list of similar words (maintained on a hidden sheet) and a form to add new listings to it but that would take some complex coding. Something you would save for a rainy week.

    I believe Zeddy wrote some nice code a while ago to compare 2 workbooks if he wants to post. Apologies Zeddy if I am wrong. If not, I'll chime in later if no one else kicks in.

    Maud
    Last edited by Maudibe; 2013-11-01 at 22:23.

  3. #3
    Bronze Lounger
    Join Date
    Jan 2001
    Location
    La Jolla, CA
    Posts
    1,468
    Thanks
    30
    Thanked 61 Times in 57 Posts
    Thanks, Maud, for the comments.

    Would formulas using MATCH and some calculations for first n-letters of the last name be just as efficient with two large files as VBA or does VBA beat raw code by a mile?
    I've tried a small set of data with MATCH and some COUNTIFs and looking at the first few letters of the last name to get approx. hits and it works just fine, but I don't know how it'll be with first name/last name in 500k rows of going against 65K rows.

  4. #4
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,433
    Thanks
    371
    Thanked 1,456 Times in 1,325 Posts
    Kevin,

    I think what you need is Soundex. Here's one article but if you search on VBA SOUNDEX you'll get plenty of hits. HTH
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  5. #5
    Bronze Lounger
    Join Date
    Jan 2001
    Location
    La Jolla, CA
    Posts
    1,468
    Thanks
    30
    Thanked 61 Times in 57 Posts
    RG, VERY interesting. Thanks for that tip. I'll have to do some experimenting with this.

  6. #6
    WS Lounge VIP
    Join Date
    Mar 2002
    Location
    Newcazzle, UK
    Posts
    2,819
    Thanks
    133
    Thanked 480 Times in 457 Posts
    Hi Kweaver

    First of all, you should be aware that just because names may exactly match, it doesn't mean it's the same person.
    Do you have any other cross-referencing data available such as date-of-birth, address details, phone number etc etc???

    RG beat me to replying with SOUNDEX.

    For the benefit of others I have attached an example file from J.Walk which demonstrates the SOUNDEX function.
    Soundex is an indexing system that translates a name into a 4-digit code consisting of one letter and three numbers. The advantage of Soundex is its ability to locate names by the way they sound, rather than by exact spelling. For example, consider the name Maris. This name has a Soundex code of M620. Other variations on this name (such as Mares, Marriss, Mariss, and Mairis) all have the same Soundex code.

    Some time ago I needed to do name cross-checking when trying to consolidate patient records.
    Hospital patient IDs were typically assigned on admission.
    If a patient had been admitted years before, they might have a different Hospital ID number.
    It is always much nicer to have all the relevant records for a particular patient tracked by a single ID number.
    Names for the same patient were often entered differently.

    Apart from checking for exact name matches, at least I had date-of-birth as another 'key' lookup value.
    You could also check first line of address etc, but that can change too when people change address.
    I even had twins! i.e. same patient names (at the same address) like Jack Smith and Jackie Smith with same date-of-birth.

    Any way, good luck with your project and post here for any further help.

    zeddy

    HI RG: same source! you are the man!
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •