Results 1 to 3 of 3
  1. #1
    3 Star Lounger
    Join Date
    Apr 2001
    Location
    Joliette, Quebec, Canada
    Posts
    290
    Thanks
    9
    Thanked 0 Times in 0 Posts
    While I can fiind duplicates entreys with the find exact duplicates with the wizard, I need to find duplicates entries cause by a slight mis-spellings
    such as Paul St-George and Paul Saint-George. I suppose I could look for duplicates based on a new field created by the first three characters the first name and the last three of characters of the last name. Does anyone have any other suggestions.

  2. #2
    2 Star Lounger
    Join Date
    Dec 2009
    Location
    Manchester, United Kingdom
    Posts
    116
    Thanks
    8
    Thanked 17 Times in 16 Posts
    *Fuzzy match

    Ultimately you need a VBA fuzzy match function and I looked for one online a while ago without success. Most of the offered solutions were pretty unstable, offering poor matches for what were visually good matches and vice-versa depending on where in the string the mismatches were located.

    I usually do this manually by sorting the table and comparing adjacent rows, but its a long process for a big table. A lot depends on the associated data - often you can use another field like date of birth and highlight entries where two people share a birthdate as likely duplicates, reducing the number of manual checks.Something like this

    query1: select birthdate as dupbirth, count(name) as dups from people
    query2: select birthdate, name from people inner join query1 on birthdate = dupbirth where dups > 1 order by birthdate, name

    I also sometimes count child records because in some datasets you find names with very few child records are duplicates. Ultimately you must check associated data because even people with the same name might be different.

  3. #3
    Super Moderator
    Join Date
    Aug 2001
    Location
    Evergreen, CO, USA
    Posts
    6,623
    Thanks
    3
    Thanked 60 Times in 60 Posts
    You might consider the use of a "soundex" approach where you code the names as numeric characters. See this explanation from http://ancestry.com. But be forewarned that it requires a good deal of VBA to implement this on an ongoing basis.
    Wendell

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •