Results 1 to 6 of 6
  1. #1
    Platinum Lounger
    Join Date
    Feb 2002
    Location
    A Magic Forest in Deepest, Darkest Kent
    Posts
    5,681
    Thanks
    0
    Thanked 1 Time in 1 Post

    Fuzzy Search? (Access/SQL)

    I have the unenviable task of trying to match data between two tables from seperate DBs. Sadly there are no defined keys to link to but this is the eventual wish of my client. The only thing that is similar but dissimilar is the address fields but this causes further complication. The address fields are in the BS7776 format (Secondary addressable object, Primary addressable object (description), Primary addressable object (number),Street,Town,Postcode) for both tables. There are approximately 200,000 properties

    The main system I am trying to match is a Geographic Information System (GIS), the other is a Tax system. The Tax system has clean data and the data is formatted to the correct BS7776 standard.

    The GIS system however has been built to assist search and because of this may vary the format the data is presented eg:

    Bromley Town Hall, 6 High Street, Bromley, Kent BR1 WE4 or

    Town Hall ,High Street, Bromley, Kent

    Note slight variance in no post code and Primary addressable object (number) missing. The addresses shown above are in a concatenated state as they do actually have fields of their own in the GIS system.

    My ultimate goal is to match the data together ( and I know it is not going to be perfect) as best as possible either by SQL or VBA, the former preferably. The data will then undergo manual testing and changed as required.

    Can anyone suggest a good approach to this conundrum?
    Jerry

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts

    Re: Fuzzy Search? (Access/SQL)

    I assume that you mean BS7666 (there is also a BS7776, but although "Glossary of torque terminology for threaded fasteners" might be of interest for kitchen table science projects, I doubt it relates to your problem).

    You could create a series of queries
    1. <LI>Match on all relevant fields.
      <LI>Match on all relevant fields except one, e.g. post code, and require that the exception is blank in one or both of the tables. You'd have a separate query for each field that could be blank for legitimate reasons.
      <LI>Match on all relevant fields except two, e.g. number and post code, and require that the exception fields are blank in one or both of the tables. You'd have a separate query for each pair of fields that could be blank for legitimate reasons.
      <LI>You could go on, but it quickly becomes unwieldy.
    Step 1 yields the most dependable results, the results of step 2 may need to be reviewed, those of step 3 more so. You can decide which of the queries you want to use.

    If you also want to use fuzzy criteria within a field, it becomes more complicated (e.g. differences in spelling such as Bromley vs Bromly.)

  3. #3
    Platinum Lounger
    Join Date
    Feb 2002
    Location
    A Magic Forest in Deepest, Darkest Kent
    Posts
    5,681
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: Fuzzy Search? (Access/SQL)

    >"Glossary of torque terminology for threaded fasteners" <img src=/S/rofl.gif border=0 alt=rofl width=15 height=15>

    Thanks 1 and 2 seem to be a good way forward.

    I have been talking to a Developer in the other office who has had to do something similar recently. It appears he had success doing it another way using a series of replace statements in a function. Due to various faulty inputting (historic) properties could show Flat 1, Flat A, Ground Floor or Ground Floor flat etc
    His approach was to trim and replace all spaces with null in both tables in a new field to look similar to BromleyTownHallBromleyKent or GroundFloorFlat123AcaciaAvenueBromley. The replace function then checked through the string and replaced GroundFloor with GRD or basementFlat with BST in both tables fields to try and bring a basic convention to property naming.

    He then used your version 1 and 2 to make various compares and matching. It appears that there was a very high hit rate, certainly manageable for manual checking. We propose to get a temporary team in to do exeption checking and this project will alow us to see how big the task is ahead of us. Thanks
    Jerry

  4. #4
    Plutonium Lounger
    Join Date
    Dec 2000
    Location
    Sacramento, California, USA
    Posts
    16,775
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: Fuzzy Search? (Access/SQL)

    You could also use concatenation to return a string of the relevant fields for each record and strip the spaces out along the way. This requires additional pre-processing, of course, before you do the comparison, but you find matches you might otherwise miss.
    Charlotte

  5. #5
    Platinum Lounger
    Join Date
    Feb 2002
    Location
    A Magic Forest in Deepest, Darkest Kent
    Posts
    5,681
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: Fuzzy Search? (Access/SQL)

    Thanks Charlotte

    Just got back from a meeting about this with the client and had a bit of a hard discussion about it. They have eventually agreed for me to strip out all unnecessary records(there is a reference field I have not told you about), remove rogue records with incorrect reference length along with other smaller problem records.

    So ultimately I will have reduced the recordset by approximately 35,000 records and a further set of approx 50,000 records which need to be updated by the client. Having done this I have now got a relatively clean set of data. I have taken on board your idea and I think it is a good approach and along the same lines of what I was thinking, thank you
    Jerry

  6. #6
    Platinum Lounger
    Join Date
    Feb 2002
    Location
    A Magic Forest in Deepest, Darkest Kent
    Posts
    5,681
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: Fuzzy Search? (Access/SQL)

    Just to up date you and to tell you method 1 and 2 did work well with a concatenation of the field data with space stripping gave me a 97% yield after a union between different sets of data <img src=/S/thankyou.gif border=0 alt=thankyou width=40 height=15>
    Jerry

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •