Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts

    Query for duplicate names where at least one record's YesNo field is False

    I have a Find Duplicates query that identifies people with duplicate names in a table. When the initial set of data are imported to the table, this enables the user to check whether the same person appears twice; however, in some cases the duplicate names do in fact relate to different people, and these legitimate duplicates are flagged as True/Yes in a YesNo field.

    The table has new data appended from another source every month, and I want to identify whether any of the imported records equate directly to existing records in the database. The Find Duplicates query will pick out all duplicates, but as the list grows I would prefer not to show pairs of names from the original dataset that have already been flagged.

    However, a new record from the appended data might relate to an already-flagged original. I would therfore like to modify the Find Duplicates SQL to add the condition that at least one of the duplicate records should have the YesNo field set to False/No (if all records in a duplicate grouping are set to True/Yes, it would mean the records have already been checked and don't need to be checked again).

    Of course, some duplications will be entirely new - ie both will have their YesNo fields set to False/No.

    This is an area of SQL I haven't previously explored - any suggestions?

    (I know I can create a function to do this, but feel that an SQL solution would work faster.)

    Thanks and regards

    Alison C

  2. #2
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    10,354
    Thanks
    130
    Thanked 1,162 Times in 1,069 Posts
    Don't you have access to the original SQL query? It would seem to me that it would be rather easy to use that query as a starting point and add the required changes to it.

  3. #3
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts
    Yes, here it is - sorry if it would have helped to include it in the original post. The problem for me is knowing what I need to add to the SQL!

    SELECT
    itblLGAClaimantsUnique.strLast,
    itblLGAClaimantsUnique.strFirst,
    itblLGAClaimantsUnique.dtmDoB,
    itblLGAClaimantsUnique.strLGAID,
    itblLGAClaimantsUnique.ysnLastFirstDupOK
    FROM
    itblLGAClaimantsUnique
    WHERE
    (((itblLGAClaimantsUnique.strLast)
    In (SELECT [strLast] FROM [itblLGAClaimantsUnique]
    As Tmp GROUP BY [strLast],[strFirst]
    HAVING Count(*)>1 And [strFirst] = [itblLGAClaimantsUnique].[strFirst])))
    ORDER BY
    itblLGAClaimantsUnique.strLast;

    I thought I could use a Domain Aggregate function to count how many False records there are with any particular last/first combination, although I know from experience that they can really slow down a query. Having said that, I suppose that's what the Find Duplicates query is using already, so perhaps it doesn't matter. I tried adding the following field:

    DCount("[strLGAID]","itblLGAClaimantsUnique","[strLast] = '" & [strLast] & "' AND [strFirst] = '" & [strFirst] & "' AND [ysnLastFirstDupOK]=False")

    This changes the WHERE clause to:

    WHERE
    (((itblLGAClaimantsUnique.strLast)
    In (SELECT [strLast] FROM [itblLGAClaimantsUnique]
    As Tmp GROUP BY [strLast],[strFirst]
    HAVING Count(*)>1 And [strFirst] = [itblLGAClaimantsUnique].[strFirst]))
    AND ((DCount("[strLGAID]","itblLGAClaimantsUnique",
    "[strLast] = '" & [strLast] & "'
    AND [strFirst] = '" & [strFirst] & "'
    AND [ysnLastFirstDupOK]=False"))>0))

    It correctly returns 0, 1 or whatever the number of False fields adds up to. But when I add >0 into the Criteria section, it spits the dummy with two successive error messages:

    Syntax error (missing operator) in query expression '[strLast] = 'Nurk' AND [strFirst] = 'Fred' AND [ysnLastFirstDupOK]=False"

    and

    Unknown

    Now, Fred Nurk is not one of the people who is duplicated, so I'm assuming the problem is because the record is being eliminated from the dynaset by the first part of the WHERE clause, and is therefore not available by the time it's working on the DCOUNT bit. I thought I could fix it by leaving the first query as is, and creating a second query, based on the Find Duplicates query, with the new DCOUNT field and criteria of >0, but I get the same error messages.

    I could probably make it work (assuming the problem isn't with something in my syntax) if I made a table from the first dynaset and then applied a query to the table; but I'm trying to change the data dynamically, and it seems to be an awful lot of steps to then use the resulting table's changes to update the original data.

    So if all else fails, I can use a VBA function to do the same thing; I just thought it would be neater in the original SQL.

    Here is the guts of the VBA code for the function "Checked", which works fine with the original Find Duplicates query:

    Dim dbs As Database
    Dim rst As Recordset
    Dim strSQL As String

    Set dbs = CurrentDb()
    strSQL = "SELECT itblLGAClaimantsUnique.strLast, " & _
    "itblLGAClaimantsUnique.strFirst, " & _
    "itblLGAClaimantsUnique.ysnLastFirstDupOK " & _
    "FROM itblLGAClaimantsUnique " & _
    "WHERE (((itblLGAClaimantsUnique.strLast)=""" & strLast & """) " & _
    "AND ((itblLGAClaimantsUnique.strFirst)=""" & strFirst & """) " & _
    "AND ((itblLGAClaimantsUnique.ysnLastFirstDupOK)=False) );"
    Set rst = dbs.OpenRecordset(strSQL, dbOpenSnapshot)
    Checked = rst.RecordCount

    (I'm a lot more comfortable with VBA than SQL; my knowledge of the latter comes from pulling apart the code after I've used the query window to create it!)

    Finally, so you don't think I'm a complete nong, [strLGAID] is not a substitute for an autonumber PKF; this is an interim table between importing some data, filtering for unique entries, and appending to the main table where the autonumber PKF is allocated!

    Thanks for your interest.

    Cheers

    Alison C

  4. #4
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    10,354
    Thanks
    130
    Thanked 1,162 Times in 1,069 Posts
    Maybe I didn't exactly get what you wanted, but if I got it right, you can do it like this:

    SELECT * FROM
    (
    SELECT
    itblLGAClaimantsUnique.strLast,
    itblLGAClaimantsUnique.strFirst,
    itblLGAClaimantsUnique.dtmDoB,
    itblLGAClaimantsUnique.strLGAID,
    itblLGAClaimantsUnique.ysnLastFirstDupOK
    FROM
    itblLGAClaimantsUnique
    WHERE
    (((itblLGAClaimantsUnique.strLast)
    In (SELECT [strLast] FROM [itblLGAClaimantsUnique]
    As Tmp GROUP BY [strLast],[strFirst]
    HAVING Count(*)>1 And [strFirst] = [itblLGAClaimantsUnique].[strFirst])))
    ORDER BY
    itblLGAClaimantsUnique.strLast
    ) As T1
    WHERE T1.
    ysnLastFirstDupOK=False


    P.S.: For some reason, the editor is not helping with line breaking, so please make sure those last two lines are actually just one.
    Last edited by ruirib; 2012-06-29 at 03:25.

  5. #5
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts
    Thanks - that doesn't give any errors, but only calls up records that are unchecked, which isn't quite what I'm after.

    What I'm trying to do is identify if there are any unchecked records and, if so, pull out all records with matching names (including those that have previously been marked as checked). This is so that, each month, if a new record is imported with the same name as an existing one, the old and new can be displayed to identify if they are the same person or not; but if there are no new records, I don't want to bother showing old records that have already been identified as two separate people.

    Sorry, I know it's a bit convoluted and therefore hard to explain clearly; so I'm not surprised if you've had trouble working out what I'm after!

    I'll take a closer look at your code and see if I can adapt it. Meanwhile, the VBA function is working, albeit slowly, so I do have a fall-back position.

    Cheers

    Alison C

  6. #6
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    10,354
    Thanks
    130
    Thanked 1,162 Times in 1,069 Posts
    I hadn't really realized that was what you wanted. I will propose a different query in a few hours.

  7. #7
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    10,354
    Thanks
    130
    Thanked 1,162 Times in 1,069 Posts
    See if this does it:

    SELECT

    itblLGAClaimantsUnique.strLast,

    itblLGAClaimantsUnique.strFirst,

    itblLGAClaimantsUnique.dtmDoB,

    itblLGAClaimantsUnique.strLGAID,

    itblLGAClaimantsUnique.ysnLastFirstDupOK

    FROM

    itblLGAClaimantsUnique

    WHERE

    (((itblLGAClaimantsUnique.strLast)

    In (SELECT [strLast] FROM [itblLGAClaimantsUnique]

    As Tmp
    WHERE [strLast] IN (SELECT [strLast] FROM [itblLGAClaimantsUnique] itb WHERE ysnLastFirstDupOK=False )
    GROUP BY [strLast],[strFirst]

    HAVING Count(*)>1 And [strFirst] = [itblLGAClaimantsUnique].[strFirst])))

    ORDER BY

    itblLGAClaimantsUnique.strLast

  8. The Following User Says Thank You to ruirib For This Useful Post:

    alifrog (2012-07-01)

  9. #8
    Star Lounger tgw7078's Avatar
    Join Date
    Jul 2010
    Location
    Seattle, WA., USA
    Posts
    84
    Thanks
    1
    Thanked 10 Times in 10 Posts
    Quote Originally Posted by alifrog View Post
    Thanks - that doesn't give any errors, but only calls up records that are unchecked, which isn't quite what I'm after.

    What I'm trying to do is identify if there are any unchecked records and, if so, pull out all records with matching names (including those that have previously been marked as checked). This is so that, each month, if a new record is imported with the same name as an existing one, the old and new can be displayed to identify if they are the same person or not; but if there are no new records, I don't want to bother showing old records that have already been identified as two separate people.

    Sorry, I know it's a bit convoluted and therefore hard to explain clearly; so I'm not surprised if you've had trouble working out what I'm after!

    I'll take a closer look at your code and see if I can adapt it. Meanwhile, the VBA function is working, albeit slowly, so I do have a fall-back position.

    Cheers

    Alison C
    Hi Alison,
    Is it possible for you to post a sample database, with some dummy names, but with enough combinations of your Yes/No field checked and unchecked? Also, post the results you expect the query to deliver. It may very well be that you need to do this using VBA code.

    Regarding the slowness of your procedure, how many records is your query + code working against? Are fields used for query criteria properly indexed? Is this a split application (FE/BE) with a network wire that separates you from the data? If "yes", have you established a persistant connection to the BE database? Here is an article I wrote that provides many suggestions for getting better performance:

    Implementing a Successful Multiuser Access/JET Application
    http://www.accessmvp.com/TWickerath/.../multiuser.htm

    You can likely speed up your VBA procedure by declaring the database and recordset variables as global variables, and set the database variable one time, instead of setting it for each iteration. Also, I highly recommend that you "disambiguate" your recordset declaration:

    Dim rst As Recordset ----> Dim rst As DAO.Recordset

    This will help prevent the possiblity of a situation where your code compiles fine, yet you experience run-time error 13: Type Mismatch. More information here:

    ADO and DAO Library References in Access Databases
    http://www.accessmvp.com/TWickerath/articles/adodao.htm
    Tom Wickerath
    Microsoft Access MVP
    4/1/2006 - 3/31/2012

  10. The Following User Says Thank You to tgw7078 For This Useful Post:

    alifrog (2012-07-01)

  11. #9
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts

    Progress!

    Hi Ruirib -

    That last solution seems to give me what I want, and faster than the VBA function I had written.

    I need to apply this in a couple of other contexts (checking Last/DoB and First/DoB as well as the same 3 options on another table) so would like to get my head around the key bit of the code - ie the WHERE section - that you provided.
    WHERE
    (((itblLGAClaimantsUnique.strLast)
    In (SELECT [strLast] FROM [itblLGAClaimantsUnique]
    As Tmp
    WHERE [strLast] IN (SELECT [strLast] FROM [itblLGAClaimantsUnique] itb WHERE ysnLastFirstDupOK=False )
    GROUP BY [strLast],[strFirst]
    HAVING Count(*)>1 And [strFirst] = [itblLGAClaimantsUnique].[strFirst])))

    I think I get most of it - it's just a little bit in the second WHERE line that I'm not sure of: what does "itb" do?

    Many thanks for the time you have spent on this - I really appreciate your persistence!

    Cheers

    Alison

  12. #10
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts
    Hi Tom -

    I was beginning to think it might be a good idea to create some dummy data, but Ruirib's response seems to have done what I need.

    However, I really appreciate your comments about speeding up databases - it's one of my biggest problems, as I never seem to get the simple ones to deal with!

    At this stage it's all in the same file, as I usually do the main development that way so I can fine tune table designs on the fly; I'll separate the FE and BE when I'm happy that I have the data structure right.

    I have used persistent connections on other databases, and anticipated doing so on this one, but will be very interested to look at your article, as I might not be doing it as well as I could. We're currently looking at a database that will grow every month. It starts at about 150 records, the first update will take it to about 2,500 records (first merge of two currently completely separate historical datasets) and will then settle down to increasing at a rate of about 10 per month (but with a lot of updated data in the existing records).

    I didn't realise setting the database as a global variable would improve speed - I just assumed it would be better to clean it out of memory when I finished running the subroutine. Your recommendation makes a lot of sense in light of the number of iterations, though.

    Disambiguation looks as though it might help the problems I sometimes have when clients install a copy on a new PC that doesn't have the correct references set up.

    As for indexing - I live in constant fear of being found to be a complete fraud if anyone else looks at my indexing. I try to keep it tight, but as I also try to offer a lot of flexibility in the user interface, I find myself saying "They might want to sort on this field" or "They might want to select on that one" or even "They might want to sort/select on Last Name only on some occasions, but on others it will be based on Last/First" and ending up falling down the other side of the optimisation bell curve with far too many indexed fields.

    A quick look at your links tells me two things immediately: I'm not doing everything wrong; and I'm in for some good reading when I get away from work tonight...

    Many thanks for your helpfull suggestions.

    Cheers

    Alison

  13. #11
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts
    Aaarrrggghhh! Deficient testing on my part - I tested with neither records' fields checked and one record's field checked, but for some reason completely omitted to test with both records' fields checked - and unfortunately have now found that the query returns both checked records (haven't even started to play with three matching records). As the reason I'm trying to do this is to eliminate the display of matching records where they have all been OK'd already, I'm back to the drawing board!

    I thought I'd made sense of the SQL, so can't really figure out what it doesn't like, but will fiddle around and see where it takes me.

    Thanks again

    Cheers

    Alison

  14. #12
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts
    OK! I've got it - at least, I know why it's throwing up the apparent error. There is somebody else with the same surname, but different first name; because there is no duplicate First/Last combination for this person, the check box is false - I wasn't bothering with the check boxes of people who didn't have both the first name and last name matching. So yes, the "WHERE [strLast] IN (SELECT [strLast] FROM [itblLGAClaimantsUnique] itb WHERE ysnLastFirstDupOK=False )" is working the way we've asked it to.

    I suppose it would work if I had a compound field that concatenated the First and Last names, but will now go and do some more testing to see if I can avoid having to go back and create this field in other objects, or if I can just use it dynamically.

    Cheers

    Alison

  15. #13
    Star Lounger tgw7078's Avatar
    Join Date
    Jul 2010
    Location
    Seattle, WA., USA
    Posts
    84
    Thanks
    1
    Thanked 10 Times in 10 Posts
    I have used persistent connections on other databases, and anticipated doing so on this one, but will be very interested to look at your article, as I might not be doing it as well as I could.
    I usually just open a form in hidden mode that is bound to a table with one record in the BE database. The user never knows this form is open. You will need to close it first, in order to compact.

    I didn't realise setting the database as a global variable would improve speed - I just assumed it would be better to clean it out of memory when I finished running the subroutine. Your recommendation makes a lot of sense in light of the number of iterations, though.
    The statement Set db = CurrentDB() is fairly costly, as far as time goes. If you do this only once, when a procedure runs, no big deal. But, if this line of code is being executed for each record in a query that calls your function, the penalty can become quite noticeable, especially if you are dealing with hundreds or thousands of records.

    Disambiguation looks as though it might help the problems I sometimes have when clients install a copy on a new PC that doesn't have the correct references set up.
    Disambiguation only helps with reference priority issues. It does not help at all if a required reference is not selected.

    As for indexing - I live in constant fear of being found to be a complete fraud if anyone else looks at my indexing. I try to keep it tight, but as I also try to offer a lot of flexibility in the user interface, I find myself saying "They might want to sort on this field" or "They might want to select on that one" or even "They might want to sort/select on Last Name only on some occasions, but on others it will be based on Last/First" and ending up falling down the other side of the optimisation bell curve with far too many indexed fields.
    Isn't this a question that you can ask your users? If you provide them with some type of Query-by-Form (QBF), then you can easily restrict which fields they can choose as criteria or for sorting. It sounds like (?) you may be providing your users with direct access to the database window? That's something I don't do.

    A quick look at your links tells me two things immediately: I'm not doing everything wrong; and I'm in for some good reading when I get away from work tonight...

    Many thanks for your helpfull suggestions.
    Great! And you're very welcome.
    Tom Wickerath
    Microsoft Access MVP
    4/1/2006 - 3/31/2012

  16. #14
    Star Lounger tgw7078's Avatar
    Join Date
    Jul 2010
    Location
    Seattle, WA., USA
    Posts
    84
    Thanks
    1
    Thanked 10 Times in 10 Posts
    Hi Alison,

    Here's a thought....can you append all names to the table and then use the Soundex function, with the appropriate constant, to help locate possible duplicates? You can display the possible duplicates in a form which includes a checkbox where you, or another human being [but not a computer] decides on actual duplicates. You may have to experiment with the constant to find the best value for the majority of names in your database. If memory serves me correctly, a value of 4 works pretty good for most names.

    More information on Soundex, and an alternate method (Levenshtein Distance) here:

    April 2005: "Close" only counts in horseshoes...and databases (467 KB)
    A couple of techniques to help determine when entries are "close enough" to be considered the same

    available on this page, by Access MVP Doug Steele:
    http://www.accessmvp.com/djsteele/SmartAccess.html
    Last edited by tgw7078; 2012-07-02 at 01:43.
    Tom Wickerath
    Microsoft Access MVP
    4/1/2006 - 3/31/2012

  17. #15
    2 Star Lounger
    Join Date
    Aug 2002
    Location
    Adelaide, South Australia, Australia
    Posts
    146
    Thanks
    5
    Thanked 5 Times in 5 Posts
    Clearly, the next thing I need to tackle is how to split a message into multiple quotes - thought it might have something to do with "Multi-quote this message" but if it does, I'm too tired at the end of a long day to work out how...

    Meanwhile, though, here are my responses to each of your points:

    - Persistent connections: That's how I've been doing them, right down to closing for the purpose of compacting - good to know I got that right!
    - Global variable for Db: Makes sense; I have then used AutoExec to run a short function using Set Db when the database is opened - is that the best way?
    - Disambiguation: That should fix about a third of the problems I encounter with references on other people's computers, at least.
    - Indexing: Don't worry, I don't give users access to the database window - don't want fiddling around by the people who know just enough to be dangerous! No, it's more that the dialog boxes I provide for initiating reports or data searching often have a number of options - mainly because I did ask the users, and they kept thinking of more options they wanted. And I do admit to empathising with the users a bit too much sometimes - other developers I know tend just to say "Can't do that" to keep it uncomplicated, whereas I look at all the ways they can get something from the database then try to give them the tools. There's a happy medium there somewhere, I just have to find it...

    Cheers

    Alison

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •