Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    New Lounger
    Join Date
    Dec 2015
    Posts
    24
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Finding duplicates, and removing duplicates

    This is a 2 part problem. I have a sorted query - [Most_FT_3g_v6_Query] that shows the most free throws made for 3 consecutive games. My listing shows many of the same player (i.e. uses 2 of the games that the leader used.) I figured that if I asked for duplicates (players listed for the same season) I would then have a better listing of top 3 game free throws made. But when I make my list of duplicates, it includes the top records. I would expect that the duplicate list would NOT include the top listing, but would include those with the same Roster_ID as shown above. Something tells me that this will involve indexes. I'm not sure how to 'create' indexes.Creating duplicates.docx
    My second problem is when the duplicate list is appropriate, how do I remove those records from the 'original' query?

  2. #2
    Star Lounger
    Join Date
    Jul 2013
    Location
    Murphy, NC
    Posts
    66
    Thanks
    0
    Thanked 8 Times in 8 Posts
    Hi jlwood44,
    I'm not sure why you mention "index," the purpose of which is to allow the database management system to more quickly fetch database rows that match the query criteria. A database would have to be quite large before the overhead of creating an index would be outweighed by the benefit.

    Also, perhaps you could more clearly explain what you consider "duplicate," that is, what field(s) you want to essentially be the unique identifier. Would it be player and year, so that there would be only one result for <2015, "Vironnica Drake">, namely, the result where free throws made over the three games was highest for that combination of player and year? If there were several rows for a given player and year that had the same three-game sum (as for <2007, "Ronicka Drake">), which one would you want selected?

    It might help also if you provided the query statement(s) you're using.

    Hope this helps,
    Dave

  3. #3
    New Lounger
    Join Date
    Dec 2015
    Posts
    24
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by DavidHLevin View Post
    Hi jlwood44,
    I'm not sure why you mention "index," the purpose of which is to allow the database management system to more quickly fetch database rows that match the query criteria. A database would have to be quite large before the overhead of creating an index would be outweighed by the benefit.

    Also, perhaps you could more clearly explain what you consider "duplicate," that is, what field(s) you want to essentially be the unique identifier. Would it be player and year, so that there would be only one result for <2015, "Vironnica Drake">, namely, the result where free throws made over the three games was highest for that combination of player and year? If there were several rows for a given player and year that had the same three-game sum (as for <2007, "Ronicka Drake">), which one would you want selected?

    It might help also if you provided the query statement(s) you're using.

    Hope this helps,
    Dave
    I agree with your assessment that I find all those with the same name and season, and have only 1 entry, and the others are duplicates. I used the duplicate query and used Roster_ID to find the results, BUT I expected that the 'original' would be the top one in the sorted query. When I ran the duplicates, as you can see the duplicates include the top records.
    SQL for the duplicates.docx
    It appears that the sorting happens AFTER the duplicates are determined. If the sorting is done before the duplicates are determined, then the appropriate listing would be created.

  4. #4
    New Lounger
    Join Date
    Dec 2015
    Posts
    24
    Thanks
    1
    Thanked 0 Times in 0 Posts
    I think that I misunderstood what the duplicate query would show. I was thinking that it would show the 2nd, 3rd, etc. occurrence of a record, when if there are duplicates, it shows the 1st, 2nd, 3rd occurrence of the record. Then the duplicate query was 'correct' as it ran. BUT, it still brings up my other question, how do I (really ACCESS) remove the records after the 'original' record.
    I think back to a year ago when I ran a duplicate query for a table to find if I had entered results more than once, and it showed both occurrences. Because this was records in a table, and it was only 1 or 2 duplicates, I manually deleted the duplicates.
    In this case, the 'original' info is from a query search. I do NOT want to remove the inputted data, but 'restrict' the results from the search to only show the 'first' occurrence. And more data is constantly being added - Vironnica just had a game where she made 14 free throws - I'm not sure what her consecutive game results were.

  5. #5
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    12,519
    Thanks
    152
    Thanked 1,398 Times in 1,221 Posts
    I think you have a bigger problem. How can you be having multiple records where many of columns are different between these multiple records, and yet you still call them duplicates? I think your problem is the actual query that is creating these records. Shouldn't your query be creating just one record per player per season?
    Rui
    -------
    R4

  6. #6
    Star Lounger
    Join Date
    Jul 2013
    Location
    Murphy, NC
    Posts
    66
    Thanks
    0
    Thanked 8 Times in 8 Posts
    Hi jlwood44,
    I agree with Rui that "find duplicates" is apparently being used here in a way for which it wasn't really intended.

    Also, the tables might have been designed to be more robust. Suppose a player had a given three-game FTM sequence occur more than once in a season, such as 12 in game 3, 9 in game 4, and 10 in game 5 and then 12 in game 17, 9 in game 18, and 10 in game 19. This could be accommodated if an additional field that I'll call g1_number were in the table that contains the FTM triplets. This would allow <3, 12, 9, 10> and <17, 12, 9, 10> to coexist.

    These entries wouldn't be "duplicates"; they would be in conformance with the table layout. This representation would show where a player achieved a given three-day total more than once in a season, but you could still obtain one query row by including the appropriate criteria in the select statement, and your select statement could specify that the results were to be sorted descending by year.

    Perhaps if you posted the present tables and explained what the FTM data are used for, we would be able to offer more specific suggestions.

    Dave

  7. #7
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    12,519
    Thanks
    152
    Thanked 1,398 Times in 1,221 Posts
    Do you want a single record per player? Or do you want the overall top 3 consecutive games with most free throws made?

    In any case, you can take the query you have now and use it as the entry table for a new query. Then you apply a condition that selects what you want: a TOP 3 SELECT clause, for the latter, or a use a correlated subquery in the WHERE clause or something close to either or both, depending on what you want.
    Rui
    -------
    R4

  8. #8
    New Lounger
    Join Date
    Dec 2015
    Posts
    24
    Thanks
    1
    Thanked 0 Times in 0 Posts
    As you indicated, I want 1 entry per player. It seems that I would use a WHERE statement. How will the WHERE statement compare to the WHERE statement that was used for the Duplicates Query? The SQL statements are included in an earlier comment (#3).

  9. #9
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    12,519
    Thanks
    152
    Thanked 1,398 Times in 1,221 Posts
    The SQL is not usable, as it is an image. If you post the SQL for the original query, I will write you a correlated subquery to select the record with the higher Total for 3 games per player.
    Rui
    -------
    R4

  10. #10
    New Lounger
    Join Date
    Dec 2015
    Posts
    24
    Thanks
    1
    Thanked 0 Times in 0 Posts
    SQL for 'original' Query
    SELECT Most_FT_3g_v2_3_Query.Player_ID, Most_FT_3g_v2_3_Query.Roster_ID, Most_FT_3g_v2_3_Query.Fall_Year, Most_FT_3g_v2_3_Query.Full_Name, Most_FT_3g_v2_3_Query.g1FTM, Most_FT_3g_v2_3_Query.g2FTM, Most_FT_3g_v2_3_Query.g3FTM, Most_FT_3g_v2_3_Query.Tot3gFTM, Most_FT_3g_v2_3_Query.Tot3gFTA
    FROM Most_FT_3g_v2_3_Query
    ORDER BY Most_FT_3g_v2_3_Query.Tot3gFTM DESC , Most_FT_3g_v2_3_Query.Game_ID DESC;

    SQL for duplicates Query
    SELECT Most_FT_3g_v2_6_Query.Roster_ID, Most_FT_3g_v2_6_Query.Full_Name, Most_FT_3g_v2_6_Query.Player_ID, Most_FT_3g_v2_6_Query.Fall_Year, Most_FT_3g_v2_6_Query.g1FTM, Most_FT_3g_v2_6_Query.g2FTM, Most_FT_3g_v2_6_Query.g3FTM, Most_FT_3g_v2_6_Query.Tot3gFTM, Most_FT_3g_v2_6_Query.Tot3gFTA
    FROM Most_FT_3g_v2_6_Query
    WHERE (((Most_FT_3g_v2_6_Query.Roster_ID) In (SELECT [Roster_ID] FROM [Most_FT_3g_v2_6_Query] As Tmp GROUP BY [Roster_ID] HAVING Count(*)>1 )))
    ORDER BY Most_FT_3g_v2_6_Query.Tot3gFTM DESC;

  11. #11
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    12,519
    Thanks
    152
    Thanked 1,398 Times in 1,221 Posts
    Replace your original query by a query based on this SQL statement. Should be enough for what you want.

    Code:
    SELECT M.Player_ID, M.Roster_ID, M.Fall_Year, M.Full_Name, M.g1FTM, M.g2FTM, M.g3FTM, M.Tot3gFTM, M.Tot3gFTA
    FROM Most_FT_3g_v2_3_Query As M
    WHERE M.Tot3gFTM = (SELECT MAX(T.Tot3gFTM) FROM Most_FT_3g_v2_3_Query As T WHERE T.Player_ID=M.Player_ID)
    I have a doubt, though. It is possible that the condition in red has to be replaced by T.Roster_ID=M.Roster_ID

    This latter version would probably give you multiple seasons for the same player, if the Most_FT_3g_v2_3_Query query gives you values for multiple seasons. If not, the replacement is not needed, even if it can still be performed.
    Rui
    -------
    R4

  12. #12
    New Lounger
    Join Date
    Dec 2015
    Posts
    24
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Run of Query without duplicates.docx
    That kind of works. And, I changed to Roster_ID so that a player might be listed for different years. BUT that seems to want to find matching results for a player. Vironnica Drake is listed with 2 times of 31 makes, but she has one of 36, and maybe a couple others bigger than the 31. Katrinna Blackmon had a 29 and a couple others in the 20's. According to your query run the 2nd highest total is 16.

  13. #13
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    12,519
    Thanks
    152
    Thanked 1,398 Times in 1,221 Posts
    Well, if you want it ordered on M.Tot3gFTA, the subquery needs to get the max of T.Tot3gFTA and not T.Tot3gFTM.

    Code:
    SELECT M.Player_ID, M.Roster_ID, M.Fall_Year, M.Full_Name, M.g1FTM, M.g2FTM, M.g3FTM, M.Tot3gFTM, M.Tot3gFTA
    FROM Most_FT_3g_v2_3_Query As M
    WHERE M.Tot3gFTM = (SELECT MAX(T.Tot3gFTA) FROM Most_FT_3g_v2_3_Query As T WHERE T.Roster_ID=M.Roster_ID)
    Yeah, it can show repeats for a player, if the player has multiple instances with the same Tot3gFTA total value. If you want to specify a tie breaking criterium, I may try and solve it.
    Rui
    -------
    R4

  14. #14
    New Lounger
    Join Date
    Dec 2015
    Posts
    24
    Thanks
    1
    Thanked 0 Times in 0 Posts
    You misunderstood my concern. Look at the results from #12, and compare with #1.
    I want the MAX for Tot3gFTM, but the query in #12 is NOT showing the top totals as shown in both parts of #1
    What happened to Drake making 36 free throws, Blackmon making 29, Guarneri making 26, R. Drake making 25? For some reason none of these are showing in the new query. The next to last field is what really matters.
    Although the query seems to ask for MAX, it does not seem to show the actual MAX.

  15. #15
    Administrator
    Join Date
    Jun 2010
    Location
    Portugal
    Posts
    12,519
    Thanks
    152
    Thanked 1,398 Times in 1,221 Posts
    Quote Originally Posted by jlwood44 View Post
    You misunderstood my concern. Look at the results from #12, and compare with #1.
    I want the MAX for Tot3gFTM, but the query in #12 is NOT showing the top totals as shown in both parts of #1
    What happened to Drake making 36 free throws, Blackmon making 29, Guarneri making 26, R. Drake making 25? For some reason none of these are showing in the new query. The next to last field is what really matters.
    Although the query seems to ask for MAX, it does not seem to show the actual MAX.
    I only work with what you give me and it seems you gave me two different queries - V2_6 and V2_3. I don't know enough about your queries to distinguish them. It seems the queries are different and give different results. The solution seems obvious to me: if you want the results from V2_6, use V2_6. and change the SQL for the query I wrote accordingly - replace all references to V2_3 with V2_6.
    Rui
    -------
    R4

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •