Results 1 to 12 of 12
  1. #1
    3 Star Lounger Jimmy-W's Avatar
    Join Date
    Jan 2001
    Location
    Helena, Montana, USA
    Posts
    220
    Thanks
    13
    Thanked 0 Times in 0 Posts

    Remove duplicates based on two fields

    I know that I easily can remove duplicates based upon one field's records, and that I can do so with respect to more than one field at one time. However, I want to remove duplicates if, for example, A1=A2 and B1=B2. In that case, I want record 1 or 2 removed. I attached a spreadsheet as an example. In the example, the duplicates would be records 2 and 4, so I want one those records removed. I think that I could do this through a couple of queries, but I'm looking for something easier. Thanks!
    Attached Files Attached Files
    JimmyW
    Helena, MT

  2. #2
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,436
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Jimmy,

    This code should solve your problem.
    Code:
    Option Explicit
    
    Sub RemoveDups()
    
      Dim lCurRow        As Long
      Dim lTestRowOffset As Long
      
      lCurRow = 2
      
      Do
      
        lTestRowOffset = 1
     
        Do
        
          If Cells(lCurRow, 1).Value = Cells(lCurRow + lTestRowOffset, 1).Value And _
             Cells(lCurRow, 2).Value = Cells(lCurRow + lTestRowOffset, 2).Value Then
             
             Rows(lCurRow + lTestRowOffset).EntireRow.Delete
             
          Else
             lTestRowOffset = lTestRowOffset + 1
          End If
        
        Loop Until Cells(lCurRow + lTestRowOffset, 1).Value = ""
        
        lCurRow = lCurRow + 1
        
      
      Loop Until Cells(lCurRow, 1).Value = ""
      
    End Sub
    Please note that I solved the problem as indicated. If the data had been sorted first the code could be a little more efficient but as it stands you have to loop through the entire data set for each itteration to make sure you have all the duplicates. Of course if your dataset {table} isn't too big this isn't a real problem.

    Test Data
    TestData.PNG
    Results
    Results.PNG
    Last edited by RetiredGeek; 2012-04-24 at 13:30.
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  3. #3
    3 Star Lounger Jimmy-W's Avatar
    Join Date
    Jan 2001
    Location
    Helena, Montana, USA
    Posts
    220
    Thanks
    13
    Thanked 0 Times in 0 Posts
    Thanks! If I may impose with a couple of questions, can I edit the code if the two fields are not contiguous, e.g., A2 and C2? Would it then be 2TestRowOffset? I also wonder about date/time granularity. Two times may be formatted to appear as 13:10, but really may be 13:10:45 and 13:10:55, etc. Does the code use the formatted time or the actual numeric time? (My worksheet is sorted by date/time, descending order.)
    JimmyW
    Helena, MT

  4. #4
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,436
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Jimmy,

    1. Can I edit the code if the two fields are not contiguous, e.g., A2 and C2? Yes just change the Column Numbers, they are the hard coded numbers. e.g. A=1 B=2 C=3 thus A2 & C2 would be
    Code:
          If Cells(lCurRow, 1).Value = Cells(lCurRow + lTestRowOffset, 1).Value And _ 
             Cells(lCurRow, 3).Value = Cells(lCurRow + lTestRowOffset, 3).Value Then
    2. Does the code use the formatted time or the actual numeric time? Actual Time
    Although the code could be modified to use formatted time.
    Code:
      
        If Format(Cells(lCurRow, 1).Value, "hh:mm") = _
           Format(Cells(lCurRow + lTestRowOffset, 1).Value, "hh:mm") And _
           Cells(lCurRow, 2).Value = Cells(lCurRow + lTestRowOffset, 2).Value Then
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  5. #5
    Lounger
    Join Date
    Apr 2012
    Location
    Abergavenny, Wales, UK
    Posts
    25
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Hi
    You can solve this without having to resort to VBA.
    In C2 enter
    =A2&"|"&B2
    In D2 enter
    =COUNTIF($C$2:C2,C2)>1
    Copy both down as far as required

    Filter on column D for TRUE and delete the selected rows

    Regards
    Roger Govier
    Microsoft Excel MVP
    Attached Files Attached Files

  6. #6
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,436
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Roger,

    Nice! However the VBA is more robust since it will work if the records are sorted or not. I tried your solution on my test data and it didn't get the right answer. Then I remembered the OP has said his data was sorted by date/time decending, my test data was not. Upon sorting your solution worked perfectly and is an interesting approach to the problem.

    I notice you're a fairly new poster here so let me take this opportunity to welcome you to the Lounge as a poster. It's always nice to have more talent, especially MVPs, to add new perspectives to the Lounge's talent base.
    Last edited by RetiredGeek; 2012-04-26 at 12:12.
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  7. #7
    Lounger
    Join Date
    Apr 2012
    Location
    Abergavenny, Wales, UK
    Posts
    25
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Hi
    Many thanks for the welcome.

    However, I fear you are incorrect in your assertion regarding sorting.
    The formula solution will work with either sorted or unsorted data, as the attached workbook shows. I added some more data to jumble the order up a bit.

    The formula in D2
    =COUNTIF($C$2:C2,C2)>1
    which becomes
    =COUNTIF($C$2:C14,C14)>1
    by the time it gets to D14
    is not concerned with any order, or referencing the row above, it is doing a Countif on an ever expanding range to see if there is more than one occurrence of the concatenated value.
    Attached Files Attached Files
    Regards
    Roger Govier
    Microsoft Excel MVP

  8. #8
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,436
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Roger,

    I'm still getting the same results.
    Here's my test workbook with both your formula and my macro, maybe you can see what I am doing wrong.
    Attached Files Attached Files
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

  9. #9
    Lounger
    Join Date
    Apr 2012
    Location
    Abergavenny, Wales, UK
    Posts
    25
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Hi

    You have the wrong formula in D2

    You have
    =COUNTIF($C$2:C2,C1)>1
    It should be
    =COUNTIF($C$2:C2,C2)>1
    Regards
    Roger Govier
    Microsoft Excel MVP

  10. #10
    New Lounger
    Join Date
    Apr 2012
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts
    If you're using Excel 2007 or greater you can use the remove duplicates function once you've created a key field through concatenation (so the count formula is unecessary).

    FWIW it's not necessary to delimit the fields (in your concatenated key field) as this is a reference field only, ie

    In C2 enter
    =A2&B2
    note that it can easily be extended to cover as many fields as necessary (and they don't need to be contiguous)

    and the count field should cover the entire column so that the data doesn't have to be sorted

    In D2 enter
    =COUNTIF(C:C,C2)

  11. #11
    Lounger
    Join Date
    Apr 2012
    Location
    Abergavenny, Wales, UK
    Posts
    25
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Hi Simon

    You are quite right that there is a remove duplicate function in later versions, I was trying to give a generic solution.
    With this particular set of data you are correct, a separator is not required in the concatenation.
    However, I do it as a matter of course with all concatenations, because there are occasions when you can get tripped up with different numbers of characters in the concatenated fields.

    On the matter of the Countif function, however, you are totally wrong.
    Your formula will count the number of occurrence in the whole data set, and that would not provide a list of uniques.
    You would not be able to select the rows you wanted to delete, as it would also contain the first occurrence of the value.

    The formula I proposed, does not need to be sorted.
    Regards
    Roger Govier
    Microsoft Excel MVP

  12. #12
    Super Moderator RetiredGeek's Avatar
    Join Date
    Mar 2004
    Location
    Manning, South Carolina
    Posts
    9,436
    Thanks
    372
    Thanked 1,457 Times in 1,326 Posts
    Just to keep you all updated Roger & I had taken our discussion offline to try to resolve why we were getting different results.
    It turns out it was due to my final version of code checking the time in HH:MM format vs his solution checking in HH:MM:SS format. Roger offered the following to make his do the same thing.

    =TEXT(A2,"hh:mm")&"|"&B2

    I'd like to take this opportunity to thank Roger for hanging with me through this and discovering the problem, and of course for his elegant solution.

    Thanks to him we all now have a new technique to add to our Excel toolbox.
    May the Forces of good computing be with you!

    RG

    PowerShell & VBA Rule!

    My Systems: Desktop Specs
    Laptop Specs

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •