Results 1 to 13 of 13
  1. #1
    5 Star Lounger jujuraf's Avatar
    Join Date
    Jun 2001
    Location
    San Jose, California, USA
    Posts
    1,061
    Thanks
    0
    Thanked 0 Times in 0 Posts

    deleting duplicate data in many rows (Excel 2000)

    Hi all,
    I have 20,000+ rows of data in multiple cols. I need to be able to occasionally go through this data and delete duplicates.

    In this case a duplicate is defined as any occurrence of the same data in more than one row. In any given row, the data can occur more than once in one or more columns, however. <img src=/S/confused.gif border=0 alt=confused width=15 height=20>

    Example (3 data rows)
    This is OK.
    AA, BB, AA, DD
    EE, KK, LL, MM
    ZZ, FF, FF, WW

    This is NOT ok: (3 data rows)
    AA, BB, AA, DD
    EE, KK, LL, AA
    ZZ, FF, FF, AA

    Rows 2 and 3 contain data that already exist in previous rows (AA is found in row 1). These last two rows would get deleted.

    I think I need VBA here, not a formula. It'd have to be some triply nested loop I think. One for the whole data set, one within sequencial dupes in one column, and then one to check all columns for a pair of cells. The code would also have to keep track of the "used" data so that it searches through the list of previous data and if any matches are found (for the new rows) that row is deleted.

    This is more than a search/replace since there is no known data to search for (it's any duplicate data).

    I'll start working on the algorithm now but wanted to post this to the group in case someone comes up with a good methodology. I'll post what I came up with when I get it working.

    Thnx, Deb <img src=/S/compute.gif border=0 alt=compute width=40 height=20>

  2. #2
    Uranium Lounger
    Join Date
    Jan 2001
    Location
    South Carolina, USA
    Posts
    7,295
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    A couple of questions:

    1- How many columns are involved in the check for duplicates (it looks like four from your example)?

    2- What should happen in this case:

    AA, BB, AA, DD
    EE, KK, LL, AA
    ZZ, FF, FF, DD
    EE,GG,GG,HH
    Legare Coleman

  3. #3
    5 Star Lounger jujuraf's Avatar
    Join Date
    Jun 2001
    Location
    San Jose, California, USA
    Posts
    1,061
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    The # of cols is not fixed, but I was going to assume that either the source data is the current worksheet or a selected range before starting the search. I'd then just query for the # of cols that contain data for the known rows.

    I was thinking of first making a pass through the data to gather all the possible data and store them in a collection (which would by definition get rid of dups for purpose of searching). This collection would be my compare list when I'm checking if any given data for a cell has already been found. If yes, I'd increment a counter by 1 and that would tell me that the data has already occured. I was going to create a user defined data type to store in the collection (e.g., DataFound as variant, DataCount as integer, DataRow as long). If I'm still on the same row, DataCount can be > 1 as long as DataFound has not already been found in a previous row. (I know this because DataCount = 1 and DataRow <> current row for the active cell.) This is quite confusing I know.

    To answer your 2nd question, rows 2,3,4 would be deleted since they all contain at least one data element that exists in a previous row. (For row 2, AA appears in row 1; for row 3, DD appears in row 1; for row 4, EE appears in row 2).

    I haven't started this algorithm yet as I'm working on something else but will try it later tonight.

    Thanks for looking into this,
    Deb <img src=/S/crazy.gif border=0 alt=crazy width=15 height=15>

  4. #4
    Super Moderator jscher2000's Avatar
    Join Date
    Feb 2001
    Location
    Silicon Valley, USA
    Posts
    23,112
    Thanks
    5
    Thanked 93 Times in 89 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    This is some strange data. If you don't care what row the data is (was) in, or how it was grouped with anything else, you might take each non-empty cell, copy it to a single column on a separate sheet, sort that, and then the work of killing duplicates should be simple.

  5. #5
    Uranium Lounger
    Join Date
    Jan 2001
    Location
    South Carolina, USA
    Posts
    7,295
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    But if 2,3 and 4 should be deleted, then EE would not appear in the results. Is that what you want? Or, would row four not be deleted because row two got deleted before row four is checked?
    Legare Coleman

  6. #6
    Super Moderator
    Join Date
    May 2002
    Location
    Canberra, Australian Capital Territory, Australia
    Posts
    5,054
    Thanks
    2
    Thanked 417 Times in 346 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    Hi Deb,
    See if this meets your needs. It simply trawls through the active sheet finding rows with duplicates and deleting them. I've also included an alternative for deleting just the duplicate cells, in case that's what you're after.

    Sub ClearDuplicateRowsInColumns()

    Dim A As Long
    Dim B As Long
    Dim C As Range
    Dim D As Range

    On Error GoTo Abort
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual

    Set D = ActiveSheet.UsedRange.Columns
    Set C = ActiveSheet.UsedRange.Rows
    For B = D.Columns.Count To 1 Step -1
    For A = C.Rows.Count To 1 Step -1
    If Application.WorksheetFunction.CountIf(Range(Rows(1 ).Columns(1), Rows(A).Columns([img]/forums/images/smilies/cool.gif[/img]), C.Rows(A).Columns([img]/forums/images/smilies/cool.gif[/img]) > 1 Then
    'If you only want to delete the offending cell, use: If Application.WorksheetFunction.CountIf(Columns([img]/forums/images/smilies/cool.gif[/img], C.Rows(A).Columns([img]/forums/images/smilies/cool.gif[/img]) > 1 Then
    C.Rows(A).EntireRow.Delete
    'If you only want to delete the offending cell, use: Rows(A).Columns([img]/forums/images/smilies/cool.gif[/img].Delete Shift:=xlUp
    End If
    Next A
    Next B

    Abort:
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    End Sub

    Cheers
    Cheers,

    Paul Edstein
    [MS MVP - Word]

  7. #7
    5 Star Lounger jujuraf's Avatar
    Join Date
    Jun 2001
    Location
    San Jose, California, USA
    Posts
    1,061
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    Ok, I now know what I'm talking about <img src=/S/clapping.gif border=0 alt=clapping width=19 height=23> <img src=/S/clapping.gif border=0 alt=clapping width=19 height=23>

    I talked to the user and I was making it too complicated. They need to delete duplicate rows where ALL the cells in a given row match 100% of the cells in any previous row.

    They tried doing it with MapPoint and it worked but took 30 minutes for each data set (which contains 100,000 data ponts) and there are 120 data sets. (It was on a slow 400MHz PC with 15GB disk).

    So, back to the beginning...If my data is

    row1: AA, BB, AA, DD
    row2: AA, DD, CC, AA
    row3: BB,CC,AA, DD
    row4: AA, DD, CC, AA

    only row 4 would be deleted.

    Now I wonder if there is a formula to do this as well. A VBA solution would be fine too.

    Thanks all for the time spent to figure this one out.
    Deb <img src=/S/bow.gif border=0 alt=bow width=15 height=15>

  8. #8
    5 Star Lounger jujuraf's Avatar
    Join Date
    Jun 2001
    Location
    San Jose, California, USA
    Posts
    1,061
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    Hey I like that! Cool. I did try it and it seemed to work but now I find that my requirement is not as complicated as I had thought. See my updated post which says only the entire row needs to be a duplicate of a previous row before it's deleted.

    Thanks a lot. Deb <img src=/S/groovin.gif border=0 alt=groovin width=21 height=21>

  9. #9
    Uranium Lounger
    Join Date
    Jan 2001
    Location
    South Carolina, USA
    Posts
    7,295
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    OK, try this code:

    <pre>Public Sub DeleteDeptNums()
    Dim lLastRow As Long, lLastCol As Long, I As Long, J As Long, K As Long
    lLastRow = Worksheets("Sheet1").Range("A65536").End(xlUp).Row - 1
    lLastCol = Worksheets("Sheet1").UsedRange.Columns.Count - 1
    For I = 0 To lLastRow - 1
    For J = lLastRow To I + 1 Step -1
    For K = 0 To lLastCol
    If Worksheets("Sheet1").Range("A1").Offset(I, K).Value <> _
    Worksheets("Sheet1").Range("A1").Offset(J, K).Value Then
    Exit For
    End If
    Next K
    If K > lLastCol Then
    Worksheets("Sheet1").Range("A1").Offset(J, 0).EntireRow.Delete
    End If
    Next J
    Next I
    End Sub
    </pre>

    Legare Coleman

  10. #10
    Silver Lounger
    Join Date
    Mar 2001
    Location
    Springfield, Ohio, USA
    Posts
    2,136
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: deleting duplicate data in many rows (Excel 2000)

    Deb, you don't need VBA to do this, just use the Data | Filter | Advance Filter menu and use a blank criteria and check the box "Unique records only." HTH --Sam
    <font face="Comic Sans MS">Sam Barrett, CACI </font face=comic>
    <small>And the things that you have heard... commit these to faithful men who will be able to teach others also. 2 Timothy 2:2</small>

  11. #11
    5 Star Lounger jujuraf's Avatar
    Join Date
    Jun 2001
    Location
    San Jose, California, USA
    Posts
    1,061
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    Hey that actually works! This is great. I obviously have never look at that menu before. Thanks for filling me in.

    Deb <img src=/S/fanfare.gif border=0 alt=fanfare width=31 height=23>

  12. #12
    5 Star Lounger jujuraf's Avatar
    Join Date
    Jun 2001
    Location
    San Jose, California, USA
    Posts
    1,061
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: deleting duplicate data in many rows (Excel 2000)

    Oopps, I spoke too soon or I'm not using this feature right. <img src=/S/doh.gif border=0 alt=doh width=15 height=15>

    I entered this data in A17
    AA BB AA DD
    DD DD AA BB
    AA BB AA DD
    ZZ HH KK LL
    DD DD AA BB

    I selected the area, did Data/Filter. Advanced Filter. I checked 'unique records' only and did the blank criteria (it prompts for a cell so I selected an empty cell).

    I got this:
    AA BB AA DD
    DD DD AA BB
    AA BB AA DD
    ZZ HH KK LL

    So it did not remove duplicates, I still have two copies of "AA BB AA DD"

    What did I do wrong?

    AAH, I did it again and found out it ONLY works on numbers, not text!!! I must have headers on each column.

    When I changed the data to numbers with text labels it worked:

    A, B, C, D <---- labels
    11,22,33,44
    44,33,11,22
    11,22,33,44
    77,88,99,77
    11,22,33,44
    77,88,99,77

    This created:
    11,22,33,44
    44,33,11,22
    77,88,99,77

    which is correct. My data is a mix of text and numbers so I'll have to figure a way to turn it into numbers.

    This all helps though... At least I have a start on the solution (and there's always the code another lounger came up with).

    Deb <img src=/S/bash.gif border=0 alt=bash width=35 height=39>

  13. #13
    Silver Lounger
    Join Date
    Mar 2001
    Location
    Springfield, Ohio, USA
    Posts
    2,136
    Thanks
    0
    Thanked 1 Time in 1 Post

    Re: deleting duplicate data in many rows (Excel 2000)

    It works on everything, it's just tough to figure out the first time. To make it easier, we will use two sheets, one for the original data and one for the criteria.

    First, make sure that your data has a row of headers, one for each column. You can just make them up: C1, C2, etc. if you want.

    Next, copy the row of headers and paste it on sheet 2, cell A1.

    Now, you are ready to do the Advanced filter. Click in the original table and do Data | Filter | Advanced Filter. Excel tries to read your mind, so hopefully it will already have you original data selected and entered in the list range. If not, enter it there.

    Now for the criteria range, select the first row of headers and a blank row beneath the headers. Finally, check the "unique records only," and press OK.

    Sometimes, Excel cannot figure out your header row, if you get this alert, just press OK to use the first row as headers.

    Since this was done in place, Excel has hidden the duplicate rows, so, if it looks OK, you can do Edit | GoTo | Special, Visible cells only, copy this data and paste it on another worksheet. If you don't like the filtered data, you can do a Data | Filter | Show All. HTH --Sam
    <font face="Comic Sans MS">Sam Barrett, CACI </font face=comic>
    <small>And the things that you have heard... commit these to faithful men who will be able to teach others also. 2 Timothy 2:2</small>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •