Results 1 to 7 of 7

20011211, 20:39 #1
 Join Date
 Aug 2001
 Posts
 11
 Thanks
 0
 Thanked 0 Times in 0 Posts
huge duplicate problem (2000 sr1)
Hi!
I have a spreadsheet with 21 columns of data. I need to discover a pattern of duplication within 37000 records. I've exported this data in chunks from Access (too big to do all at once) using a duplicate query with 2 of the columns used as criteria.
Now I need to know how many of each duplicate I have in my Excel spreadsheet. For a record to be unique, each of the 21 columns must check out.
Other solutions I've seen involve sorting (I don't think I can sort on 21 columns at a time) and pivot tables, which can't handle all the columns I'm trying to analyze at once.
I'm thinking maybe a programming solution is the way to go, or maybe a formula, but I'm not sure where to begin with either.
Many thanks in advance for any help you can give me!
<img src=/S/exclamation.gif border=0 alt=exclamation width=15 height=15>

20011211, 21:14 #2
 Join Date
 Jun 2001
 Location
 New York, New York, Lebanon
 Posts
 1,449
 Thanks
 1
 Thanked 1 Time in 1 Post
Re: huge duplicate problem (2000 sr1)
elisap
I would <font color=red> STRONGLY </font color=red> suggest you use SQL for this kind of a deal. Check what Access can do for you.
If by all means you don't have a strong SQL helper, Like I have in Satish <img src=/S/bravo.gif border=0 alt=bravo width=16 height=30>, do the following:
1) Use the Filter command on all 21 Columns. Filter each column and copy the first record, if there are more than one, of the dataset to another worksheet. The you will have to do all 21 columns to get to the unique records.
2) Sort by the first 3 columns. Make a column with numbers 1,2,3,4,...37000
Sort by the next 3 columns, Make another column with numbers 1,2,3,4,...37000
Sort by the next 3 columns, Make another column with numbers 1,2,3,4,...37000
and so on. When you are done 7 times over add the numbers in the new columns, and number 7 should be your first record and number 777000 will be your last record.
I really hope you use SQL.
Wassim<img src=/S/compute.gif border=0 alt=compute width=40 height=20> in the <img src=/S/bagged.gif border=0 alt=bagged width=22 height=22>

20011211, 23:04 #3
 Join Date
 Jan 2001
 Location
 South Carolina, USA
 Posts
 7,295
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: huge duplicate problem (2000 sr1)
The code below might give you a start on what you want to do. It will look down the rows in Sheet1 for duplicates. It will put the number of duplicates (not counting the first row of the set of duplicates) in column V of the first row of each set of duplicates (zero if there are no duplictes). For all of the duplicate rows, it will put the row number of the first row of the set of duplicates into column W.
<pre>Public Sub FindDups()
Dim I As Long, J As Long, K As Long, lMax As Long, lCount As Long
With Worksheets("Sheet1")
lMax = .Range("A65536").End(xlUp).Row
For I = 0 To lMax  1
.Range("A1").Offset(I, 21).Value = ""
.Range("A1").Offset(I, 22).Value = ""
Next I
For I = 0 To lMax  1
lCount = 0
If .Range("A1").Offset(I, 22) = "" Then
For J = I + 1 To lMax  1
If .Range("A1").Offset(J, 22) = "" Then
For K = 0 To 20
If .Range("A1").Offset(I, K) <> .Range("A1").Offset(J, K) Then
Exit For
End If
Next K
If K > 20 Then
lCount = lCount + 1
.Range("A1").Offset(J, 22).Value = I + 1
End If
End If
Next J
.Range("A1").Offset(I, 21).Value = lCount
End If
Next I
End With
End Sub
</pre>
Legare Coleman

20011211, 23:36 #4
 Join Date
 Feb 2001
 Location
 Dallas plus 20 miles or so, Texas, USA
 Posts
 876
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: huge duplicate problem (2000 sr1)
Well, this may not be what you are looking for, but I gathered from your post that each of the 21 columns constitute part of a record (row). If you are checking out rows to see which row is unique you might try something similar to the following:
<pre>=CONCATENATE(A1,B1,C1,D1,E1,F1,G1,H1)
</pre>
If you put that in a new workbook, say in J1, it will add the contents of A1 to H1 as text. If I am thinking correctly, a unique record would have to have the same characters in the same order. You might then sort on this column instead of all 21.
HTH,

20011212, 03:00 #5
 Join Date
 Feb 2001
 Location
 Youngstown, Ohio, USA
 Posts
 705
 Thanks
 1
 Thanked 0 Times in 0 Posts
Re: huge duplicate problem (2000 sr1)
That's part of the solution that I was thinking of, but you may want to take it another step. Sort on the concatenated field (think of it as a huge index field <img src=/S/smile.gif border=0 alt=smile width=15 height=15>) and then do an IF comparison:
<pre>If(j2=j1,"duplicate","unique")</pre>
Use the autofilter to select all records that are flagged as a "unique". Copy this selection to a new workbook.

20011212, 03:32 #6
 Join Date
 Jan 2001
 Location
 Perth, Western Australia, Australia
 Posts
 190
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: huge duplicate problem (2000 sr1)
It could be done with an array formula like
=sum((A1=(range of column a))*(B1=(range of column [img]/forums/images/smilies/cool.gif[/img])*(C1=(range of column C))* etc for the whole 31 columns ).
Dont forget to use controlshiftenter when you enter the formula as Excel will not recognise it as an array formula.
The formula will return the number of time that the row is duplicated. I have assumed that the data commences is in Cell A1, you will have to adjust it to suit.
Enter the formula in the top row. Make sure that the the ranges are absolute references and the cell reference is relative. Then just doubleclick the bottom right corner of the cell for the formula to autofill down the sheet.
it may take a minute or two to calculate so be patient.

20011212, 14:18 #7
 Join Date
 Aug 2001
 Posts
 11
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: huge duplicate problem (2000 sr1)
Thank you so much for your help. I think I'm going to try the concatenation, sort, comparison approach first since I understand it. The macro _almost_ makes sense, as does the array formula. I'll look into them further if concatenation doesn't do the trick.
Really appreciate it!