Results 1 to 14 of 14
Thread: Compare Lists

20130517, 08:51 #1
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
Compare Lists
Hi All,
I've searched for various ways of comparing lists. What I found a lot of is use of Conditional Formatting to compare 2 lists in a "static" way. What I need to do is a bit different.
Attached is a simple spreadsheet showing what I need.
The sheet "Group Rosters" has 4 lists of groups, each group having several names (Name1, Name2, etc.). A Name can belong to one or more lists.
The sheet "Sked" has 2 possible ways of arranging the groups. A group is shaded red (done manually here) to indicate there is at least 1 Namex in a group that is also in the next group; a group is shaded green (or can be left unshaded) to indicate there are no names in common between that group and the next group. It doesn't matter if the shading is done for a group and the next group, or a group and the previous group. I chose the former here; hence the last group will never be shaded since no group follows it.
For this application, groups can be rearranged to ensure no overlap (or minimal overlap, recognizing there may be no arrangement of groups that result in no overlap).
The sheet "Compare Groups" was also generated manually to show the overlap of Name's between any two groups but should be generated automatically (and this may be somewhere on the Internet although I didn't see it). This may help to rearrange groups to ensure the minimum overlap. If there was some way of generating a list of groups that ensures minimal overlap, that would be great. However, there can be 40 or 50 groups, so I'm not holding my breath for this.
It would be ideal to have some way of shading the groups on the "Sked" worksheet using Conditional Formatting rather than using a macro. The person actually using this will not know much Excel (right now, she's doing this all manually). Therefore, I also added a check on the "Sked" sheet to show that an arrangement of groups has no duplications of groups (eg, she may rearrange groups to avoid overlaps but forget to change a group). If there are duplications, having the number of duplications would be useful (under limited circumstances, a duplication may be ok but that's more the exception than the rule).
TIA
Fred

20130517, 10:48 #2
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
If you create names for the "groups" in the 'Group Roster' sheet:
Group1 ='Group Rosters'!$A$1:$A$4
Group2 ='Group Rosters'!$B$1:$B$3
Group3 ='Group Rosters'!$C$1:$C$3
Group4 ='Group Rosters'!$D$2:$D$4
Then in 'Compare Groups' you can create the ARRAY formula (confirm with ctrlshiftenter) in B2:
=IF($A2=B$1,"",SUM(ISNUMBER(MATCH(INDIRECT($A2),INDIRECT(B$1),0 ))*1))
This can be copied down the Bcolumn and the B column copied to the other columns to calculate the number for each comparison.
In the 'sked' sheet you can use conditional formuatting. Select A1:C4, home  conditional formatting New rule  use a formula...
nter the formula:
=SUM(ISNUMBER(MATCH(INDIRECT(A1),INDIRECT(A2),0))* 1)>0
Then [formal] fill  select red  okok
Then you can select A1:A4, home format painter  select C1:C4 and the format will color it red as well.
You can add the Green formatting if desired by changing the formula to:
=SUM(ISNUMBER(MATCH(INDIRECT(A1),INDIRECT(A2),0))* 1)=0
and formatting it green.
Steve

20130517, 11:43 #3
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
Steve,
Thanks a million.
If I spent an eternity, I probably would still have not come up with your solution. And if I did almost get it, I probably would not have had the *1 at the end of the array formulas and conditional formatting. So why is that necessary?
Another question: you mention that "In the 'sked' sheet, you can use...Select A1:C4, home..." But later you mention "select A1:A4, ...format painter...C1:C4...". Since I'm using the format painter, shouldn't the first C4 be just A4 and the format painter will take care of column C?
I am using Excel 2003 (the real user may actually be using Excel on a MAC). I know that 2007 onwards did a lot on conditional formatting (beyond just the # of conditions). So I don't know if 2007 (or MAC) would allow me to count the number of cells shaded red in a column so I can give the "No Dups" or "x Dups" warning at the bottom of the column of the "Sked" sheet. Assuming it's not allowed (certainly not in 2003), how would I do this? Note that columns can have a different number of groups (up to the 40 or 50 I mentioned in my original post); I just happened to show 2 columns each with 4 groups. I'm thinking an array formula is needed with use of the ROW function to see if any of the adjacent row pairs have duplicates.
Thanks again.
Fred

20130517, 15:53 #4
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
I probably would not have had the *1 at the end of the array formulas and conditional formatting. So why is that necessary?
Another question: you mention that "In the 'sked' sheet, you can use...Select A1:C4, home..." But later you mention "select A1:A4, ...format painter...C1:C4...". Since I'm using the format painter, shouldn't the first C4 be just A4 and the format painter will take care of column C?
I am using Excel 2003 (the real user may actually be using Excel on a MAC). I know that 2007 onwards did a lot on conditional formatting (beyond just the # of conditions). So I don't know if 2007 (or MAC) would allow me to count the number of cells shaded red in a column so I can give the "No Dups" or "x Dups" warning at the bottom of the column of the "Sked" sheet.
Perhaps:
=SUM(ISNUMBER(MATCH(INDIRECT(A1),INDIRECT(A2),0))* 1)+SUM(ISNUMBER(MATCH(INDIRECT(A2),INDIRECT(A3),0) )* 1)+SUM(ISNUMBER(MATCH(INDIRECT(A3),INDIRECT(A4),0) )* 1)&" Duplicates"
Assuming it's not allowed (certainly not in 2003), how would I do this? Note that columns can have a different number of groups (up to the 40 or 50 I mentioned in my original post); I just happened to show 2 columns each with 4 groups. I'm thinking an array formula is needed with use of the ROW function to see if any of the adjacent row pairs have duplicates.
Steve

20130517, 16:51 #5
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
The *1 is to ensure that the result is a number. Isnumber gives a true/false which when multiplied by 1 gives a 1/0 respectively. I thought you wanted to do it with cond. formatting to not have to manually do it.
Yes. Originally I was doing the cond format on A1:C4, but since B1:B4 are not formatted I changed the scheme to A1:A4 then copy format to C1:C4, so it should be A1:A4
The cond format will work in XL2003 (I have no experience with MACs so can't comment on that. Without user defined functions, colors whether explict or cond formatted can not be counted. You will have to calculated based on the original data. What is "X" and how would you count it?
Perhaps:
=SUM(ISNUMBER(MATCH(INDIRECT(A1),INDIRECT(A2),0))* 1)+SUM(ISNUMBER(MATCH(INDIRECT(A2),INDIRECT(A3),0) )* 1)+SUM(ISNUMBER(MATCH(INDIRECT(A3),INDIRECT(A4),0) )* 1)&" Duplicates"
But if there is a way to calculate whether there are dups in the column, X can be
 the fact that there is or is not dups between successive groups somewhere in the col (a "yes" or "no"); so both col A and C in the original attachment would show "yes" even tho there are different number of dups between successive groups
 the number of successive groups that have dups in their respective lists. So if the order of the groups is per col A, this "check" would show "2 dups"; if the arrangement is per col C, it would show "1 dups" (don't care about grammar); if no successive groups had dups, show "0 dups". This is preferable in that it alerts the user as to how many successive groups to check.
Thks.
Fred

20130517, 19:46 #6
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
If there are 50 groups, I would do the array calculations on the 'compare groups' sheet and then just see if the sum of the combinations of interest are zero.
Steve

20130518, 00:17 #7
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
Steve  I'm not sure I follow the suggestion.
The "compare groups" sheet has theoretical values: if Group2 follows Group1, then the value at the intersection shows the number of duplicates between the 2 groups. But the arrangement in the "sked" sheet does not have to have Group1 and Group2 consecutively  maybe Group3 is between those 2 groups.
Maybe I'm missing something in the last part of your suggestion? Are you suggesting adding a column in the "sked" sheet that records for each row the value from the "compare groups" sheet that contains that row's Group# and the following row's Group#, then adding the new column to see if it's 0 or >0? Could even hide the column and put a "verdict" at the bottom of the col with the Group#'s?
Fred

20130518, 05:00 #8
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
The compare groups has every combination. It will be zero if there are no duplicates.
If you are going to put an order down, In addition to using cond format to highlight the ones with duplicates, you could just calculate each pair in order and sum this column up...
Steve

20130518, 07:35 #9
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
Yes  since "compare groups" has every combination, it will be zero if there are no duplicates.
But it can be nonzero and a particular order can still have no duplicates whereas a different order does have duplicates.
So what I thought you were saying in the previous post is what I did and what you said in your latest post: add a column next to the group and use your SUM(ISNUMBER... formula to calculate the number for each pair and then this up. I could have gotten the value from the "compare groups" sheet by looking up the pair from its col/row entry but just decided to use the formula.
Thanks for the help. I've shown it to the person who'll be using it and she was quite happy. Now I just need to expand it to fit the actual usage.
Fred

20130607, 21:14 #10
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
Hi,
Picking up from where the thread was about 3 weeks ago, I've made some improvements over what I posted back then (see attachment). But there are still a few issues:
 On the "Compare Groups" sheet, as Steve mentioned in post #8, I calculate, for a given class, whether there are duplicate student names for that class compared to all classes (see the "Class Rosters" sheet). But this makes use of a named Range of the form "classn" where "n" is a unique number assigned for each class. In the example I posted originally, there were only 4 classes. But in the "real" case, there can be about 5060 classes; so I've allowed room for 72 classes just to be sure. However, creating 72 Range Names will be tedious. Is there some way to go about finding the number of duplicate names for a pair of classes that does NOT use Range Names (or VBA). I've started to play around with some formulas just to get the names of the students in the first class (see first INDIRECT in formula for cell E5 for example). See my Note in cols TW for what I've done so far.
 the Range Names of classn were defined specifically for the #rows that each class had (see post #2), including the header row. That is not useful either, so I redefined the Names to start with the actual first row for students and end with the max # students allowed  (max 30 students but probably no more than 20 students/class). See sheet "Class Rosters". While probably not the end of the world, the formula that computes whether there are duplicates in a pair of classes (again, see cell E5 of the "Compare Groups" sheet) uses 2 INDIRECTS. The first INDIRECT returns all the student names in the first class to be compared to the student names of the second class (returned by the second INDIRECT). So if one class has 10 students while the other has 5 students, the formula still compares 30 "names" for the first class with 30 "names" for the second class  the rows without names return 0 for each list. Would it be useful (more efficient, etc.) to define the Range Name for each class (or other mechanism per above) to be dynamic so as to only look at the number of names in each class? If so, how would this be done? Again, I've made some notes in the "Class Rosters" sheet on things I've tried so far.
As a side question of the way the duplicates are calculated (again see E5 of "Compare Groups"), the first INDIRECT returns 0 for rows that are blank in the "Class Rosters" sheet as does the second INDIRECT. How come the 0 from the first INDIRECT is not matched to a 0 from the second INDIRECT? Instead the MATCH is returning an #N/A error, which is fine for the way things need to work with the ISNUMBER.
 Lastly, is there a way in the "Compare Groups" sheet to only have as many columns and rows for classes as there are actual class names assigned in the "Class Names" sheet? No VBA allowed. I'm imagining all kinds of IF statements with Conditional Formatting but not sure about how to actually do this.
TIA
Fred

20130608, 06:23 #11
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
It can be done without named ranges if you are going to have a roster sheet like you have. You can put the array formula (confirm with ctrlshiftenter) in ClassbyClass Dupl Comparison! D5:
=SUM(ISNUMBER(MATCH(OFFSET('Class Rosters'!$A$1,2,$B5,COUNTA(INDEX('Class Rosters'!$B$3:$E$32,0,$B5)),1),OFFSET('Class Rosters'!$A$1,2,D$3,COUNTA(INDEX('Class Rosters'!$B$3:$E$32,0,D$3)),1),0))*1)
Then copy it down the rows and then copy that column across the columns.
The red "2" in the formula is because you have 2 headers in Class Roster sheet. It can be adjusted accordingly.
I assume by "Compare Groups" sheet you mean the ClassbyClass Dupl Comparison sheet. There is no way to adjust without VBA. You can "mask" the items with conditional formatting.
Select the range A1:R19.
Add the cond formatting formula:
=OR(A$3>COUNTA('Class Names'!$B$2:$B$73),$B1>COUNTA('Class Names'!$B$2:$B$73))
[Format]
Fonttab, color = white
Bordertab = None
Fill tab = No Color
[ok][ok][ok]
It could be simplified by naming the "ClassRoster" range ('Class Rosters'!$B$3:$E$32) which has all the names in it. The "2" in the formula (shown as red) could also be a range name (no brackets):[= Row('Class Rosters'!$B$3:$E$32)1]
Note: you could, of course, use its range name instead if named (no brackets), for example: [=row(ClassRosters)1]
SteveLast edited by sdckapr; 20130610 at 11:25. Reason: Corrected formula

20130610, 11:15 #12
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
Thanks a million Steve.
I think the two $B21 refs in your formula should have been $B5, but I fixed that.
I got the red 2 in your formula. I decided a way around worrying about that (i.e., if header rows are added or deleted) is to name the first cell containing a student name of the "Class_Rosters" sheet with something like first_student and then use ROW(first_student) to always know the row needed. So the first part of the OFFSET becomes OFFSET(first_student, ROW(first_student), ...) with corresponding adjustments. I think this was what you were referring to in your last 2 paragraphs.
I also decided to add a row for the count of the #students per class to the Class_Rosters using COUNTA. So the COUNTA(INDEX...) part became a little simpler (just an OFFSET to the correct column in that new row. But I did like the use of COUNTA(INDEX(...,0,...)) to return all the entries in the corresponding col  something to keep in mind for future use.
Yes, you did assume correctly on the switch in names of the last sheet. I thought I revised the name of that sheet from my original attachment but it looks like the original name stuck in the 2nd attachment.
As to just showing the number of rows/columns equal to the number of classes actually defined, I haven't decided if I just want to have the headers for the rows/columns with blanks/zeros in the table or do some "fancy" Conditional Formatting as you suggested. What I may do is put on the "Compare Groups" sheet the # classes actually defined from the "Class Names" sheet (which I now have also) so the user knows not to look at "garbage." In other cases, I also have done something like:
 select the row for a # columns equal to the max # classes allowed (72 in this case)
 enter the following array formula for the entire selection:
{=IF(column() <= nbr_classes, "header_info", "")}
 where column() is adjusted to account for the headers at the left  something like column()3; header_info is something that may involve some calculation depending on the col#
 do similarly for rows
But I'd still need Condtional Formatting if I want a border to appear/not appear depending on if the class is actually defined or not.
So all is left is my "side" question from the 4th paragraph in my post #10: why did the MATCH work in the original formula if both lists returned a "0" for rows where there was no name entered.
Thanks again.
Fred

20130610, 11:30 #13
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
Yes you were correct about the formula. I copied one from the wrong row. i corrected the formula.
As to your "side question", your formula is:
=SUM(ISNUMBER(MATCH(INDIRECT("class"&$B5),INDIRECT ("class"&D$3),0))*1)
If the range is blank, MATCH does not find a match and so gives a #N/A error. The ISNUMBER yields a FALSE for the error and the sum of the FALSEs add to zero (in excel, FALSE=0 and TRUE = 1 when converted to numbers).
Steve

20130610, 11:57 #14
 Join Date
 Jan 2001
 Location
 West Long Branch, New Jersey, USA
 Posts
 1,935
 Thanks
 6
 Thanked 9 Times in 7 Posts
Thks for the response on the side question.
I was looking at the results of the formula by using F9 while selecting a part of the formula in the Formula Bar. That showed 0's for the result of the INDIRECT for the entire range for both INDIRECT's. Seemed like there should have been a match. Of course, the real range had blanks, hence the #N/A. So that was a point of confusion for me. I did get the rest of the formula with the ISNUMBER, etc.
Thanks for all the help. I think this project is done as far as the Lounge goes.
Fred