Results 1 to 12 of 12

20090618, 05:01 #1
 Join Date
 Jun 2009
 Location
 Brisbane, Australia
 Posts
 18
 Thanks
 0
 Thanked 0 Times in 0 Posts
Hoping someone can help. I have a spreadsheet that contains about 400,000 rows of data (text) and multiple columns. I have a list of about 100 search terms that I would like to run over the data in one column, to identify and then extract a copy of the relevent rows into a separate worksheet. An example would be a list of 400,000 names (firstname, middle names and lastnames in the same cell), where I would like to find and extract a copy of all rows containing the names John; Mary; Fred; Jane regardless of where in the cell contents they appear.
This is a work related task, and in my paranoid organisation, access to VB is denied, so no chance of using a script. I was hoping for a formula (or multiple formulae) or macro(s). Any suggestions gratefully received.
Thanks in anticipation.

20090618, 05:14 #2
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
Welcome to the Lounge!
Could you post a small sample workbook with some dummy data so that we have an idea what your data look like? Thanks in advance.

20090622, 05:02 #3
 Join Date
 Jun 2009
 Location
 Brisbane, Australia
 Posts
 18
 Thanks
 0
 Thanked 0 Times in 0 Posts
Hi Hans (and any others with suggestions/solutions),
I have uploaded some dummy data (sorry using Excel 2003 not 2007 and obviously not a file with 800,000 rows. I would want to run a list of about 100 search terms over the contents of column A to identify and then copy the rows with the valid data to a new worksheet.
Option 1: In the sample data, I would be looking to identify every instance of the names John, Jane, Alice and Henry no matter where the name appears in the contents of each cell, then copy those rows to another sheet. That would be the optimal result.
Option 2: If that is not possible, then a formula that approximates this logic: If the contents of colum A contains "john" or "Jane" or etc etc then return the value of yes or no to column B. I can then filter and copy the relevant rows to a new sheet.
In the real worksheet, there are 800,000 rows of data in 3 columns, and I have approximately 100 search terms to run over the data. This is something that is likely to need to be repeated periodically, using different search terms, hence my desire to do multiple terms in one go.
Any suggestions gratefully received.

20090622, 15:15 #4
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
I don't know whether this will work with a list of 100,000+ rows (I suspect it'll be excruciatingly slow), but you could try the following:
 In B1, enter a column heading such as Occurs.
 In B2, enter the following array formula (confirm with Ctrl+Shift+Enter):
=SUM(1ISERROR(SEARCH({"John","Mary","Fred","Jane"},A2)))
 Fill down as far as needed (for example by doubleclicking the fill handle in the lower right corner of B2).
 You can now apply AutoFilter or Advanced filter to select the rows where Occurs is >0, and copy them elsewhere.

20090622, 18:18 #5
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
With 100 items and 800,000 rows I agree that Hans' solution will be excrutiatinly slow. I think a faster method may be more the brute approach.
Make 100 columns each with the term as the header. Create the isnumber search functions for each and an OR to combine them (see attached). You can filter on the Any column or the individual. This will still have many functions and be a little slow, but should be faster than the Array formulas.
Steve

20090623, 05:15 #6
 Join Date
 Jun 2009
 Location
 Brisbane, Australia
 Posts
 18
 Thanks
 0
 Thanked 0 Times in 0 Posts
[quote name='HansV' post='781074' date='22Jun2009 16:15']I don't know whether this will work with a list of 100,000+ rows (I suspect it'll be excruciatingly slow), but you could try the following:
 In B1, enter a column heading such as Occurs.
 In B2, enter the following array formula (confirm with Ctrl+Shift+Enter):
=SUM(1ISERROR(SEARCH({"John","Mary","Fred","Jane"},A2)))
 Fill down as far as needed (for example by doubleclicking the fill handle in the lower right corner of B2).
 You can now apply AutoFilter or Advanced filter to select the rows where Occurs is >0, and copy them elsewhere.[/quote]

20090623, 05:35 #7
 Join Date
 Jun 2009
 Location
 Brisbane, Australia
 Posts
 18
 Thanks
 0
 Thanked 0 Times in 0 Posts
Hi Hans,
Thanks  I tried your formula and it worked for stage one of my problem. I also used this =OR(ISNUMBER(SEARCH("john",A1)),ISNUMBER(SEARCH("j ane",A1)),ISNUMBER(SEARCH("alice",A1)))*1...etc, and it worked too  for the first stage.
Stage two of the problem is identifying where the incidences of john; jane etc appear in column A so that I can then extract those to another worksheet. I have no preference for how things are identified. To try and give you a better picture, column A has 800,000 names. The result of using either your formula or the one above is a new list of for example 5000 full names. I now need to find where in the 800,000 list, each of the 100 search terms occurs (by cell reference, colour  anything that anyone can think of).
My apologies if this is not really making sense, but I cannot reference the real data due to secrecy concerns, so I am trying to come up with dummy data, which is not as easy as I hoped it might be.
Thanks again

20090623, 05:41 #8
 Join Date
 Jun 2009
 Location
 Brisbane, Australia
 Posts
 18
 Thanks
 0
 Thanked 0 Times in 0 Posts
Hi,
Thanks  interesting take  might help me with Stage 2, which is locating where in the 800,000, each of the successful hits occurs, so that they can be extracted to another sheet or workbook (see my response to Hans for a more comprehensive explanation). I will try your approach tomorrow at work, and let you know how I go.
Oh and slow is a always a relative thing...manual edit find was the only solution my IT section could come up with...that and telling me it could not be done...tragic really.
Thanks for taking the time to help out.
Wyldchild

20090623, 06:30 #9
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
You could use conditional formatting to highlight the rows where the formula (whichever one you use) returns a value > 0.

20090624, 04:50 #10
 Join Date
 Jun 2009
 Location
 Brisbane, Australia
 Posts
 18
 Thanks
 0
 Thanked 0 Times in 0 Posts
[quote name='HansV' post='781195' date='23Jun2009 06:30']You could use conditional formatting to highlight the rows where the formula (whichever one you use) returns a value > 0.[/quote]
Hi,
Thanks  I have tried adding conditional formatting and while slow and labour intensive, it does work.
Another question for you  is it possible to write into the formula to change the colour of the text?? Remembering that I have a list of 5000 items I am trying to locate in a list of 800,000, where many of the 5,000 items appear multiple times.
I've tried a number of options, but nothing has worked so far  and it's one of those puzzles that just wont let go  I keep trying to solve it while doing the manual slog work.
Thanks again for your help
Wyldchild

20090624, 04:55 #11
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
You can't specify font formatting in a formula, and you can't color parts of the result of a formula. Sorry!

20090624, 05:21 #12
 Join Date
 Jun 2009
 Location
 Brisbane, Australia
 Posts
 18
 Thanks
 0
 Thanked 0 Times in 0 Posts
[quote name='HansV' post='781418' date='24Jun2009 04:55']You can't specify font formatting in a formula, and you can't color parts of the result of a formula. Sorry![/quote]
Hi,
Thanks for confirming that  I had been hoping to achieve something similar to showing negative numbers in red, but I will stop chasing that particular rabbit down the hole. You've saved me time and no doubt a bit of sanity.
Cheers
Wyldchild