Results 1 to 12 of 12
  1. #1
    New Lounger
    Join Date
    Jun 2009
    Location
    Brisbane, Australia
    Posts
    18
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hoping someone can help. I have a spreadsheet that contains about 400,000 rows of data (text) and multiple columns. I have a list of about 100 search terms that I would like to run over the data in one column, to identify and then extract a copy of the relevent rows into a separate worksheet. An example would be a list of 400,000 names (firstname, middle names and lastnames in the same cell), where I would like to find and extract a copy of all rows containing the names John; Mary; Fred; Jane regardless of where in the cell contents they appear.

    This is a work related task, and in my paranoid organisation, access to VB is denied, so no chance of using a script. I was hoping for a formula (or multiple formulae) or macro(s). Any suggestions gratefully received.

    Thanks in anticipation.

  2. #2
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts
    Welcome to the Lounge!

    Could you post a small sample workbook with some dummy data so that we have an idea what your data look like? Thanks in advance.

  3. #3
    New Lounger
    Join Date
    Jun 2009
    Location
    Brisbane, Australia
    Posts
    18
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi Hans (and any others with suggestions/solutions),

    I have uploaded some dummy data (sorry using Excel 2003 not 2007 and obviously not a file with 800,000 rows. I would want to run a list of about 100 search terms over the contents of column A to identify and then copy the rows with the valid data to a new worksheet.

    Option 1: In the sample data, I would be looking to identify every instance of the names John, Jane, Alice and Henry no matter where the name appears in the contents of each cell, then copy those rows to another sheet. That would be the optimal result.

    Option 2: If that is not possible, then a formula that approximates this logic: If the contents of colum A contains "john" or "Jane" or etc etc then return the value of yes or no to column B. I can then filter and copy the relevant rows to a new sheet.

    In the real worksheet, there are 800,000 rows of data in 3 columns, and I have approximately 100 search terms to run over the data. This is something that is likely to need to be repeated periodically, using different search terms, hence my desire to do multiple terms in one go.

    Any suggestions gratefully received.
    Attached Files Attached Files

  4. #4
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts
    I don't know whether this will work with a list of 100,000+ rows (I suspect it'll be excruciatingly slow), but you could try the following:
    - In B1, enter a column heading such as Occurs.
    - In B2, enter the following array formula (confirm with Ctrl+Shift+Enter):

    =SUM(1-ISERROR(SEARCH({"John","Mary","Fred","Jane"},A2)))

    - Fill down as far as needed (for example by double-clicking the fill handle in the lower right corner of B2).
    - You can now apply AutoFilter or Advanced filter to select the rows where Occurs is >0, and copy them elsewhere.

  5. #5
    WS Lounge VIP sdckapr's Avatar
    Join Date
    Jul 2002
    Location
    Pittsburgh, Pennsylvania, USA
    Posts
    11,225
    Thanks
    14
    Thanked 342 Times in 335 Posts
    With 100 items and 800,000 rows I agree that Hans' solution will be excrutiatinly slow. I think a faster method may be more the brute approach.

    Make 100 columns each with the term as the header. Create the isnumber search functions for each and an OR to combine them (see attached). You can filter on the Any column or the individual. This will still have many functions and be a little slow, but should be faster than the Array formulas.

    Steve
    Attached Files Attached Files

  6. #6
    New Lounger
    Join Date
    Jun 2009
    Location
    Brisbane, Australia
    Posts
    18
    Thanks
    0
    Thanked 0 Times in 0 Posts
    [quote name='HansV' post='781074' date='22-Jun-2009 16:15']I don't know whether this will work with a list of 100,000+ rows (I suspect it'll be excruciatingly slow), but you could try the following:
    - In B1, enter a column heading such as Occurs.
    - In B2, enter the following array formula (confirm with Ctrl+Shift+Enter):

    =SUM(1-ISERROR(SEARCH({"John","Mary","Fred","Jane"},A2)))

    - Fill down as far as needed (for example by double-clicking the fill handle in the lower right corner of B2).
    - You can now apply AutoFilter or Advanced filter to select the rows where Occurs is >0, and copy them elsewhere.[/quote]

  7. #7
    New Lounger
    Join Date
    Jun 2009
    Location
    Brisbane, Australia
    Posts
    18
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi Hans,

    Thanks - I tried your formula and it worked for stage one of my problem. I also used this =OR(ISNUMBER(SEARCH("john",A1)),ISNUMBER(SEARCH("j ane",A1)),ISNUMBER(SEARCH("alice",A1)))*1...etc, and it worked too - for the first stage.

    Stage two of the problem is identifying where the incidences of john; jane etc appear in column A so that I can then extract those to another worksheet. I have no preference for how things are identified. To try and give you a better picture, column A has 800,000 names. The result of using either your formula or the one above is a new list of for example 5000 full names. I now need to find where in the 800,000 list, each of the 100 search terms occurs (by cell reference, colour - anything that anyone can think of).

    My apologies if this is not really making sense, but I cannot reference the real data due to secrecy concerns, so I am trying to come up with dummy data, which is not as easy as I hoped it might be.

    Thanks again

  8. #8
    New Lounger
    Join Date
    Jun 2009
    Location
    Brisbane, Australia
    Posts
    18
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hi,

    Thanks - interesting take - might help me with Stage 2, which is locating where in the 800,000, each of the successful hits occurs, so that they can be extracted to another sheet or workbook (see my response to Hans for a more comprehensive explanation). I will try your approach tomorrow at work, and let you know how I go.

    Oh and slow is a always a relative thing...manual edit find was the only solution my IT section could come up with...that and telling me it could not be done...tragic really.

    Thanks for taking the time to help out.

    Wyldchild

  9. #9
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts
    You could use conditional formatting to highlight the rows where the formula (whichever one you use) returns a value > 0.

  10. #10
    New Lounger
    Join Date
    Jun 2009
    Location
    Brisbane, Australia
    Posts
    18
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post

    [quote name='HansV' post='781195' date='23-Jun-2009 06:30']You could use conditional formatting to highlight the rows where the formula (whichever one you use) returns a value > 0.[/quote]

    Hi,

    Thanks - I have tried adding conditional formatting and while slow and labour intensive, it does work.

    Another question for you - is it possible to write into the formula to change the colour of the text?? Remembering that I have a list of 5000 items I am trying to locate in a list of 800,000, where many of the 5,000 items appear multiple times.

    I've tried a number of options, but nothing has worked so far - and it's one of those puzzles that just wont let go - I keep trying to solve it while doing the manual slog work.

    Thanks again for your help

    Wyldchild

  11. #11
    Plutonium Lounger
    Join Date
    Mar 2002
    Posts
    84,353
    Thanks
    0
    Thanked 29 Times in 29 Posts
    You can't specify font formatting in a formula, and you can't color parts of the result of a formula. Sorry!

  12. #12
    New Lounger
    Join Date
    Jun 2009
    Location
    Brisbane, Australia
    Posts
    18
    Thanks
    0
    Thanked 0 Times in 0 Posts
    [quote name='HansV' post='781418' date='24-Jun-2009 04:55']You can't specify font formatting in a formula, and you can't color parts of the result of a formula. Sorry![/quote]

    Hi,

    Thanks for confirming that - I had been hoping to achieve something similar to showing negative numbers in red, but I will stop chasing that particular rabbit down the hole. You've saved me time and no doubt a bit of sanity.

    Cheers

    Wyldchild

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •