Results 1 to 6 of 6
  1. #1
    3 Star Lounger
    Join Date
    Jun 2003
    Location
    Utah, USA
    Posts
    352
    Thanks
    6
    Thanked 1 Time in 1 Post

    Conditionally Void a Cumulative Sum

    I have a large data set (800K rows), so I'm looking for 1) any solution to my problem, but 2) an efficient one would be better.

    See the attached file ...

    I have columns A and B filled with binary data (ones and zeros). The ones and zeros come in bunches. The two columns are not correlated.

    I want to fill column C from column A. If the cell in A is zero, set the cell in C to be zero (no problem). If the cell in A is a 1, start a cumulative sum in column C (so 0,1,1,1,0) in A converts to (0,1,2,3,0) in C. Again, not a problem.

    Here's the problem: I want to mask some of column C and put the result in column D. If there's any cell in column B that's a 1, and there's a cumulative sum next to it in column C, then I want the entire set of cells in the cumulative sum to be reset to 0 in column D, including the rows before and after the one where there was a 1 in column B. In the example above, a (1,0,0,0,0) or (0,0,0,0,1) in B does nothing to C, but a (0,1,0,0,0), or a (0,0,1,0,0) or a ((0,0,0,1,0) would convert D to a (0,0,0,0,0). So do other combinations with more than one 1 in any of those three spots.

    For me, the roadblock is that I never know how many cells in D — before the 1, or after the 1, shows up in B — need to be reset to 0.

  2. Subscribe to our Windows Secrets Newsletter - It's Free!

    Get our unique weekly Newsletter with tips and techniques, how to's and critical updates on Windows 7, Windows 8, Windows XP, Firefox, Internet Explorer, Google, etc. Join our 480,000 subscribers!

    Excel 2013: The Missing Manual

    + Get this BONUS — free!

    Get the most of Excel! Learn about new features, basics of creating a new spreadsheet and using the infamous Ribbon in the first chapter of Excel 2013: The Missing Manual - Subscribe and download Chapter 1 for free!

  3. #2
    WS Lounge VIP sdckapr's Avatar
    Join Date
    Jul 2002
    Location
    Pittsburgh, Pennsylvania, USA
    Posts
    11,212
    Thanks
    14
    Thanked 338 Times in 331 Posts
    How about this in D2:
    Code:
    =IF(OR(A2=0,B2=1,AND(A1>0,D1=0)),0,IF(SUM(OFFSET(A2,0,1,MATCH(0,OFFSET(A2,0,0,COUNT(A:A),1),0)-1,1))=0,1+D1,0))
    A formula for C2 (if desired) would be:
    =IF(A2=0,0,1+C1)

    Steve

  4. The Following User Says Thank You to sdckapr For This Useful Post:

    boobounder (2014-08-01)

  5. #3
    3 Star Lounger
    Join Date
    Jun 2003
    Location
    Utah, USA
    Posts
    352
    Thanks
    6
    Thanked 1 Time in 1 Post
    Promising (seems OK at the top). But ... it returned a #REF! from row 262,600 downward.

    I think I know where the problem is, but I'm not sure how to address it.

    Because there are 786,435 rows in my worksheet, the count(A:A) portion of the formula will return a large height =offset(), which may carry that range beyond the number of rows allocated by Excel.

    ***********************

    Also, I have 2 rows of labels at the top, and 1 row of labels at the bottom which will make the count(A:A) slightly off. But, I'm not quite sure how this is working yet, so I'm not sure how to adjust for those. What's the best way to make that modification?

  6. #4
    3 Star Lounger
    Join Date
    Jun 2003
    Location
    Utah, USA
    Posts
    352
    Thanks
    6
    Thanked 1 Time in 1 Post
    Need confirmation ...

    I believe I have solved this problem by changing the count(A:A) bit to count(A2:A$786434) for an entry in row 2.

    Am I missing anything that anyone can see?

  7. #5
    WS Lounge VIP sdckapr's Avatar
    Join Date
    Jul 2002
    Location
    Pittsburgh, Pennsylvania, USA
    Posts
    11,212
    Thanks
    14
    Thanked 338 Times in 331 Posts
    You could use:
    =IF(OR(A2=0,B2=1,AND(A1>0,D1=0)),0,IF(SUM(OFFSET(A 2,0,1,MATCH(0,OFFSET(A2,0,0,COUNT(A:A)-COUNT($A1:A1),1),0)-1,1))=0,1+D1,0))

    I reduce the count by the number of entries before. I just used count since I needed to ensure that I had the whole range of rows in a "bunch". If you know the bunch will have a max number, you could just use that number (10, 100, etc). The exact number is not critical, it just has to be large enough to see the transition in the "bunch" it currently is on.

    Steve

  8. The Following User Says Thank You to sdckapr For This Useful Post:

    boobounder (2014-08-01)

  9. #6
    3 Star Lounger
    Join Date
    Jun 2003
    Location
    Utah, USA
    Posts
    352
    Thanks
    6
    Thanked 1 Time in 1 Post
    Thanks a bunch.

    I'm actually parsing through a matrix that's written out as a vector. And I can replace the count() with a large integer, since what I'm counting up has to all fall within a single row that's about 1000 elements long. I'm sure that will save time.

    EOM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •