1. statistical analysis (Excel 2000)

Hi all,
I happened to land this site during my desperate search last night for any help regarding my situation. I got so glued to the site that I read 20 pages of responses in 24 hours. Clearly, I was not expecting to learn coding overnight, but it was the overwhelming support from the gurus that amazed me. Details of my situation is as follows (any rescue effort would be greatly appreciated)

I need to do statistical analysis of a component test data file. For a given part, at the start of the testing the high and low limits for various parameters were set based on judgement (since no data is existed). It should also noted that the limits are generally different for different Modes during same test for each parameter. Now, the idea is to calculate +/- 3 sigma and average from the real time data as sample gets bigger with every test and find out how far the initial limits are from these statistical limits. Ideally, the limits should be tweaked based on stat data. The test data files (like the one attached for view) are saved in excel format at the server and could reach upto 400 files a day. Another thing to mention is that the total no of files could reach 50,000 to 60,000 by the time whole batch of parts is done. Each data file contains a unique Test Run No., however Part No., operator, software version, could be repeated. Columns C, F, I, L, O, R, U, X are data columns, whereas D-E, G-H, J-K, M-N, P-Q, S-T, V-W, Y-Z are low and high limits for each data column respectively. Information in cells A1, A2, A3, B1 & B2 also needs to be captured in this resulting stat analysis workbook.
Many many thanks for the help.

2. Re: statistical analysis (Excel 2000)

COuld you add some more detail as to what you want to actually do with these files?

I expect you need to add everything up somehow and calculate averages and sigma's, right?

3. Re: statistical analysis (Excel 2000)

I second Jan's request for some details on what you need.

I will "take a leap" and give a thought on some preliminary analyses:

If you want to "highlight", on the current list, the items that are "outside the ranges", you could use conditional formatting.

Select the range of cells (C5:Z57)
Format - conditional formatting...
Select "formula is" from the pick list
Enter in the box to the right of the picklist:
<pre>=AND(ISNUMBER(FIND(":",C\$4)),OR(C5<D5,C5>E5)) </pre>

[format..]
Patterns (tab)
Select a color
[ok][ok]
Now cells that are in the columns containing a time (they have a ":") that are either < than the Low value or > than the High value, are colored in the color you selected.

Steve

4. Re: statistical analysis (Excel 2000)

THANKS Jane!
Lets see if the following explanation helps!!

One way I think of doing this is shown in the attached file "sampledata2".

Create a new worksheet with all the parameters in column "A" as shown in the attached file
Column "B" will get the data for each parameter by copying the data locations for all the modes and then paste special/transpose in columns B
Repeat same process for each parameter in a test (I showed upto parameter 3)
Repeat whole process for next test file. (I showed upto test 4)
Once you have 2 data sets the sheet will do calculation in columns I, J, K.

With the number of tests in mind (50,000 to 60,000) I am sure I will end up using 8 or 10 workbooks, if not more (I want to keep the file size manageable)

This may not be the only way or the best way to accomplish what I am trying to do.
In my opinion, no matter how I approach it, I need to somehow prompt the user (me) to get the data file and then align the data as shown in "sampledata2"
The next thing is, I want to take the +/- 3 sigma values for each parameter at each mode back in the original data lay out for comparison, i.e. new calculated limits and old limits columns next to each other.

Any and all ideas are welcome.

Thanks

5. Re: statistical analysis (Excel 2000)

When I hear 50-60,000 tests, I start thinking you might want something other than excel. Excel is limited to 65,536 rows rows and if each test has multiple parameters, you are rapidly over excel's limits.

A database such as access might be better to setup in as it is more suitable for handling larger datasets.

But even sticking with excel (as we are discussing in these boards), it is not completely clear to me what you start with and what you end up with. If you are starting with the attachment from post1, it seems that you are extracting data into 1 column of the second attachment. The second attachment looks more like a results table (It looks like the results excel would give from a pivot table) and (to me) it would make more sense to take the "original source data" and convert it into some form of datatable. The datatable could be converted into this summary table directly, but could also more easily convert into other tables for analysis.

I think more clarification of what your goals are and what you have is needed (at least for me). I have (I think) a vague notion of what you are after, but I am not entirely sure of what you have to start and what you need at the end. I am not entirely sure (as I mentioned) if excel is the program you should be using.
Steve

6. Re: statistical analysis (Excel 2000)

Hi Steve
Thanks a bunch for ideas.
Keep in mind you are dealing with a rookie here. Let me try to explain further.

The whole idea is to use the data to build statistical basis for the limits. The tests that are generating these data files already have passed/failed criteria defined for each parameter at each mode. However, the pass /fail limits were guessed based on similar parts performance during same tests. I don

7. Re: statistical analysis (Excel 2000)

I am still not clear what your goal is. You seem to want to analyze a great deal of data and to me (though i could be speaking from ignorance of your processes) this seems like overkill.

Instead of looking at individual "times" (I think they are times) why not create subgroups of a day's data and use that to create statistical control charts (plotting the average of the day and an indication of day's variance) for particular tests
[There are plenty of online info to google on "Statiistical process control"]

Also why look at all the tests. Based on the 80/20 rule, I woul expect 80% of the problems to come from 20% of the items. Concentrate first in reducing the variation of the largest items then gradually work on the others.

It is important to first get your "process" into control and you can do that with a small number of key ingredients/tests.

But you haven't really asked any specific "excel questions" (at least that I can see) your questions and comments are more "general". If you have specific questions that you need answered, I think you need to more specific in the those questions. The more general the questions the more general the answers, the more specific they are the more specific we can be.

I still think a database package or a statistical control package might be better suited to your needs (especially if you are a rookie). I would imagine there are many pre-packaged programs for these type of analyses.

just my <img src=/S/2cents.gif border=0 alt=2cents width=15 height=15> and I hope this is helping,

Steve

8. Re: statistical analysis (Excel 2000)

I agree with Steve on this.

When e.g. you have at least 20 independent measurements, you can reliably start predicting the standard deviation of the entire population.

To be on the safe side, choose the 20 (or more) measurements spaced over a longer time span, e.g. a whole day or a week. That should give you a pretty good starting point.
Then for those measures you expect to have a large variation, check the S.D. you just calculated against a larger dataset.

If you still want to use all data, I urge you to consider a statistical package. I know STATISTICA has what they call the Mega File Manager, which allows millions of rows and many more columns than 255. See http://www.statsoft.com.

They also have an excellent online statistical resource: The Electronic Statistics Textbook highly recommended!

9. Re: statistical analysis (Excel 2000)

I like to thank Jan and Steve for taking the time and steering me in the right direction. I discussed the situation with my management based on the knowledge I gain from you folks and last Friday got the news that we are going to explore Minitab and Statictica.
Thanks
Siz

10. Re: statistical analysis (Excel 2000)

You're welcome and good luck evaluating your data.

Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•