Results 1 to 11 of 11

Thread: Statistics

  1. #1
    Chris Glover
    Guest

    Statistics

    I have a frequency distribution table. One column gives the age, the second column the frequency. How do I get Excel97 to give me the mean and SD for this data

  2. #2
    4 Star Lounger
    Join Date
    Jan 2001
    Location
    Kortrijk, Belgium
    Posts
    571
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Statistics

    Either you work with the raw data and calculate the average and stdev with the worksheetfunctions or create a pivottable. Therefore, you need to define classes for your age date, e.g. class 1: from 0 to 10 years old, class 2 : from 11 to 20 years old etc. This may correspond to the classes you used to calculate the frequency (eventually these classes only have a width of 1). Then put your data in two columns, one with the age data (and with label age in the first row) and the second column with the class data, that is, put next to each age-value the corresponding class value. Then select Data >> Pivottable, and drag the label class to the Row field and drag it also to the data field. Double click it and change the function to 'count'. Now drag the label age to the data field, double click it and change the function into 'average'. repeat the last step and change the function into 'stdev'. This should give you a pivottable with all you want.

  3. #3
    4 Star Lounger
    Join Date
    Jan 2001
    Location
    Kortrijk, Belgium
    Posts
    571
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Statistics

    With an example
    Attached Files Attached Files

  4. #4
    Chris Glover
    Guest

    Re: Statistics

    Thanks for the suggestion. I was hoping that Excel could calculate the mean and SD direct rather than long hand. The long hand involves using the mean m = sum of the frequencies times the values and divided by the sum of frequencies i.e. m=Sum(f*x)/sum(f). Excel only seems to calculate the simple mean of sum of values divided by number of values. Similarly the SD calculation does not cater for frequencies of data. Any ideas?

    Thanks

    Chris

  5. #5
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Statistics

    See if =sumproduct() helps for true mean. See <A target="_blank" HREF=http://www.wopr.com/cgi-bin/w3t/showflat.pl?Cat=&Board=xl&Number=36909&page=1&view =expanded&sb=5>this thread</A> for a similar discussion.
    -John ... I float in liquid gardens
    UTC -7ąDS

  6. #6
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Statistics

    It's been a mere 30 years since I've had to work with standard deviation, so I can't help there, but <A target="_blank" HREF=http://www.beyondtechnology.com/tips016.shtml>beyondtechnology</A> looks to have "soup to nuts" coverage.
    -John ... I float in liquid gardens
    UTC -7ąDS

  7. #7
    Star Lounger
    Join Date
    Jan 2001
    Location
    Adelaide, South Australia, Australia
    Posts
    85
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Statistics

    I don't think Excel has a formula for calculating mean and sd from grouped data. The long-hand formulae are

    mean = sum(f*x)/sum(f)

    sd = sqrt( (sum(x*x*f) - (sum(f*x)^2)/sum(f))/(sum(f)-1) )
    or
    sd = sqrt( sum(f*((x-mean)^2))/(sum(f)-1) )
    (the second version is less prone to rounding error, but requires you to have the mean calculated separately)

  8. #8
    Uranium Lounger
    Join Date
    Dec 2000
    Location
    Salt Lake City, Utah, USA
    Posts
    9,508
    Thanks
    0
    Thanked 6 Times in 6 Posts

    Re: Statistics

    Can you see if this SD is correct? I can't figure how to enable the second formula, because I don't understand how to derive "(x-mean)" for multiple x values.
    Attached Files Attached Files
    -John ... I float in liquid gardens
    UTC -7ąDS

  9. #9
    3 Star Lounger
    Join Date
    Jan 2001
    Location
    Baltimore, MD, Maryland, USA
    Posts
    254
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Statistics

    I have attached a word document with instructions on how to compute the weighted Std Deviation in XL. It is essentially the same as below, except for the failure of the subscripts to carry over from what I copied and pasted from Word below:

    If you have NI values of age AI, then you get the weighted mean, WA, as follows:
    1. create an additional column;
    2. insert = NI * AI and copying it down the length of your age column;
    3. sum the new column; and
    4. divide that sum by the sum of the frequencies

    The data points for the weighted variance calculation are (NI * AI ), so the variance can be computed as follows:
    a) create an other column for (AI - WA) and another for [NI * (AI - WA)]^2; and
    [img]/forums/images/smilies/cool.gif[/img] add these [NI * (AI - WA)]^2 up and divide by sum(NI )

    You get the weighted Std Deviation by taking the square root of the number calculated in [img]/forums/images/smilies/cool.gif[/img], above.
    Attached Files Attached Files

  10. #10
    Star Lounger
    Join Date
    Jan 2001
    Location
    Adelaide, South Australia, Australia
    Posts
    85
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Statistics

    Hi John:

    Yes, your formula is correct. The second version of the formula for your spreadsheet is

    {=SQRT( SUM(Units*(Cost-B31)^2)/(SUM(Units)-1))}

    since you have stored the mean of the Costs in B31.
    This gives the same value as you get.

    Ian.

  11. #11
    Star Lounger
    Join Date
    Jan 2001
    Location
    Adelaide, South Australia, Australia
    Posts
    85
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Statistics

    epic60sman's version is the same formula as mine, but with the terms in the sum spread out over a column of cells. It would be better for explaining the calculations, though takes up more space on the sheet.

    Incidentally, does anyone know if there is any performance difference between using SUMPRODUCT or SUM in formulae like these?

    eg is it 'better' to use
    = SUMPRODUCT(A1:A1000,B1:B1000)
    or
    {=SUM(A1:A1000*B1:B1000)}
    where the latter is entered as an array formula (ctrl/shift/enter)

    Ian.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •