Results 1 to 15 of 17
Thread: Bell Curve/Trend (2K3)

20071203, 18:18 #1
 Join Date
 Feb 2002
 Location
 A Magic Forest in Deepest, Darkest Kent
 Posts
 5,681
 Thanks
 0
 Thanked 1 Time in 1 Post
Bell Curve/Trend (2K3)
I am out of my comfort zone when it comes to statistics so my apologies in advance.
I have a series the X axis is Minutes and the Y is the count of people who ran that time. I have charted these figures and it "vaguely" looks like a Bell curve. The sample contains 1560 people so a reasonable sample what is the method to depict a smoothed out curve following this data in the form of a trend line.
Hoping this is enough information, data available if required.Jerry

20071203, 18:27 #2
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 31 Times in 31 Posts
Re: Bell Curve/Trend (2K3)
This is not really symmetrical, so another distribution than the normal distribution is probably a better fit, for example a binomial distribution.
Do you have a statistics program such as SPSS or SAS at your disposal?

20071203, 19:01 #3
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
Re: Bell Curve/Trend (2K3)
Looks like it could be fit with Weibull or lognormal or even the Gamma function.
perhaps Distribution Fitting or 8.2.1. How do you choose an appropriate life distribution model? can help you out...
You can do some fitting of the curves using solver (similar to what <!profile=HansV>HansV<!/profile> did for exponential fitting in <post:=679,974>post 679,974</post:>)
Steve

20071204, 19:39 #4
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
Re: Bell Curve/Trend (2K3)
I had a chance to play with this.
As I alluded to here is an example (using an estimate of data points from your picture) with the lognormal and Weibull distributions. I included the normal (bell curve) distribution to show you the differences. The Solver is setup to minimize the squared deviation of all 3 equations equations together. The lognormal appears to fit your data the best. It gets the right tail better and the leftmost points the best in addition to having the lowest sum of squares.
Steve

20071205, 01:30 #5
 Join Date
 Feb 2002
 Location
 A Magic Forest in Deepest, Darkest Kent
 Posts
 5,681
 Thanks
 0
 Thanked 1 Time in 1 Post
Re: Bell Curve/Trend (2K3)
Wow
That is very impressive Steve, thank you. That has whetted my appetite to investigate this further, but this model is fantastic as I want to introduce this in the results of some of the time trials we do... <img src=/S/thankyou.gif border=0 alt=thankyou width=40 height=15>Jerry

20071205, 01:51 #6
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
Re: Bell Curve/Trend (2K3)
Thanks for the kind words.
This method of calculating the minimum in the squares of the deviation is general and works with any model whether or not it can be linearized. (The LINEST function in Excel only works with linearized equations). It is a type of nonlinear regression. If you have a particular theoretical model, I would start with that. In your example, like Hans also mentioned, it does not appear to be "normal" (bellshaped) as it has signicant skewness to the right. This occurs alot in nature, especially when there is natural minimum: (the best when everything goes well), but there can be some things which can "delay it" causing small numbers of extreme cases. (eg the time to travel from home to work: no traffic 15 mins, normal traffic 1725 min, but some days (snow/traffic accidents, etc) it can be hours!). This type of shape is why "average salaries" are generally reported as the median (50 percentile) rather than the mean. The mean is affected dramatically by the few extreme cases of multimillionaires and billionaires.
One caveat about this method and Solver: the final results may depend on the initial "seed values"/approximations you use. I tend to try to "play" with the parameters until the fitted chart looks "close" and then run the solver. Sometimes manually tweaking thhis can also help and then run again. Changing the Solver options for the iterations can also affect the results.
Also even though I based my fitted line on the points from data for the example, this is not required nor always a good idea. The values are needed for calculations but if the dataset is limited or you want to extrapolate outside the range, I typically will create a separate range from start to finish in the X spaced with enough points to give a nice smooth curve.
Good Luck. Post back if you need further assistance.
Steve

20071205, 12:48 #7
 Join Date
 Feb 2001
 Location
 Weert, Limburg, Netherlands
 Posts
 4,812
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Bell Curve/Trend (2K3)
Hi Steve,
Have you ever seen a way to compute R squares or better yet the f value of fits like these from their squared differences? Sort of an ANOVA?Jan Karel Pieterse
Microsoft Excel MVP, WMVP
www.jkpads.com
Professional Office Developers Association

20071205, 16:50 #8
 Join Date
 Feb 2001
 Location
 Weert, Limburg, Netherlands
 Posts
 4,812
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Bell Curve/Trend (2K3)
Hi Steve,
Good explanation. If I find time I will implement that in a more generic approach I have taken in the attached workbook, which also allows you to solve problems like these (also using Solver).Jan Karel Pieterse
Microsoft Excel MVP, WMVP
www.jkpads.com
Professional Office Developers Association

20071205, 17:01 #9
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 31 Times in 31 Posts
Re: Bell Curve/Trend (2K3)
Thanks Steve and Jan Karel, this is very useful.

20071205, 17:26 #10
 Join Date
 Mar 2002
 Location
 Buenos Aires, Argentina
 Posts
 877
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Bell Curve/Trend (2K3)
<hr>Do you have a statistics program such as SPSS or SAS at your disposal?<hr>You might want to try <img src=/S/free.gif border=0 alt=free width=30 height=15>ware, open source R. It does have a steep learning curve, but it looks quite promising. I'm studying it on my own and I can give some references I've found regarding distributions fitting if you are interested.<img src=/w3timages/blue3line.gif width=33% height=2>
<img src=/S/flags/Argentina.gif border=0 alt=Argentina width=30 height=18> <big><font color=4682b4><font face="Comic Sans MS">Diegol</font face=comic></font color=4682b4> </big>

20071205, 17:33 #11
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 31 Times in 31 Posts
Re: Bell Curve/Trend (2K3)
R is very useful, but it's geared towards programmers. It's definitely not suitable for casual use. (I know, one might argue that this is true of any statistics package, but it's more so for R).

20071205, 19:34 #12
 Join Date
 Mar 2002
 Location
 Buenos Aires, Argentina
 Posts
 877
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Bell Curve/Trend (2K3)
Yes.
I mentioned it in case someone was interested in taking the plunge, but a fair amount of "exposure" is required before being able to do something worth considering. Even importing the information is a real chore. JIC, those interested check the R Commander package, which makes many simple tasks (i.e., importing data), well, simpler.
I've never used SAS / SPSS / Minitab / Statistica, so I cannot tell how R compares to these.<img src=/w3timages/blue3line.gif width=33% height=2>
<img src=/S/flags/Argentina.gif border=0 alt=Argentina width=30 height=18> <big><font color=4682b4><font face="Comic Sans MS">Diegol</font face=comic></font color=4682b4> </big>

20071206, 13:22 #13
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
Re: Bell Curve/Trend (2K3)
<P ID="edit" class=small>(Edited by sdckapr on 06Dec07 06:22. Modified my post, I was a little mistaken on calculation of Degrees of freedom)</P>I see some discrepancies in what I would calculate in ANOVA table (though I am not a statisitician, so I may be mistaken). I have been thinking about the discrepancies and the primary difference is in the "degrees of freedom". After further reading I am editing this and my earlier post since I missed some of more general detail (my lack of formal statistical training shows!)
in B2:
=B5B3
in B3:
=COUNT(yValues)COUNT(Constants)
in C3:
=SUM(yDelta)
You could also add the "prob" in G2.
=FDIST(F2,B2,B3)
The formula I am not sure how to make general is B5. The "1" is not always there. The "1" comes from the use of an "intercept" (an independent constant, which depends on its use in the equation not its presence as a constant) You may want to create an "independent parameter" (essentially an "offset" in the Ydirection)
This will then match the numbers (especially the F) found using LINEST as well as other stat software (our company uses JMP). Your method does not match their Fvalue, while it will with the above changes
Some remarks:
The DF in the residual/error (DFres) is always the total degrees of freedom (LINEST reports this if used). It is the count of the Y values  number of parameters (N  k).
The regression or model degrees of freedom (DFreg) is always the total degrees of freedom  the residual degrees of freedon (DFtotal  DFres)
The DFtotal is NOT always the count of Yvalues  1! The "1" comes from the use of the INTERCEPT (a parameter independent of X). If there is a parameter independent of X then the DFtotal is (N 1) otherwise it is N
[This is the related to why the estimate of the sample std dev is "n1" weighted, since it loses a degree of freedom by using the mean and estimate of the population mean]
So if there are 10 yvalues and you are fitting to an equation:
y = mx + b
DFres = 8 [=102 (m,[img]/forums/images/smilies/cool.gif[/img]]
DFtotal = 9 [10 1 ([img]/forums/images/smilies/cool.gif[/img]]
DFreg = 1 [98]
But if you fit to the equation:
y = mx
DFres = 9 [=101 (m)]
DFtotal = 10 [No "1", no constant]
DFreg = 1 [10  9]
I apologize ofr any confusion I may have created.
Steve

20071206, 14:11 #14
 Join Date
 Feb 2001
 Location
 Weert, Limburg, Netherlands
 Posts
 4,812
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Bell Curve/Trend (2K3)
Well, whichever package you use, any of them is useless without proper statistical insight/training.
I have used StatGraphics (do they even exist anymore?) and Statistica (very complete, but a bit overwhelming) quite a bit, along with Design Expert, which is a DOE software package (the best around IMO). I have worked a bit with (I think it was called) RS/1, which was very good, but back in the mainframe days, so text based and cumbersome to work with.Jan Karel Pieterse
Microsoft Excel MVP, WMVP
www.jkpads.com
Professional Office Developers Association

20071206, 14:47 #15
 Join Date
 Jul 2002
 Location
 Pittsburgh, Pennsylvania, USA
 Posts
 11,225
 Thanks
 14
 Thanked 342 Times in 335 Posts
Re: Bell Curve/Trend (2K3)
<P ID="edit" class=small>(Edited by sdckapr on 06Dec07 07:47. Changed some info on Degrees of Freedom after doing more investigation.
Edited by sdckapr on 05Dec07 09:05. Corrected a mistake. Brainfart, I was using SS for the F instead the MS.)
)</P>Yes they are both readily calculated.
This is probably more than you want or need but just some FYI if people try adapt/expand this technique and want to crunch some of the numbers:
The value we are minimizing is the sum of squares of the deviations. it is the sum [(YobsYpred)²] is is the "Sum of Squares of the residuals" (SSres). [We are presuming that the Ys are the dependent variables and the Xs are independent so all the error is in the Y]. The Total Sum of squares of the data is calculated from the Y values using the VARP and COUNT functions [SStotal = VARP(Ys)*COUNT(Ys)].
For the Example file:
SStotal = VARP($B$7:$B$61)*COUNT($B$7:$B$61) = 10,134.91
For the lognormal example fitting is in H6
SSres = SUM(H7:H61) = 3,327.18
The R² value is the ratio of the Regression sum of squares (SSreg) to the SStotal [R² = SSreg/SStotal]. It is essentially what percentage of the total variation is explained by the model chosen. Since the SSres is the amount not explained by the mode, SSreg = SStotal  SSres and therefore:
R² = SSreg/SStotal
R² = (SStotal  SSres)/SStotal
R² = 1 SSres/SStotal
R² = 1 3,327.18/10,134.91
R² = 0.672
This indicates that the "model" fits only 67.2% of the total variation in data.
The degrees of Freedom for the Res (DFres) is "N  k" the total points minus the number of parameters in the model (all in the Jezza examples used 55  3 = 52 ), the degrees of Freedom in total are in many models "N1" = 54, though in these examples there is no "offset" parameter (like the intercept in y = mx + [img]/forums/images/smilies/cool.gif[/img] which is independent of the X so the total is N. The DFreg = are DFtotal  DFres = 3. The DFreg is typically the number of parameters  1 ("k1") since most models include the "offset" (y = A0 + A1X1 + A2X2). The models in the Jezza examples could include an offset (if desired). The data could be suggesting that it might be needed since the data "tail" seems to start and end at 2 instead of zero
The Fvalue is the ratio of the Mean Squares (MSreg to the MSres). The MS are the SS/DF
F = MSreg/MSres
F = (SStotalSSres)/(SStotalDFres) / (SSres / DFres)
F = ((10,134.91 3,327.18) / (5552) ) / (3,327.18 / 52)
F = 35.5
The "F Prob dist" at 3 and 52 degrees of freedom for that value is:
=Fdist(F, DFreg, DFres)
=Fdist(35.5, 3, 52) <<0.0001
[Note: since the SSres generally will decrease as more parameters are added, an adjusted R² can also be calculated if you are comparing "correlation coefficients" for various models instead of the Fvalue (the F accounts for this since it explicitly uses the "degress of freedom") :
AdjR² = 1  (SSres/DFres) / (SStotal / DFtotal)
AdjR² = 1  (3,327.18/(553)) / (10,134.91 / 55)
AdjR² = 1  (3,327.18/52) / (10,134.91 / 55)
AdjR² = 0.653
Thus the SSres must decrease enough for the increase on the DF to get the best "bang for the buck" and not add parameters which do not contribute anything significant to the model (the presumption is the simpler the better.]
Steve