Results 1 to 13 of 13

20030703, 15:25 #1
 Join Date
 Jan 2001
 Location
 Redcliff, Alberta, Canada
 Posts
 4,066
 Thanks
 2
 Thanked 5 Times in 5 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 / XP)
Thanks very much  it is (amazingly) a topic which has been on my mind lately. Just don't expect me to read it quickly <img src=/S/grin.gif border=0 alt=grin width=15 height=15>
[b]Catharine Richardson (WebGenii)
WebGenii Home Page
Moderator: Spreadsheets, Other MS Apps, Presentation Apps, Visual Basic for Apps, Windows Mobile

20030703, 23:46 #2
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 / XP)
Thanks for sharing this. It's useful to have an overview.

20030704, 04:45 #3
 Join Date
 Feb 2001
 Location
 Weert, Limburg, Netherlands
 Posts
 4,812
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 / XP)
Hi Hans,
Excellent article!
Would it be OK if I posted this article in the Microsoft Excel MVP newsgroup?
A message I sometimes refer to with regards to Excel's poor Stat qualities:
==========
From: Jerry W. Lewis (JWLewis53@mediaone.net)
Subject: Re: LINEST with r2 = 1.18 ???
Newsgroups: microsoft.public.excel.worksheet.functions
View: Complete Thread (35 articles)  Original Format
Date: 20010926 06:40:28 PST
LINEST() (also SLOPE(), INTERCEPT(), VAR(), STDEV(), LOGEST(), TREND(),
FORECAST(), etc.) uses a numerically unstable algorithm. With
challenging data (such as yours), rounding error has accumulated to the
point that none of its calculations (slope, intercept, etc.) can be
believed. In your case, you were lucky enough to get an impossible R^2,
so that it was obvious that there was a problem. There may still be a
problem even with data that give more reasonable R^2 values. These
problems with Excel's algorithms have been well documented for years
(cf. Sawitzki, 1994, "Report on the reliability of data analysis
systems" Comput. Statist. Data Anal. 18:289301) yet Microsoft continues
to ignore them.
Harlan Grove's matrix formulation simply recreates the same problem.
DEVSQ(), COVAR(), and CORREL() are the only 2nd moment functions in
Excel that are numerically reliable. For simple linear regression, use
the following formulas instead of LINEST(), SLOPE(), INTERCEPT(), RSQ(),
STEYX(), etc.
slope = COVAR(y,x)/DEVSQ(x)*COUNT(y)
intercept = AVERAGE(y)  slope*AVERAGE(x)
rsq = CORREL(y,x)^2
SSreg = rsq*DEVSQ(y)
SSresid = (1rsq)*DEVSQ(y)
df = COUNT(y)2
F = SSreg/SSresid*df
steyx = SQRT(SSresid/df)
se1 = steyx/SQRT(DEVSQ(x))
seb = steyx/SQRT(1/COUNT(y)+AVERAGE(x)^2/DEVSQ(x))
This approach has the added advantage over LINEST that it allows missing
values in the data range. However that cuts both ways, because they
will give a wrong answer if there are data pairs where only x or y (but
not both) are missing.
Similarly, for univariate statistics use the following formulas instead
of VAR(), VARP(), STDEV(), and STDEVP()
var = DEVSQ(x)/(COUNT(x)1)
varp = DEVSQ(x)/COUNT(x)
stdev = SQRT(var)
stdevp = SQRT(varp)
Since Microsoft has already programmed routines that would be superior
to their unstable routines, it is puzzling why they continue to maintin
redundant inferior code. The unstable formulas that Excel programed are
mathematically exact (with infinite precision), so my formulas will
agree with the Excel functions for nonchallenging data sets. When they
disagree, the dedicated Excel functions are wrong.
There is no DEVSQA function, there is no hel for VARA(), VARPA(),
STDEVA(), or STDEVPA() other than doing those calculations manually.
If you are wedded to using LINEST(), then test to see if
STDEV(x) = SQRT(DEVSQ(x)/COUNT(x))
STDEV(y) = SQRT(DEVSQ(y)/COUNT(y))
PEARSON(y,x) = CORREL(y,x)
If all three of these are approximately true (say to at least 12 figures
each), then LINEST() can probably be believed for simple linear
regression. Figuring out when LINEST() can be believed for more complex
models is not so simple.
Jerry
Richard Nolan wrote:
> Having used LINEST for Linear regression, I think
> successfully a few times, I now have a data set that
> returns an r2 value of 1.18, which is not possible. I can
> look at the data and tell r2 must be +, not negative.
>
> Are there two logic problems with LINEST. (a) r2 can never
> greater than +/ 1, and ([img]/forums/images/smilies/cool.gif[/img] I can see the relationship is
> +, not .Jan Karel Pieterse
Microsoft Excel MVP, WMVP
www.jkpads.com
Professional Office Developers Association

20030704, 05:35 #4
 Join Date
 Jan 2001
 Location
 Kortrijk, Belgium
 Posts
 571
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 / XP)
Thanks, Jan Karel for your additional information. Yes, you can post this article in the Microsoft Excel MVP newsgroup. No problem!

20030704, 09:14 #5
 Join Date
 Feb 2001
 Location
 Weert, Limburg, Netherlands
 Posts
 4,812
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 / XP)
OK, on it's way then.
Jan Karel Pieterse
Microsoft Excel MVP, WMVP
www.jkpads.com
Professional Office Developers Association

20070618, 19:52 #6
 Join Date
 Jan 2001
 Location
 Redcliff, Alberta, Canada
 Posts
 4,066
 Thanks
 2
 Thanked 5 Times in 5 Posts
Re: Statistical flaws in Excel (2007)
I'm wondering if Excel 2007 has addressed any of these issues.
Comments?[b]Catharine Richardson (WebGenii)
WebGenii Home Page
Moderator: Spreadsheets, Other MS Apps, Presentation Apps, Visual Basic for Apps, Windows Mobile

20070618, 20:40 #7
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
Re: Statistical flaws in Excel (2007)
Nope, but some of the issues were addressed in Excel 2003  see Description of improvements in the statistical functions in Excel 2003 and in Excel 2004 for Mac and also Statistics:Numerical Methods/Numerics in Excel As far as I know, no significant changes were made in this area in Excel 2007.
For an exhaustive review of the weaknesses of statistics in all versions of Excel, including Excel 2007, see Errors, Faults and Fixes for Excel Statistical Functions and Routines (as of May/21/2007).

20070711, 15:40 #8
 Join Date
 Jan 2001
 Location
 Redcliff, Alberta, Canada
 Posts
 4,066
 Thanks
 2
 Thanked 5 Times in 5 Posts
Re: Statistical flaws in Excel (2007)
Just ran across this
http://www.robweir.com/blog/2007/07/...rfailure.html
Am I right in thinking that these problems only come to light if the spreadsheet is saved in XML format?[b]Catharine Richardson (WebGenii)
WebGenii Home Page
Moderator: Spreadsheets, Other MS Apps, Presentation Apps, Visual Basic for Apps, Windows Mobile

20070711, 18:44 #9
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
Re: Statistical flaws in Excel (2007)
That article refers to the Office Open XML format, which is used as default in Office 2007 for workbooks without macros; the file extension is .xlsx. I don't know whether the implementation in Excel 2007 actually includes the errors mentioned in the article, or whether they are merely errors in the description of the format.

20080312, 19:59 #10
 Join Date
 Mar 2002
 Location
 Buenos Aires, Argentina
 Posts
 877
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 / XP)
Hans P.,
I'm having trouble trying to download your attachment. Maybe it disappeared when the Lounge went down? Would you or someone else please reattach it?
Thank you!<img src=/w3timages/blue3line.gif width=33% height=2>
<img src=/S/flags/Argentina.gif border=0 alt=Argentina width=30 height=18> <big><font color=4682b4><font face="Comic Sans MS">Diegol</font face=comic></font color=4682b4> </big>

20080312, 20:53 #11
 Join Date
 Jan 2001
 Location
 Kortrijk, Belgium
 Posts
 571
 Thanks
 0
 Thanked 0 Times in 0 Posts
Statistical flaws in Excel (Excel 97 / 2000 / XP)
Edited by HansV to reattach the zipped document, it was lost in the server crash of August 2007
Many people, including myself, use Excel's statistical functions and the Analysis Toolpak. However, Excel has some flaws that you should know when you are using these tools. I summarized many of these flaws in one document, which I wanted to present to the forum. I would appreciate your comments and suggestions.
It's a zipped pdf document to reduce it to the acceptable file size for uploading.

20080312, 20:54 #12
 Join Date
 Mar 2002
 Posts
 84,353
 Thanks
 0
 Thanked 30 Times in 30 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 / XP)
I had a copy of Hans Pottel's PDF file; I have reattached it to the original post.

20080312, 21:33 #13
 Join Date
 Mar 2002
 Location
 Buenos Aires, Argentina
 Posts
 877
 Thanks
 0
 Thanked 0 Times in 0 Posts
Re: Statistical flaws in Excel (Excel 97 / 2000 /
It worked. Thanks!
<img src=/w3timages/blue3line.gif width=33% height=2>
<img src=/S/flags/Argentina.gif border=0 alt=Argentina width=30 height=18> <big><font color=4682b4><font face="Comic Sans MS">Diegol</font face=comic></font color=4682b4> </big>