The use of Excel for statistics is wide spread but is it correct? An extract from the Practical Stats website newsletter is presented below. It dates back to 2009 and may not all still be relevant to 2021 versions of excel but worth noting and checking . The text has been reprinted here but you can also use the link to the Practical Stats Website.
Also included below is a more recent link to an IBM newsletter (also trying to push the use of SPSS) but worth a look. https://www.ibm.com/downloads/cas/7YEX9BKK and a more detailed analysis from 2000 by Hans Pottel. Statistical Flaws in Excel
B. McCullough has co‐authored a series of papers over the years evaluating the accuracy (and lack of it) for MS Excel’s statistical routines. In 2008 he collected several papers into a special issue of the journal Computational Statistics and Data Analysis (vol. 52, issue 10) evaluating the routines in Excel 2007. It appears that not much has changed in the Excel world. Problems exist in several statistical functions and routines that are likely used by any scientist who uses Excel. Tests may return inaccurate p‐values. Perhaps the most egregious error is in Excel’s normal probability plot produced for regression residuals. This plot incorrectly compares data to a uniform distribution, not a normal distribution, so the evaluations of whether data follow a normal distribution are completely bogus. His conclusion: “…it is perhaps worth comparing quality assurance in Microsoft’s Excel to quality assurance in its game division … It is difficult not to think that if Microsoft tested business software the way it tested game software, then the statistical functions in Excel would be as accurate as those found in any other major software package. If that were the case, then none of the articles in this special section would have been written.”
Some highlights of the papers include McCullough and Heiser (2008), pages 4570‐4578:
“… it is not safe to assume that Microsoft Excel’s statistical procedures give the correct answer. Persons who wish to conduct statistical analyses should use some other package.” Specifically,
1. “a major flaw in Excel Solver” which is used to solve nonlinear equations. Similar flaws in other nonlinear functions such as LOGEST.
2. “the Excel random number generator does not fulfill the basic requirements … for scientific purposes.”
3. The normal probability plot of residuals in regression analysis is totally wrong. “Excel computes the plot for the wrong variable, the dependent variable instead of the residuals, and has managed to confuse the uniform distribution for the normal distribution.”
4. Inaccurate t‐test results in the presence of missing values. (I was unable to verify this, but that is what they state)
5. Inaccurate p‐values from a t‐test.
6. The trendline function computes incorrect regression equations and can produce a negative r‐ squared. These problems are “by no means exhaustive”.
Yalta (2008), pages 4579‐4586:
The accuracy of the binomial, Poisson, inverse standard normal, inverse beta, inverse student’s t and inverse F distributions is very poor. The inverse distributions are used to produce p‐values for hypothesis tests. In contrast, the equivalent results in the free OpenOffice’s Calc spreadsheet and the open‐source Gnumeric spreadsheet are superior. A table of numerical errors reported for Excel 97 is carried through for later versions. Almost all are listed as “Not fixed” or “Poor fix” in Excel 2000, 2003 and now in 2007. The summary: “researchers should continue to avoid using the statistical functions in Excel 2007 for any scientific purpose.”
Su (2008), pages 4594‐4601: Graphics in Excel have defaults that lead to chartjunk – “redundant symbols, fill‐ins and other extraneous graphical elements” that don’t tell readers anything about the data themselves. It is possible to make good graphs with Excel, graphs that adhere to current principles of good graphics, but these require the user to invoke options. So that requires users to be familiar with principles of good graphics and override the defaults.
There you will find three possible solutions to the problems:
1) Purchase Excel statistical add‐ins
2) Use free, open‐source alternatives that do not have the issues found in Excel. There are alternative spreadsheets out there, for free.
3) Purchase commercial software, or use the free R statistical software system.