Escolar Documentos
Profissional Documentos
Cultura Documentos
The Foundations of statistics concerns the tical practice in advance of statistical theory). Fishers
epistemological debate in statistics over how one more explanatory and philosophical writing was written
should conduct inductive inference from data. Among much later.[9] There appear to be some dierences be-
the issues considered in statistical inference are the ques- tween his earlier practices and his later opinions.
tion of Bayesian inference versus frequentist inference,
Fisher was motivated to obtain scientic experimental re-
the distinction between Fisher's signicance testing sults without the explicit inuence of prior opinion. The
and NeymanPearson hypothesis testing, and whether signicance test is a probabilistic version of Modus tol-
the likelihood principle should be followed. Some of lens, a classic form of deductive inference. The signi-
these issues have been debated for up to 200 years cance test might be simplistically stated, If the evidence
without resolution.[1] is suciently discordant with the hypothesis, reject the
Bandyopadhyay & Forster[2] describe four statistical hypothesis. In application, a statistic is calculated from
paradigms: "(1) classical statistics or error statistics, (ii) the experimental data, a probability of exceeding that
Bayesian statistics, (iii) likelihood-based statistics, and statistic is determined and the probability is compared to
(iv) the Akaikean-Information Criterion-based statistics. a threshold. The threshold (the numeric version of su-
Savages text Foundations of Statistics has been cited over ciently discordant) is arbitrary (usually decided by con-
12000 times on Google Scholar.[3] It tells the following. vention). A common application of the method is de-
ciding whether a treatment has a reportable eect based
on a comparative experiment. Statistical signicance is a
It is unanimously agreed that statistics de- measure of probability not practical importance. It can
pends somehow on probability. But, as to what be regarded as a requirement placed on statistical sig-
probability is and how it is connected with nal/noise. The method is based on the assumed existence
statistics, there has seldom been such com- of an imaginary innite population corresponding to the
plete disagreement and breakdown of commu- null hypothesis.
nication since the Tower of Babel. Doubtless,
much of the disagreement is merely termino- The signicance test requires only one hypothesis. The
logical and would disappear under suciently result of the test is to reject the hypothesis (or not), a sim-
sharp analysis. ple dichotomy. The test distinguish between truth of the
hypothesis and insuciency of evidence to disprove the
hypothesis; so it is like a criminal trial in which the de-
fendants guilt is assessed in (so it is like a criminal trial
1 Fishers signicance testing in which the defendant is assumed innocent until proven
vs NeymanPearson hypothesis guilty).
testing
1.2 Hypothesis testing
In the development of classical statistics in the second
quarter of the 20th century two competing models of in-
Neyman & Pearson collaborated on a dierent, but re-
ductive statistical testing were developed.[4][5] Their rel-
lated, problem selecting among competing hypotheses
ative merits were hotly debated[6] (for over 25 years) un-
based on the experimental evidence alone. Of their joint
til Fishers death. While a hybrid of the two methods is
papers the most cited was from 1933.[10] The famous re-
widely taught and used, the philosophical questions raised
sult of that paper is the NeymanPearson lemma. The
in the debate have not been resolved.
lemma says that a ratio of probabilities is an excellent
criterion for selecting a hypothesis (with the threshold for
comparison being arbitrary). The paper proved an opti-
1.1 Signicance testing
mality of Students t-test (one of the signicance tests).
Fisher popularized signicance testing, primarily in two Neyman expressed the opinion that hypothesis testing
popular and highly inuential books.[7][8] Fishers writing was a generalization of and an improvement on signi-
style in these books was strong on examples and relatively cance testing. The rationale for their methods is found in
weak on explanations. The books lacked proofs or deriva- their joint papers.[11]
tions of signicance test statistics (which placed statis- Hypothesis testing requires multiple hypotheses. A hy-
1
2 1 FISHERS SIGNIFICANCE TESTING VS NEYMANPEARSON HYPOTHESIS TESTING
pothesis is always selected, a multiple choice. A lack of Fisher and Neyman were separated by attitudes and per-
evidence is not an immediate consideration. The method haps language. Fisher was a scientist and an intuitive
is based on the assumption of a repeated sampling of the mathematician. Inductive reasoning was natural. Ney-
same population (the classical frequentist assumption). man was a rigorous mathematician. He was convinced
by deductive reasoning rather by a probability calculation
based on an experiment.[4] Thus there was an underlying
1.3 Grounds of disagreement clash between applied and theoretical, between science
and mathematics.
The length of the dispute allowed the debate of a wide
range of issues regarded as foundational to statistics.
In this exchange Fisher also discussed the requirements 1.4 Related history
for inductive inference, with specic criticism of cost
functions penalizing faulty judgments. Neyman coun- Neyman, who had occupied the same building in England
tered that Gauss and Laplace used them. This exchange as Fisher, accepted a position on the west coast of the
of arguments occurred 15 years after textbooks began United States of America in 1938. His move eectively
teaching a hybrid theory of statistical testing. ended his collaboration with Pearson and their develop-
ment of hypothesis testing.[4] Further development was
Fisher and Neyman were in disagreement about the foun- continued by others.
dations of statistics (although united in opposition to the
Bayesian view): Textbooks provided a hybrid version of signicance and
hypothesis testing by 1940.[16] None of the principals
had any known personal involvement in the further de-
The interpretation of probability
velopment of the hybrid taught in introductory statistics
[5]
The disagreement over Fishers inductive rea- today.
soning vs Neymans inductive behavior con- Statistics later developed in dierent directions includ-
tained elements of the Bayesian/Frequentist ing decision theory (and possibly game theory), Bayesian
divide. Fisher was willing to alter his opinion statistics, exploratory data analysis, robust statistics and
(reaching a provisional conclusion) on the ba- nonparametric statistics. NeymanPearson hypothesis
sis of a calculated probability while Neyman testing contributed strongly to decision theory which is
was more willing to change his observable be- very heavily used (in statistical quality control for ex-
havior (making a decision) on the basis of a ample). Hypothesis testing readily generalized to ac-
computed cost. cept prior probabilities which gave it a Bayesian a-
vor. NeymanPearson hypothesis testing has become
The proper formulation of scientic questions with
[6][15] an abstract mathematical subject taught in post-graduate
special concern for modeling
statistics,[17] while most of what is taught to under-
Whether it is reasonable to reject a hypothesis based graduates and used under the banner of hypothesis testing
on a low probability without knowing the probability is from Fisher.
of an alternative
reects common statistical practice. The merged termi- 2.1 Major contributors
nology is also somewhat inconsistent. There is strong em-
pirical evidence that the graduates (and instructors) of an Main article: History of statistics
introductory statistics class have a weak understanding of
the meaning of hypothesis testing.[19] Two major contributors to frequentist (classical) meth-
ods were Fisher and Neyman.[4] Fishers interpreta-
tion of probability was idiosyncratic (but strongly non-
1.6 Summary
Bayesian). Neymans views were rigorously frequentist.
The interpretation of probability has not been re- Three major contributors to 20th century Bayesian sta-
solved (but ducial probability is an orphan). tistical philosophy, mathematics and methods were de
Finetti,[21] Jereys[22] and Savage.[23] Savage popular-
Neither test method has been rejected. Both are ized de Finettis ideas in the English-speaking world and
heavily used for dierent purposes. made Bayesian mathematics rigorous. In 1965, Dennis
Lindleys 2-volume work Introduction to Probability and
Texts have merged the two test methods under the Statistics from a Bayesian Viewpoint brought Bayesian
term hypothesis testing. methods to a wide audience. Statistics has advanced over
the past three generations; The authoritative views of
Mathematicians claim (with some exceptions)
the early contributors are not all current.
that signicance tests are a special case of hy-
pothesis tests.
Others treat the problems and methods as dis- 2.2 Contrasting approaches
tinct (or incompatible).
2.2.1 Frequentist inference
The dispute has adversely aected statistical educa-
tion. Main article: Frequentist inference
Bayesian speaks of the probability of a theory while a true tion that successes in Bayesian applications do not jus-
frequentist can speak only of the consistency of the evi- tify the supporting philosophy.[29] Bayesian methods of-
dence with the theory. Example: A frequentist does not ten create useful models that are not used for traditional
say that there is a 95% probability that the true value of a inference and which owe little to philosophy.[30] None of
parameter lies within a condence interval, saying instead the philosophical interpretations of probability (frequen-
that 95% of condence intervals contain the true value. tist or Bayesian) appears robust. The frequentist view is
too rigid and limiting while the Bayesian view can be si-
multaneously objective and subjective, etc.
2.3 Mathematical results
Neither school is immune from mathematical criticism 2.6 Illustrative quotations
and neither accepts it without a struggle. Steins paradox
(for example) illustrated that nding a at or uninfor- carefully used, the frequentist approach yields
mative prior probability distribution in high dimensions broadly applicable if sometimes clumsy answers[31]
[1]
is subtle. Bayesians regard that as peripheral to the core
of their philosophy while nding frequentism to be rid- To insist on unbiased [frequentist] techniques may
dled with inconsistencies, paradoxes and bad mathemati- lead to negative (but unbiased) estimates of a vari-
cal behavior. Frequentists can explain most. Some of the ance; the use of p-values in multiple tests may
bad examples are extreme situations - such as estimat- lead to blatant contradictions; conventional 0.95-
ing the weight of a herd of elephants from measuring the condence regions may actually consist of the whole
weight of one (Basus elephants), which allows no statis- real line. No wonder that mathematicians nd it of-
tical estimate of the variability of weights. The likelihood ten dicult to believe that conventional statistical
principle has been a battleground. methods are a branch of mathematics.[32]
Bayesians are united in opposition to the limitations of The two philosophies, Bayesian and frequentist, are
frequentism, but are philosophically divided into numer- more orthogonal than antithetical.[24]
ous camps (empirical, hierarchical, objective, personal,
subjective), each with a dierent emphasis. One (fre- An hypothesis that may be true is rejected be-
quentist) philosopher of statistics has noted a retreat from cause it has failed to predict observable results
the statistical eld to philosophical probability interpreta- that have not occurred. This seems a remarkable
tions over the last two generations.[28] There is a percep- procedure.[22]
5
For a short introduction to the foundations of statis- [17] Lehmann & Romano 2005.
tics, see ch. 8 (Probability and statistical inference) [18] Hubbard & Bayarri c. 2003.
of Kendalls Advanced Theory of Statistics (6th edition,
1994). [19] Sotos et al. 2007.
[26] Yu 2009.
6 See also
[27] Berger 2003.
[39] Savage 1960, p. 585. Breiman, Leo (2001). Statistical Modeling: The
Two Cultures. Statistical Science. 16 (3): 199231.
[40] Forster & Sober 2001. doi:10.1214/ss/1009213726.
[41] Royall 1997. Chin, Wynne W. (n.d.). Structural Equation Mod-
eling in IS Research - Understanding the LISREL
[42] Lindley 2000.
and PLS perspective. University of Houston lec-
[43] Some large models attempt to predict the behavior of ture notes?
voters in the United States of America. The popula-
tion is around 300 million. Each voter may be inu- Cox, D. R. (2005). Frequentist and Bayesian
enced by many factors. For some of the complica- Statistics: a Critique. Statistical Problems in Par-
tions of voter behavior (most easily understood by the ticle Physics, Astrophysics and Cosmology. PHYS-
natives) see: http://www.stat.columbia.edu/~{}gelman/ TAT05.
presentations/redbluetalkubc.pdf
de Finetti, Bruno (1964). Foresight: its Logical
[44] Efron mentions millions of data points and thousands of laws, its Subjective Sources. In Kyburg, H. E. Stud-
parameters from scientic studies.
ies in Subjective Probability. H. E. Smokler. New
[45] Tabachnick & Fidell 1996. York: Wiley. pp. 93158. Translation of the 1937
French original with later notes added.
[46] Forster & Sober 1994.
Edwards, A.W.F. (1999). Likelihood. Prelimi-
[47] Freedman 1995. nary version of an article for the International Ency-
[48] Breiman 2001. clopedia of the Social and Behavioral Sciences.
Abelson, Robert P. (1995). Statistics as Principled Efron, Bradley (1978). Controversies in the
Argument. Lawrence Erlbaum Associates. ISBN 0- foundations of statistics (PDF). The Ameri-
8058-0528-1. ... the purpose of statistics is to orga- can Mathematical Monthly. 85 (4): 231246.
nize a useful argument from quantitative evidence, doi:10.2307/2321163.
using a form of principled rhetoric.
Fienberg, Stephen E. (2006). When did Bayesian
Aldrich, John (2002). How likelihood and iden- inference become Bayesian"?". Bayesian Analysis.
tication went Bayesian. International Statisti- 1 (1): 140. doi:10.1214/06-ba101.
cal Review. 70 (1): 7998. doi:10.1111/j.1751-
5823.2002.tb00350.x. Fisher, R. A. (1925). Statistical Methods for Re-
search Workers. Edinburgh: Oliver and Boyd.
Backe, Andrew (1999). The likelihood principle
and the reliability of experiments. Philosophy of Fisher, Sir Ronald A. (1935). Design of Experi-
Science. 66: S354S361. doi:10.1086/392737. ments. Edinburgh: Oliver and Boyd.
8 8 REFERENCES
Fisher, R (1955). Statistical Methods and Scien- Lehmann, E. L. (December 1993). The Fisher,
tic Induction (PDF). Journal of the Royal Statisti- NeymanPearson Theories of Testing Hypotheses:
cal Society, Series B. 17 (1): 6978. One Theory or Two?". Journal of the Ameri-
can Statistical Association. 88 (424): 12421249.
Fisher, Sir Ronald A. (1956). The logic of scientic doi:10.1080/01621459.1993.10476404.
inference. Edinburgh: Oliver and Boyd.
Lehmann, E. L. (2011). Fisher, Neyman, and the
Forster, Malcolm; Sober, Elliott (1994). How to creation of classical statistics. New York: Springer.
Tell when Simpler, More Unied, or Less Ad Hoc ISBN 978-1441994998.
Theories will Provide More Accurate Predictions.
British Journal for the Philosophy of Science (45): Lehmann, E.L.; Romano, Joseph P. (2005). Testing
136. Statistical Hypotheses (3E ed.). New York: Springer.
ISBN 0-387-98864-5.
Forster, Malcolm; Sober, Elliott (2001). Why like-
lihood. Likelihood and evidence: 8999. Lenhard, Johannes (2006). Models and Statisti-
cal Inference: The Controversy between Fisher and
Freedman, David (March 1995). Some issues in
NeymanPearson. Brit. J. Phil. Sci. 57: 6991.
the foundation of statistics. Foundations of Science.
doi:10.1093/bjps/axi152.
1 (1): 1939.
Gelman, Andrew (2008). Rejoinder. Bayesian Lindley, D.V. (2000). The philosophy of statistics.
Analysis. 3 (3): 467478. doi:10.1214/08- Journal of the Royal Statistical Society, Series D. 49.
BA318REJ. A joke escalated into a serious discus- pp. 293337. doi:10.1111/1467-9884.00238.
sion of Bayesian problems by 5 authors (Gelman,
Little, Roderick J. (2006). Calibrated Bayes: A
Bernardo, Kadane, Senn, Wasserman) on pages
Bayes/Frequentist Roadmap. 60 (3).
445-478.
Gelman, Andrew; Shalizi, Cosma Rohilla (2012). Lou, Francisco (2008). Should The Widest Cleft
Philosophy and the practice of Bayesian statis- in Statistics-How and Why Fisher opposed Neyman
tics. British Journal of Mathematical and Statis- and Pearson (PDF). Working paper contains nu-
tical Psychology. 66: 838. doi:10.1111/j.2044- merous quotations from the original sources of the
8317.2011.02037.x. dispute.
Gigerenzer, Gerd; Swijtink, Zeno; Porter, Mayo, Deborah G. (February 2013). Discussion:
Theodore; Daston, Lorraine; Beatty, John; Kruger, Bayesian Methods: Applied? Yes. Philosophical
Lorenz (1989). Part 3: The Inference Experts. Defense? In Flux. The American Statistician. 67
The Empire of Chance: How Probability Changed (1): 1115. doi:10.1080/00031305.2012.752410.
Science and Everyday Life. Cambridge University
Press. pp. 70122. ISBN 978-0-521-39838-1. Neyman, J; Pearson, E. S. (January 1, 1933). On
the Problem of the most Ecient Tests of Statistical
Halpin, P F; Stam, HJ (Winter 2006). Inductive Hypotheses. Phil. Trans. R. Soc. Lond. A. 231
Inference or Inductive Behavior: Fisher and Ney- (694706): 289337. doi:10.1098/rsta.1933.0009.
man: Pearson Approaches to Statistical Testing in
Psychological Research (19401960)". The Amer- Neyman, J (1967). Joint statistical papers of
ican Journal of Psychology. 119 (4): 625653. J.Neyman and E.S.Pearson. Cambridge University
doi:10.2307/20445367. JSTOR 20445367. PMID Press.
17286092.
Neyman, Jerzy (1956). Note on an Article by Sir
Hubbard, Raymond; Bayarri, M. J. (c. 2003). Ronald Fisher. Journal of the Royal Statistical So-
P Values are not Error Probabilities (PDF). A ciety, Series B. 18 (2): 288294.
working paper that explains the dierence between
Fishers evidential p-value and the NeymanPearson Royall, Richard (1997). Statistical evidence : a like-
Type I error rate . lihood paradigm. London New York: Chapman &
Hall. ISBN 978-0412044113.
Jereys, H. (1939). The theory of probability. Ox-
ford University Press. Savage, L.J. (1972). Foundations of Statistics (sec-
ond ed.).
Kass (c. 2012). Why is it that Bayes rule has not
only captured the attention of so many people but in- Senn, Stephen (2011). You May Believe You Are
spired a religious devotion and contentiousness, re- a Bayesian But You Are Probably Wrong. RMM.
peatedly across many years?" (PDF). 2: 4866.
9
9 Further reading
Barnett, Vic (1999). Comparative Statistical Infer-
ence (3rd ed.). Wiley. ISBN 978-0-471-97643-1.
10 External links
Citations of Savage (1972) at Google Scholar. [Over
10000 citations.]
11.2 Images
File:Edit-clear.svg Source: https://upload.wikimedia.org/wikipedia/en/f/f2/Edit-clear.svg License: Public domain Contributors: The
Tango! Desktop Project. Original artist:
The people from the Tango! project. And according to the meta-data in the le, specically: Andreas Nilsson, and Jakub Steiner (although
minimally).
File:Fisher_iris_versicolor_sepalwidth.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_
sepalwidth.svg License: CC BY-SA 3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (origi-
nal); Pbroks13 (talk) (redraw)