Escolar Documentos
Profissional Documentos
Cultura Documentos
Chapter 1
Inference: When we make an inference we are drawing a conclusion from incomplete
information. The science of statistics is to formulate rules about the validity of these
inferences. The realm of valid inferences is called the scope of inference.
Population Inference can be justified only from a random sample. Only when
we take a random sample from the population do we have mathematical models for
quantifying the behavior of our population estimates. Randomization gives us the highest
probability that what we sample is representative of the population proportions and
distribution.
Example: Lead content in teeth.
Researchers measured the lead content in teeth and the IQ score for 3,229
children attending first and second grade in the period 1975 and 1978.
Model: Y*=Y + δ
Where Y is a subject’s score without taking the questionnaire and Y* is the score after,
and δ measures the difference. Hence, δ is the treatment effect and our hypothesis test is
then:
At test statistic is a numerical quantity that we calculate from our data to test the
hypotheses of interest. For the creativity data, the test statistic is the difference of the
averages between the two treatment groups. In order to use the test statistic we have to
know its sampling distribution, or its randomization distribution.
P-value: is the probability in a randomized experiment that the observed test statistic, or
one more extreme, is due to the randomization alone. In order to calculate this value we
need to know the mathematical curve of the sampling distribution of the test statistic or to
conduct simulation or resampling.
Chapter 2
Normal Model: A probability distribution. Most of the analyses in this course will use
the normal model, or assume that the underlying distribution is regular and symmetric
enough that the sample estimates are approximately normal, and hence the test statistics
are either normal or t.
One Sample t Test: a test about the population mean where the test statistic has a t
distribution. A true one-sample t test is rarely performed. Often we are interested in the
average difference between paired observations in a sample. In this case, under certain
circumstances, the scaled difference of the two sample averages has a t distribution and
hence we use the same methods as in the one sample t test.
Standard error: of a statistic is an estimate of the standard deviation of its
sampling distribution.
∑( y − y )
i
s =
2 i =1
n −1
Z and t - ratios: the ratio of an estimate’s error to its standard error is a convenient
test statistic. When the standard deviation is known, we use the Z-ratio. When the
standard deviation is estimated, we use the t - ratio. Most often we will need to use an
estimate of the standard error and hence we will most often use the t-ratio.
y − µ0
Example: t = is the t-ratio used to test the hypothesis that the true mean
SE ( y )
is µ0 in a one sample t-test.
y1 − y2 − 0
Example: t = is the t-ratio to test the hypothesis that the true
SE ( y1 − y2 )
difference between paired observations is zero.
Two sample t tests: In addition to the normal or near-normal assumption, for the two
sample t test we also require the assumption of independent samples. The observations
within a sample must be both independent of each other and of all observations in the
other sample. The two-sample test statistic is given by
t=
( y1 − y2 ) − ( µ1 − µ2 )
SE ( y1 − y2 )
Where
s12 s22
SE ( y1 − y2 ) = + when equal variances cannot be assumed. And is given by
n1 n2
SE ( y1 − y2 ) =
s 2p
+
s 2p
= sp
1 1
+ where s p =
2 ( n1 − 1) s12 + ( n2 − 1) s22
when equal
n1 n2 n1 n2 n1 + n2 − 2
variances can be assumed.
Note: The course text uses the pooled estimate of the standard error because this is the
estimate used in ANOVA, an extension of the t-test. As mentioned in 444, the unpooled
estimate is considered superior for most tests. However, we cannot use the unpooled in
the ANOVA situation and so we review the pooled estimate here.