Escolar Documentos
Profissional Documentos
Cultura Documentos
Two-sample t procedures
Robustness of two-sample t procedures
Details of the t approximation
Avoid the pooled two-sample t procedures
Sociology 360
Statistics for Sociologists I
Chapter 19
Two-Sample Problems
One-sample test:
Is a population mean >, <, or different from some fixed value?
Two-sample test:
Goal: Compare responses to two treatments or characteristics of two
populations.
Independent samples for each treatment or population (i.e., the data
are not matched pairs).
Are the population means the same as each other, or is one greater
than the other?
Research questions:
Most frequently, the null hypothesis is that the two means are the
same.
Option 1:
H0: !1 = !2
Option 2:
H0: !1 - !2 = 0
3. Mothers of twins were surveyed and asked how often in the past
month strangers had asked whether the twins were identical.
Of course both options mean the same thing since Option 2 is obtained
algebraically from Option 1 by subtracting !2 from both sides.
Two-tailed:
Option 1: HA: !1 ! !2 or
Option 2: HA: !1 - !2 ! 0
One-tailed (right):
Option 1: HA: !1 > !2 or
Option 2: HA: !1 - !2 > 0
In each case,
Option 1 is
equivalent
to Option 2.
One-tailed (left)
Option 1: HA: !1 < !2 or
Option 2: HA: !1 - !2 < 0
10
G B = 0.4
The sampling distribution of (x1 x2) will be Normal under the right
circumstances.
And the mean of that sampling distribution will be (!1 - !2).
All that remains to be discovered about the sampling distribution is
its standard error (or estimated standard deviation).
11
12
Standard error
Degrees of freedom
Since we are using a standard error, estimated from the data, rather
than a known standard deviation, the procedures will be t rather than z
based.
In fact, its standard error is simply the square root of the sum of the
standard errors of each sample considered separately:
SE =
s21 s22
+
n1 n2
df
You should use this rule for problems done by hand; for example, on
the exam.
1"2
13
14
Two-sample t-test
The null hypothesis is that both population means !1 and !2 are equal,
Gender
(x1 x2) 0
!2
s1
s22
n1 + n2
Male
Female
16
Confidence interval
Male
Female
(x1 x2)
Note:
CI = (x1 x2) t
.892 .922
+
= .064
374 416
s21 s22
+
n1 n2
17
Male
Female
18
11.012 17.152
+
= 4.31
21
23
20
df =
1. When n1 + n2 < 15, the data from both samples must be close to
normal (roughly symmetric, single peak) and without outliers.
2. When 15 " n1 + n2 < 40, mild skewness is acceptable, but not
outliers.
1
n11
s22
s21
n1 + n2
! 2 "2
s1
n1
"2
1
n21
! 2 "2
s2
n2
3. When n1 + n2 " 40, the t statistic will be valid even with strong
skewness.
22
Pooled procedures are often the default choice in stat packages (e.g.,
Stata, including the current version, 10.0).
The reasons that the pooled approach is often used are: 1) it was
historically easier to calculate; 2) it leads to a smaller estimated
standard error when the assumptions are met; 3) it amounts to a special
case of a very important technique called the analysis of variance.
But Moore is right to emphasize: 1) the assumption of normality and
equal variances cant be tested effectively when the sample sizes are
small (i.e., when the pooled procedure would be most advantageous);
2) the pooled procedure can lead to incorrect inferences when the
assumptions arent met; 3) the reduction in SEs is small for large ns.
24