Você está na página 1de 32

Ismor Fischer, 4/22/2017 Solutions / 6.

4-1

6.4
1.
(a) Let X = the number of children weighing at most 2500 grams; then X ~ Bin(102, 0.15).

13 102
Therefore, P(X 13) = (.15) x (.85)102 x
= 0.3178. COMPARE
x 0 x Note that these answers differ
somewhat because one of the
Note: Run pbinom(13, 102, 0.15). conditions under which the
normal approximation to the
binomial distribution applies, is
(b) Recall that, via § 4.2, barely met. Which one?

X ~ Bin ( n , ) N ( , ) , with n and n (1 ) .

Thus, continuity correction


X n + 0.5 13 (102 0.15) + 0.5
P(X 13) P Z = P Z = P(Z 0.5) = 0.3088.
n (1 ) 102 0.15 0.85

X (1 ) 13
Via § 6.1.3, ˆ N , . The sample-based estimate of is ˆ = = 0.128.
n n 102

13 89
102 102
(c) Therefore, 95% margin of error for confidence interval = (z.025) = (1.96)(0.033) =
102
0.065. Hence the 95% confidence interval = (0.128 0.065, 0.128 + 0.065) = (0.063, 0.192).

continuity correction

13 0.5
0.15 +
The associated p-value = 2 P ˆ 13
=2 P Z
102 102
= 2 P(Z 0.5) = 0.6177.
102 0.15 0.85
102
Note that (except for the factor of 2), this is the identical probability computation as in (b)!
Clearly, the 95% confidence interval does indeed contain the hypothesized proportion 0.15, and
the p-value >> .05, so the null hypothesis cannot be rejected at the = .05 level. Thus, our
sample data seem to be consistent with the null hypothesis that = 0.15.
Ismor Fischer, 4/22/2017 Solutions / 6.4-2

(d)  R code

binom.test(13, 102, 0.15)

Exact binomial test

data: 13 and 102


number of successes = 13, number of trials = 102, p-value = 0.6767
alternative hypothesis: true probability of success is not
equal to 0.15
95 percent confidence interval:
0.06964188 0.20808057
sample estimates:
probability of success
0.1274510

Note that the answers to (a) and (b) differ a bit, since R utilizes the exact binomial distribution
in its calculations, whereas we used the normal approximation to the binomial.
Ismor Fischer, 4/22/2017 Solutions / 6.4-3

2. “Smart pill”

109.9 100 9.9


(a) p-value = 2 P( X 109.9) = 2 P Z = 2P Z = 2 P(Z 2.2) = .0278
27 / 36 4.5

(b) C.I. = (109.9 4.5 z /2, 109.9 + 4.5 z /2), where z /2 = 1.645, 1.960, 2.575, respectively.
Is p .0278 ?
Reject
Is 100 outside of CI?
Significance Confidence
Confidence Interval Decision about H0
Level Level 1
.10 .90 (102.5, 117.3) Strong Rejection

.05 .95 (101.1, 118.7) Moderate Rejection

.01 .99 (98.3, 121.5) Cannot Reject

(c) Generally, a lower significance level at the outset of an experiment will make it subsequently
more difficult to reject a given null hypothesis, resulting in a more conservative testing
procedure. The same p-value (e.g., .0278) that results in rejection at some significance level
(e.g., = .10 or .05) might not result in rejection at a lower level (e.g., = .01). Likewise, a
lower significance level gives rise to a larger confidence level, and hence a larger confidence
interval, which then may indeed contain the null value (e.g., = 100), resulting in an inability
to reject. We will see this idea again in “Bonferroni correction.”

3. Serum cholesterol levels

(a) H0: = 211 mg/dL, versus HA: 211 mg/dL

(b) 95% margin of error = (z.025)(46 / 12 ) = (1.96)(13.28 mg/dL) = 26.03 mg/dL.


Hence the 95% confidence interval = (217 26.03, 217 + 26.03) = (190.97, 243.03).

217 211
(c) p-value = 2 P( X 217) = 2 P Z = 2 P(Z 0.45) = 0.653.
13.28

(d) The 95% confidence interval does indeed contain the hypothesized mean of 211 mg/dL, and
the p-value > .05 as well, so the null hypothesis cannot be rejected at the = .05 level.
Interpretation: Based on the data from this sample, there has been no statistically significant
difference demonstrated between the true mean serum cholesterol level of the population of
hypertensive male smokers, and the mean serum cholesterol level of the general male
population (211 mg/dL).

(e) The 95% acceptance region for H0 is (211 26.03, 211 + 26.03) = (184.97, 237.03).
Therefore, the complementary 95% rejection region is ( , 184.97] [237.03, + ). This is
entirely consistent with our conclusion in (d), as the sample mean of 217 is in the former region.
Ismor Fischer, 4/22/2017 Solutions / 6.4-4

4. Plasma aluminum levels

(a) H0: = 4.13 g/L, versus HA: 4.13 g/L.

(b) 95% margin of error = (t9, .025) (7.13 / 10 ) = (2.262) (2.255 g/L) = 5.1 g/L. Hence the
95% confidence interval = (37.2 5.1, 37.2 + 5.1) = (32.1, 42.3) g/L.

37.20 4.13
(c) p-value = 2 P( X 37.20) = 2 P T9 = 2 P(T9 14.67) << 0.05.
2.255

 Run 2 * pt(14.67, df = 9, lower.tail = F) for the exact p-value.

(d) The 95% confidence interval certainly does not contain the hypothesized mean of 4.13 g/L, and
the p-value << .05 as well, so the null hypothesis is strongly rejected at the = .05 level.
Interpretation: Based on the data from this sample (37.2 g/L), there is an extremely
statistically significant difference demonstrated between the true mean plasma aluminum level
of the population of infants receiving antacids, and the mean plasma aluminum level of the
population of infants not receiving antacids (4.13 g/L).

(e) H0: 4.13 g/L, versus HA: > 4.13 g/L. The p-value for this one-sided test is P( X 37.20)
= P(T9 14.67), or exactly half the p-value for the two-sided test above, consequently, << 0.025
< 0.05. As before, this leads to an extremely strong rejection of the null hypothesis in favor of
the alternative, at the = .05 significance level. Interpretation: Based on the data from this
sample, the true mean plasma aluminum level of the population of infants receiving antacids is
significantly higher than the mean plasma aluminum level of the population of infants not
receiving antacids (4.13 g/L).

5. Money wheel – revisited


125
(a) With n 80 , x 28.75, 25 , and s 2 125 (i.e., s 125 ), we have X N 25,
80
= N ( 25 , 1.25 ) . Therefore...
28.75 25
p-value = 2 P ( X 28.75 ) = 2 P Z = 2 P ( Z 3) = 2 (.00135 ) .0027 < .05,
1.25
so the null hypothesis H 0 : 25 can be rejected at the .05 significance level. However, this
null hypothesis is true, because as the calculations in 4.4/2 show, the mean is indeed equal
to 25! Therefore, our decision to reject is incorrect, and we have committed a Type I error,
the probability of which is the significance level = .05, by definition. In other words, the
mean x 28.75 of this particular random sample is one of the unlucky ones that led to an
erroneous conclusion, an event that should only occur by chance 5% of the time, if H 0 is true.
Ismor Fischer, 4/22/2017 Solutions / 6.4-5

(b) Now testing false null hypothesis 0 28 , vs. 1 25, using the same sample data as above.
| 25 28 |
Power = 1 = P Z z.025 n = P Z 1.96 80 = P Z 0.44 =
125
0.67003, i.e., only 67% power of correctly rejecting H0: = 28 in favor of the alternative
28.75 28
HA: = 25. Indeed, p-value = 2 P ( X 28.75 ) = 2 P Z = 2 P ( Z 0.6 ) =
1.25
2 ( 0.27425 ) 0.54851 > .05 = , which indicates that this false null hypothesis cannot be
rejected at this significance level. This is a Type II error.

6. “Does chicken soup make a significant difference in the duration of cold symptoms?”

(a) 95% margin of error = (t15, .025)(3.0 / 16 ) = (2.131)(0.75 days) = 1.598 days. Hence the
95% confidence interval = (5.5 1.6, 5.5 + 1.6) = (3.9, 7.1) days. The associated p-value
5.5 7.0
= 2 P( X 5.5) = 2 P T15 = 2 P(T15 –2) = 2 P(T15 +2). Now since
0.75
1.753 < 2 < 2.131 from the t-table on df = 15, it must follow that the p-value is between
2(.025) and 2(.05), i.e., .05 < p < .10. (Using R, the exact p-value = .0639.) Clearly, the 95%
confidence interval does indeed contain the null value of 7 days, and the p-value > .05 as well,
so the null hypothesis cannot be rejected at the = .05 level.

5 7
(b) Sample size: With scaled difference = 2/3, and 1 – = .99, we have z.01 2.33, so
3.0
2
1.96 2.33
n = = (6.435)2 = 41.4, so take n 42.
2/3

Note that in the absence of the population standard deviation , we have substituted the
sample standard deviation s in the calculation. As the resulting sample size is large enough
for the Central Limit Theorem to be valid, it is not unreasonable that the difference between
using the latter rather than the former in hypothesis testing will be negligible.

(c) 95% margin of error = (t48, .025)(2.8 / 49 ) = (2.0106)(0.4 days) = 0.8043 days. Hence
the 95% confidence interval = (5.5 0.804, 5.5 + 0.804) = (4.696, 6.304) days. The
5.5 7.0
associated p-value = 2 P( X 5.5) = 2 P T48 = 2 P(T48 3.75). Using the
0.4
R command 2 * pt(-3.75, 48), the exact p-value = .0004758. Clearly, the 95%
confidence interval does not contain the null value of 7 days, and p-value << .05 as well, so
the null hypothesis can be (strongly) rejected at the = .05 level. (Note: Using the Z-
distribution yields slightly different answers, but the same conclusions.)


Note: To find t48, .025, use qt(.975, 48) = 2.010635 in R.
Ismor Fischer, 4/22/2017 Solutions / 6.4-6

7. Toxicity Testing

Since we are told that X is normally distributed, and n = 121 and s = 5.5 ppb for all four
samples, the standard error of the sampling distribution of X is estimated by
s 5.5 ppb
0.5 ppb . Also, σ is unknown but n is large, so we may use either the z-test or
n 121
t-test with degrees of freedom df = 121 – 1 = 120. The solution presented here uses the latter
(which is more precise), but both tests will produce very similar results (… except if n is small).

(a) Two-sided: H 0 : 10 ppb, versus H A : 10 ppb, at α = .05

Source 1: x 11.43 ppb

(i) Simply by inspection, the sample mean of x 11.43 ppb is 1.43 ppb above 10
ppb, which would seem to suggest that the true mean arsenic concentration is
significantly higher than the null value 10 ppb, thereby indicating that the
t120
water is unsafe for drinking. But this assertion still needs to be formally shown.

11.43 10
(ii) p-value = 2P( X 11.43) 2 P T120 2 P(T120 2.86) 2(.0025) .005
0.5
.0025 .0025
This is much smaller than the significance level α = .05, hence we have a very
0 strong rejection of the null hypothesis.
2.86
(iii)This evidence is very strongly statistically significant, on the high side of 10 ppb,
very clearly indicating that the water is indeed unsafe… exactly as we suspected from
the informal inspection in (i).

Source 2: x 8.57 ppb

(i) Simply by inspection, the sample mean of x 8.57 ppb is 1.43 ppb below 10
ppb, which would seem to suggest that the true mean arsenic concentration is
significantly lower than the null value 10 ppb, thereby indicating that the
t120 water is safe for drinking. Moreover…

(ii) Because this sample mean is on the opposite side of Source 1 by the same amount, it
follows that the two-sided p-value will remain unchanged. [Explicitly, we have
.0025 .0025
8.57 10
p-value = 2P( X 8.57) 2 P T120 2 P( T120 2.86) 2(.00 25) .005
0.5
0
–2.86
via symmetry.] Again, this is much smaller than the significance level α = .05, hence
we have a very strong rejection of the null hypothesis.

(iii)As with Source 1, this evidence is very strongly statistically significant, but on the
low side of 10 ppb, very clearly indicating that the water is indeed safe… exactly as
we suspected from the informal inspection in (i).
Ismor Fischer, 4/22/2017 Solutions / 6.4-7

Source 3: x 9.1 ppb

(i) As with Source 2, this sample mean of x 9.1 ppb is still below the null value
10 ppb, but not by quite as much (0.9 ppb) as Source 2 was (1.43 ppb). Hence
we may be able to conclude that the water is safe for drinking, but only if statistical
significance can be formally shown. This might indeed be a borderline case.

9.1 10
(ii) p-value = 2 P( X 9.1) 2P T120 2P(T120 1.8), which is equal to
t120 0.5
2 P (T120 1.8) via symmetry. According to the t-table on 120 df, the t-score of 1.8 is
between the entries 1.658 (corresponding to an upper tail area of .05) and 1.980
.0372 .0372 (corresponding to an upper tail area of .025). Therefore, doubling vis-à-vis the two-
sided test, it follows that .05 < p < .10. (R gives the exact p-value as .0744.) This
0 indicates that we cannot reject H 0 : 10 ppb at the 5% significance level.
–1.8
(iii)This result is in fact not statistically significant, i.e., we are unable to conclude
that 10 ppb can be rejected in favor of 10 ppb , at the 5% significance
level. As far as a determination of the drinking water goes, although the evidence
suggests that it is probably safe, the formal result is inconclusive… confirming
our informal inspection in (i).

Source 4: x 10.9 ppb

(i) As with Source 1, this sample mean of x 10.9 ppb is above the null value 10
ppb, but not by quite as much (0.9 ppb) as Source 1 was (1.43 ppb). Hence we may
be able to conclude that the water is unsafe for drinking, but only if statistical
significance can be formally shown. This again might indeed be borderline.

(ii) Because this sample mean is on the opposite side of Source 3 by the same amount, it
t120 follows that the two-sided p-value will remain unchanged. [Explicitly, we have
10.9 10
p-value = 2 P( X 10.9) 2P T120 2P (T120 1.8), so that .05 < p < .10.
0.5
.0372 .0372 (Again, the exact p-value is .0744.)] This indicates that we cannot reject H 0 : 10
ppb at the 5% significance level.
0
1.8
(iii)As with Source 3, this result is in fact not statistically significant, i.e., we are
unable to conclude that 10 ppb can be rejected in favor of 10 ppb , at
the 5% significance level. As far as a determination of the drinking water goes,
although the sample evidence suggests that it is probably unsafe, the formal result
is inconclusive… confirming our informal inspection in (i).
Ismor Fischer, 4/22/2017 Solutions / 6.4-8

(b) We have seen that Sources 1 and 2 (both 1.43 ppb from 10 ppb) have the same p-value by
symmetry, and likewise for Sources 3 and 4 (both 0.9 ppb from 10 ppb), but the former
are statistically significant, while the latter are not. The 95% acceptance region endpoints
must therefore be between 0.9 ppb and 1.43 ppb away from 10 ppb, and are given by
0 (t120, .025 ) s n = 10 ppb ± (1.980)(0.5 ppb) = 10 ppb ± 0.99 ppb. Therefore, the null
.025 .025
hypothesis is rejected if either x 9.01 ppb or x 10.99 ppb, consistent with the data.

X
9.01 10 10.99
(c) One-sided: H 0 : 10 ppb (safe), versus H A : 10 ppb (unsafe), at α = .05

Source 1: x 11.43 ppb

(i) The informal intuition here is similar to part (i) of (a) for Source 1, above. Namely,
the very high sample mean of 11.43 ppb seems to provide evidence that clearly
refutes the null hypothesis 10 ppb, and thus clearly supports the alternative
hypothesis 10 ppb, i.e., the water is unsafe for drinking.
t120
(ii) One-sided p-value is equal to half of the two-sided p-value for Source 1
Because the direction of the p-value always follows the alternative hypothesis (i.e.,
reflecting the investigator’s belief), its calculation here is simply the right-tailed
.0025 probability P ( X 11.43) , i.e., exactly half of the two-sided p-value computed in (a)
0
for Source 1. That is, p-value = .0025. As before, this is much smaller than the
2.86 significance level α = .05, hence we have a strong rejection of the null hypothesis.

(iii)This is very strong statistically significant evidence (11.43 ppb) to reject the null
hypothesis in favor of the alternative that 10 ppb, very clearly indicating that
the water is indeed unsafe… confirming our suspicion in (i).

Source 2: x 8.57 ppb

(i) Again, the informal intuition here is similar to part (i) of (a) for Source 2, above.
Namely, the very low sample mean of 8.57 ppb seems to provide evidence that
clearly supports the null hypothesis 10 ppb, and clearly refutes the alternative
hypothesis 10 ppb, i.e., the water is safe for drinking.

(ii) One-sided p-value is equal to 1 – (half of the two-sided p-value) for Source 2!!!!!
t120 Because the direction of the p-value always follows the alternative hypothesis (i.e.,
reflecting the investigator’s belief), its calculation here is still the right-tailed
.9975
probability P ( X 8.57) . But this is NOT equal to half of the corresponding two-
sided p-value in (a) for Source 2! Rather, it is its complementary area:
p-value = 1 – .0025 = .9975. This is much greater than the significance level α = .05,
0 in fact very close to 1, thus we actually have very strong support of (i.e., not reject)
–2.86 the null hypothesis.

(iii)This is not statistically significant evidence (8.57 ppb) to reject the null hypothesis
10 ppb. On the contrary, it is very strong evidence to “accept” it, very
clearly indicating that the water is indeed safe… confirming our inspection in (i).
Ismor Fischer, 4/22/2017 Solutions / 6.4-9

Source 3: x 9.1 ppb

(i) As with Source 2, the low sample mean of 9.1 ppb seems to provide evidence that
clearly supports the null hypothesis 10 ppb, and clearly refutes the alternative
hypothesis 10 ppb, i.e., the water is safe for drinking.

t120 (ii) One-sided p-value is equal to 1 – (half of the two-sided p-value) for Source 3!!!!!
Again, here the p-value = P( X 9.1) is the complement of the area to the left of 9.1,
.9628 or 1 – half the two-sided p-value in (a) for Source 3. That is, .95 < p < .975. (Via R,
the exact p-value is .9628.) And as with Source 2, this is much greater than the
significance level α = .05, in fact very close to 1, hence again we actually have very
0 strong support of (i.e., not reject) the null hypothesis.
–1.8
(iii)This is not statistically significant evidence (9.1 ppb) to reject the null hypothesis
10 ppb. On the contrary, it is very strong evidence to “accept” it, very
clearly indicating that the water is indeed safe… confirming our inspection in (i).

Source 4: x 10.9 ppb

(i) As with Source 1, the high sample mean of 10.9 ppb seems to provide evidence that
clearly refutes the null hypothesis 10 ppb, and thus clearly supports the
alternative hypothesis 10 ppb. However, it is not as high as Source 1 was
(11.43 ppb). Hence we may be able to conclude that the water is unsafe for
drinking, but only if statistical significance can be formally shown.
t120
(ii) One-sided p-value is equal to half of the two-sided p-value for Source 4
This p-value is P ( X 10.9) , i.e., exactly half of the two-sided p-value computed in
.0372 (a) for Source 4. That is, .025 < p < .05. (Via R, the exact p-value is .0372.) This
indicates that we can reject the null hypothesis at the 5% level.
0 (iii)In fact, this is moderately strong statistically significant evidence (10.9 ppb) to
1.8
reject the null hypothesis in favor of the alternative that 10 ppb, clearly
indicating that the water is indeed unsafe… confirming our suspicion in (i).

(d) Source 5: x 10.6445 ppb

t120 (i) As with Source 4, the high sample mean of 10.6445 ppb seems to provide evidence
that refutes the null hypothesis 10 ppb, and thus supports the alternative
hypothesis 10 ppb. However, it is not as high as Source 4 was (10.9 ppb).
.10
Hence, once again, this may or may not turn out to be statistically significant.

0 10.6445 10
1.289 (ii) p-value = 2P( X 10.6445) 2P T120 2P(T120 1.289), so that
0.5
p = .10 > .05, hence the null hypothesis cannot be rejected at this level.

(iii)This result is not statistically significant, i.e., we are unable to conclude that
10 ppb can be rejected in favor of 10 ppb , at the 5% significance level.
The sample evidence suggests that it is probably unsafe, but the formal result is
inconclusive… confirming our informal inspection in (i).
Ismor Fischer, 4/22/2017 Solutions / 6.4-10

(e) We have seen that, at the α = .05 level, Source 5 with x 10.6445 ppb was not significant,
but Source 4 with x 10.9 ppb was significant. Therefore, the cutoff level will be the x
value between these two, that corresponds to a right-tail area of α = .05 exactly. That is,
0 (t120, .05 ) s n = 10 ppb + (1.658)(0.5 ppb) = 10.829 ppb. Therefore, this null
hypothesis is rejected if x 10.829 ppb, consistent with the data from all five sources.
See the pictured null distribution.

(f) Summary and Conclusions

.05 Source 1, with x 11.43 ppb, is high enough above 10 ppb to be statistically significant for
either the two-sided test H 0 : 10 ppb vs. H A : 10 ppb, or the one-sided test
H0: 10 ppb (safe) vs. H A : 10 ppb (unsafe), at α = .05 level. (The corresponding
X p-values are .005 and .0025, respectively.) Thus we have the following.
10 10.829
In either the one-sided or two-sided scenario, Source 1 can be concluded to be unsafe.

Likewise, Source 2, with x 8.57 ppb, is low enough below 10 ppb to be statistically
significant for either the two-sided test (p = .005) or one-sided test (p = .0025). Thus…

In either the one-sided or two-sided scenario, Source 2 cannot be concluded to be unsafe.

However, Source 3 with x 9.1 ppb, and Source 4 with x 10.9 ppb, are both closer to
10 ppb, but both have a corresponding two-sided p-value between .05 and .10, hence…

Neither Source 3 nor Source 4 is statistically significant in the two-sided test.

However, as it has a one-sided p-value extremely close to 1, it follows that…

Source 3 very strongly supports the null hypothesis (safe) in the one-sided test.

Likewise, with a one-sided p-value between .025 and .10, it follows that…

Source 4 is statistically significant (unsafe) in the one-sided test.

Finally, Source 5 with x 10.6445 is closer to 10 pbb than Source 4, and has a larger
one-sided p-value = .10, so even though its sample mean value is above 10 ppb…

Source 5 is not statistically significant (i.e., cannot be declared unsafe, and is in fact,
probably safe) in the one-sided test. Note that the two-sided p-value = .20 >> .05!

All of these conclusions are consistent with the corresponding rejection regions.

In general, one-sided tests are less conservative (i.e., significant; they reject the null
hypothesis more often) than corresponding two-sided tests, and should be used sparingly. In
circumstances like this however, when one direction is impossible or of no practical
importance (e.g., we need only be concerned if arsenic levels are too high), a one-sided test
might be more appropriate. Here, we see that Source 4 is not significant at α = .05 using a
two-sided test, but is significant – on the unsafe side – with a one-sided test. Therefore, in
this toxicity context, a one-sided test errs more on the side of caution than a two-sided test.
Ismor Fischer, 4/22/2017 Solutions / 6.4-11

8. We have power = 1 = P(Z z), where z = 1.96 + 0.5 n for n = 25, 16, 9, 4, 1,
respectively. Hence z = 0.54, 0.04, 0.46, 0.96, 1.46, from which it follows via the Z-table
[or via the R command pnorm(c(0.54, 0.04, -0.46, -0.96, -1.46))] that
power (customarily expressed as a percentage) = 70.5%, 51.6%, 32.3%, 16.9%, 7.2%,
respectively. Clearly, the power steadily decreases as sample size decreases.

9. The dotplot of the original data shows the strong presence of positive skew, i.e., many outliers
greater than the mean. However, the “log-transformed” dotplot restores a more symmetric
appearance, consistent with a normally distributed set of values.
Ismor Fischer, 4/22/2017 Solutions / 6.4-12

10. In both situations, we are testing H0: 1 = 2 or equivalently, H0: 1 2 = 0, vs. HA: 1 2 0.
Notice that the sample numerical data are the same in both situations! So we have the following
summary statistics for both (a) and (b): x 1 = 250, s12 = 1300, and x 2 = 200, s22 = 400.
(a) In the clinical trial example, we have two small, independent samples. This is therefore a good
candidate for the two-sample t-test, provided that we can establish equivariance, via the
informal condition that the ratio of sample variances lies between 0.25 and 4. We have
s12 1300
= = 3.25, which satisfies this criterion.  Next, for = .05 and df = 3 + 3 2 = 4,
s22 400
 “point estimate” of mean difference 1 2 is x1 x2 = 250 200 = 50
 critical value = t = 2.776
4, .025

 standard error s pooled


2
1/n +1/n 1 2

(3 1) (1300) + (3 1) (400) 1 1
= +
4 3 3
Note how the
pooled variance
850 is between = 850 0.667 = 23.805
1300 and 400.
Multiply

Therefore, the 95% margin of error = 2.776 23.805 = 66.08.

Putting these together, the 95% C.I. = (50 66.08, 50 + 66.08) = ( 16.08, 116.08), which
50 0
does contain the hypothesized mean difference of 0. Also, the p-value = 2 P T4 =
23.805
2 P(T4 2.1), which is larger than 0.10. Thus, the null hypothesis cannot be rejected at the
= .05 level. Based on the data from this sample, a statistically significant difference has not
been demonstrated between the treatment group and control group.
 Run 2 * pt(2.1, df = 4, lower.tail = F) for the exact p-value, and compare below.

 R code
placebo <- c(220, 240, 290)
drug <- c(180, 200, 220)

# Output
t.test(placebo, drug, var.equal = T, paired = F)
# Pooled variance is used if var.equal = TRUE is specified.
# Otherwise, the Satterwaithe Test is implemented.

Two Sample t-test


data: placebo and drug
t = 2.1004, df = 4, p-value = 0.1036
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: -16.09261 116.09261
sample estimates:
mean of x mean of y
250 200
Ismor Fischer, 4/22/2017 Solutions / 6.4-13

(b) In the second example, we have two small, paired samples. Therefore, a paired t-test is
appropriate, i.e., a t-test on the single sample of individual differences D = {40, 40, 70}. Here,
d = 50 as above, and variance sd2 = 300. So, for = .05 and df = 3 1 = 2, we have…
 critical value = t2, .025 = 4.303
Multiply
 standard error s / n = 300 / 3 = 10

Therefore, the 95% margin of error = 43.03, and the 95% C.I. = (50 43.03, 50 + 43.03) =
(6.97, 93.03), which does not contain the hypothesized mean difference of 0. Also, the p-value
50 0
= 2 P T2 = 2 P(T2 5), which is between .02 and .05. Thus, the null hypothesis
10
can be rejected at the = .05 level. Based on the data from this sample, a statistically
significant difference has been demonstrated between the pre-treatment group and the post-
treatment group.
 Run 2 * pt(5, df = 2, lower.tail = F) for the exact p-value, and compare below.

 R code
pre.tx <- c(220, 240, 290)
post.tx <- c(180, 200, 220)
# Output
t.test(pre.tx, post.tx, paired = T)

Paired t-test
data: pre.tx and post.tx
t = 5, df = 2, p-value = 0.03775
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval: 6.973473 93.026527
sample estimates:
mean of the differences 50

(c) Clearly, even though the sample data are the same in both cases, the study designs are
different, which affects the statistical analysis. A computer will only crunch the numbers,
according to the type of test that a knowledgeable experimenter specifies. As we can see from
these examples, the results can be radically different!
Ismor Fischer, 4/22/2017 Solutions / 6.4-14

11. Iron levels in CF children

(a) H0: 1 = 2 versus HA: 1 2, or equivalently, H0: 1 2 = 0 versus HA: 1 2 0.


(b) X is approximately normally distributed with unknown standard deviation in each population,
and the independent samples are clearly small. This is therefore a good candidate for the
two-sample t-test, provided that we can establish equivariance, via the informal condition that
s12 (5.9)2 34.81
the ratio of sample variances lies between 0.25 and 4. We have 2 = = = 0.877,
s2 (6.3)2 39.69
which does satisfy this criterion.  Next, for = .05 and df = 9 + 13 2 = 20, we calculate...
 “point estimate” of mean difference = 18.9 11.9 = 7.0 mol/l
 critical value = t20, 025 = 2.086
 standard error spooled2 1 / n 1 + 1 / n 2

(9 1) (34.81) + (13 1) (39.69) 1 1


= +
20 9 13

= 37.738 0.188 = 2.663 mol/l


Multiply
Therefore, the margin of error = 2.086 2.663 mol/l = 5.56 mol/l.

Putting these together, the 95% C.I. = (7.0 5.56, 7.0 + 5.56) = (1.44, 12.56) mol/l.
7.0 0
(c) p-value = 2 P( X 1 X2 7.0) = 2 P T20 = 2 P(T20 2.629). From the table, we
2.663
see that 2.629 falls between the t20-scores of 2.528 and 2.845, which means that the amount of
area in each corresponding tail must be between .01 and .005. Hence, doubling, .01 < p < .02.
[The exact p-value is .016, via 2 * pt(2.629, df = 20, lower.tail = F).]

(d) The 95% confidence interval does not contain the hypothesized mean difference of 0 mol/l,
and the p-value < .05 as well, so the null hypothesis is moderately rejected at the = .05
level. Interpretation: Based on the data from this sample, there is a statistically significant
difference demonstrated between the mean serum iron level of the population of healthy
children, and the mean serum iron level of the population of children with cystic fibrosis.
Furthermore, the data suggest that the latter population suffers from an iron deficiency.

12. Crossover study: Methylphenidate versus Placebo


(a) 95% margin of error for placebo = (t19, .025) (4.8 / 20 ) = (2.093) (1.073) = 2.246
95% confidence interval for placebo = (14.0 2.246, 14.0 + 2.246) = (11.754, 16.246).
(b) 95% margin of error for methyl = (t19, .025) (2.9 / 20 ) = (2.093) (0.648) = 1.357
95% confidence interval for methyl = (10.8 1.357, 10.8 + 1.357) = (9.443, 12.157).
(c) Comparing these two confidence intervals side-by-side, even though there is some overlap,
there seems to be a suggestion that children with ADD on methylphenidate generally have
lower rating scores, and therefore improved attention, than when on placebo. However, this is
an informal evaluation, and not the formal conclusion of a rigorous statistical analysis. The
appeal of crossover studies such as this, is that each patient essentially acts as his or her own
control. Note that each of the n = 20 children in the sample has two scores at different times:
one when on placebo, and one when on methylphenidate. The appropriate statistical analysis
involves methods which are outside the scope of this course.
Ismor Fischer, 4/22/2017 Solutions / 6.4-15

13.
(a) Given X 1 ~ N ( 1 , 50) and X 2 ~ N ( 2 , 50) , with respective means x1 215 and x2 200
50
from random samples, each of size n = 100. Thus, the 95% margin of error = (1.96) =
100
9.8, so that...
 95% confidence interval for 1 is (215 9.8, 215 9.8) = (205.2, 224.8)
 95% confidence interval for 2 is (200 9.8, 200 9.8) = (190.2, 209.8),

and clearly, there is overlap between them. Despite this however, x1 x2 15 , and the 95%
(50)2 (50)2
margin of error for the true mean difference 1 2 = 50 = 5 2 = 7.07, is
100 100
so that the 95% confidence interval for 1 2 is (15 7.07, 15 7.07) = (7.93, 22.07). This
clearly does not contain 0, and so the null hypothesis can be rejected at the .05 level.
Moral: At a fixed significance level, the fact that two confidence intervals overlap may
suggest that the null hypothesis cannot be rejected, but it is not conclusive!

(b) Given X 1 ~ N ( 1 , ) and X 2 ~ N ( 2 , ) , with respective means x1 and x 2 from random


samples, each of size n. Without loss of generality, we may assume that x1 x2 ; the resulting
100 (1 )% confidence limits for CI 1
and CI 2
are then given by x1 z /2( / n ) and
x2 z /2( / n ) , respectively. Hence, overlap occurs if and only if the left endpoint of
CI 1
is less than the right endpoint of CI 2
, i.e., x1 z /2( / n ) < x2 z /2( / n ) , or

| x1 x2 |
d 2 . On the other hand, as the 100 (1 )% standard error for 1 2 is
z /2 / n
2 2 2 2
1 2
given by = = 2 , it follows that the 100 (1 )% confidence
n1 n2 n n n
limits for CI 1 2
are ( x1 x2 ) z /2 2 / n . Hence, 0 is not contained within – i.e.,
the null hypothesis can be rejected – if and only if the left endpoint
| x1 x2 |
( x1 x2 ) z /2 2 / n > 0, or d 2 . Combining these two
z /2 / n
inequalities yields the result(s).
Ismor Fischer, 4/22/2017 Solutions / 6.4-16

14. Z-tests and Chi-squared Tests


(a) Test of Independence

 ˆ A| B
335 915
ˆ A | BC =
500 1500
= 0.67 – 0.61 = 0.06. Under the null hypothesis of equal

proportions, H0: A| B A| BC 0 , their common value can be estimated by

ˆpooled 335 915 1250


0.625 . Hence the standard error is estimated by
500 1500 2000
ˆpooled (1 ˆpooled ) 1 1 1 1
s.e.0 = (0.625)(0.375) = 0.025, and so
nB nBC 500 1500
0.06 0
the corresponding Z-score = = 2.4. Because this is greater than the critical value
0.025
z.025 1.96 , the null hypothesis can be rejected at the = .05 significance level,
indicating a statistically significant association between the A and B responses. (In
particular, liking Brand A is associated with a statistically greater probability of liking
Brand B, than it is with not liking Brand B.)

 ˆB | A
335 165
ˆ B | AC =
1250 750
= 0.268 – 0.22 = 0.048. Under the null hypothesis of equal

proportions, H0: B| A B | AC 0 , their common value can be estimated by

ˆpooled 335 165 500


0.25 . Hence the standard error is estimated by
1250 750 2000
ˆpooled (1 ˆpooled ) 1 1 1 1
s.e.0 = (0.25)(0.75) = 0.02, and so the
nA nAC 1250 750
0.048 0
corresponding Z-score = = 2.4 also. (Not a coincidence!) Again because this is
0.02
greater than the critical value z.025 1.96 , the null hypothesis can be rejected at the
= .05 significance level, indicating a statistically significant association between the
A and B responses. (In particular, liking Brand B is associated with a statistically greater
probability of liking Brand A, than it is with not liking Brand A.)

( 22.5)2 ( 22.5)2 ( 22.5) 2 ( 22.5) 2


 The Chi-squared statistic 2
312.5 937.5 187.5 562.5
= 5.76 on 1 df.

Because this is greater than the critical value 1,2 .05 = 3.84, the null hypothesis can again
be rejected at the = .05 level, indicating an association between liking the two brands
A and B. Note that (not surprisingly), both Z-test statistics are identical (2.4), and the
Chi-squared statistic (5.76) is equal to their square.

(b) Test of Homogeneity


In this scenario, only the design structure and null hypothesis are different; the values of the
test statistics are identical to those in (a). Hence, the formal interpretation is that the two
cities are not homogeneous with respect to liking Brand A. In particular, there is a
significant difference between City 1 (at 67%) and City 2 (at 61%), at the = .05 level.
Ismor Fischer, 4/22/2017 Solutions / 6.4-17

15. Is there an association between MI and diabetes in the U.S. Navajo population?
(a)
MI
Yes No Totals

Diabetes
Yes 46 (35.5) 25 (35.5) 71

No 98 (108.5) 119 (108.5) 217

Totals 144 144 288

With the expected values indicated in parentheses, the Chi-squared test statistic is given by

2 (46 35.5)2 (25 35.5)2 (98 108.5)2 (119 108.5)2


= + + + = 8.244
35.5 35.5 108.5 108.5
on 1 degree of freedom. This value is between 6.635 and 10.828 which, from the textbook’s Chi-
squared table, cut respective areas of 0.01 and 0.001 from the upper tail of that distribution
having df = 1. Hence, .001 < p-value < .01 = . Therefore, we can reject the null hypothesis at
the 1% significance level. Interpretation: There is a statistically significant difference in the
proportion of diabetes between the two groups, indicating an association between diabetes and
the occurrence of myocardial infarction among Navajos living in the United States. In particular,
victims of acute MI are more likely to suffer from diabetes than individuals free of heart disease.
(b) For paired data, we calculate the appropriate test statistic for McNemar’s Test on the modified
2 (16 37)2
contingency table, resulting in = = 8.321, on 1 degree of freedom. As above,
16 + 37
.001 < p-value < .01 = , and we can reject the null hypothesis of no association at the 1%
significance level. Interpretation: As before, there is a statistically significant difference in the
proportion of diabetes between the two groups, indicating an association between diabetes and
the occurrence of myocardial infarction among Navajos living in the United States. In particular,
victims of acute MI are more likely to suffer from diabetes than the individuals free of heart
disease, who have been matched on age and gender.
(c) If both members of a case-control pair were – or likewise, were not – exposed to a candidate risk
factor, then this reveals nothing about a potential association between it, and the disease in
question. That is, the concordant pairs – or the pairs of responses in which either two diabetics
or two non-diabetics are matched – provide no information for testing how differences in
diabetic status may be associated with myocardial infarction. Therefore, these data are naturally
discarded by the test statistic, which focuses only on the discordant pairs, or the pairs of
responses in which an individual who has diabetes is paired with one who does not.
(d) navajo <- matrix(c(46, 98, 25, 119), ncol = 2, nrow = 2, dimnames
= list("Diabetes" = c("Yes", "No"), "MI" = c("Yes", "No")))
navajo
chisq.test(navajo, correct = FALSE)
navajo.paired <- matrix(c(9, 37, 16, 82), ncol = 2, nrow = 2,
dimnames = list("Diabetes" = c("Yes", "No"), "MI" = c("Yes", "No")))
navajo.paired
mcnemar.test(navajo.paired, correct = FALSE)
Ismor Fischer, 4/22/2017 Solutions / 6.4-18

16. Fetal monitoring

(358) (2745)
(a) OR = = 1.722
(2492) (229)
Interpretation: The odds of being delivered by caesarian section are 1.722 times higher for
fetuses that are electronically monitored during labor than for fetuses that are not monitored.
Therefore, there does appear to be a moderate association between the use of electronic fetal
monitoring and the eventual method of delivery.

(b) The odds ratio OR can take on any positive value. However, because its distribution is
skewed to the right, we typically work in the logarithm scale ln(OR), which is more
symmetric and approximately normally distributed:

1 1 1 1
95% confidence limits for ln(OR): ln( OR ) z.025 + + +
a b c d

1 1 1 1
= ln(1.722) 1.96 + + +
358 2492 229 2745

= 0.543 1.96 (0.089), so that…

95% confidence interval for ln(OR): (0.369, 0.717). Hence,

95% confidence interval for OR: (e0.369, e0.717) = (1.446, 2.048).

(c) The 95% confidence interval for the odds ratio does not contain the value 1; therefore, the null
hypothesis H0: OR = 1 can be rejected in favor of the alternative HA: OR 1, at the = .05
significance level. Interpretation: We can be 95% confident that the odds of delivery by
caesarian section are between 1.446 and 2.048 times higher for fetuses that are monitored
during labor than for those that are not monitored. There does seem to be a statistically
significant association between caesarian section delivery and electronic fetal monitoring.

(d) Association does not necessarily imply causation. It is possible that the fetuses who are at
higher risk for caesarian delivery are also the ones that are monitored.
Ismor Fischer, 4/22/2017 Solutions / 6.4-19

17. Summary Odds Ratio


(31)(379) (39)(465)
(a) OR1 = = 1.579 OR 2 = = 1.645
(93)(80) (74)(149)
Interpretation: In the first study, the odds of epithelial ovarian cancer in women who have
had no term pregnancies are 1.579 times higher than the odds of epithelial ovarian cancer in
women who have had one or more term pregnancies. In the second study, the odds are 1.645
times higher. With values so close, it is possible that both studies are actually estimating the
same quantity.

(b) We conduct a Test of Homogeneity on the null hypothesis H0: OR1 = OR2, vs. the alternative
HA: OR1 OR2, by performing the following steps.
Step 1: l1 = ln(1.579) = 0.457 l2 = ln(1.645) = 0.498
1 1
Step 2: w1 = w2 =
(1/31 + 1/93 + 1/80 + 1/379) (1/39 + 1/74 + 1/149 + 1/465)
= 17.197 = 20.826
(17.197)(0.457) + (20.826)(0.498)
Step 3: L = = 0.479
17.197 + 20.826
2
Step 4: = 17.197 (0.457 0.479)2 + 20.826 (0.498 0.479)2 = 0.016
Step 5: For a Chi-squared distribution with 1 degree of freedom, this test statistic
corresponds to a p-value >> .05 = ,. Hence we cannot reject the null
hypothesis that OR1 = OR2, and so we may combine the information.

(31)(379) / 583 + (39)(465) / 727


(c) Mantel-Haenszel estimate: OR summary = = 1.615
(93)(80) / 583 + (74)(149) / 727

(124)(111) (113)(188) (124)(472) (113)(539)


(d) Step 1: + = 52.83, + = 184.17
583 727 583 727
(459)(111) (614)(188) (459)(472) (614)(539)
+ = 246.17, + = 826.83
583 727 583 727
Because each of these expected values is > 5, we may proceed.
1
Step 2: From above, L = 0.479. Now, s.e. (L) = = 0.162.
17.197 + 20.826

Step 3: Therefore, 95% confidence limits for ln(OR) are 0.479 1.96 (0.162), so that the 95%
confidence interval for ln(OR) is (0.161, 0.797). Hence, the 95% confidence interval
for OR is (e0.161, e0.797) = (1.17, 2.22).

(124 )( 459 )(111)( 472 )


(e) From § 6.2.3: The variance of Study 1 is V1 15.074 , and the variance
( 583)2 ( 583 1)
(113)( 614 )(188 )( 539 )
of Study 2 is V2 18.323 . Therefore, summing over both studies yields
( 727 ) 2 ( 727 1)
Ismor Fischer, 4/22/2017 Solutions / 6.4-20

Observed D+ totals Expected D+ totals Total Variance

O1 = 31 + 39 = 70 , E1 = 52.83 (computed in (d) above)


V = 15.074 + 18.323 = 33.397
O2 = 80 + 149 = 228, E2 = 246.17 (computed in (d) above)

2 2 ( 70 52.83) 2
and the formal 1 test statistic given by = 8.827.
33.397

(f) The 95% confidence interval for the summary odds ratio does not contain the value 1, and the
Chi-squared statistic indicates that the p-value < .05. Therefore, the null hypothesis H0: OR = 1
can be rejected in favor of the alternative HA: OR 1, at the = .05 significance level.
Interpretation: We can be 95% confident that the odds of epithelial ovarian cancer are between
1.17 and 2.22 times higher in women who have had no term pregnancies than in women who
have had one or more term pregnancies. There does seem to be a statistically significant
association between epithelial ovarian cancer and number of term pregnancies, at the 5% level.

(g)  R output (using given code):


> # 2-by-2 contingency tables for 2 strata
> ovarian
, , Stratum = Study 1
Disease
Pregnancies Cancer No Cancer
None 31 93
One or More 80 379

, , Stratum = Study 2
Disease
Pregnancies Cancer No Cancer
None 39 74
One or More 149 465
> test.stat
[1] 0.01559548 Note: This value corresponds to Step 4 of part (b).
>
> p.value
[1] 0.900617
> # Note: If p.value > .05, then strata can be combined; proceed.
>
> # Test of Association: ORsummary = 1
> mantelhaen.test(ovarian, correct = FALSE)

Mantel-Haenszel chi-squared test without continuity correction


data: ovarian
Mantel-Haenszel X-squared = 8.827, df = 1, p-value = 0.002968
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval: 1.175135 2.218911
sample estimates: common odds ratio 1.614781
Ismor Fischer, 4/22/2017 Solutions / 6.4-21

18. This exercise brings out some of the more subtle aspects of the Chi-squared Test.

(a) In this case, the null hypothesis H0: 1 = 2 = 3 can be more precisely written in terms of
conditional probability as H0: Men | Left = Men | Mid = Men | Right, i.e., “The proportion of men in
each political category is the same.” The sample data give us the estimates ˆ1 = 12/60 = 0.2,
ˆ 2 = 18/60 = 0.3, and ˆ 3 = 30/60 = 0.5. Under the null hypothesis assumption of the uniform
distribution, the expected values should be equally divided among the 60, as shown below.

Left Middle Right


Men 20 20 20 60

To test the null hypothesis, we use the Chi-squared Goodness-of-Fit test. Therefore the test
2
( 8 )2 ( 2 )2 ( 10 ) 2 2
statistic is = 8.4 on 2 df. From the table, this -score is
20 20 20
between 7.378 and 9.210; hence .010 < p-value < .025. (Using R, the exact p-value is
pchisq(8.4, 2, lower.tail = F) = .015.) As this is less than = .05, we may
reject the null hypothesis at this significance level. Thus, according to the sample, there is a
statistically significant difference between the proportions of men in the 3 political categories.

(b) Here, H0: 1 = 2 = 3 can be written as H0: Women | Left = Women | Mid = Women | Right, i.e.,
“The proportion of women in each political category is the same.” Note that the sample
counts are nine times the men’s, so the proportions are the same as theirs: ˆ1 = 108/540 = 0.2,
ˆ 2 = 162/540 = 0.3, and ˆ 3 = 270/540 = 0.5. Under the null hypothesis assumption of
equality, the expected values out of 540 are as shown below.

Left Middle Right


Women 180 180 180 540

2
( 72 )2 ( 18 )2 ( 90 )2
The Chi-squared Goodness-of-Fit test statistic yields =
180 180 180
75.6 on 2 df (exactly nine times the test statistic for the men). Therefore, the p-value is
extremely small, much less than = .05, and so again we may reject the null hypothesis at
this significance level. Thus, according to the sample, there is a very strong, statistically
significant difference between the proportions of women in the 3 political categories.
Ismor Fischer, 4/22/2017 Solutions / 6.4-22

(c) When combined, the informal null hypothesis H0: 1 = 2 = 3 is more formally expressed as
H0: Men | Left = Men | Mid = Men | Right AND Women | Left = Women | Mid = Women | Right. That is,
“The proportion of men in each political category is the same AND the proportion of women
in each political category is the same.” When computed, the expected values are equal to the
observed values, so the p-value = 1 exactly, indicating the strongest possible support for the
null hypothesis! But why???? Because the proportions are taken from different (i.e., mixed
gender) populations. Hence the null hypothesis H0: 1 = 2 = 3 means something different!

Left Middle Right


Men 12 18 30 60
Women 108 162 270 540
120 180 300 600

(d) Here, the sample estimates yield the following values:

ˆ Men | Left = 12/120 = 0.1, ˆ Men | Mid = 18/180 = 0.1, ˆ Men | Right = 30/300 = 0.1,

so the first part of the null hypothesis H0: Men | Left = Men | Mid = Men | Right is confirmed, AND

ˆ Women | Left = 108/120 = 0.9, ˆ Women | Mid = 162/180 = 0.9, ˆ Women | Right = 270/300 = 0.9,

so the second part of the null hypothesis H0: Women | Left = Women | Mid = Women | Right is confirmed.

(e) Moreover, since rows and columns of a contingency table should be interchangeable, it is also
possible to express this via a mathematically equivalent null hypothesis, namely

Left | Men = Mid | Men = Right | Men =


H0: AND AND
Left | Women Mid | Women Right | Women.

In this case, the sample estimates are

ˆ Left | Men = 12/60 = 0.2 ˆ Mid | Men = 18/60 = 0.3 ˆ Right | Men = 30/60 = 0.5
ˆ Left | Women = 108/540 = 0.2  ˆ Mid | Women = 162/540 = 0.3  ˆ Right | Women = 270/540 = 0.5 .

Comparing for equal gender proportions within the same political category (i.e., Men | Left vs.
Women | Left, Men | Mid vs. Women | Mid, and Men | Right vs. Women | Right) would involve three
separate null hypotheses, each of which is analyzed by the Goodness-of-Fit test (but see (d)):

Men Women Men Women Men Women


Left 12 108 120 Middle 18 162 180 Right 30 270 300
Ismor Fischer, 4/22/2017 Solutions / 6.4-23

19. Chi-squared Goodness of Fit Test

(a) H0: Vanilla Chocolate Strawberry

Vanilla Chocolate Strawberry With the uniform expected values


indicated in parentheses, we have the
test statistic
416 419 365
(400) (400) (400) 2 (+16)2 (+19)2 ( 35)2
= + +
400 400 400
= 4.605, on 2 degrees of freedom.

From the table, we see that the p-value = P( 22 4.605) = .10 > .05 = . Therefore, on the
basis of this finding, we conclude that we cannot reject the null hypothesis in favor of the
alternative, at the 5% significance level. Interpretation: A statistically significant difference
has not been demonstrated among the three preferences in the general population, at the 5%
significance level. Although there is a suggestion that consumers do prefer vanilla and
chocolate equally over strawberry, the difference is not statistically significant at the 5% level.

(b) Test of Independence between or Test of Homogeneity of equal flavor preferences


gender and flavor preference: between Male and Female populations:
Vanilla |Males Vanilla |Females

Males|Vanilla Males|Chocolate Males|Strawberry and


H0 : and H0 : Chocolate|Males Chocolate|Females

Females|Vanilla Females|Chocolate Females|Strawberry and


Strawberry|Males Strawberry|Females

Vanilla Chocolate Strawberry With the expected values indicated in


parentheses, we have the test statistic
200 190 210 ( 8.0)2 ( 19.5)2 (+27.5)2
M 2
= + + +
(208.0) (209.5) (182.5) 208.0 209.5 182.5

216 229 155 (+8.0)2 (+19.5)2 ( 27.5)2


F + +
(208.0) (209.5) (182.5) 208.0 209.5 182.5
= 12.533, on 2 degrees of freedom.

From the table, we see that the p-value = P( 22 12.533) << .05 = . Therefore, on the basis
of this finding, we conclude that we can reject the null hypothesis in favor of the alternative,
at the 5% significance level. Interpretation: A statistically significant difference has been
demonstrated among the three preferences between males and females in the population, at
the 5% significance level. Therefore, there is an association between gender and ice cream
flavor preference, at this level. Note that this does not contradict the findings in part (a)!
Ismor Fischer, 4/22/2017 Solutions / 6.4-24

(c) R code
 For (a):
chisq.test(c(416, 419, 365))

Chi-squared test for given probabilities

data: c(416, 419, 365)


X-squared = 4.605, df = 2, p-value = 0.1

 For (b):
icecream <- matrix(c(200, 216, 190, 229, 210, 155), ncol = 3,
nrow = 2, dimnames = list(c("Males", "Females"), c("Vanilla",
"Chocolate", "Strawberry")))
# Output
icecream
chisq.test(icecream, correct = FALSE)

Vanilla Chocolate Strawberry


Males 200 190 210
Females 216 229 155

Pearson's Chi-squared test


data: icecream
X-squared = 12.5331, df = 2, p-value = 0.001899
Ismor Fischer, 4/22/2017 Solutions / 6.4-25

20.
Degree of Hair Growth
No New Minimal Moderate Dense
Totals
Growth Vellus Growth Growth Growth
301 172 178 58 5
Rogaine 714
(361.24) (160.66) (145.69) (43.41) (2.99)
423 150 114 29 1
Placebo 717
(362.76) (161.34) (146.31) (43.59) (3.01)
Totals 724 322 292 87 6 1431

(a) With the expected values indicated in parentheses, the Chi-squared test statistic is given by

2 (301 361.24)2 (172 160.66)2 (178 145.69)2 (58 43.41)2 (5 2.99)2


+ + + + +
361.24 160.66 145.69 43.41 2.99

(423 362.76)2 (150 161.34)2 (114 146.31)2 (29 43.59)2 (1 3.01)2


+ + + +
362.76 161.34 146.31 43.59 3.01
= 48.416, on 4 degrees of freedom. This value is much larger than 13.277 which, from the
Chi-squared table, cuts an area of 0.01 from the upper tail of that distribution having df = 4.
Hence, the p-value << .01 = , and we can strongly reject the null hypothesis at the 1%
significance level. Interpretation: There is a statistically significant difference between the
two groups, providing very strong evidence for the effectiveness of Rogaine versus placebo,
across these categories.
Note: The fact that the expected values in the last column are both less than 5 is not really a
concern, as they comprise no more than 20% of the total number of cells. (See Lecture Notes.)

2
4

= .01

13.277 48.416
Ismor Fischer, 4/22/2017 Solutions / 6.4-26

(b)
No Growth Growth Totals
301 413
Rogaine 714
(361.24) (352.76)
423 294
Placebo 717
(362.76) (354.24)
Totals 724 707 1431

With the expected values indicated in parentheses, the Chi-squared test statistic is given by

2 (301 361.24)2 (413 352.76)2 (423 362.76)2 (294 354.24)2


+ + + = 40.58,
361.24 352.76 362.76 354.24

on 1 degree of freedom. This value is much larger than 6.635 which, from the Chi-squared
table, cuts an area of 0.01 from the upper tail of that distribution having df = 1. Hence, the
p-value << .01. Therefore, we can again strongly reject the null hypothesis at the 1%
significance level. Interpretation: There is a statistically significant difference between the
two groups, providing very strong evidence for the effectiveness of Rogaine versus placebo.

(c) To use a Z-test on the null hypothesis of equality between the proportions of success (“growth”)
in the two populations (Rogaine vs. Placebo), we first calculate their sample estimates:
ˆGrowth|Rogaine = 413/714 = 0.5784, ˆGrowth|Placebo = 294/717 = 0.4100, and the pooled estimate
ˆ pooled = 707/1431 = 0.4941. The standard error estimate is calculated via 0.4941 (1 0.4941)
1 1 0.5784 0.4100
+ = 0.5 0.05287 = 0.02644. Thus, the p-value = 2 P Z
714 717 0.02644
= 2 P(Z 6.37) << .01, so we may once again reject the null hypothesis at the 1% significance
level. Notice that, to two decimal places, (6.37)2 = 40.58, emphasizing the connection between
the standard normal distribution and Chi-squared distribution with 1 degree of freedom.

2 N(0, 1)
1

/2 = .005 /2 = .005
0
= .01 p/2 << .005 p/2 << .005
p << .01

6.37 2.575 0 2.575 6.37

6.635 40.58
Ismor Fischer, 4/22/2017 Solutions / 6.4-27

(d)  R code
# 2 by 5 contingency table
none <- c(301, 423)
vellus <- c(172, 150)
min <- c(178, 114)
mod <- c(58, 29)
dense <- c(5, 1)
hair.growth <- matrix(c(none, vellus, min, mod, dense), nrow = 2,
ncol = 5, dimnames = list(Treatment = c(“Rogaine”, “Placebo”),
Growth = c(“None”, “Vellus”, “Minimal”, “Moderate”, “Dense”)))

# Output
hair.growth
chisq.test(hair.growth)

Growth
Treatment None Vellus Minimal Moderate Dense
Rogaine 301 172 178 58 5
Placebo 423 150 114 29 1

Pearson's Chi-squared test

data: hair.growth
X-squared = 48.4158, df = 4, p-value = 7.73e-10

# 2 by 2 contingency table
hair.growth <- matrix(c(none, vellus + min + mod + dense),
nrow = 2, ncol = 2, dimnames = list(Treatment = c(“Rogaine”,
“Placebo”), Growth = c(“No”, “Yes”)))

# Output
hair.growth
chisq.test(hair.growth, correct = FALSE)

Growth
Treatment No Yes
Rogaine 301 413
Placebo 423 294

Pearson's Chi-squared test

data: hair.growth
X-squared = 40.5816, df = 1, p-value = 1.886e-10
Ismor Fischer, 4/22/2017 Solutions / 6.4-28

21. ANOVA

SSErr (21 1) (0.246) + (16 1) (0.274) + (23 1) (0.248)


(c) swithin2 = MSErr = =
dfErr 21 + 16 + 23 3
14.486
= = 0.254 liters2
57

21 (2.63) + 16 (3.03) + 23 (2.88)


(b) x = = 2.83 liters
21 + 16 + 23

2 SSTrt 21 (2.63 2.83)2 + 16 (3.03 2.83)2 + 23 (2.88 2.83)2


sbetween = MSTrt = =
dfTrt 3 1
1.538
= = 0.769 liters2
2

F = MSTrt / MSErr
(c) ANOVA Table

Source df SS MS F-ratio p-value

Treatment 2 1.538 0.769

3.028 p > .05

Error 57 14.486 0.254

Total 59 16.024

The p-value corresponding to the upper tail area of the F2, 57-distribution to the right of 3.028 is
larger than .05 (extrapolating from F2, 60, whose area to the right of 3.15 is exactly equal to .05).
[The exact p-value is .056, via pf(3.028, df1 = 2, df2 = 57, lower.tail = F).]
Therefore, we cannot reject the null hypothesis at the = .05 level of significance.
Interpretation: There is no statistically significant difference between the mean baseline FEV1
amounts of the three groups of men, at the 5% level. (This is desirable to have in an experiment!)
Ismor Fischer, 4/22/2017 Solutions / 6.4-29

22.
n1 x1 n2 x2
(a) Combining the two samples yields the “grand mean” x . For the
n1 n2

SS
“grand variance” sTotal2, we have, by definition of any sample variance, s 2 , where
df
df = n – 1. That is, SS = (n – 1) s2, or in particular, SSTotal = (n1 + n2 – 1) sTotal2. But via
ANOVA, we know that SSTotal = SSTrt + SSErr, where SSErr = (n1 – 1) s12 + (n2 – 1) s22
n1 n2 2
and SSTrt = n1 ( x1 x ) 2 + n2 ( x2 x )2 = x1 x2 when simplified. Therefore,
n1 n2
putting everything together, we have

n1 n2 2
(n1 n2 1) sTotal 2 (n1 1) s12 (n2 1) s2 2 x1 x2 ,
n1 n2

from which sTotal may be easily obtained.

(b) In this case, we have n1 = 4000, x1 $30, s12 = 100, and n2 = 1000, x2 $0, s22 = 0
(no variance in the second sample, since all of its values are the same). Plugging these
values into the formulas above yields x $24 and sTotal2 = 224.025, or sTotal = $14.97.

23.
(a) First, it is useful to establish the following relations: a b R1
a + b = R1
c d R2
c + d = R2
a + c = C1 C1 C2 n
b + d = C2
R1C1 R1C2
a + b + c + d = R1 + R2 = C1 + C2 = n R1
n n
Now, with the table of observed values above, it R2C1 R2C2
R2
follows that the table of expected values is as shown n n
below it. Next consider the Chi-squared statistic C1 C2 n
2
(Obs Exp)
. The first term of this sum –
all cells Exp
2
R1C1
a
n (ad bc)2
corresponding to the first cell – is , which simplifies to , and
R1C1 n R1C1
n
2 (ad bc)2 1 1 1 1
similarly for the remaining cells. Thus, 1 .
n R1C1 R1C2 R2C1 R2C2
Ismor Fischer, 4/22/2017 Solutions / 6.4-30

R2C2 R2C1 R1C2 R1C1


The expression in brackets can be rewritten as , the numerator
R1R2C1C2
of which is simply ( R1 R2 )(C1 C2 ) = ( n )( n ) = n2. Hence, putting all of this together,
(ad bc)2 n2 n (ad bc)2
we have the Chi-squared score = 2
1 = . 
n R1R2C1C2 R1 R2C1C2

(b) Without loss of generality, we may assume that the first row of the observed table
a b
corresponds to the number of “Successes” in each column, so that ˆ1 and ˆ 2 .
C1 C2
a b aC2 bC1
Thus, the difference between these two point estimates is ˆ1 ˆ2
C1 C2 C1 C2
a (b d ) b (a c) ad bc a b R1
. The pooled sample proportion is ˆ p ,
C1 C2 C1 C2 C1 C2 n

R1 R 1 1
hence it follows that the standard error s.e. can be estimated by 1 1 =
n n C1 C2

R1 (n R1 ) C1 C2 R1 R2 n 1 R1R2
= = . Therefore, from these, we have
n C1C2 n C1C2 n C1C2
ad bc
ˆ ˆ2 C1 C2 n
z-score = 1 = = (ad bc) ,
s.e. 1 R1R2 R1R2C1 C2
n C1C2

the square of which is clearly the 2


1 score found in (a). 
Ismor Fischer, 4/22/2017 Solutions / 6.4-31

24. Below is a typical plot of normally-distributed random sample quantiles; note how the graph is
very linear, with most values concentrated symmetrically about 0 (which, recall, is the mean and
the median), and a smattering of outliers in both tails.

However, the following plots are representative of sample quantiles that are randomly generated
by a t1-distribution. Roughly half of the generated values are above 0, and half are below
(i.e., 0 is the median). But the high nonlinearity at both ends indicates very pronounced skew in
the left and right tails. That is, the presence of so many outliers in such heavy tails on both sides
causes the means x of multiple samples to “jump around” asymmetrically around 0, and not
converge to a single “expected value” µ. The moral: Beware of highly skewed data!

x 0
x 0

x 0
Ismor Fischer, 4/22/2017 Solutions / 6.4-32

25. The following are typical plots.

(a)

(b)

(c) The histogram of x clearly shows strong positive skew, which is also suggested by the
highly curved q-q plot. Taking the logarithm however, transforms the data into one that
appears more normally distributed. The histogram of y shows a closer resemblance to a
“typical” bell curve, and the q-q plot is much closer to a straight line. It’s not perfect of
course, and formal testing for normality can (and should) be done via any of the techniques
on the bottom of page 6.1-25, for example.

Você também pode gostar