Você está na página 1de 9

Statistics 24500, Winter 2006

Solutions to Homework 6

1. (Maximum likelihood in one-way ANOVA) (For notational simplicity, we assume J = Ji in this


solution, but all results also hold in the case of unequal Ji ; compare Rice p. 449 bottom.)

(a) The likelihood is:


I 
 J
2 1 2 2
lik(μ1 , . . . , μI , σ ) = √e−(Yij −μi ) /2σ
i=1 j=1
2πσ
⎛ ⎞
 J
I 
1
= (2π)−IJ/2 σ −IJ exp ⎝− 2 (Yij − μi )2 ⎠ .

i=1 j=1

The log-likelihood is:


I J
2 IJ 2 1 
l = log lik(μ1 , . . . , μI , σ ) = C − log(σ ) − 2 (Yij − μi )2 ,
2 2σ
i=1 j=1

where C is a constant. For each i = 1, . . . , I, we have


J
1 
l μi = 2 (Yij − μi )
σ
j=1
⎛ ⎞
J
1 ⎝
= 2 Yij − Jμi ⎠ .
σ
j=1

Also,
I J
IJ 1 
lσ 2 = − 2
+ (Yij − μi )2
2σ 2(σ 2 )2
i=1 j=1
⎛ ⎞
I  J
1 ⎝
= (Yij − μi )2 − IJσ 2 ⎠ .
2(σ 2 )2
i=1 j=1

To get the critical points we set all these partial derivative equal to 0 at the same time and solve
the resulting system of equations. For each i = 1, . . . , I we have
J
1
μ̂i = Yij = Ȳi· .
J
j1

Substituting these values into the last equation we obtain


I J
1 
σ̂ 2 = (Yij − μ̂i )2
IJ
i=1 j=1

Stat 24500: Winter 2006 Solutions to Homework #6 1


To verify that this unique critical point is indeed where the maximum value of l is achieved, we
can reason as follows. First notice that
I 
 J I 
 J
2
(Yij − μi ) = (Yij − μ̂i + μ̂i − μi )2
i=1 j=1 i=1 j=1
⎛ ⎞
I
 J J

= ⎝ (Yij − μ̂i )2 − 2(μ̂i − μi ) (Yij − μ̂i ) + J(μ̂i − μi )2 ⎠
i=1 j=1 j=1
⎛ ⎞
I
 J
= ⎝ (Yij − μ̂i )2 + J(μ̂i − μi )2 ⎠
i=1 j=1
I 
 J I

= (Yij − μ̂i )2 + J (μ̂i − μi )2 .
i=1 j=1 i=1

Therefore, using values of μ1 , . . . , μI other than μ̂1 , . . . , μ̂I can only decrease l. Having that, we
can consider l(μ̂1 , . . . , μI , ·) as a function only of σ 2 ; we can easily see that lσ2 (σ 2 ) is positive for
σ 2 < σ̂ 2 and negative for σ 2 > σ̂ 2 , so this critical point is actually the MLE.

(b) The (generalized) likelihood ratio statistic is given by:

max(μ1 ,...,μI ,σ2 )∈ω1 [lik(μ1 , . . . , μI , σ 2 )]


Λ=
max(μ1 ,...,μI ,σ2 )∈ω0 [lik(μ1 , . . . , μI , σ 2 )]

where ω1 = {(μ1 , . . . , μI , σ 2 ) : μi ∈ R, σ 2 > 0} and ω0 = {(μ1 , . . . , μI , σ 2 ) : μ1 = μ2 = · · · =


μI ∈ R, σ 2 > 0} By definition, the maximum in the numerator is achieved when the MLEs are
substituted for μ and σ 2 .

To obtain the denominator, we maximize the likelihood


 I J

(Y − μ)2
1 1 i=1 j=1 ij
lik(μ, σ 2 ) = exp − .
(2π)IJ/2 σ −IJ 2σ 2

Notice that this is exactly the same setup for finding the MLEs for i.i.d. normals, since all the
means are the same. We know what the MLEs are for this case (see the Notes on Normal MLEs
in the Chalk Class Documents):
I J
1 
μ̂0 = Yij = Ȳ··
IJ
i=1 j=1

and
I J
1 
σ̂02 = (Yij − Ȳ·· )2
IJ
i=1 j=1

Stat 24500: Winter 2006 Solutions to Homework #6 2


But notice that we can decompose σ̂02 as follows:
I J
1 
σ̂02 = (Yij − Ȳ·· )2
IJ
i=1 j=1
I J
1 
= (Yij − Ȳi· + Ȳi· − Ȳ·· )2
IJ
i=1 j=1
I J
1 
= (Yij − Ȳi· )2 − 2(Yij − Ȳi· )(Ȳi· − Ȳ·· ) + (Ȳi· − Ȳ·· )2
IJ
i=1 j=1
I J I
1  1
= (Yij − Ȳi· )2 + (Ȳi· − Ȳ·· )2
IJ I
i=1 j=1 i=1

Now we plug these values back into Λ. The result of plugging-in either of σ̂ 2 or σ̂02 into the
exponential part yields e−IJ/2 ; these parts cancel out, as do the constants (2π)−IJ/2 , leaving
2 −IJ/2  I 2

IJ/2
σ̂ i=1 (Yi· − Ȳ·· )
Λ= = 1 + J I J .
σ̂02 i=1 j=1 (Y ij − Ȳi· )2

The LRT rejects H0 if Λ is large; that is, if Λ > c for a cutoff point c ≥ 0 that satisfies
Prob[Λ > c | H0 ] ≤ α,
where α is the specified significance level. We have to show that this test is equivalent to an F -test.
Indeed, raising Λ to the power 2/IJ, subtracting 1, and multiplying by the appropriate ratio of
integers to obtain the correct degrees of freedom (all these operations preserve order), we obtain
that Λ > c is equivalent to
I 2
i=1 (Yi· − Y·· ) /(I − 1)
I J
> c
(Y − Y )2 /(I(J − 1)
i=1 j=1 ij i·

This is the F -test (see Rice page 448).

(c) Since this is just a reparameterization of the model in (a), in which


I
1
μ= μi ,
I
i=1

I

αi = μi − μi , for i = 1, . . . , I,
i=1
and σ remains unchanged, the MLEs are obtained by transforming the MLEs obtained in (a) in
the same way:
I I
1 1
μ̂ = μ̂i = Ȳi· = Ȳ·· ,
I I
i=1 i=1

Stat 24500: Winter 2006 Solutions to Homework #6 3


I

αi = μi − μi = Yi· − Ȳ·· , for i = 1, . . . , I,
i=1
and
I J
1 
σ̂ 2 = (Yij − Ȳi· )2 .
IJ
i=1 j=1

Finding these MLEs can also be attempted directly, but the computations involved range from
complicated to very complicated.

2. (Simultaneous confidence intervals for differences in means)

We restrict our data set to consider only labs 1, 4, 5, and 6; c.f. Example A of Section 12.2.2.1
in Rice (depending on your interpretation of the problem, you might also use the whole data set;
that can give you a different estimate of variability, in particular since you will have to use the t
distribution with 63 degrees of freedom instead of 36, as below, and the whole pooled standard
error).

The means are, then:

Lab Mean
1 4.062
4 3.920
5 3.957
6 3.955

Thus, the means are unchanged; however, the pooled standard error is different from the one in
the book (since we have fewer data): sp = 0.0498.

There are 3 comparisons to be made. Let’s say that we want the confidence level to be 95% for
all the intervals simultaneously. Following the Bonferroni method, we take α = 0.05/3 = 0.0167.
Then each interval is given by:

2
(Ȳi1 · − Ȳi2 · ) ± sp t36,0.025/3
10
The simultaneous confidence intervals are:
Comparison Lower Upper
μlab1 − μlab4 0.0861 0.1979
μlab1 − μlab5 0.0491 0.1609
μlab1 − μlab6 0.0511 0.1629

REMARK: If you interpreted the question as asking for CI’s for all the pairwise comparisons among
the for labs considered, then you will have 6 comparisons. That means that you should use t36,0.025/6
in order to get the desired significance level for all the comparisons simultaneously.

Stat 24500: Winter 2006 Solutions to Homework #6 4


3. (One-way and two-way ANOVA)

First, the p-value for Dosage associated with the two-way ANOVA table at the top of page 464 in
Rice is 7.9 × 10−9 .

To compute the one-way ANOVA table for Dosage, we pass the sums of squares corresponding to
Iron form and to the interaction into the error SS; we can do that because of the algebraic identity

SST OT = SSA + SSB + SSAB + SSE ,

which holds here because the design is balanced (all cells have the same number of observations).
The degrees of freedom for Dosage remain the same, but the error gains 3 degrees of freedom.
Using this information we can compute the mean sums of squares, the F statistics and the p-values
to complete the new table:

Analysis of Variance Table


Source df SS MS F p
Dosage 2 15.588 7.794 21.412 1.59 × 10−8
Error 105 38.18 0.364
Total 107 53.768

Notice that the new p-value is approximately twice as large as the p-value for the two-way ANOVA
(although it is still extremely small). This is due to the fact that, by disregarding Iron form and the
interaction term, we increased the error SS more than necessary to compensate for the increased
error degrees of freedom. In the two-way table, inferences about Dosage are adjusted for the effect
of the iron form, while the one-way table there is no such adjustment.

4. (Additive model without interaction)

The table below contains the original data and the marginal averages:

Ȳi·
4.4 4.3 4.0 3.6 3.3 3.92
4.1 4.1 4.0 4.0 3.9 4.02
3.2 3.9 3.8 4.1 4.4 3.88
3.9 4.0 3.8 4.1 4.4 4.04
Ȳ·j 3.9 4.08 3.9 3.95 4.0 3.97 = Ȳ··

Stat 24500: Winter 2006 Solutions to Homework #6 5


Now,

μ̂ = 3.97
α1 = 3.92 − 3.965 = −0.045
α2 = 4.02 − 3.965 = 0.055
α3 = 3.88 − 3.965 = −0.085
α4 = 4.04 − 3.965 = 0.075
β1 = 3.9 − 3.965 = −0.065
β2 = 4.075 − 3.965 = 0.11
β3 = 3.9 − 3.965 = −0.065
β4 = 3.95 − 3.965 = −0.015
β5 = 4.0 − 3.965 = 0.035

(b) The fitted values are:

3.855 4.030 3.855 3.905 3.955


3.955 4.130 3.955 4.005 4.055
3.815 3.990 3.815 3.865 3.915
3.975 4.150 3.975 4.025 4.075

And the residuals are:


0.545 0.270 0.145 -0.305 -0.655
0.145 -0.030 0.045 -0.005 -0.155
-0.615 -0.090 -0.015 0.235 0.485
-0.075 -0.150 -0.175 0.075 0.325

As you can observe, the residuals are, in general, larger by about one order of magnitude than the
estimated row and column effects. This indicates that those effects are not significant.

5. (Data Analysis: Two-way ANOVA)

We perform the analysis in R.

First, we enter the data in the form of columns of a data frame (this can be done by hand). Here
is a print-out of how the data is organized:

Stat 24500: Winter 2006 Solutions to Homework #6 6


time poison ttt time poison ttt
1 3.1 I A 25 4.3 I C
2 4.6 I A 26 6.3 I C
3 3.6 II A 27 4.4 II C
4 4.0 II A 28 3.1 II C
5 2.2 III A 29 2.3 III C
6 1.8 III A 30 2.4 III C
7 4.5 I A 31 4.5 I C
8 4.3 I A 32 7.6 I C
9 2.9 II A 33 3.5 II C
10 2.3 II A 34 4.0 II C
11 2.1 III A 35 2.5 III C
12 2.3 III A 36 2.2 III C
13 8.2 I B 37 4.5 I D
14 8.8 I B 38 6.6 I D
15 9.2 II B 39 5.6 II D
16 4.9 II B 40 7.1 II D
17 3.0 III B 41 3.0 III D
18 3.8 III B 42 3.1 III D
19 11.0 I B 43 7.1 I D
20 7.2 I B 44 6.2 I D
21 6.1 II B 45 10.0 II D
22 12.4 II B 46 3.8 II D
23 3.7 III B 47 3.6 III D
24 2.9 III B 48 3.3 III D
Make sure that the vectors poison and treatment are created as factors (if they are just character
vectors you will get an error when you apply aov).

a <- aov( time ~ poison + treatment + poison:treatment )

summary(a)

Df Sum Sq Mean Sq F value Pr(>F)


poison 2 103.043 51.521 23.5699 2.863e-07 ***
treatment 3 91.904 30.635 14.0146 3.277e-06 ***
poison:treatment 6 24.745 4.124 1.8867 0.1100
Residuals 36 78.692 2.186
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

(I am including the whole output here, but you are not required to, and in fact, are encouraged to
delete unnecessary information when you decide to include R output in your report).

Remark: The term poison:treatment in the model formula inside aov refers to the interaction
term.

As you can see from the output, both the poison and treatment effects appear to be significant, yet
the interaction term is not.

Stat 24500: Winter 2006 Solutions to Homework #6 7


Of course, inferences made using the F -test depend on how well the model assumptions hold; in
particular, we should routinely check the residuals for evidence of failure of the independence, nor-
mality, or constant variance assumptions. Figure 1 contains a normal probability plot, to check for
the normality assumption about the residuals, and a residuals-vs.-fitted values plot. The probabil-
ity plot reveals that several points are either outliers, or deviate from the normality assumption (in
the form of tails heavier than normal). The residuals-vs.-fitted values plot shows a clear pattern
that indicates that the variance of the error term is not constant, but is rather approximately
proportional to the magnitude of the fitted values.

The observations in the previous paragraph cast some doubts on the computed p-values. However,
the p-values are so small that the conclusions about significant effects might still hold. Furthermore,
moderate violations of the normality assumption or (especially in the case when the number of
observations in each group is the same) the constant variance assumption, do not affect the F test
strongly (see Rice, page 450).

Normal Q−Q Plot

● ●
4

4
● ●

● ●
2

● ●
Sample Quantiles

● ● ● ●
Residuals

● ●
●● ● ●
●● ● ● ●● ●
● ● ●
●●●● ● ●
● ●
●●● ● ● ●
●●●● ● ● ● ●
●●● ● ●
0


●● ● ●
●● ●● ● ●
●●● ●
● ●
●● ● ●

● ●
●● ● ●
● ●
● ●
● ● ● ●
−2

−2

● ●

● ●

● ●

−2 −1 0 1 2 2 3 4 5 6 7 8 9

Theoretical Quantiles Fitted values

Figure 1: Diagnostic plots for the analysis of variance in Problem 5, using the original time data
(not transformed): on the left, a normal probability plot of the residuals; on the right, the residuals
plotted vs. the fitted values.

However, if we use the logarithm to transform the time data, we obtain similar results from the anal-
ysis of variance (regarding the significance of the different terms, not the values of the coefficients),
and we obtain better (although not perfect) behavior from the residuals:

logtime <- log(time)

b <- aov( logtime ~ poison + treatment + poison:treatment )

Stat 24500: Winter 2006 Solutions to Homework #6 8


summary(b)

Df Sum Sq Mean Sq F value Pr(>F)


poison 2 5.2322 2.6161 48.8606 5.519e-11 ***
ttt 3 3.5511 1.1837 22.1077 2.722e-08 ***
poison:ttt 6 0.3919 0.0653 1.2198 0.319
Residuals 36 1.9275 0.0535

Normal Q−Q Plot Residuals vs. Fitted values

● ● ● ●
0.4

0.4
● ●

●● ● ●
0.2

0.2
Sample Quantiles

● ●
●● ● ●
● ● ●
●●●● ● ● ●
●●●●● ●● ●

Residuals
● ●
● ●
●●● ● ● ●
●● ● ●
0.0

0.0
●●● ● ● ●
● ●
●●● ● ● ●
●●● ● ●●
●● ● ●
●● ● ●
−0.2

−0.2
● ●
●● ● ●
● ●
● ●
●● ● ●
● ●
−0.4

−0.4

● ●
● ●

−2 −1 0 1 2 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2

Theoretical Quantiles Fitted values

Figure 2: Diagnostic plots for the analysis of variance in Problem 5, using the log-transformed time
data: on the left, a normal probability plot of the residuals; on the right, the residuals plotted
vs. the fitted values.

Appendix to Problem 6: The R instructions used to create Figue 1 (you do not have to include
this sort of information in your reports).

postscript("hw6_sol_plots.eps")
par(mfrow=c(1,2)) # splitting the plotting area in two
qqnorm(a$res)
plot(a$fit,a$res,xlab="Fitted values",ylab="Residuals")
dev.off() # saving the file with the picture

Stat 24500: Winter 2006 Solutions to Homework #6 9

Você também pode gostar