Você está na página 1de 23

Stat 544, Lecture 23

'

Linear and Nonlinear Mixed-Effect Models


In the last lecture, we looked at the theory and practice of modeling longitudinal data using generalized estimating equations (GEE). GEE methods are semiparametric because they do not rely on a fully specied probability model. With GEE, the estimates are ecient if the working covariance assumptions are correct. If the working covariance assumptions are wrong, the estimated coecients are still approximately unbiased, and SEs from the sandwich (empirical) method are reasonable if the sample is large. The philosophy of GEE is to treat the covariance structure as a nuisance. An alternative to GEE is the class of nonlinear mixed-eect (NLME) models. These are fully parametric and model the within-subject covariance structure more explicitly. NLMEs are best understood as an extension of linear mixed-eect (LME) models.

&

Stat 544, Lecture 23

'
Linear mixed-eect models. Also known as multilevel models linear mixed-eects models random-eects models random-coecient models hierarchical linear models Implemented in HLM, SAS PROC MIXED, S-PLUS/R (the function lme) and Stata. Adopting the now-standard notation of Laird and Ware (1982), the model is yi = Xi + Zi bi + i , where yi bi
i

i = 1, . . . , m

= = = = =

(yi1 , yi2 , . . . , yi,ni )T Nq (0, ) Nni (0, 2 Vi ) xed eects random eects for unit i between-unit covariance matrix within-unit covariance matrix

bi 2 Vi

&

Stat 544, Lecture 23

'
Handles unequal ni s, time-varying covariates, unequally spaced responses Often we use Vi = I , but other structurese.g., autoregressiveare useful, especially when ni s are large measurement times are often incorporated into Xi , Zi as polynomials Zi contains a subset of the columns of Xi Averaging over the distribution of the latent random eects bi , the marginal (population-averaged) distribution of yi is yi i = N ( Xi , i )
T Zi Zi + 2 Vi .

If we take Zi = (1, 1, . . . , 1)T (random intercepts) and Vi = I , then has compound symmetry. The elements of represent the eects of the covariates in Xi on the mean response, both for a single subject and on average for the population. ML estimates of and the covariance parameters are

&

Stat 544, Lecture 23

'

obtained by maximizing the likelihood function Y ` 2 N 1 2 | Wi | 2 L ( i ) X 1 exp 2 (yi Xi )T Wi (yi Xi ) , 2 i


T + Vi )1 . Given the where Wi = ( 2 Zi Zi covariance parameters, L is maximized at the GLS estimate ! ! 1 m m X X T T = Xi Wi Xi Xi Wi yi i=1 i=1

Maximizing a modied version of L (a version with removed) produces restricted maximum likelihood (RML) estimates for the covariance parameters. In large samples, RML and ML are essentially identical. In moderate-sized samples, RML may be somewhat better. A great reference on tting linear mixed models with PROC MIXED is the new text by Fitzmaurice, Laird and Ware (2004).

&

Stat 544, Lecture 23

'
Example. Recall the data from the psychiatric trial in the last lecture.

Plot of average responses in the placebo and drug groups versus square-root of week:

&

The important covariates are group (0=placebo, 1=drug) and week. Notice that group is a characteristic of the subject and does not change over time, whereas the value of week for any given subject does change over time. Any covariate whose value changes over time for a subject is called

Stat 544, Lecture 23

'
time-varying. The most common example of a time-varying covariate is time! Suppose we t a model in which the columns of Xi are a constant, group, week, and group week This will allow for a dierent average slope and intercept in each of the two groups. Moreover, if the columns of Zi are a constant and week, then the slope and intercept for any given subject will randomly vary about the group average. Under this model, = (0 , 1 , 2 , 3 )T 0 1 2 = = = avg. response at week 0 for placebo avg. at week 0 for drug avg. at week 0 for placebo avg. slope for placebo

&

Stat 544, Lecture 23

'
3 = avg. slope for drug avg. slope for placebo and bi = (bi0 , bi1 ) bi0 bi1 = = predicted response at week 0 for subject i avg. at week 0 for subject is group slope for subject i avg. slope for subject is group SAS code for tting this model is shown below.
options nocenter nodate nonumber linesize=72; data schiz; infile "d:\jls\stat504\lectures\schiz.dat"; input id drug week y; sqrtweek = sqrt(week); run; proc mixed data=schiz method=ml; class id; model y = drug sqrtweek drug*sqrtweek / solution; random intercept sqrtweek / sub=id type=un; run;

The solution option in the model statement tells SAS to print out a table of estimates and SEs for . The random statement tells SAS which random eects to include. Notice that an intercept must be explicitly declared. The option type=un tells SAS to allow the -matrix to be unstructured (what we usually want).

&

Stat 544, Lecture 23

'

The Mixed Procedure Covariance Parameter Estimates Cov Parm UN(1,1) UN(2,1) UN(2,2) Residual Subject id id id Estimate 0.3468 0.03517 0.2293 0.5916

Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 4352.4 4368.4 4368.5 4400.6

Null Model Likelihood Ratio Test DF 3 Chi-Square 493.44 Pr > ChiSq <.0001

Solution for Fixed Effects Standard Error 0.08999 0.1034 0.07072 0.08044

Effect Intercept drug sqrtweek drug*sqrtweek

Estimate 5.3617 0.04561 -0.3232 -0.6493

DF 411 677 408 677

t Value 59.58 0.44 -4.57 -8.07

Pr > |t| <.0001 0.6592 <.0001 <.0001

&

Stat 544, Lecture 23

'
The estimated covariance parameters are 2 3 0.3468 0.0352 = 4 5 0.0352 0.2293 and 2 = 0.5916. The estimated coecients are interpreted as population-average eects. The coecient of primary interest, drug*sqrtweek, is negative and highly signicant. Comparing the results for to those from GEE in the last lecture, we see that they are quite similar. Do we really need random slopes? No random slopes is equivalent to setting 2 3 2 3 0 00 01 5 = 4 00 5, = 4 10 11 0 0 which has two fewer parameters than the current model. Fitting the reduced model produces 2loglik = 4520.3, and 2 [loglik dierence] = 167.9.

&

For technical reasons, this is not a standard

Stat 544, Lecture 23

10

'
likelihood-ratio test, but we can get an approximate p-value by doubling the tail area from 2 2, 1 P (2 2 167.9) 0 2 (Stram and Lee, 1994; Olsen and Schafer, 2001). This is strong evidence that subjects slopes vary. Nonlinear mixed-eect models. What can we do for longitudinal responses that are clearly non-normal and discrete? The natural extension is to allow the responses to come from an exponential-family distribution, and to connect the mean of this distribution to subject is linear predictor Xi + Zbi by a link function. The resulting class is called: nonlinear mixed-eects models generalized linear mixed models generalized linear models with random eects Like GLIMs, these models are regarded as linear or nonlinear, depending on your point of view. Lets look at the application of NLMEs to binary responses. The response is yi = (yi1 , yi2 , . . . , yi,ni )T ,

&

Stat 544, Lecture 23

11

'
where yij = ij i = = = bi 1 if subject i displayed the trait of interest at occasion j , 0 otherwise Bernoulli(ij ) logit(ij ) = log[ ij / (1 ij ) ] (i1 , i2 , . . . , i,ni )T Xi + Zi bi N (0, )

With a binary-response model, its dicult to have any random eects beyond a random intercept Zi = (1, 1, . . . , 1)T if lots of subjects have yi = (0, 0, . . . , 0)T or (1, 1, . . . , 1)T , because slopes for these subjects are poorly estimated. The likelihood function YZ P (yi |bi ) P (bi ) dbi L =
i

cannot be computed exactly; it can only be approximated by numerical techniques such as Gauss-Hermite quadrature. The algorithms for model

&

Stat 544, Lecture 23

12

'
tting are considerably more complicated than for the normal linear mixed model. Early programs for NLMEs used penalized quasilikelihood (PQL) (Breslow and Clayton, 1993). Unfortunately, PQL estimates can be badly biased for the binary model, so PQL is no longer recommended. More recent programs use true ML, which is much better. ML for the binary NLME has been implemented in HLM MIXOR (designed for ordinal responses, but also works for binary) MLWin PROC NLMIXED Bayesian inference is also possible by Markov chain Monte Carlo (MCMC) in MLWin and WinBUGS. Some additional issues to consider about this model: 1. Elements of estimate subject-specic (SS) but not population-average (PA), eects. SS and PA have dierent interpretations and are appropriate in dierent circumstances (Zeger, Liang and

&

Stat 544, Lecture 23

13

'

Albert, 1988). The dierence between SS and PA grows as between-subjects variation increases. 2. Assuming an underlying normal distribution for the random eects is often unrealistic; there may be a large group of yi = (0, 0, . . . , 0)T subjects for which ij 0 at all occasions (Carlin, Wolfe, Brown and Gelman, 2001) Example. Here is a hypothetical dataset containing 5 repeated binary measurements for a sample of 25 subjects.

&

Stat 544, Lecture 23

14

'

Week Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 1 1 0 0 1 0 0 1 1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 1 1 2 0 1 0 0 0 1 0 1 1 0 1 0 1 0 1 1 0 0 1 0 1 1 0 0 0 3 1 1 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 0 0 0 4 1 1 0 0 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 0 1

&

Stat 544, Lecture 23

15

'
Plot of the sample proportion of successes by week:

I created these data by supposing that the response for subject i at week j is Bernoulli with ij log = i + weekj 1 ij where = .5 and i N (1, 2). That is, each subjects logit-probabilities of success follow a linear trajectory with a random intercept and xed slope of 0.5; the subjects intercepts are randomly distributed about 1.0 with variance 2.0. In the notation of a nonlinear mixed-eects model,

&

Stat 544, Lecture 23

16

'
the response vector for subject i, yi = (yi1 , yi2 , yi3 , yi4 , yi5 )T has mean i = (i1 , i2 , i3 , i4 , i5 )T (in this case, the ij s are actually ij s). The logit of this vector is i = logit(i ) = Xi + Zi bi , where 2 3 2 3

1 6 6 1 6 6 Xi = 6 6 1 6 6 1 4 1 and

7 1 7 7 7 2 7 7, 7 3 7 5 4

1 7 6 6 1 7 7 6 7 6 6 Zi = 6 1 7 7, 7 6 6 1 7 5 4 1 bi N (0, 2).

2 = 4

1.0 0.5

3 5,

The interpretation of the elements of is a bit subtle. The intercept 0 is an average log-odds of success at week 0 (average means averaged across subjects). But exp(0 ) is not the average odds of success at

&

Stat 544, Lecture 23

17

'
week 0 in the population, because the average of the log is not the log of the average. As long as the coecients vary from one subject to another, the average of the log wont be the log of the average. For this reason, the elements of are subject-specic (SS) rather than population-average (PA) eects. Lets t this model in PROC NLMIXED. First, we arrange the data in a le called fake.dat.
1 0 1 1 1 1 1 2 0 1 3 1 1 4 1 2 0 1 2 1 1 2 2 1 2 3 1 2 4 1 -lines omitted 25 0 0 25 1 1 25 2 0 25 3 0 25 4 1

Heres the SAS code.


options nocenter nodate nonumber linesize=72; data fake; infile "d:\jls\stat504\lectures\fake.dat"; input id week y; run; proc nlmixed data=fake;

&

Stat 544, Lecture 23

18

'
parms beta0=-1.0 beta1=0.5 varb0=2.0; eta = beta0 + beta1*week + b0; mu = exp(eta) / ( 1 + exp(eta) ); model y ~ binary(mu); random b0 ~ normal(0,varb0) subject=id; run;

The params statement gives initial values for the parameters to start the interative tting procedure. NLMIXED requires good starting values! If you dont give a params statement, all the starting values are set to 1.0, which can be very bad. In this example, I used the true values for the parameters as the starting values. The next four lines after params dene the model. The types of models that can be t include normal (better to use PROC MIXED if you have an identity link), binary, binomial, Poisson, gamma, and negative binomial. In principle, its possible to include random slopes as well as random intercepts in any of these models, but (particularly with binary data) the data often provide little information to estimate random slopes.
The NLMIXED Procedure Specifications Data Set Dependent Variable WORK.FAKE y

&

Stat 544, Lecture 23

19

'
Distribution for Dependent Variable Random Effects Distribution for Random Effects Subject Variable Optimization Technique Integration Method Binary b0 Normal id Dual Quasi-Newton Adaptive Gaussian Quadrature

Dimensions Observations Used Observations Not Used Total Observations Subjects Max Obs Per Subject Parameters Quadrature Points 125 0 125 25 5 3 5

Parameters beta0 1 beta1 -0.5 varb0 2 NegLogLike 98.484698

Iteration History Iter 1 2 3 4 5 6 7 8 9 Calls 2 4 5 6 8 9 11 13 15 NegLogLike 83.6783055 76.6855886 76.1614056 76.127471 76.1239194 76.1206696 76.1203964 76.1203953 76.1203953 Diff 14.80639 6.992717 0.524183 0.033935 0.003552 0.00325 0.000273 1.089E-6 5.15E-10 MaxGrad 10.42582 8.149659 1.917327 0.095085 0.1152 0.030752 0.002024 0.00009 4.273E-6 Slope -244.053 -7.83795 -0.84272 -0.06761 -0.00098 -0.00304 -0.00057 -2.17E-6 -1.1E-9

NOTE: GCONV convergence criterion satisfied.

&

Stat 544, Lecture 23

20

'
Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 152.2 158.2 158.4 161.9

Parameter Estimates Standard Error 0.4762 0.1670 1.2021

Parameter beta0 beta1 varb0

Estimate -0.7030 0.5047 1.9875

DF 24 24 24

t Value -1.48 3.02 1.65

Pr > |t| 0.1529 0.0059 0.1113

Alpha 0.05 0.05 0.05

Lower -1.6858 0.1601 -0.4936

Parameter Estimates Parameter beta0 beta1 varb0 Upper 0.2798 0.8493 4.4686 Gradient 3.255E-6 -4.27E-6 9.529E-7

The likelihood function, which cannot be computed exactly, is being approximated by ve-point adaptive Gauusian quadrature. The derivatives (gradient) of the log of this approximate likelihood are computed and the algorithm tries to nd the location where the gradient vector is zero. In practice, the ve-point quadrature might not be accurate enough; you can increase the number of quadrature points to get a better approximation, but the ttinmg procedure takes longer. If you increase it

&

Stat 544, Lecture 23

21

'
to 6, 7, 8, etc. and the results dont change much, then you have evidence that the approximation is working well. In this example, the parameter estimates are reasonably close to the true population parameter values. Are random intercepts needed? Notice that the parameter varb0 has a standard error and an approximate 95% condence interval based on this SE goes into the negative range, which makes no sense. We cannot use a condence interval or the Wald method to test the null hypothesis of no random intercepts, because the null value for this parameter lies on the boundary of the parameter space. A reasonable way to test this hypothesis is to t the model with and without the random statement, compare the dierence in 2 loglikelihood to chisquare, and then cut the p-value in half.
proc nlmixed data=fake; parms beta0=1.0 beta1=-0.5; eta = beta0 + beta1*week; mu = exp(eta) / ( 1 + exp(eta) ); model y ~ binary(mu); run;

&

Stat 544, Lecture 23

22

'
The result is
Fit Statistics -2 Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better) 164.0 168.0 168.1 173.6

The 2 loglik function dropped by 164.0 152.2 = 11.8, so the approximate p-value is 1 P (2 1 11.8) 0. 2 Lets also t a binary logit model using GEE and see what we get. The GEE version is a marginal model; it says that logiti = Xi and treats the within-subject covariances as a nuisance.
data fake; infile "d:\jls\stat504\lectures\fake.dat"; input id week y; run; proc genmod data=fake descending; class id; model y = week / dist=binom link=logit; repeated subject=id / type=exch; run;

&

Stat 544, Lecture 23

23

'
Notice the use of the descending option, which tells SAS to express the model in terms of the logit of P (yij = 1) rather than P (yij = 0). I used an exchangeable structure, which loosely corresponds to a model of random intercepts.
GEE Model Information Correlation Structure Subject Effect Number of Clusters Correlation Matrix Dimension Maximum Cluster Size Minimum Cluster Size Exchangeable id (25 levels) 25 5 5 5

Algorithm converged.

Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard Error 0.3175 0.1003 95% Confidence Limits -1.1428 0.1728 0.1016 0.5661

Parameter Estimate Intercept week -0.5206 0.3694

Z Pr > |Z| -1.64 3.68 0.1010 0.0002

Notice that the estimated slope and intercept are smaller in magnitude than those from the nonlinear mised-eects model. These are population-average (PA) rather than subject-specic (SS) eects. In this case, exp(0.5206) can be interpreted as the overall odds of success at week 0 in the population.

&

Você também pode gostar