Escolar Documentos
Profissional Documentos
Cultura Documentos
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
Biometrika (1987), 74, 2, pp. 221-32
Printed in Great Britain
AND D. PREGIBON
Statistics and Data Analysis Research Department, AT&T Bell Laboratories, Murray Hill,
New Jersey 07974, U.S.A.
SUMMARY
Wedderburn's original definition of quasi-likelihood for generalized linear models is
extended to allow the comparison of variance functions as well as those of linear predictors
and link functions. The relationship between generalized linear models and the use of
transformations of the response variable is explored, and the ideas are illustrated by
three examples.
Some key words: Data transformation; Exponential family; Saddlepoint approximation; Second-moment
assumptions; Variance function.
1. INTRODUCTION
Wedderburn's (1974) introduction of quasi-likelihood greatly widened the scope of
generalized linear models by allowing the full distributional assumption about the random
component in the model to be replaced by a much weaker assumption in which only the
first and second moments were defined. In making this extension Wedderburn widened
the scope of generalized linear models in a way very similar to that of Gauss when he
replaced the assumption of normality in classical linear models by that of equal variance.
For generalized linear models with distributions in the exponential family, likelihood
ratio and score tests are used for testing hypotheses concerning nested subsets of covariates
in the linear predictor and for assessing hypothesized link functions. These methods are
also applicable with Wedderburn's form of quasi-likelihood. However neither of these
methods is suitable for the comparison of different variance functions. In this paper we
introduce an extended quasi-likelihood function which allows for the comparison of
various forms of all the components of a generalized linear model, i.e. the linear predictor,
the link function, and the variance function. We then apply the ideas to the analysis of
several sets of data.
2. QUASI-LIKELIHOOD
2 1 General
In this section we introduce the original and extended quasi-likelihood functions, and
describe their properties and limitations. More comprehensive treatments of some of the
material is given by Wedderburn (1974), McCullagh (1983) and McCullagh & Nelder
(1983, Ch. 8).
222 J. A. NELDERAND D. PREGIBON
2-2. Wedderburn'soriginalform
Wedderburn (1974) defined the quasi-likelihood, strictly the quasi-log-likelihood, Q
for an observation y with mean , and variance V(,) by the equation
Q(y;,u) {) du (1)
Now consider n independent observations (yi, xi; i = 1, . .., n), where yi is the ith
response variable with mean ,i and variance V(Ai) and xi is an associated vector of q
covariates. We suppose that the relationship between ,i and the covariate vector xi is
given by the equation
k! (2 wk)2k e . (8)
Though the form (7) may look unfamiliar, exp (Q+) is in fact the unnormalized
saddlepoint approximation for exponential families discussed by Barndorff-Nielsen &
Cox (1979). However these authors discuss its use in connection with the asymptotic
distribution of maximum likelihood estimates, whereas we apply it to single observations
with generalized linear model structure.
A distribution can be formed from an extended quasi-likelihood by normalizing
exp (Q+) with a suitable factor to make the sum or integral equal to unity. While the
normalizing factor is usually a function of ji, 4 and 0, see ? 3 2, it can happen that its
value varies only slightly for large changes in certain of these parameters. Then it can
be argued that the normalizing factor contains almost no information about these
particular parameters, so that little is lost in optimizing of the unnormalized extended
quasi-likelihood for estimation purposes. Our use of the unnormalized extended quasi-
likelihood function therefore has a certain 'partial likelihood' flavour, though we have
not explored the connection in depth.
224 J. A. NELDER AND D. PREGIBON
3. NONLINEAR PARAMETERS AND THE EXTENDED QUASI-LIKELIHOOD FUNCTION
3- 1. General
By 'nonlinear' parameters we mean those parameters not in the linear predictor. In
this section we illustrate how the extended quasi-likelihood function can be used to
estimate nonlinear parameters affecting the variability of the response. See Jorgensen
(1987) and Efron (1986) for related work in this area.
Most common values of 0 are the values 0, 1, 2, 3 which correspond to variance functions
associated with normal, Poisson, gamma, and inverse Gaussian distributions respectively.
Tweedie (1981) discusses distributions with this variance function, and shows that an
exponential family exists for 0 = 0, and 0 ? 1. See also Jorgensen (1987).
For the variance function family (10), the deviance function is
2{y log (y/,u) - (y -,u) (O = 1)3,
2{y/,ut -log (y/u) - 1} (0 = 2),
Do( ) =2{@ - (2 - H)y,u'-@+ ( I _- aotherwise.
I. ~~(1-0)(2 -0)
For fixed 0, the maximum quasi-likelihood estimate of ,3 is easily obtained using standard
techniques, such as those provided by GLIM.
is greater than some pre-assigned value d is the profile quasi-likelihood interval. Thus d
may be defined by some percentage point of a 2X2distribution.
When Q+ corresponds to an exponential family distribution, asymptotic justification
of profile quasi-likelihood intervals is available (Jorgensen, 1983; Efron, 1986). Given
assumptions about the first two moments only of the error distribution, we recognize that
properties of such intervals remain to be justified. We have attempted a justification for
particular data sets, e.g. Example 1 in ? 6, by use of resampling schemes to derive
'confidence' intervals for these parameters. The general method is as follows:
(i) compute the standardized Pearson residuals ei=(yi- i)/{4)VO2u)}i, where ,u
and 4)are evaluatedat 0;
226 J. A. NELDERAND D. PREGIBON
(ii) resample the ei to form Ei, and construct the pseudo-response variable y*=
A + A
(iii) using y* in place of y, find the maximum quasi-likelihood estimate of 0;
(iv) repeat (ii) and (iii) M times to generate a resampling distribution for 0.
Possible resampling schemes as stage (ii) include (a) sampling ei without replacement,
(b) sampling ei with replacement (Efron, 1979), (c) random sampling from f(e; 4) where
f is a parametric family of distributions, with e being possibly estimated from the data,
and (d) random sampling from f(e), where f is a nonparametric density estimate of e.
Given the resampling distribution of 0, there are also several possible ways of forming
an interval for 0. We adopt the 'quantile' method whereby the quantiles of the distribution
of Om - 0 are used to estimate the quantiles of the distribution of 0- 0. Let aL and au
denote the lower and upper symmetric percentage points of the resampling distribution
of Om _0. The intervalestimate(OL, OR) of 0 is (0-au, 0-aL).
For the discrete distributions mentioned above, the use of the modified Stirling's
approximation (12) yields the results displayed in Table 1, with V(y; c) replacing V(y)
in (7). Note that the maximum quasi-likelihood estimates of 13and 4 are unchanged if
V(y) is replaced by V(y; c); however the use of V(y; c) allows Q+ to be defined for
all sample sets and will be important if V(.) itself contains unknown parameters.
f( ) V~'(yexp ( ) du.
_ quasi-likelihood and data-
Since to first order, E{g(y)} g(A) and var (y) /{g'(,)}2,
transformationmodels are first-orderequivalent.The term V2(y) in the quasi-likelihood
formulation is a Jacobian-likeadjustmentfor scale changes. The major conceptual
differencebetween the models is that squared-losson the transformedscale is replaced
by an appropriatelychosen deviancefunction.In practicethis differenceis often unimpor-
tantunlessa grosslyinadequatedata-transformation, or quasi-likelihoodmodel,is chosen.
An interestingfeature of the subset of quasi-likelihoodmodels which intersectdata-
transformationmodels is that the estimatedasymptoticcovariancematrixof ,Bis propor-
tional to (XTX)-l. This has implicationsfor both fitting,wherethe Hessian need not be
updatedat each iteration,and for inferences,wherein balanceddesigns,the components
of deviance are asymptoticallyindependent.
Response-variabletransformationhas several disadvantagescomparedto the method
of quasi-likelihood:
(i) range restrictionson g(y) following from y, including discreteness,technically
imply that a normalapproximationfor g(y) is not appropriate;
(ii) the assessmentof the variancefunction of the responsebased on decomposing
the normallog likelihood requiresreplicateobservations;and
(iii) a single common scale for linearityand homogeneityof varianceis required.
While the firstof these is not importantin practice,the other two are. The fact that the
extendedquasi-likelihoodmodel allowsvariance-functionassessmentwithoutreplication
is especially attractive.The third point is also importantthough cases do occur when a
single transformationachieves both linearityand homogeneitv.However,note that the
quasi-likelihoodmodel allows one to assess whether a common transformationdoes
achieve these dual aims. For the family of power transformations,this is obtained by
0 and testing the
considering the general quasi-likelihood model =x/,
=1 V0(A)=A
hypothesis H: 0 = 2(1 - f,). An example is given below.
Ultimately,the determinationof which loss function is appropriatefor estimationof
13 depends on the purpose of the analysis and the type of response variable under
consideration.Box & Cox (1964) distinguishbetween extensive and nonextensivevari-
ables. The formerclass has the propertyof physical additivity,where response-variable
transformationwould not be appropriate.Nonextensivevariablescan usefullybe model-
led on any scale unless there is some prior preferencefor a particularone. Thus, for
nonextensive variables, the choice between data-transformationand quasi-likelihood
modelsis seldommadeon statisticalgroundsbutratheron considerationof subjectmatter.
228 J. A. NELDER AND D. PREGIBON
6. EXAMPLES
Several examples are presented in this section to illustrate the flexibility of the extended
quasi-likelihood model in statistical analysis. The presentation is necessarily incomplete,
as features not relevant to the present discussion are ignored. In practice, one would
supplement the analyses with further model checking, including residual and other
diagnostic plots.
Example 1: Textile data. This example concerns the behaviour of worsted yarn under
cycles of repeated loading (Box & Cox, 1964). The response variable y is the number of
cycles to failure, resulting from a single replicate of a 33 experiment with factors xl:
length of test specimen, x2: amplitude of loading cycle, and x3: load. Box & Cox
recommend a log transform of the response variable, both to enhance additivity of effects
and increase sensitivity of the analysis. Their methods for examining the question of
variance homogeneity are not applicable since there is no replication at each design
point. We now consider how quasi-likelihood methods can help.
Consider the quasi-likelihood model
log(,u)=x181+x2132+x3133, V1(1)= 1i
The plot of the profile quasi-likelihood function Q+(y; ,u) is displayed in Fig. 1. A 95%
likelihood-type interval for 6 is (1-75, 3 35). In contrast, a 95% bootstrap-type interval
for 0 is (1.55, 3 42), this being based on 100 samples generating according to the
prescription in ? 3 4. The two intervals are in general agreement though the latter is more
-122
-124 - -- - - - -- - - - - - - - - - - - -- - - - -
-126
-128 -
-130 II1
10 1i5 2-0 25 3-0 3-5 4-0
0
Fig. 1. Profile quasi-likelihood function of the variance-function parameter 0 for Example 1. The maximizing
value is 6 = 2-5. Dashed line indicates a 95% likelihood-type interval for 0; shaded bar, a 95% bootstrap-type
interval for 6. A variance function with power 6 between 1 7 and 3 4 is supported by the data.
An extended quasi-likelihoodfunction 229
conservative,i.e. wider. Both intervalsindicate that 0 = 2 is plausiblegiven the data. As
var (y) O2 4 implies var{log(y)} const, the log transformationof the data is justified.
The decision whetherto reportconclusions using a log transformationof the data or
a generalizedlinear model with log-link and squaredvariancefunction seems to matter
little. Table 2 gives the estimatedregressioncoefficientsand their standarderrorsfor
both models, while Table3 displaysthe standardanalysisof varianceand deviancetables
for the.33 design. As suggestedby our discussion in ? 5, they are remarkablysimilar.In
either case confidence intervals for functions of the parameterestimates would be
calculatedon the log scale, with back transformationto the originalscale if needed.
QPL(O) \
-0-5
-1.0
-1-5 _"
. . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 0. . . . . . . .. . . . . . . . . . ......... -------------------~;;f
-2.
-20 - II
The use of the extended quasi-likelihood function provides a means of estimating jointly
the parameters wijand 0.
Consider the extended quasi-likelihood function Q+ defined by
logit (y a
=+f3= +
y,i+yj, = 1+ O(nij- 1), var (yij=) =ijV(),
where V(Qij) is the usual binomial variance function. Figure 2 is a plot of the profile
quasi-likelihood function of 0 obtained by maximizing Q+(y; A) over ,t holding 0 fixed.
The maximum quasi-likelihood estimate of 0 is 0-0129. This is not significantly different
than zero as the significance level based on a X' approximation to the quasi-likelihood
ratio statistic is 0-12.
For comparative purposes the profile likelihood function of 0 assuming a beta-binomial
distribution for y is also plotted in Fig. 2. The shapes of the two functions are qualitatively
similar, both attaining their maximum near 0-012. For small values of 0 less than 0-012
the two functions are nearly identical. Hence both specifications indicate that 0 is not
significantly different than zero. Evidently Pearson's X2 is picking up departures from
the model of a form different than that suggested by these models of extra-binomial
variation.
The 95% likelihood interval for 0 under the beta-binomial specification is shorter than
that based on the quasi-likelihood model. This is intuitively reasonable since stronger
assumptions should lead to tighter intervals.
Example 3: Insurance claims. McCullagh & Nelder (1983, Ch. 7) cite an example,
given by L. A. Baxter, S. M. Coutts and G. A. F. Ross in unpublished conference
proceedings, concerning damage claims to privately owned and comprehensively insured
An extended quasi-likelihoodfunction 231
410
010 -5 -tD 0
0
-0-52
-1.0 -
-1-5-
-2.0
1.0 1-5 20 25 3-0
Fig. 3. Profile quasi-likelihood function of the variance and link function parameters (6, ef) for Example 3.
The maximizing value is (6, ei) = (2-4, -0-75). The solid contours are labelled in powers of 2k for k = -1(1)4.
Dashed contour, a 95% likelihood-type interval for plausible values of (6, el); [1, McCullagh & Nelder's
(1983) model; dotted line, intersection of data transformation and quasi-likelihood models. A power
transformation, y , with ti between -2 and 0 is supported by the data.
cars. The data consist of the average claims y for each combination of policy holder's
age PA, with 8 levels, car group CG, with 4 levels, and vehicle age VA, with 4 levels. The
number of claims on which each average is based is also available.
In their analysis, L. A. Baxter et al. use a standard linear model with additive main
effects. McCullagh & Nelder (1983) use the generalized linear model with
L =PA+CG+VA, (13a)
V(y =) p7/ N, (13b)
where N is the number of claims on which y is based. They justify this model by:
(i) holding the variance function fixed as in (13b), and relaxing (13a) to goi=
PA+ CG + VA, and plotting D(y; ,'u) versus 4';and
(ii) holding the linear predictor fixed as in (13a), and relaxing (13b) to V(,) = ,0,
and tabulating values of QPL( 0) for several values of 0.
We extend McCullagh & Nelder's treatment of these data by considering the general model
/AY =PA+CG+VA, Vo(p) = 0/N.
Figure 3 shows the contour plot of QPL(J, 4').The contours are oriented almost parallel
to the coordinate axes, indicating that the parameters 0 and 4f are approximately
uncorrelated. The maximum of the extended quasi-likelihood function is at (2-4, -075)
and the dotted line gives the limit based on X22(005). The point (0, 4')= (2, -1) indicated
by the square on Figure 3 corresponds to McCullagh & Nelder's model. Thus, this
232 J. A. NELDER AND D. PREGIBON
particular choice is not contradicted by the data. The line superimposed on the plot is
0 = 2(1 -i/), which corresponds to the intersection of quasi-likelihood and data-transfor-
mation models. Several data-transformation models are supported by the data including
y= log y and y =y-l/3
The quasi-likelihood model (13) and the data-transformation model, y/ = log y, yield
similar conclusions about damage claims. See McCullagh & Nelder (1983) for details.
Here the y-variable is of the extensive type, so that the data-transformation model,
although simpler to fit, is the less attractive.
7. CONCLUSION
The use of the extended quasi-likelihood Q+ allows the comparison of generalized
linear models in which the random component is specified only in respect of its first two
moments, and in which the link function, variance function, and/or dispersion parameter
may be parameterized. Standard techniques of model fitting and comparison may then
be applied to this flexible class of models.
ACKNOWLEDGEMENTS
The Department of Mathematics at the University of Western Australia and the Division
of Mathematics and Statistics at CSIRO, Perth, provided a stimulating environment
where our initial work on this paper orginated. We gratefully acknowledge the support
of these institutions and many helpful discussions with our colleagues there, especially
N. A. Campbell and I. R. James. We also thank David M. Gay of AT&T Bell Laboratories
for computing assistance in carrying out a number of bootstrap experiments.
REFERENCES
BARNDORFF-NIELSEN, 0. & Cox, D. R. (1979). Edgeworth and saddlepoint approximations with statistical
applications (with discussion). J. R. Statist. Soc. B 41, 279-312.
Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations (with discussion). J. R. Statist. Soc. B
26, 211-43.
BROOKS, R. J. (1984). Approximate likelihood ratio tests in the analysis of beta-binomial data. Appl. Statist.
33, 285-9.
CROWDER, M. J. (1978). Beta-binomial anova for proportions. Appl. Statist. 27, 34-7.
EFRON, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7, 1-26.
EFRON, B. (1986). Double exponential families and their use in generalized linear regression. J. Am. Statist.
Assoc. 81, 709-21.
JORGENSEN, B. (1983). Maximum likelihood estimation and large-sample inference for generalized linear
and nonlinear regression models. Biometrika 70, 19-28.
JORGENSEN, B. (1987). Exponential dispersion models (with discussion). J. R. Statist. Soc. To appear.
MCCULLAGH, P. (1983). Quasi-likelihood functions. Ann. Statist. 11, 59-67.
MCCULLAGH, P. & NELDER, J. A. (1983). Generalized Linear Models. London: Chapman and Hall.
MORRIS, C. N. (1982). Natural exponential families with quadratic variance functions. Ann. Statist. 10, 65-80.
MORTON, R. (1981) Efficiency of estimating equations and the use of pivots. Biometrika 68, 227-33.
PREGIBON, D. (1984). Review of Generalized Linear Models by McCullagh and Nelder. Ann. Statist. 12,
1589-96.
TWEEDIE, M. C. K. (1981). An index which distinguishes between some important exponential families. In
Proc. Ind. Statist. Inst. Golden Jub. Int. Conf., pp. 579-604. Calcutta: Indian Statistical Institute.
WEDDERBURN, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the Gauss-
Newton method. Biometrika 61, 439-47.
WILLIAMS, D. A. (1982). Extra-binomial variation in logistic linear models. Appl. Statist. 31, 144-8.