3.1987 - An Extended Quasi-Likelihood Functions - Nelder-Pregibon

Biometrika Trust
An Extended Quasi-Likelihood Function

Author(s): J. A. Nelder and D. Pregibon
Reviewed work(s):
Source: Biometrika, Vol. 74, No. 2 (Jun., 1987), pp. 221-232
Published by: Biometrika Trust
Stable URL: http://www.jstor.org/stable/2336136 .
Accessed: 21/08/2012 09:35
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.
http://www.jstor.org
Biometrika (1987), 74, 2, pp. 221-32
Printed in Great Britain
An extended quasi-likelihood function

BY J. A. NELDER
Department of Mathematics, Imperial College, London SW7 2BZ, U.K.
AND D. PREGIBON
Statistics and Data Analysis Research Department, AT&T Bell Laboratories, Murray Hill,
New Jersey 07974, U.S.A.
SUMMARY
Wedderburn's original definition of quasi-likelihood for generalized linear models is
extended to allow the comparison of variance functions as well as those of linear predictors
and link functions. The relationship between generalized linear models and the use of
transformations of the response variable is explored, and the ideas are illustrated by
three examples.
Some key words: Data transformation; Exponential family; Saddlepoint approximation; Second-moment
assumptions; Variance function.
1. INTRODUCTION
Wedderburn's (1974) introduction of quasi-likelihood greatly widened the scope of
generalized linear models by allowing the full distributional assumption about the random
component in the model to be replaced by a much weaker assumption in which only the
first and second moments were defined. In making this extension Wedderburn widened
the scope of generalized linear models in a way very similar to that of Gauss when he
replaced the assumption of normality in classical linear models by that of equal variance.
For generalized linear models with distributions in the exponential family, likelihood
ratio and score tests are used for testing hypotheses concerning nested subsets of covariates
in the linear predictor and for assessing hypothesized link functions. These methods are
also applicable with Wedderburn's form of quasi-likelihood. However neither of these
methods is suitable for the comparison of different variance functions. In this paper we
introduce an extended quasi-likelihood function which allows for the comparison of
various forms of all the components of a generalized linear model, i.e. the linear predictor,
the link function, and the variance function. We then apply the ideas to the analysis of
several sets of data.
2. QUASI-LIKELIHOOD
2 1 General
In this section we introduce the original and extended quasi-likelihood functions, and
describe their properties and limitations. More comprehensive treatments of some of the
material is given by Wedderburn (1974), McCullagh (1983) and McCullagh & Nelder
(1983, Ch. 8).
222 J. A. NELDERAND D. PREGIBON
2-2. Wedderburn'soriginalform
Wedderburn (1974) defined the quasi-likelihood, strictly the quasi-log-likelihood, Q
for an observation y with mean , and variance V(,) by the equation
Q(y;,u) {) du (1)
plus some function of y only, or equivalently by

adQ(y;,)/d,u = (y-A)/ V(Z). (2)
The deviance function, which measures the discrepancy between the observation and its
expected value, is obtained from the analogue of the log-likelihood-ratio statistic
D(y; Al)= -21Q( y; A) - Q( y; y)} = -2 y YV(u)du. (3)
Now consider n independent observations (yi, xi; i = 1, . .., n), where yi is the ith
response variable with mean ,i and variance V(Ai) and xi is an associated vector of q
covariates. We suppose that the relationship between ,i and the covariate vector xi is
given by the equation
9(Ai) = xi3, (4)

where g, the link function, is some known strictly monotone differentiable function. The
sum of individual quasi-likelihoods gives the quasi-likelihood for the sample. Unless
otherwise noted, this will be denoted simply by Q(y; ,). Similarly we denote the sample
deviance function by D(y; ,).
Wedderburn (1974) and McCullagh (1983) show that quasi-likelihoods, and their
associated maximum quasi-likelihood estimates have many properties analogous to those
of likelihoods and their associated maximum likelihood estimates. In particular, the
maximum quasi-likelihood estimate 8 is asymptotically normal with mean 8, and
asymptotic covariances may be derived in the usual fashion from the second derivative
matrix of Q. Further, if HA and HB are two nested hypotheses of dimension A < B, then,
under HA, the change in deviance
D(A B; AA) = D(Y; A) -D(Y; AB) (5)
has an asymptotic XiB-A distribution.
Wedderburn relaxes the assumption of a known variance function of y by allowing
an unknown constant of proportionality 4, so that var (y) = OV(,). The introduction of
this 'dispersion' parameter does not alter the estimation of the regression coefficients 8.
However k does appear as a scale factor in the asymptotic distributions described above,
and for these purposes an estimate is required. Wedderburn suggested the 'bias corrected'
mean X2 statistic
X2_ 1 (yI-ii)2 (6)
n-q n-q V( Ai)
Clearly many of the ideas about, and procedures for fitting, generalized linear models
can be extended without difficulty when likelihoods are replaced by quasi-likelihoods.
One remaining problem concerns the comparison of different variance functions on the
same data set. Since the variance function determines the units of measurement for
An extended quasi-likelihoodfunction 223
D(y;,u) and X2, differencing these discrepancy measures across variance functions is
not possible. To assess variance functions it is necessary to extend the definition of
quasi-likelihood and this we do in ? 2-3.
We note that a distribution with a given variance function may exist, but without
belonging to the class of distributions which are required for a generalized linear model
proper. Quasi-likelihood enables a fitting procedure to be defined for such distributions,
using only the first two moments. The efficiency of such a procedure as compared with
maximum likelihood under the exact distribution is not known.
2-3. An extended quasi-likelihooddefinition

We define a function Q+ for a single observation y with mean u and variance VV(,u)
by
Q+(Y; ) = -2 log{2gr0V( y)}-'D(y; ,u)/ , (7)
where D(y; u) is the deviance as defined in (3) and 4 is the dispersion parameter. Note
that Q+, like Q, does not presuppose a full distributional assumption but only the form
of the first two moments.
The estimates of ,B obtained by maximizing Q+ are the same as those obtained from
maximizing Q, that is they are maximum quasi-likelihood estimates. This follows because
Q+ is a linear function of Q with coefficients independent of ,B.
The estimate of 4 obtained from maximizing Q+is 0 = D(y; ,u)/n, the mean deviance.
For the special cases where Q+ corresponds to the normal and inverse Gaussian distribu-
tions, see below, 4 is the maximum likelihood estimate of f.
We need to justify Q+. For the original definition of quasi-likelihood, an equivalent
likelihood proper exists if there is a distribution from a natural exponential family with
the same variance function (Morris, 1982). This remains true for Q+ for the normal and
inverse Gaussian distributions; for the gamma distribution Q+ differs from the log
likelihood by a factor depending only on 4; for the discrete Poisson, binomial, and
negative binomial distributions, Q+ is obtainable from the respective log likelihood
function by replacing any factorial k! by Stirling's approximation
k! (2 wk)2k e . (8)
Though the form (7) may look unfamiliar, exp (Q+) is in fact the unnormalized
saddlepoint approximation for exponential families discussed by Barndorff-Nielsen &
Cox (1979). However these authors discuss its use in connection with the asymptotic
distribution of maximum likelihood estimates, whereas we apply it to single observations
with generalized linear model structure.
A distribution can be formed from an extended quasi-likelihood by normalizing
exp (Q+) with a suitable factor to make the sum or integral equal to unity. While the
normalizing factor is usually a function of ji, 4 and 0, see ? 3 2, it can happen that its
value varies only slightly for large changes in certain of these parameters. Then it can
be argued that the normalizing factor contains almost no information about these
particular parameters, so that little is lost in optimizing of the unnormalized extended
quasi-likelihood for estimation purposes. Our use of the unnormalized extended quasi-
likelihood function therefore has a certain 'partial likelihood' flavour, though we have
not explored the connection in depth.
224 J. A. NELDER AND D. PREGIBON
3. NONLINEAR PARAMETERS AND THE EXTENDED QUASI-LIKELIHOOD FUNCTION
3- 1. General
By 'nonlinear' parameters we mean those parameters not in the linear predictor. In
this section we illustrate how the extended quasi-likelihood function can be used to
estimate nonlinear parameters affecting the variability of the response. See Jorgensen
(1987) and Efron (1986) for related work in this area.
3*3. Unknownparameters in the variancefunction

Wedderburn's original quasi-likelihood model required knowing the variance function
up to a multiplicative constant. Using Q+, this requirement can be relaxed. Consider
embedding the variance function in a family of functions indexed by an unknown
parameter 0, so that var (y) = OV0(g). This implies that the quasi-likelihood contribution
of a single observation is
Q'(y; /) =- log {2,iTV6(y)} -1-V1D0(y; ,u), (9)

where D6(y; A) is given by
Do(y; /L)= -2 { du.
A useful family is obtained by considering powers of ,u:

V0(Z) = He. (10)
Most common values of 0 are the values 0, 1, 2, 3 which correspond to variance functions
associated with normal, Poisson, gamma, and inverse Gaussian distributions respectively.
Tweedie (1981) discusses distributions with this variance function, and shows that an
exponential family exists for 0 = 0, and 0 ? 1. See also Jorgensen (1987).
For the variance function family (10), the deviance function is
2{y log (y/,u) - (y -,u) (O = 1)3,
2{y/,ut -log (y/u) - 1} (0 = 2),
Do( ) =2{@ - (2 - H)y,u'-@+ ( I _- aotherwise.
I. ~~(1-0)(2 -0)
For fixed 0, the maximum quasi-likelihood estimate of ,3 is easily obtained using standard
techniques, such as those provided by GLIM.
3X3. Nonconstant dispersionparameter

Wedderburn's original quasi-likelihood model assumed that the dispersion parameter
4 is constant for all observations. In certain applications, it may be desirable to check
this assumption, or perhaps model 4 as a function of known covariates. The extended
quasi-likelihood function provides an objective framework upon which such an analysis
may be based.
Consider the general quasi-likelihood model (Pregibon, 1984)
g(,u)=xp, h(4)=zO, var(y)=4V(,u), (11)
where both the mean and the dispersion parameters vary across observations in a
parameterized fashion. Typically, the particular application will guide the choice of the
link function h for the dispersion; it will usually be either the identity, as Example 2 in
? 6, or the logarithmic function.
As with ordinary quasi-likelihood, 0 is implicitiy assumed to be functionally indepen-
dent of u so that the dependence of the variance of y on the mean is contained wholly
in the variance function V(,u). Thus, we are thinking of applications where z represents
stratification variables or covariates affecting dispersion only. If the same covariate vector
z = x is used in modelling both mean and dispersion, intrinsic aliasing of the parameters
could occur.
Fitting the general quasi-likelihood model in (11) is accomplished by maximizing
Q+(y; ,u) as given in (7), where now 4 depends upon unknown parameters 0. Without
causing undue confusion with the notation of ? 3-2, let Q' denote the corresponding
extended quasi-likelihood function with 4 occurring implicitly as an argument. The usual
iterative weighted least-squares algorithm for maximizing Q+ can be adapted to this
problem. Holding p fixed at the current estimate X, the estimate of 8 is obtained as
usual, using k as the prior weight. Holding ju fixed at the current estimate ju,the estimate
of 0 is obtained by fitting the quasi-likelihood model, h(4) = zO, V(+) = 42, where 4 is
the mean parameter for the 'dependent' variable D(y; A). Cycling between holding /u
fixed and holding 4 fixed will lead to estimates (13,0) which maximize Q+. Note that
the covariance estimates of each set of parameter estimates obtained from this procedure
will be conditioned on the values of the other set being set equal to their estimates. The
MACRO facility in GLIM provides a convenient means of implementing this algorithm.
Morton (1981) discusses optimum linear estimating equations for models defined by
first and second moments only, and gives an example where the variance is a function
of parameters occurring in the model for the mean. We conjecture that the above procedure
is equivalent to solving two coupled sets of such equations for the parameters 3 in the
mean and 0 in the dispersion:
3 4. Inference concerning nonlinear parameters

For inferences about 'nonlinear' parameters, we propose the use of profile quasi-
likelihood intervals constructed in the usual way. The nonlinear parameters concerned
will be those occurring in the link and variance functions, and their use will often comprise
a form of model checking in which prior forms are embedded in a parameterized family.
For example, for the nonlinear parameter 0 introduced in equations (9) and (11), the
profile quasi-likelihood interval is obtained by maximizing Q' for fixed 0, obtaining
estimates /3(0) and 4)(0), and then varying 0 over some interval of interest. Let Q+max
denote the maximum of Q'. The interval (6L, OR) over which
QPL(O) = Qmax Q
is greater than some pre-assigned value d is the profile quasi-likelihood interval. Thus d
may be defined by some percentage point of a 2X2distribution.
When Q+ corresponds to an exponential family distribution, asymptotic justification
of profile quasi-likelihood intervals is available (Jorgensen, 1983; Efron, 1986). Given
assumptions about the first two moments only of the error distribution, we recognize that
properties of such intervals remain to be justified. We have attempted a justification for
particular data sets, e.g. Example 1 in ? 6, by use of resampling schemes to derive
'confidence' intervals for these parameters. The general method is as follows:
(i) compute the standardized Pearson residuals ei=(yi- i)/{4)VO2u)}i, where ,u
and 4)are evaluatedat 0;
226 J. A. NELDERAND D. PREGIBON
(ii) resample the ei to form Ei, and construct the pseudo-response variable y*=
A + A
(iii) using y* in place of y, find the maximum quasi-likelihood estimate of 0;
(iv) repeat (ii) and (iii) M times to generate a resampling distribution for 0.
Possible resampling schemes as stage (ii) include (a) sampling ei without replacement,
(b) sampling ei with replacement (Efron, 1979), (c) random sampling from f(e; 4) where
f is a parametric family of distributions, with e being possibly estimated from the data,
and (d) random sampling from f(e), where f is a nonparametric density estimate of e.
Given the resampling distribution of 0, there are also several possible ways of forming
an interval for 0. We adopt the 'quantile' method whereby the quantiles of the distribution
of Om - 0 are used to estimate the quantiles of the distribution of 0- 0. Let aL and au
denote the lower and upper symmetric percentage points of the resampling distribution
of Om _0. The intervalestimate(OL, OR) of 0 is (0-au, 0-aL).
4. ADJUSTMENT OF V(Y) AT THE ORIGIN
Certain quasi-likelihood functions, e.g. those with 0 2 in (10), restrict y to be strictly

greater than zero. Otherwise the quasi-likelihood function Q becomes infinite. In some
important cases where y = 0 is allowed, the form of Q+ is unsatisfactory because V( y) = 0
for y = 0, so that the extended quasi-likelihood function Q+ becomes infinite. If Q+ is
viewed as an approximation to a discrete distribution such as the binomial, Poisson, or
negative binomial, then the problem lies with the form (8) of the Stirling approximation
used for the factorials, which approaches zero for y -* 0 instead of unity. The solution is
to use the modified form
k!k{21T(k+c)} kkek. (12)
For c = 6, this approximation gives 1 023 for 0! and the correct first term, (12k)-1, of the
Stirling series for large k. The approximation is better than (8) for all integer k.
Table 1. Modified empirical variancefunctions for some

discrete distributions
Distribution V(y) = V(y; 0) V(y; c)
Poisson y y+ c
Binomial y(N-y) (y+c)(N-y+c)

N N+c
Negative binomial Y( + ') (y + ')2(y + C)( v + C)

v p2(y+V+ C)
For the discrete distributions mentioned above, the use of the modified Stirling's
approximation (12) yields the results displayed in Table 1, with V(y; c) replacing V(y)
in (7). Note that the maximum quasi-likelihood estimates of 13and 4 are unchanged if
V(y) is replaced by V(y; c); however the use of V(y; c) allows Q+ to be defined for
all sample sets and will be important if V(.) itself contains unknown parameters.
5. RELATIONSHIP OF QUASI-LIKELIHOOD AND RESPONSE-VARIABLE TRANSFORMATION

There is a close correspondence between data-transformation models, where, for some
monotone function g, g(y) is assumed to be approximately normally distributed with
meanjig = x43 and constantvariance0,g, andthe subsetof quasi-likelihoodmodelsdefined
by g(A) = x,Band V(,) = 1/{g'(,)}2.
Consider the typical data-transformationmodel which utilizes the power family of
transformationsg(y) = y*. The approximationto the distributionof y is
f~Jd~(Y )12dYg exp [-2 {g(y)_x3}2]
and standardlikelihood methodsfor estimating83are applicable.

The correspondingquasi-likelihoodmodel is , = x,3 and V4,(pu)= ,2(1-). The saddle-
point approximationto the distributionof y is
f( ) V~'(yexp ( ) du.
_ quasi-likelihood and data-
Since to first order, E{g(y)} g(A) and var (y) /{g'(,)}2,
transformationmodels are first-orderequivalent.The term V2(y) in the quasi-likelihood
formulation is a Jacobian-likeadjustmentfor scale changes. The major conceptual
differencebetween the models is that squared-losson the transformedscale is replaced
by an appropriatelychosen deviancefunction.In practicethis differenceis often unimpor-
tantunlessa grosslyinadequatedata-transformation, or quasi-likelihoodmodel,is chosen.
An interestingfeature of the subset of quasi-likelihoodmodels which intersectdata-
transformationmodels is that the estimatedasymptoticcovariancematrixof ,Bis propor-
tional to (XTX)-l. This has implicationsfor both fitting,wherethe Hessian need not be
updatedat each iteration,and for inferences,wherein balanceddesigns,the components
of deviance are asymptoticallyindependent.
Response-variabletransformationhas several disadvantagescomparedto the method
of quasi-likelihood:
(i) range restrictionson g(y) following from y, including discreteness,technically
imply that a normalapproximationfor g(y) is not appropriate;
(ii) the assessmentof the variancefunction of the responsebased on decomposing
the normallog likelihood requiresreplicateobservations;and
(iii) a single common scale for linearityand homogeneityof varianceis required.
While the firstof these is not importantin practice,the other two are. The fact that the
extendedquasi-likelihoodmodel allowsvariance-functionassessmentwithoutreplication
is especially attractive.The third point is also importantthough cases do occur when a
single transformationachieves both linearityand homogeneitv.However,note that the
quasi-likelihoodmodel allows one to assess whether a common transformationdoes
achieve these dual aims. For the family of power transformations,this is obtained by
0 and testing the
considering the general quasi-likelihood model =x/,
=1 V0(A)=A
hypothesis H: 0 = 2(1 - f,). An example is given below.
Ultimately,the determinationof which loss function is appropriatefor estimationof
13 depends on the purpose of the analysis and the type of response variable under
consideration.Box & Cox (1964) distinguishbetween extensive and nonextensivevari-
ables. The formerclass has the propertyof physical additivity,where response-variable
transformationwould not be appropriate.Nonextensivevariablescan usefullybe model-
led on any scale unless there is some prior preferencefor a particularone. Thus, for
nonextensive variables, the choice between data-transformationand quasi-likelihood
modelsis seldommadeon statisticalgroundsbutratheron considerationof subjectmatter.
6. EXAMPLES
Several examples are presented in this section to illustrate the flexibility of the extended
quasi-likelihood model in statistical analysis. The presentation is necessarily incomplete,
as features not relevant to the present discussion are ignored. In practice, one would
supplement the analyses with further model checking, including residual and other
diagnostic plots.
Example 1: Textile data. This example concerns the behaviour of worsted yarn under
cycles of repeated loading (Box & Cox, 1964). The response variable y is the number of
cycles to failure, resulting from a single replicate of a 33 experiment with factors xl:
length of test specimen, x2: amplitude of loading cycle, and x3: load. Box & Cox
recommend a log transform of the response variable, both to enhance additivity of effects
and increase sensitivity of the analysis. Their methods for examining the question of
variance homogeneity are not applicable since there is no replication at each design
point. We now consider how quasi-likelihood methods can help.
Consider the quasi-likelihood model
log(,u)=x181+x2132+x3133, V1(1)= 1i
The plot of the profile quasi-likelihood function Q+(y; ,u) is displayed in Fig. 1. A 95%
likelihood-type interval for 6 is (1-75, 3 35). In contrast, a 95% bootstrap-type interval
for 0 is (1.55, 3 42), this being based on 100 samples generating according to the
prescription in ? 3 4. The two intervals are in general agreement though the latter is more
-122
-124 - -- - - - -- - - - - - - - - - - - -- - - - -
-126
-128 -
-130 II1
10 1i5 2-0 25 3-0 3-5 4-0
0
Fig. 1. Profile quasi-likelihood function of the variance-function parameter 0 for Example 1. The maximizing
value is 6 = 2-5. Dashed line indicates a 95% likelihood-type interval for 0; shaded bar, a 95% bootstrap-type
interval for 6. A variance function with power 6 between 1 7 and 3 4 is supported by the data.
conservative,i.e. wider. Both intervalsindicate that 0 = 2 is plausiblegiven the data. As
var (y) O2 4 implies var{log(y)} const, the log transformationof the data is justified.
The decision whetherto reportconclusions using a log transformationof the data or
a generalizedlinear model with log-link and squaredvariancefunction seems to matter
little. Table 2 gives the estimatedregressioncoefficientsand their standarderrorsfor
both models, while Table3 displaysthe standardanalysisof varianceand deviancetables
for the.33 design. As suggestedby our discussion in ? 5, they are remarkablysimilar.In
either case confidence intervals for functions of the parameterestimates would be
calculatedon the log scale, with back transformationto the originalscale if needed.
Table 2. Estimates and standard errors for data-transformation and

quasi-likelihood models, linear terms,fitted to the textile data
Data transformation Quasi-likelihood
Variable Estimate Std error Estimate Std error
Intercept 6 335 0 0357 6-349 0 0352
Length 0-832 0-0438 0-842 0-0431
Amplitude -0-631 0-0438 -0-631 0-0431
Load -0-392 0-0438 -0 385 0-0431
Dispersion a2= 0-0420 = 00335
Table 3. Analysis of variance and analysis of deviance tables for the

textile data, with a main-effects model
Data transformation Quasi-likelihood
Source d.f. ss MS = ss/d.f. Dev. Dev./d.f.
Length 2 12f51 6 25 12f69 6f34
Amplitude 2 7f17 3 58 6f89 3 45
Load 2 2f80 1f40 2f60 1f30
Residual 30 0f72 0f36 0f71 0 35
Total 26 23f20 22-89
d.f., degrees of freedom; ss, sum of squares; MS, mean square; Dev., deviance.
Example2: Seed cultivation.This example concerns the dependence of germination

rate on seed type and cultivationmedium (Crowder,1978). The responsevariableyij is
the number of germinatingseeds of type i planted in mediumj out of a total of ni,
planted. Crowder introduced this example to illustrate the applicability of the beta-
binomial distribution for data which exhibit extra-binomialvariation; the value of
Pearson'sX2 statisticis 31X51on 17 degrees of freedom. Subsequently,Williams(1982)
introduced a quasi-likelihoodmodel for analysis of such data that includes the beta-
binomial as a special case.
Consider unobservable random variables pij, independently distributed on the unit
interval with means 7rijand variances OT,rij(1- rij). Assume that conditional on pij,
yij B(nij,pij), so that unconditionally yij has mean Aij= nij,rijand variance 4ij x
nij7rij(1--rij), where 4ij = 1+ 0(nij -1). For fixed 6, maximum quasi-likelihood estimates
of 7rijare easily obtained.As an estimateof 0, Williamssuggesteda moment-likeestimator
based on equatingPearson'sX2 to its expectationunderthe model. Subsequently,Brooks
(1984) suggestedan alternativeestimate,based on maximizingthe beta-binomialprofile
likelihood function of 0, evaluated at the maximum quasi-likelihoodestimates of Tr
ratherthan at the maximumlikelihoodestimates,assumingthe beta-binomialdistribution.
0
QPL(O) \
-0-5
-1.0
-1-5 _"
. . . . . . . .. . . . . . . . . . .. . . . . . . . . . .. 0. . . . . . . .. . . . . . . . . . ......... -------------------~;;f
-2.
-20 - II
0 001 002 003 004 oos 006

Fig. 2. Profile quasi-likelihood function, solid curve, and profile log-likelihood function, dashed curve, of 6
for Example 2. The latter is based on a beta-binomial distribution for y. Both functions have been standardized
to have a maximum of zero. The dotted line indicates a 95% likelihood-type interval for plausible values
of 6 under the two models.
The use of the extended quasi-likelihood function provides a means of estimating jointly
the parameters wijand 0.
Consider the extended quasi-likelihood function Q+ defined by
logit (y a
=+f3= +
y,i+yj, = 1+ O(nij- 1), var (yij=) =ijV(),
where V(Qij) is the usual binomial variance function. Figure 2 is a plot of the profile
quasi-likelihood function of 0 obtained by maximizing Q+(y; A) over ,t holding 0 fixed.
The maximum quasi-likelihood estimate of 0 is 0-0129. This is not significantly different
than zero as the significance level based on a X' approximation to the quasi-likelihood
ratio statistic is 0-12.
For comparative purposes the profile likelihood function of 0 assuming a beta-binomial
distribution for y is also plotted in Fig. 2. The shapes of the two functions are qualitatively
similar, both attaining their maximum near 0-012. For small values of 0 less than 0-012
the two functions are nearly identical. Hence both specifications indicate that 0 is not
significantly different than zero. Evidently Pearson's X2 is picking up departures from
the model of a form different than that suggested by these models of extra-binomial
variation.
The 95% likelihood interval for 0 under the beta-binomial specification is shorter than
that based on the quasi-likelihood model. This is intuitively reasonable since stronger
assumptions should lead to tighter intervals.
Example 3: Insurance claims. McCullagh & Nelder (1983, Ch. 7) cite an example,
given by L. A. Baxter, S. M. Coutts and G. A. F. Ross in unpublished conference
proceedings, concerning damage claims to privately owned and comprehensively insured
410
010 -5 -tD 0
0
-0-52
-1.0 -
-1-5-
-2.0
1.0 1-5 20 25 3-0
Fig. 3. Profile quasi-likelihood function of the variance and link function parameters (6, ef) for Example 3.
The maximizing value is (6, ei) = (2-4, -0-75). The solid contours are labelled in powers of 2k for k = -1(1)4.
Dashed contour, a 95% likelihood-type interval for plausible values of (6, el); [1, McCullagh & Nelder's
(1983) model; dotted line, intersection of data transformation and quasi-likelihood models. A power
transformation, y , with ti between -2 and 0 is supported by the data.
cars. The data consist of the average claims y for each combination of policy holder's
age PA, with 8 levels, car group CG, with 4 levels, and vehicle age VA, with 4 levels. The
number of claims on which each average is based is also available.
In their analysis, L. A. Baxter et al. use a standard linear model with additive main
effects. McCullagh & Nelder (1983) use the generalized linear model with
L =PA+CG+VA, (13a)
V(y =) p7/ N, (13b)
where N is the number of claims on which y is based. They justify this model by:
(i) holding the variance function fixed as in (13b), and relaxing (13a) to goi=
PA+ CG + VA, and plotting D(y; ,'u) versus 4';and
(ii) holding the linear predictor fixed as in (13a), and relaxing (13b) to V(,) = ,0,
and tabulating values of QPL( 0) for several values of 0.
We extend McCullagh & Nelder's treatment of these data by considering the general model
/AY =PA+CG+VA, Vo(p) = 0/N.
Figure 3 shows the contour plot of QPL(J, 4').The contours are oriented almost parallel
to the coordinate axes, indicating that the parameters 0 and 4f are approximately
uncorrelated. The maximum of the extended quasi-likelihood function is at (2-4, -075)
and the dotted line gives the limit based on X22(005). The point (0, 4')= (2, -1) indicated
by the square on Figure 3 corresponds to McCullagh & Nelder's model. Thus, this
particular choice is not contradicted by the data. The line superimposed on the plot is
0 = 2(1 -i/), which corresponds to the intersection of quasi-likelihood and data-transfor-
mation models. Several data-transformation models are supported by the data including
y= log y and y =y-l/3
The quasi-likelihood model (13) and the data-transformation model, y/ = log y, yield
similar conclusions about damage claims. See McCullagh & Nelder (1983) for details.
Here the y-variable is of the extensive type, so that the data-transformation model,
although simpler to fit, is the less attractive.
7. CONCLUSION
The use of the extended quasi-likelihood Q+ allows the comparison of generalized
linear models in which the random component is specified only in respect of its first two
moments, and in which the link function, variance function, and/or dispersion parameter
may be parameterized. Standard techniques of model fitting and comparison may then
be applied to this flexible class of models.
ACKNOWLEDGEMENTS
The Department of Mathematics at the University of Western Australia and the Division
of Mathematics and Statistics at CSIRO, Perth, provided a stimulating environment
where our initial work on this paper orginated. We gratefully acknowledge the support
of these institutions and many helpful discussions with our colleagues there, especially
N. A. Campbell and I. R. James. We also thank David M. Gay of AT&T Bell Laboratories
for computing assistance in carrying out a number of bootstrap experiments.
REFERENCES
BARNDORFF-NIELSEN, 0. & Cox, D. R. (1979). Edgeworth and saddlepoint approximations with statistical
applications (with discussion). J. R. Statist. Soc. B 41, 279-312.
Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations (with discussion). J. R. Statist. Soc. B
26, 211-43.
BROOKS, R. J. (1984). Approximate likelihood ratio tests in the analysis of beta-binomial data. Appl. Statist.
33, 285-9.
CROWDER, M. J. (1978). Beta-binomial anova for proportions. Appl. Statist. 27, 34-7.
EFRON, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7, 1-26.
EFRON, B. (1986). Double exponential families and their use in generalized linear regression. J. Am. Statist.
Assoc. 81, 709-21.
JORGENSEN, B. (1983). Maximum likelihood estimation and large-sample inference for generalized linear
and nonlinear regression models. Biometrika 70, 19-28.
JORGENSEN, B. (1987). Exponential dispersion models (with discussion). J. R. Statist. Soc. To appear.
MCCULLAGH, P. (1983). Quasi-likelihood functions. Ann. Statist. 11, 59-67.
MCCULLAGH, P. & NELDER, J. A. (1983). Generalized Linear Models. London: Chapman and Hall.
MORRIS, C. N. (1982). Natural exponential families with quadratic variance functions. Ann. Statist. 10, 65-80.
MORTON, R. (1981) Efficiency of estimating equations and the use of pivots. Biometrika 68, 227-33.
PREGIBON, D. (1984). Review of Generalized Linear Models by McCullagh and Nelder. Ann. Statist. 12,
1589-96.
TWEEDIE, M. C. K. (1981). An index which distinguishes between some important exponential families. In
Proc. Ind. Statist. Inst. Golden Jub. Int. Conf., pp. 579-604. Calcutta: Indian Statistical Institute.
WEDDERBURN, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the Gauss-
Newton method. Biometrika 61, 439-47.
WILLIAMS, D. A. (1982). Extra-binomial variation in logistic linear models. Appl. Statist. 31, 144-8.
[Received November 1984. Revised August 1986]

3.1987 - An Extended Quasi-Likelihood Functions - Nelder-Pregibon

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

3.1987 - An Extended Quasi-Likelihood Functions - Nelder-Pregibon

Enviado por

Direitos autorais:

Formatos disponíveis

Biometrika Trust

An Extended Quasi-Likelihood Function

An extended quasi-likelihood function

plus some function of y only, or equivalently by

D(y; Al)= -21Q( y; A) - Q( y; y)} = -2 y YV(u)du. (3)

9(Ai) = xi3, (4)

2-3. An extended quasi-likelihooddefinition

3*3. Unknownparameters in the variancefunction

Q'(y; /) =- log {2,iTV6(y)} -1-V1D0(y; ,u), (9)

Do(y; /L)= -2 { du.

A useful family is obtained by considering powers of ,u:

3X3. Nonconstant dispersionparameter

3 4. Inference concerning nonlinear parameters

4. ADJUSTMENT OF V(Y) AT THE ORIGIN

Certain quasi-likelihood functions, e.g. those with 0 2 in (10), restrict y to be strictly

Table 1. Modified empirical variancefunctions for some

Binomial y(N-y) (y+c)(N-y+c)

Negative binomial Y( + ') (y + ')2(y + C)( v + C)

5. RELATIONSHIP OF QUASI-LIKELIHOOD AND RESPONSE-VARIABLE TRANSFORMATION

f~Jd~(Y )12dYg exp [-2 {g(y)_x3}2]

and standardlikelihood methodsfor estimating83are applicable.

Table 2. Estimates and standard errors for data-transformation and

Table 3. Analysis of variance and analysis of deviance tables for the

Example2: Seed cultivation.This example concerns the dependence of germination

0 001 002 003 004 oos 006

[Received November 1984. Revised August 1986]

Você também pode gostar