Você está na página 1de 9

Biornetrika(1974), 61, 3, 439 439

Printed,in GreatBritain

functions,
Quasi-likelihood linearmodels,
generalized
and theGauss-Newtonmethod
BY R. W. M. WEDDERBURN
RothamstedExperimentalStation,Harpenden,Herts.

SUMMARY
To definea likelihoodwe have to specifythe formofdistributionofthe observations,but
to definea quasi-likelihoodfunctionwe need only specifya relationbetween the mean and
variance of the observationsand the quasi-likelihoodcan then be used forestimation.For
a one-parameterexponential familythe log likelihood is the same as the quasi-likelihood
and it follows that assuming a one-parameterexponential familyis the weakest sort of
distributionalassumption that can be made. The Gauss-Newton method for calculating
nonlinearleast squares estimatesgeneralizeseasily to deal with maximum quasi-likelihood
estimates,and a rearrangementof this produces a generalizationof the method described
by Nelder & Wedderburn(1972).

Some key words: Estimation; Exponential families; Gauss-Newton method; Generalized linear models;
Maximum likelihood; Quasi-likelihood.

1. INTRODUCTION

This paper is mainly concerned with fittingregressionmodels, linear or nonlinear,in


whichthe variance of each observationis specifiedto be eitherequal to, or proportionalto,
some function of its expectation. If the form of distributionof the observations were
specified,the method of maximum likelihood would give estimates of the parameters in
the model. For instance,if it is specifiedthat the observations have normallydistributed
errorswith constantvariance, thenthe methodofleast squares providesexpressionsforthe
variances and covariances of the estimates, exact forlinear models and approximate for
nonlinearones, and these estimates and the expressionsfortheirerrorsremain valid even
ifthe observationsare not normallydistributedbut merelyhave a fixedvariance; thus,with
linear models and a given errorvariance, the variance of least squares estimates is not
affectedby the distributionof the errors,and the same holds approximatelyfornonlinear
ones.
A moregeneralsituationis consideredin this paper, namely the situationwhen thereis a
givenrelationbetweenthevariance and mean ofthe observations,possiblywithan unknown
constantofproportionality.A similarproblemwas consideredfroma Bayesian viewpointby
Hartigan (1969). We definea quasi-likelihoodfunction,which can be used forestimationin
the same way as a likelihood function.With constant variance this again leads to least
squares estimation. When other mean-variance relationships are specified, the quasi-
likelihood sometimesturnsout to be a recognizablelikelihoodfunction;forinstance,fora
of variation the quasi-likelihoodfunctionis the same as the likelihood
constant coefficient
obtained by treatingthe observationsas if they had a gamma distribution.
440 R. W. M. WEDDERBURN
Thencomputational methodsare discussed.The well-known Gauss-Newtonmethodfor
calculationof nonlinearleast squares estimatesis generalizedto providea methodof
calculatingmaximumquasi-likelihood estimates.
Whenthereexistsa function ofthemeansthatislinearintheparameters, a rearrangement
ofthecalculationsin thegeneralizedGauss-Newtonmethodgivesa procedureidenticalto
thatdescribedbyNelder& Wedderburn (1972).Thismethodproducesmaximumlikelihood
estimatesby iterativeweightedleast squareswhenthe distribution ofobservations
has a
certainformand thereis a transformation ofthemeanwhichmakesit linearin thepara-
meters.The distributions thatcan be treatedin thiswayare thoseforwhichthelikelihoods
are identicalto the quasi-likelihoods;
thusthepresentresultgeneralizesthatofNelder&
Wedderburn.
The approachdescribedin thispaper shedsnew lighton some existingdata-analytic
techniques,and also suggestsnewones.An exampleis givento illustratethemethod.

2. DEFINITION OF THE QUASI-LIKELIHOOD FUNCTION


Suppose we have independentobservationszi (i = 1,...,n) with expectations,ti and
variancesV(,ui),whereV is someknownfunction. Laterwe shallrelaxthisspecification
and
sayvar(zi)oc V(jtc).We supposethatforeachobservation j,ciis someknownfunction ofa set
ofparameters .81,..., fr. Thenforeach observation
we definethe quasi-likelihood
function
K(z*,jt) by therelation
aK(zi, A*) zi- i (1)
al'i v(Pli)X
or equivalently
-

K(zi, i)=|
zi
d14+ functionofz*.
Fromnowon,whenconivenient, thesubscripti willbe droppedso thatz and ,t willrefer
to an observation and its expectation,
respectively. Also,thenotationS(.) willbe used to
denotesummation overtheobservations,so thatS(z) isthesumoftheobservations. We shall
findthatK has manyproperties in commonwitha loglikelihoodfunction.In fact,we find
that K is the log likelihoodfunctionof the distributionif z comesfroma one-parameter
exponentialfamily,as willbe shownin ? 4.

3. PROPERTIES OF QUASI-LIKELIHOODS
K has properties
It is nowshownthatthefunction similarto thoseoflog likelihoods.
THEOREM 1.Letz and K bedefined as in ? 2, and supposethatIt is expressed
as a function
ofparameters
fi1,
...X An, ThenK has thefollowing properties:
a
(i) E ( = O,
(i)
(ii) Ea p3)0,

(iii) E =

aK= E
-E
32K
ial)l 'I all,
(iv) E
= (j)fi9f1
Quasi-likelihoodand generalizedlinear models 441
Proof. First, (i) followsimmediatelyfromthe definitionof K. Then (ii) followson noting
that aK/lfi*= (VK/l,u)(l/tla/3j)and (iii) is a special case of (iv).
To prove (iv), we note that

E@taj E@6 al8*al8j


-F(z-A)21 aA aA
E{V(1t)}12J fl&fl

since V(,ut) var (z). Also we have

-E(8f V()J#}
4 -E
-l V(1 ( V() m} il
V(#)Df
C6a
alCb I al
1 altl alj

which completesthe proof.


A resultwhich will be discussed furtherin ? 4 is as follows.

COROLLARY. If thedistribution
ofz is specified
in terms
ofa, sothattheloglikelihood
L canbe
then
defined,

-E E (2)
)X(62)
Proof. From the theoremjust proved, the above statementis equivalent to

var (z) 11E a2'

a result which followsimmediatelyfromthe Cramer-Rao inequality (Kendall & Stuart,


1973, ? 7.14).

4. LIKELIHOODS OF EXPONENTIAL FAMILIES

It is possible to definea log likelihoodif a one-parameterfamilyof distributionswithIt


as parameteris specifiedforz. The followingtheoremshows that the log likelihoodfunction
is identical to the quasi-likelihoodif and only if this familyis an exponential family.

THEOREM 2. For oneobservation


ofz, theloglikelihood
function
L has theproperty
AL z-,U3
Ta V(,U)'(3)
whereIt = E(z) and V(u) = var(z), ifand onlyifthedensity
ofz withrespect
tosomemeasure
can bewrittenin theformexp{zO- g(0)},where0 is somefunction ofa.
Proof. If aL/Dalhas the form(3), then integratingwith respect to ,t and defining

0=d=_T
(4) )
442 R. W. M. WEDDERBURN
we have therequiredresult.Conversely,supposeforsomemeasurem on therealline,the
ofz is givenby exp{zO- g(O)}dm(z).ThenfeZOdm(z)= eu(O),and so themoment
distribution
generatingfunction M(t) ofz is
fez(0+t) e-g(0)dm(z) = eg(O+t)-g(0)

It followsthatg(O+ t)- g(6),regardedas a function oft,is thecumulantgeneratingfunction


of z. Hence g'(6) = ,uand g"(0) = V(,u); also diu/dd= g"(0) = V(,). Then we have
aL {_ (0)} dO z- A

This completestheproof.
If K reallyis a loglikelihood,
thenthetheorem showsthat,givenV(jz),we can construct 0
and g(O)byintegration. Theorem9 ofLehmann(1959)showsthatV and g mustbe analytic
functionsand thatthecharacteristic function 0(t) ofz is analyticon thewholereallineand
givenby 0(t) = exp{g(O+ it)-g(O)}. Thus,inprinciple, theproblemofdetermining whether
ornotK is a loglikelihoodis reducedto theproblemofdetermining whethera givencom-
plexfunction 0(t),analyticoverthewholerealline,is a characteristic function;thispoint
willnotbe pursuedfurther here.
In the corollary to Theorem1 in ? 3 it was shownthat

-Et @,b -E 2)

Then Theorem2 showsthat this inequalitybecomesan equalityfora one-parameter


exponentialfamily.Thus for a given mean-variancerelationship,a one-parameter
exponentialfamilyminimizes theinformation - E(a2L/,tt2),providedthatan exponential
family existsfor thatrelationship.
It seemsreasonabletoregard- E(a2K/,tt2), whichis equal to 1/var(z),as a measureofthe
information z givesconcerningIt whenonlythemean-variance relationshipis known,and to
regardE{a2(K - L)/Ia2}, whichis always nonnegative,as the additional information
providedbyknowing thedistributionofz. Fromthispointofview,assuminga one-parameter
exponentialfamilyforz is equivalentto makingno assumptionotherthan the mean-
variancerelationship.

5. ESTIMATION USING QUASI-LIKELIHOODS


Thissectiondiscussesmaximumquasi-likelihood estimatesand showsthattheirprecision
maybe estimatedfromtheexpectedsecondderivatives ofK inthesamewayas theprecision
ofmaximum likelihoodestimatesmaybe estimatedfromtheexpectedsecondderivatives of
theloglikelihood.
For each observation,let u be the vectorwhose componentsare aK/a/Ji.Then,from
Theorem1,uthas mean0 and dispersion matrixwithelements

EaA a2j)

Let H -
a2S(K)/afijDfij; S(u) has mean 0 and
then,if the observationsare independent,
dispersionD = - E(H). Now let , be the maximum quasi-likelihoodestimateoffi,obtained
by settingS(ut)equal to itsexpectation,0. To firstorderin ,-,kwe have S(ut) H(f- A),
whence/8-/ H-1S(u).
and generalizedlinearmodels
Quaasi-likelihood 443
Approximatingto H by its expectation, - D, we have,
A
A l+ D-1S(u).
Now D-1 S(a) has dispersionD-1; hence we have deduced, ratherinformally,the following
result.
THEOREM 3. Maximum quasi-likelihoodestimateshave approximatedispersion matrix
D-1 = {E(H)}-1, whereH is thematrixofsecondderivativesof S(K).
Next, we considerthe case wherethe mean-variancerelationis not knowncompletely,but
the variance is knownto be proportionalto a givenfunctionofthe mean,i.e. var (z) = y V(u),
where V is a known functionbut y is unknown. Clearly the maximum quasi-likelihood
estimateof , is not affectedby the value of y, so that we can calculate A2
as ify was known
to be 1. To obtain errorestimateswe need some estimate of y. Assumingthat It is approxi-
matelylinear in ,8and that V(sc)differsnegligiblyfromV(Ia), we have the approximation

E[S{ j
n-in)2,

which leads to an estimate of y given by


, 1 S((z- )2)

For normallinear models, this gives the usual estimate of variance.

6. A GENERALIZATION
OF THE GAUSS-NEWTONMETHOD
When V(,a)= 1, maximum quasi-likelihood estimation reduces to least squares. One
methodof calculatingthe estimatesis then the Gauss-Newton method. This is an iterative
process in which one calculates a regressionof the residuals on the quantities ajt/D/Ji by
linearleast squares, the residualsand a,ua8j3being calculated fromthe currentestimateoft.
The resultingregressioncoefficients are then used as correctionsto Ai.It will now be shown
that to calculate maximum quasi-likelihoodestimateswith a general V, the Gauss-Newton
methodcan be modifiedsimplyby usingthe currentestimateof 1/V(,u)as a weightvariate in
the least squares calculation.
Writingvi foraulaflj,and r forz -,u, we have
aS(K) Strvi
al8i v
V(1)J
and using Theorem 2
Etj a,flj)=S (6).

A
Then ifwe obtain successive approximationsto using the Newton-Raphson methodwith
the second derivativesof K replaced by theirexpectationswe obtain corrections6,l to the
estimates given, fori = 1,..., n, by

= S{vj,c.)}
Es(F(A))83j (4)
Hence we have proved the following:
444 R. W. M. WEDDERBURN

THEOREM 4. UsingtheNewton-Raphson methodwithexpectedsecondderivativesof K to


calculate,I is equivalent
tocalculating a weighted
iteratively oftheresiduals,
linearregression r,
onthederivatives with
oflt to
respect the/'s with I/
weight V(At),
andusingtheregression
coefficients
as corrections to/?.
Here V(,u)and the derivativesof,t are calculated at the currentestimate of ,4.

7. GENERALIZED LINEAR MODELS


We now derive a result which includes the result of Nelder & Wedderburn (1972) as a
special case. Suppose that some functionof the mean f (I) can be expressedin the form

f (I) = EXAxi= Y,
say, where the x's are known variables. Then in the notation of the previous section
vi = xidl1ldY. Hence (4) may be rewritten

(SV(,t)(d ) V(}) dY
Then if ,t denotes the current estimates and ,*- + 6,, the corrected ones, and if
Y -X/ixi
= we have
I C
xixi)8#Y* 2Y+r d
(()dY)t (); Y)
which proves the next theorem.
THEOREM 5. When Y = f (I) = ,8i/xia methodequivalentto thegeneralizedGauss-Newton
a weighted
is tocalculaterepeatedly
alreadydescribed
method linearregressionof
dY

on xl,.. ., Xmusingas weighting


variate
W= (~2v(1)
I d# 21 (5)

Nelder & Wedderburn showed that this technique could be used to obtain maximum
likelihoodestimateswhen therewas a linearizingtransformationof the meanf(,u),and the
distributionof z could be expressed in the form
ir(z;0,0) = cc(b)exp{zO-g(O)+h(z)}+r/(z,q),
where0 is a functionof,uand 0 is a nuisanceparameter.For fixed0 thisgivesa one-parameter
exponential family,so that the likelihood is the same as the quasi-likelihood.Also, by a
simple extensionof the argumentused in Theorem 1 we have var (z) = g'(0)/1c(0). Hence
the mean-variance relationship is of the form given in (3), and the result of Nelder &
Wedderburnis a special case of Theorem 5.
A good starting approximation in this process is usually given by setting It = z and
calculating w from (5) and y as f (z), but some modificationmay be needed when f has
singularitiesat the ends of the range of possible z.
Quasi-likelihoodand generalizedlinear models 445

8. EXAMPLE
J. F. Jenkynin an unpublishedAberystwythPh.D. thesis,discussed the data of Table I
which gives estimates of the percentageleaf area of barley infectedwith Rhynchosporium
secalis, or leaf blotch,for10 different
varietiesgrownat 9 different sites in a varietytrialin
1965.
Jenkynapplied the angular transformationto the data, and then applied the method of
Finlay & Wilkinson (1963), calculating, for each variety, regressionsof the transformed
percentageson the site means of the transformedpercentages.He founda markedrelation-
ship between the variety means and the slopes of the regressionsand also between the
varietymeans and the residual variances fromthe regression.Thus the angular transforma-
tion failed to produce additivity or to stabilize the variance; in fact, it appeared that a
transformationwitha moreextremeeffectat the ends of the range, or at least at the lower
end, was needed; Jenkyn suggested a logarithmic transformation.Two others suggest
themselves:thelogistictransformation, log (p/q),and the complementarylog log transforma-
tion, log (-log q).

Table 1. Incidenceof R. secalis on leaves of 10 varietiesgrown


at nine sites; percentageleaf area affected
Variety

Site 1 2 3 4 5 6 7 8 9 10 Mean
1 0 05 0 00 0 00 0 10 0-25 0-05 0-50 1*30 1.50 1.50 0*52
2 0-00 0.05 0*05 0*30 0 75 0*30 3 00 7 50 1 00 12*70 2*56
3 1-25 1*25 2-50 16-60 2-50 2*50 0-00 20-00 37 50 26*25 11 03
4 2.50 0*50 0*01 3-00 2-50 0 01 25-00 55 00 5 00 40 00 13-35
5 5.50 1 00 6-00 1*10 2-50 8*00 16-50 29-50 20-00 43 50 13-36

6 1*00 5*00 5-00 5.00 5.00 5 00 10*00 5*00 50*00 75-00 16*60
7 5 00 0 10 5-00 5 00 50-00 10 00 50 00 25-00 50-00 75-00 27-51
8 5-00 10 00 5-00 5-00 25*00 75-00 50 00 75*00 75-00 75-00 40 00
9 17-50 25*00 42-50 50 00 37*50 95 00 62-50 95 00 95-00 95 00 61-50

Mean 4-20 4-77 7.34 9.57 14.00 21*76 24-17 34-81 37-22 49 33

An attemptwas made to analyze the data usinga logistictransformation. To do this,zero


had to be replaced by some suitably small value; since the value 0 01 % occurs in the data
we could hardly replace zero by somethinggreater than this, unless 0-01 0 were to be
increased too. It was foundthat some of these small values in the data gave large negative
residuals which had a serious distortingeffecton the analysis, and only when these values
were ignoredwas it possible to obtain a satisfactoryanalysis.
The logistic transformationappeared to be about right for stabilizing variance and
producingadditivityexcept forthe undesirableeffectof the small observations.This led to
a different formulationof the model which was the same to a firstorderof approximation,
but which avoided the problems caused by applying a logistic transformationto small
observations.
Let pij denote the proportionof leaf area infectedin the ith variety at the jth site. Let
P*j= E(pij) and Qij = 1-IP. The model is stated as logit ij = Yij = mn+a*+b,and
var (pia) = P2j Q?_
446 R. W. M. WEDDERBURN
Whenthe methodof ? 6 is applied the weighting
variateis equal to 1. This is useful
becauseit meansthatthe iterativeanalysisremainsorthogonal.The modifiedy variate,
f (,u),takestheformofa 'workinglogit',namely
-
i = Y. Pij ,,

The quantitiesrij = (pj -PI)/(PIjQij) can be regardedas weightedresiduals.They are


proportionalto residualsdividedby theirestimatedstandarderror.Evidentlythecalcula-
tionsareliketheusualonesforfitting a logisticmodelto quantaldata,butsimpler,because
theweightsare equal to 1.
Whenthemodelwasfitted bythemethoddescribed, therewere,ofcourse,cleardifferences
betweensites,and betweenvarieties.The estimateofy obtainedfromtheresidualmean
square cameto 0-995,a value whichindicateshighvariabilityin the data. It implies,for
instance,thatifPij is 0-20,thestandarddeviationofpii is about 0416.Clearly,forthisto
happen,pij wouldhaveto have a ratherskewdistribution forPIj notnear0 5; examination
oftheweightedresidualsshowedthisskewness.

Table 2. Means oversitesoffitted


valuesoflogitPij
Variety

1 2 3 4 5 6 7 8 9 10
Mean ofA
-4 05 -4-51 -3-96 -3 09 -2-69 -2-71 -1-71 -0-78 -0-91 -0-16
logit Pi1
(Standard error+ 0-331.)

The meanvaluesofYijforeachvarietyare shown,withtheirstandarderrors, in Table 2.


Clearlythereare differencesbetweenvarieties;thereseemto be 3 highlyresistantvarieties
and 3 less so, whiletheremaining 4 are muchmoresusceptible.
Startingwiththefinalset ofworking logits,thetechniqueofFinlay& Wilkinson(1963)
wasappliednoniteratively, buttherewas no signofanyinteraction; norwasthereanywhen
a singledegreeoffreedom fornonadditivity (Tukey,1949)was isolated.
It seems,then,thata simplesummary ofthedata has beenachievedwhichmakesit easy
to see whatconclusions can be drawn.The simplermethodofworking withlogitPIj might
have workedbetterifthevariancehad notbeen so large;partofthetroubleis thatwith
sucha largevariancetheapproximations
E(logitp%j) logPj, var (logitp%j) var (pij)/(P2* Q4)
breakdown.

9. CoNCLusIoNs
It maybe difficult to decidewhatdistribution
one'sobservationsfollow,but theformof
themean-variance is
relationship oftenmuch easierto this
postulate; is whatmakesquasi-
likelihoodsuseful.It has beenseenhowmaximumquasi-likelihood estimation produceda
satisfactoryanalysis of ratherdifficultdata, and how these estimates can be computed.
Someprocedures used in thepast are bestunderstood in termsofquasi-likelihoods.
For
instance,in probitanalysis,whenthe varianceofthe observations is foundto be greater
than that predictedby the binomialdistribution,it is commonto accept the maximum
Quasi-likelihood
and generalizedlinearmodels 447
likelihoodestimatesregardless,
whileestimating thedegreeofheterogeneity as in Chapter4
of Finney(1971). If the varianceis stillproportional
to binomialvariancethenthispro-
cedurecan be justifiedintermsofquasi-likelihoods.
AlsoFisher(1949),finding thatin some
data thevariancewas proportional to the mean,treatedthemeffectively as iftheyhad a
Poissondistribution, eventhoughthe measurement involvedwas a continuousone. Thus
quasi-likelihoods
improveunderstanding ofsomepastprocedures,as wellas providing new
ones.

The authorwishesto thankthedirector oftheNationalInstituteofAgricultural


Botany
and Dr J.F. Jenkynfortheirpermissionto usethedata,MrM. J.R. Healywhosecomments
on an earlierversionof the paper improvedthe presentation and Mr R. W. Payne for
runningthe calculationson theGENSTAT statisticalprogramdevelopedat Rothamsted.

REFERENCES

FINLAY,K. W. & WILKINSON,G. N. (1963). The analysisofadaptation in a plant breedingprogramme.


Aust. J. Agric.Res. 14, 742-54.
FINNEY, D. J. (1971). Probit Analysis, 3rd edition. Cambridge University Press.
FISHER, R. A. (1949). A biological assay of tuberculins.Biometrics5, 300-16.
HARTIGAN, J. A. (1969). Linear Bayesian models. J.R. Statist. Soc. B 31, 446-54.
KENDALL,M. G. & STUART,A. L. (1973). TheAdvancedTheoryofStatistics,Vol. 11,3rdedition.London:
Griffin.
LEHMANN, E. L. (1959). TestingStatisticalHypotheses.New York: Wiley.
NELDER,J. A. & WEDDERBURN,R. W. M. (1972). Generalizedlinear models. J.R. Statist.Soc. A 135,
370-84.
TuKEY, J. W. (1949). One degree of freedomfornon-additivity.Biometrics5, 232-42.

NIovember
[Received 1973.RevisedJune1974]

Você também pode gostar