Escolar Documentos
Profissional Documentos
Cultura Documentos
Printed,in GreatBritain
functions,
Quasi-likelihood linearmodels,
generalized
and theGauss-Newtonmethod
BY R. W. M. WEDDERBURN
RothamstedExperimentalStation,Harpenden,Herts.
SUMMARY
To definea likelihoodwe have to specifythe formofdistributionofthe observations,but
to definea quasi-likelihoodfunctionwe need only specifya relationbetween the mean and
variance of the observationsand the quasi-likelihoodcan then be used forestimation.For
a one-parameterexponential familythe log likelihood is the same as the quasi-likelihood
and it follows that assuming a one-parameterexponential familyis the weakest sort of
distributionalassumption that can be made. The Gauss-Newton method for calculating
nonlinearleast squares estimatesgeneralizeseasily to deal with maximum quasi-likelihood
estimates,and a rearrangementof this produces a generalizationof the method described
by Nelder & Wedderburn(1972).
Some key words: Estimation; Exponential families; Gauss-Newton method; Generalized linear models;
Maximum likelihood; Quasi-likelihood.
1. INTRODUCTION
K(zi, i)=|
zi
d14+ functionofz*.
Fromnowon,whenconivenient, thesubscripti willbe droppedso thatz and ,t willrefer
to an observation and its expectation,
respectively. Also,thenotationS(.) willbe used to
denotesummation overtheobservations,so thatS(z) isthesumoftheobservations. We shall
findthatK has manyproperties in commonwitha loglikelihoodfunction.In fact,we find
that K is the log likelihoodfunctionof the distributionif z comesfroma one-parameter
exponentialfamily,as willbe shownin ? 4.
3. PROPERTIES OF QUASI-LIKELIHOODS
K has properties
It is nowshownthatthefunction similarto thoseoflog likelihoods.
THEOREM 1.Letz and K bedefined as in ? 2, and supposethatIt is expressed
as a function
ofparameters
fi1,
...X An, ThenK has thefollowing properties:
a
(i) E ( = O,
(i)
(ii) Ea p3)0,
(iii) E =
aK= E
-E
32K
ial)l 'I all,
(iv) E
= (j)fi9f1
Quasi-likelihoodand generalizedlinear models 441
Proof. First, (i) followsimmediatelyfromthe definitionof K. Then (ii) followson noting
that aK/lfi*= (VK/l,u)(l/tla/3j)and (iii) is a special case of (iv).
To prove (iv), we note that
-E(8f V()J#}
4 -E
-l V(1 ( V() m} il
V(#)Df
C6a
alCb I al
1 altl alj
COROLLARY. If thedistribution
ofz is specified
in terms
ofa, sothattheloglikelihood
L canbe
then
defined,
-E E (2)
)X(62)
Proof. From the theoremjust proved, the above statementis equivalent to
0=d=_T
(4) )
442 R. W. M. WEDDERBURN
we have therequiredresult.Conversely,supposeforsomemeasurem on therealline,the
ofz is givenby exp{zO- g(O)}dm(z).ThenfeZOdm(z)= eu(O),and so themoment
distribution
generatingfunction M(t) ofz is
fez(0+t) e-g(0)dm(z) = eg(O+t)-g(0)
This completestheproof.
If K reallyis a loglikelihood,
thenthetheorem showsthat,givenV(jz),we can construct 0
and g(O)byintegration. Theorem9 ofLehmann(1959)showsthatV and g mustbe analytic
functionsand thatthecharacteristic function 0(t) ofz is analyticon thewholereallineand
givenby 0(t) = exp{g(O+ it)-g(O)}. Thus,inprinciple, theproblemofdetermining whether
ornotK is a loglikelihoodis reducedto theproblemofdetermining whethera givencom-
plexfunction 0(t),analyticoverthewholerealline,is a characteristic function;thispoint
willnotbe pursuedfurther here.
In the corollary to Theorem1 in ? 3 it was shownthat
-Et @,b -E 2)
EaA a2j)
Let H -
a2S(K)/afijDfij; S(u) has mean 0 and
then,if the observationsare independent,
dispersionD = - E(H). Now let , be the maximum quasi-likelihoodestimateoffi,obtained
by settingS(ut)equal to itsexpectation,0. To firstorderin ,-,kwe have S(ut) H(f- A),
whence/8-/ H-1S(u).
and generalizedlinearmodels
Quaasi-likelihood 443
Approximatingto H by its expectation, - D, we have,
A
A l+ D-1S(u).
Now D-1 S(a) has dispersionD-1; hence we have deduced, ratherinformally,the following
result.
THEOREM 3. Maximum quasi-likelihoodestimateshave approximatedispersion matrix
D-1 = {E(H)}-1, whereH is thematrixofsecondderivativesof S(K).
Next, we considerthe case wherethe mean-variancerelationis not knowncompletely,but
the variance is knownto be proportionalto a givenfunctionofthe mean,i.e. var (z) = y V(u),
where V is a known functionbut y is unknown. Clearly the maximum quasi-likelihood
estimateof , is not affectedby the value of y, so that we can calculate A2
as ify was known
to be 1. To obtain errorestimateswe need some estimate of y. Assumingthat It is approxi-
matelylinear in ,8and that V(sc)differsnegligiblyfromV(Ia), we have the approximation
E[S{ j
n-in)2,
6. A GENERALIZATION
OF THE GAUSS-NEWTONMETHOD
When V(,a)= 1, maximum quasi-likelihood estimation reduces to least squares. One
methodof calculatingthe estimatesis then the Gauss-Newton method. This is an iterative
process in which one calculates a regressionof the residuals on the quantities ajt/D/Ji by
linearleast squares, the residualsand a,ua8j3being calculated fromthe currentestimateoft.
The resultingregressioncoefficients are then used as correctionsto Ai.It will now be shown
that to calculate maximum quasi-likelihoodestimateswith a general V, the Gauss-Newton
methodcan be modifiedsimplyby usingthe currentestimateof 1/V(,u)as a weightvariate in
the least squares calculation.
Writingvi foraulaflj,and r forz -,u, we have
aS(K) Strvi
al8i v
V(1)J
and using Theorem 2
Etj a,flj)=S (6).
A
Then ifwe obtain successive approximationsto using the Newton-Raphson methodwith
the second derivativesof K replaced by theirexpectationswe obtain corrections6,l to the
estimates given, fori = 1,..., n, by
= S{vj,c.)}
Es(F(A))83j (4)
Hence we have proved the following:
444 R. W. M. WEDDERBURN
f (I) = EXAxi= Y,
say, where the x's are known variables. Then in the notation of the previous section
vi = xidl1ldY. Hence (4) may be rewritten
(SV(,t)(d ) V(}) dY
Then if ,t denotes the current estimates and ,*- + 6,, the corrected ones, and if
Y -X/ixi
= we have
I C
xixi)8#Y* 2Y+r d
(()dY)t (); Y)
which proves the next theorem.
THEOREM 5. When Y = f (I) = ,8i/xia methodequivalentto thegeneralizedGauss-Newton
a weighted
is tocalculaterepeatedly
alreadydescribed
method linearregressionof
dY
Nelder & Wedderburn showed that this technique could be used to obtain maximum
likelihoodestimateswhen therewas a linearizingtransformationof the meanf(,u),and the
distributionof z could be expressed in the form
ir(z;0,0) = cc(b)exp{zO-g(O)+h(z)}+r/(z,q),
where0 is a functionof,uand 0 is a nuisanceparameter.For fixed0 thisgivesa one-parameter
exponential family,so that the likelihood is the same as the quasi-likelihood.Also, by a
simple extensionof the argumentused in Theorem 1 we have var (z) = g'(0)/1c(0). Hence
the mean-variance relationship is of the form given in (3), and the result of Nelder &
Wedderburnis a special case of Theorem 5.
A good starting approximation in this process is usually given by setting It = z and
calculating w from (5) and y as f (z), but some modificationmay be needed when f has
singularitiesat the ends of the range of possible z.
Quasi-likelihoodand generalizedlinear models 445
8. EXAMPLE
J. F. Jenkynin an unpublishedAberystwythPh.D. thesis,discussed the data of Table I
which gives estimates of the percentageleaf area of barley infectedwith Rhynchosporium
secalis, or leaf blotch,for10 different
varietiesgrownat 9 different sites in a varietytrialin
1965.
Jenkynapplied the angular transformationto the data, and then applied the method of
Finlay & Wilkinson (1963), calculating, for each variety, regressionsof the transformed
percentageson the site means of the transformedpercentages.He founda markedrelation-
ship between the variety means and the slopes of the regressionsand also between the
varietymeans and the residual variances fromthe regression.Thus the angular transforma-
tion failed to produce additivity or to stabilize the variance; in fact, it appeared that a
transformationwitha moreextremeeffectat the ends of the range, or at least at the lower
end, was needed; Jenkyn suggested a logarithmic transformation.Two others suggest
themselves:thelogistictransformation, log (p/q),and the complementarylog log transforma-
tion, log (-log q).
Site 1 2 3 4 5 6 7 8 9 10 Mean
1 0 05 0 00 0 00 0 10 0-25 0-05 0-50 1*30 1.50 1.50 0*52
2 0-00 0.05 0*05 0*30 0 75 0*30 3 00 7 50 1 00 12*70 2*56
3 1-25 1*25 2-50 16-60 2-50 2*50 0-00 20-00 37 50 26*25 11 03
4 2.50 0*50 0*01 3-00 2-50 0 01 25-00 55 00 5 00 40 00 13-35
5 5.50 1 00 6-00 1*10 2-50 8*00 16-50 29-50 20-00 43 50 13-36
6 1*00 5*00 5-00 5.00 5.00 5 00 10*00 5*00 50*00 75-00 16*60
7 5 00 0 10 5-00 5 00 50-00 10 00 50 00 25-00 50-00 75-00 27-51
8 5-00 10 00 5-00 5-00 25*00 75-00 50 00 75*00 75-00 75-00 40 00
9 17-50 25*00 42-50 50 00 37*50 95 00 62-50 95 00 95-00 95 00 61-50
Mean 4-20 4-77 7.34 9.57 14.00 21*76 24-17 34-81 37-22 49 33
1 2 3 4 5 6 7 8 9 10
Mean ofA
-4 05 -4-51 -3-96 -3 09 -2-69 -2-71 -1-71 -0-78 -0-91 -0-16
logit Pi1
(Standard error+ 0-331.)
9. CoNCLusIoNs
It maybe difficult to decidewhatdistribution
one'sobservationsfollow,but theformof
themean-variance is
relationship oftenmuch easierto this
postulate; is whatmakesquasi-
likelihoodsuseful.It has beenseenhowmaximumquasi-likelihood estimation produceda
satisfactoryanalysis of ratherdifficultdata, and how these estimates can be computed.
Someprocedures used in thepast are bestunderstood in termsofquasi-likelihoods.
For
instance,in probitanalysis,whenthe varianceofthe observations is foundto be greater
than that predictedby the binomialdistribution,it is commonto accept the maximum
Quasi-likelihood
and generalizedlinearmodels 447
likelihoodestimatesregardless,
whileestimating thedegreeofheterogeneity as in Chapter4
of Finney(1971). If the varianceis stillproportional
to binomialvariancethenthispro-
cedurecan be justifiedintermsofquasi-likelihoods.
AlsoFisher(1949),finding thatin some
data thevariancewas proportional to the mean,treatedthemeffectively as iftheyhad a
Poissondistribution, eventhoughthe measurement involvedwas a continuousone. Thus
quasi-likelihoods
improveunderstanding ofsomepastprocedures,as wellas providing new
ones.
REFERENCES
NIovember
[Received 1973.RevisedJune1974]