Escolar Documentos
Profissional Documentos
Cultura Documentos
Article views: 45
1. INTRODUCTION
There are many situations where inference on a parameter of interest has to be
carried out in the presence of nuisance parameters. Considerable simplification may
be achieved by either conditioning on a minimal sufficient statistic for the nuisance
parameter or making use of a pivotal quantity which only depends on the parameter
of interest. There are essentially two classes of models for which exact conditional and
marginal inference is available: linear exponential family models and regression-scale
models. Nevertheless, exact calculation usually involves, respectively, the derivation of
Alessandra R. Brazzale is a PhD Student at the Department of Mathematics, Swiss Federal Institute of Technology, 1015 Lausanne, Switzerland (Email: Alessandra.Brazzale@epfl.ch).
@ 1999 American Statistical Association, Institute of Mathematical Statistics,
and Inte$ace Fowuiation of North America
Journal of Computational and Graphical Statistics, Volume 8, Number 3, Pages 6 5 3 4 6 1
653
A. R. BRAZZALE
654
2. BACKGROUND RESULTS
Consider a linear exponential family of order p with natural parameter ($, A) and
minimal sufficient statistic (T,S) having log-likelihood function
$7
4 = $t + A3 - K($,A),
(2.1)
where .tp($)= l( $ , &) ,& is the constrained maximum likelihood estimate of the
nuisance parameter, and j x x ( $ , A) is the ( p - PO) x ( p - PO) submatrix of the observed
APPROXIMATE
CONDITIONAL
INFERENCE
655
(2.4)
sign(4 - $1 [2{!(4, X)
(4 - @)$2(4),
c($, x$)}I
(4,
where
A) are the maximum likelihood estimates of the parameters, j,(@) = -ti($)
is the observed information from the profile log-likelihood, and p is a correction factor
which takes into account the presence of nuisance parameters
An equivalent approximation can be obtained directly from the log-likelihood for the
joint distribution (2.1) and takes the form
Pr(T 5 t
1 s = s ; $1
1
+ @ ( r )1( ;- --).
= @(r)
(2.5)
Furthermore, in the same article (sec. 3) they suggested a decomposition of the higher
order adjustment in (2.3) into two parts; one being analogous to the adjustment needed
in one-parameter models, and the other pertaining to effects of estimating the nuisance
parameter. We may rewrite r* as
r*=r-INF+NP,
A. R. BRAZZALE
656
where
INF = T
- ~
log(r/v)l
NP = T
- ~log(p),
and refer to INF and NP as, respectively, the information aspect and the nuisance parameter aspect. By taking the absolute values over a reasonable $-range of interest, their
maxima can be used as a diagnostic for the behavior of the small-sample approximations.
The foregoing methods are defined, if the maximum likelihood estimate exists
and is finite. The latter is no longer the truth, if the observed value T = t lies on the
boundary of the conditional sample space. The problem can be overcome by moving t
half a step away-that is, using T = t f .5, while leaving the observed value of S = s
unchanged.
3. IMPLEMENTATION IN S-PLUS
As mentioned at the beginning, small-sample asymptotics are generally thought to
be difficult to implement. Our aim is to make the use of these approximations straightforward to applied statisticians and data analysts. Our s-Plus library for approximate
conditional inference in logistic and loglinear regression models is a first step in this
direction. We decided to use the programming environment S-Plus, as it is well-suited
to the implementation of small-sample asymptotics. Its growing popularity guarantees a
large user base, and the object-oriented programming language can easily be extended.
An extensive collection of tools and facilities for statistics and data analysis, including
large-sample inference for linear and nonlinear statistical models, is already available.
In particular, the function glm ( ) , which may be used to fit a generalized linear model,
includes also the families binomial and Poisson,that is logistic and loglinear regression models. All the output from the fit is thereby stored in an object of class glm. This
last feature proves extremely useful: because both models considered are linear exponential families, the approximations given in Section 2 can be computed straightforwardly
using standard glm( ) output (Davison 1988, p. 451).
The aim of the following is to give a short description of how the methods have
been implemented. The basic idea is to create a new object class, cond, such that
objects of this class store all the elements needed in small-sample inference. Going
back to the expressions in Section 2, we notice that the basic elements we need are
the profile log-likelihood !($l A+), the signed likelihood ratio and Wald statistics, T and
w, and the determinant of the observed information matrix J A A ( $ ~ A) evaluated at both
the maximum likelihood estimate ($,A) and the constrained estimate A@. Once they are
known, all small-sample methods of Section 2 are readily obtained.
One problem for the models considered is that there is no closed-form expression
for the constrained maximum likelihood estimate
For a given value of $, this can be
bypassed by treating the model term +t in (2.1) as fixed-in S-Plus terminology, by including it as an offset in the regression formula-and estimating the corresponding value
of A. To have a pointwise approximation to A+, we repeat this step for several values of $
ranging over a reasonable interval of interest. The two basic quantities that have to be retained at each iteration are the deviance of the reduced model, used to compute the profile
&,.
APPROXIMATE
CONDITIONAL
INFERENCE
657
log-likelihood and the statistic T , and the determinant I j x x ( $ , &,)I. Note that the matrix
A+) has as its inverse the variance-covariance matrix of the constrained maximum
likelihood estimate, which is stored in the corresponding glm object. The remaining quantities, which depend on the unconstrained maximum likelihood estimate, may be retrieved
from the glm object created by fitting the full model (2.1), so the observed information matrices j ( 4 ,A) and j p ( $ ) , which correspond to the inverse of, respectively, the
variance-covariance matrix of
A) and the squared standard error of $. Consequently
l j ~ , ( $A)I
, can be calculated remembering that lj(4,I)I = I&,($, A ) l l j x x ( 4 , A)]. Finally,
for a fixed value of $, the likelihood ratio statistic 2{L(4,1) - L($, A+,)} is equal to the
difference between the deviances of, respectively, the full and the reduced model with
offset $t.
The foregoing steps allow the calculation of all the approximations discussed in
Section 2 exactly for several values of the parameter of interest $ and represent the
bulk of the algorithm we have implemented. To extend them over the whole $-range of
interest, a spline interpolation is used. Estimates and confidence bounds can so be read
off. The spline interpolation works extremely well provided the functions considered
are analytic-for example, the profile and approximate conditional log-likelihoods. Care
must be taken, however, in approximating expressions (2.3) and (2.5) as they have
a singularity at @ =
which causes instabilities in the computation. The problem
is a purely numerical one, due to the tiny values of the determinants involved in the
computation (often of order lop6 or smaller). Around they interact with the statistics
T and 2) which also tend to zero. Skovgaard (1987) proved that the limits exist and are
finite. The implementation of the exact expressions (Davison 1988, sec. 2) is, however,
rather cumbersome and not worthwhile compared to the advantages it may bring. What
we do is to exclude from the spline interpolation exactly calculated points that fall in a
small interval around
and to extrapolate over this range. This does not interfere with
the tail approximations and consequently with the analysis. The numerical instabilities
grow even worse with respect to the information and nuisance parameter aspects. As the
method applied to approximations (2.3) and (2.5) does not work satisfactorily for (2.6),
and we are merely interested in the maxima of their absolute values, a less accurate
resolution is used. We approximate the INF and NP curves by a third-order polynomial
in T , corresponding to a third-order Taylor expansion of INF and NP as a function of
the signed likelihood ratio statistic. The approximation proves not as satisfactory as we
hoped, but still provides insight in the behavior of the methods. Undoubtedly this aspect
needs further consideration.
The computations are coded so that just one command is necessary to generate the
conditional object: the function glm .cond produces an object of class cond by updating or fitting a generalized linear model. The variable of interest has to be scalar, either
numerical or a two-level factor. Families supported are binomial and Poisson. By
default m = 20 points are calculated exactly and are equally spaced over the $-range
of f 3.5 j ~ ( $ ) ,but both the value of m and the length of the interval can be
changed if needed. Method functions have been added and produce the necessary output.
summary.cond returns a summary list for objects of class cond which contains the
unconditional and approximate conditional maximum likelihood estimates of the parameter of interest and their standard errors, large- and small-sample confidence intervals
jxx($,
(4,
4,
A. R. BRAZZALE
658
i babies.glm i - glm(formula=y~day+lull-1,family=binomial,data=babies)
1. babies.cond i - glm.cond(glm.obj=babies.glm,offset=lull)
1. summary (babies.cond)
COEFFICIENTS
Value
Std. Error
uncond.
1.43237
0.733757
cond.
1.27718
0.695223
CONFIDENCE INTERVALS
level = 95 %
lower
-0.005767
upper
2.87051
-0.085434
2.63979
Directed deviance
0.122813
3.08569
0.007007
2.75580
3.09805
DIAGNOSTICS
INF
NP
0.0768 0.285
Approximation based on 20 points
1. plot (babies.cond)
APPROXIMATE
CONDITIONAL
INFERENCE
659
All small-sample results are stored in the conditional object babies .cond created
by g l m . cond. The function summary is used to produce a short summary of the approximate conditional fit. Figure 1 shows some examples of graphical output produced by
p l o t . There is a clear difference between the profile log-likelihood and the approximate
conditional log-likelihood for the parameter of interest $, see Figure l(a), and consequently between the related estimates. Note that, in the same figure, the approximate
conditional and exact conditional log-likelihood functions agree within drawing accuracy
(see Davison 1988). From a theoretical point of view the conditional log-likelihood is better, as it only depends on the parameter of interest and, unlike the profile log-likelihood,
produces a nearly consistent estimate.
0.0
4.0 L
-0.4
3.0
2.0
-0.8
5
5
-1.2
1.0
0.0
-1.6
.-
-2.0
-2.4
-1.0
\
\\.
c
-2.0
\
:?
. . . . . . . . .
.. ..
.. ..
-3.0
-4.0
-2.8
0.0
0.6
(4
3.0
-1.0
0.0
1.0 2.0
3.0
Coefficient of lull
'"""?
''
4.0
(b)
0.049
0.036
-1.0
1.5 3.5
Coefficient of lull
660
A. R.BRAZZALE
For testing the significance of the treatment effect, several large-sample' (first three
in the summary output) and small-sample (last two) confidence intervals are available.
Figure l(b) shows the directed and modified directed deviance statistics, T and T * . The
dotted horizontal lines correspond to, respectively, the 2.5% and 97.5% quantiles of the
standard normal distribution and are used to read off a 95% confidence interval. The
dashed straight lines represent the normal approximation to the Wald statistics based on
either the unconditional or conditional maximum likelihood estimates.
The information and nuisance parameter aspects are given in Figure l(c). The absolute value of the NP correction term exceeds the limiting value of .2 given by Pierce and
Peters (1992), suggesting a need for small-sample results. This is because our data consist
of 18 independent strata with one scalar nuisance parameter associated to each, and the
parameter of interest in common, so that the information on within each stratum is
very small (see Pierce and Peters 1992, sec. 3).
$J
APPROXIMATE
CONDITIONAL
INFERENCE
661
studies using Markov chain Monte Car10 techniques help us to assess systematically the
conditional properties of the methods. A second aspect we are dealing with concerns the
S-Plus implementation: the maximization routines already available have to be adapted
to the situation at hand in such a way that they not only provide the maximum likelihood
estimates of the parameters, but also the output needed to compute the small-sample
results.
The implementation of small-sample asymptotics not only involves the development
of a set of computational tools, but also gives rise to theoretical questions. For instance,
small-sample asymptotics are not robust to model failure. The assessment of sensitivity to
model assumptions, the development of diagnostics for model failure and the implementation of robust resolutions are needed. As to this, research is only in the initial phase;
see for instance the work by Field and Welsh (1997). The extension of small-sample
results to a broader context is not yet as well developed as for linear exponential families and regression-scale models. No systematic study is available which tries to assess
numerically the reliability of the methods. These are only few examples of the work
required before an object-oriented implementation of the methods will be available for
use in practice.
[Received July 1998. Revised August 1998.1
REFERENCES
Bamdorff-Nielsen, 0. E. (199 l), Modified Signed Log-Likelihood Ratio, Biometrikn, 78, 557-564.
Bamdorff-Nielsen, 0. E., and Cox, D. R. (1979). Edgeworth and Saddlepoint Approximations With Statistical
Applications (with discussion), Journal of the Royal Statistical Society, Ser. B, 41, 279-312.
Cox, D. R. (1970). Analysis of Binary Data, London: Chapman and Hall, p. 61.
Davison, A. C. (1988), Approximate Conditional Inference in Generalized Linear Models, Journal of the
Royal Statistical Society, Ser. B, 50, 445-461.
Field, C. A., and Welsh, A. H. (1997). Robust Confidence Intervals for Regression Parameters, to appear in
the Australian Journal of Statistics.
Fisher, R. A. (1934). n o New Properties of Mathematical Likelihood, Proceedings of the Royal Society,
Ser. A, 144, 285-307.
Lehmann, E. L. (1986). Testing Statistical Hypotheses (2nd ed.), New York Wiley.
Pierce, D. A,, and Peters, D. (1992), Practical Use of Higher Order Asymptotics for Multiparameter Exponential Families (with discussion), J o u m l of the Royal Statistical Society ser. B, 54, 701-737.
Skovgaard, I. M. (1987). Saddlepoint Expansions for Conditional Distributions, Journal of Applied Probability, 24, 875-887.