Você está na página 1de 10

Journal of Computational and Graphical Statistics

ISSN: 1061-8600 (Print) 1537-2715 (Online) Journal homepage: http://www.tandfonline.com/loi/ucgs20

Approximate Conditional Inference in Logistic and


Loglinear Models
Alessandra R. Brazzale
To cite this article: Alessandra R. Brazzale (1999) Approximate Conditional Inference in Logistic
and Loglinear Models, Journal of Computational and Graphical Statistics, 8:3, 653-661
To link to this article: http://dx.doi.org/10.1080/10618600.1999.10474839

Published online: 21 Feb 2012.

Submit your article to this journal

Article views: 45

View related articles

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=ucgs20
Download by: [177.40.235.130]

Date: 01 September 2016, At: 07:38

SPECIAL SECTION: WINNING ENTRIES


OF THE 1998 STUDENT PAPER COMPETITION
The Statistical Computing Section of the American Statistical Association held its annual Student
Paper Competition in 1998. Winners in the competition received a prize and the opportunity to
publish their winning entry in this special section of JCGS. The papers were subjected to an
additional review by the Editorial Board of JCGS. We publish here two of the four winning entries
that have passed the JCGS review at this point; we hope the remaining two entries will follow at
a later time. We congratulate the young authors for their excellent work.-The Editor

Approximate Conditional Inference in


Logistic and Loglinear Models
Alessandra R.BRAZZALE
Recently developed small-sample asymptotics provide nearly exact inference for
parametric statistical models. One approach is via approximate conditional and marginal
inference, respectively, in multiparameter exponential families and regression-scale models. Although the theory is well developed, these methods are under-used in practical
work. This article presents a set of S-Plus routines for approximate conditional inference
in logistic and loglinear regression models. It represents the first step of a project to
create a library for small-sample inference which will include methods for some of the
most widely used statistical models. Details of how the methods have been implemented
are discussed. An example illustrates the code.

Key Words: Conditional inference; Exponential family; Logistic model; Loglinear


model; Small-sample asymptotics; S-Plus.

1. INTRODUCTION
There are many situations where inference on a parameter of interest has to be
carried out in the presence of nuisance parameters. Considerable simplification may
be achieved by either conditioning on a minimal sufficient statistic for the nuisance
parameter or making use of a pivotal quantity which only depends on the parameter
of interest. There are essentially two classes of models for which exact conditional and
marginal inference is available: linear exponential family models and regression-scale
models. Nevertheless, exact calculation usually involves, respectively, the derivation of
Alessandra R. Brazzale is a PhD Student at the Department of Mathematics, Swiss Federal Institute of Technology, 1015 Lausanne, Switzerland (Email: Alessandra.Brazzale@epfl.ch).
@ 1999 American Statistical Association, Institute of Mathematical Statistics,
and Inte$ace Fowuiation of North America
Journal of Computational and Graphical Statistics, Volume 8, Number 3, Pages 6 5 3 4 6 1

653

A. R. BRAZZALE

654

the conditional cumulant generating function or multidimensional numerical integration,


and is hard to do in practice.
In the last 15 or so years, there has been rapid progress in small-sample parametric inference. Saddlepoint approximations and related methods provide highly accurate
approximations for conditional or marginal densities and distribution functions. However, although the theory is well developed, these approximations are still under-used
in practice. They are widely thought to be too complicated to be computed by hand.
But the largest obstacle to a wider use is the lack of software. Even if there are a few
implementations concerning part of the theory, they were developed in a very special
context and apply to a limited class of problems. No library is available that includes an
extensive choice of small-sample methods and models that could be applied in routine
data analysis.
This article presents an S-Plus library for approximate conditional inference in logistic and loglinear regression models. It represents the initial phase of a project that aims
to compare the various formulas in small-sample asymptotics and to implement the more
promising of them for some widely used classes of statistical models. Section 2 contains
a brief review of the main theoretical results. Section 3 gives some technical remarks
concerning the implementation. Section 4 illustrates the software available following an
example. Section 5 contains a short discussion and proposals for future work.

2. BACKGROUND RESULTS
Consider a linear exponential family of order p with natural parameter ($, A) and
minimal sufficient statistic (T,S) having log-likelihood function
$7

4 = $t + A3 - K($,A),

(2.1)

where K($,X) is the cumulant generating function, $ the po-dimensional parameter


of interest, and X represents a nuisance parameter of dimension p - po. Although well
defined (Lehmann 1986, chap. 4), the application of exact conditional methods is possible
in practice only in special cases. Exact calculation requires, in fact, the derivation of the
conditional cumulant generating function of T given S = s. It can be bypassed by recently
developed small-sample asymptotics, which provide nearly exact approximations for the
conditional density and distribution function and can readily be constructed from the
log-likelihood function. The problem was reviewed by Pierce and Peters (1992). I will
give only a short description here.
Using the double saddlepoint approximation to the conditional density of T given
S = s (Barndorff-Nielsen and Cox 1979), the conditional log-likelihood function for
$, &($) can be approximated by an adjustment to the profile log-likelihood function,
tP($), obtained from the full model (2.1); that is,

where .tp($)= l( $ , &) ,& is the constrained maximum likelihood estimate of the
nuisance parameter, and j x x ( $ , A) is the ( p - PO) x ( p - PO) submatrix of the observed

APPROXIMATE
CONDITIONAL
INFERENCE

655

information matrix corresponding to A. The error in (2.2) is of order Op(n-). Note


that the approximate conditional log-likelihood function depends only on the parameter
of interest and can be used to provide an approximation to the conditional maximum
likelihood estimate of $ and its standard error.
If the parameter of interest is scalar, a very accurate approximation for conditional
tail probabilities may be based on the modified directed deviance (Barndorff-Nielsen
1991)
(2.3)
that is,
Pr(T 5 t I S = s ; $) = @ ( r * ) ,

(2.4)

and the relative error in (2.4) is of order O , ( V ~ / ~ Here


) . r and w are the likelihood root
and the Wald statistic constructed from the full model (2.1),

sign(4 - $1 [2{!(4, X)

(4 - @)$2(4),

c($, x$)}I

(4,

where
A) are the maximum likelihood estimates of the parameters, j,(@) = -ti($)
is the observed information from the profile log-likelihood, and p is a correction factor
which takes into account the presence of nuisance parameters

An equivalent approximation can be obtained directly from the log-likelihood for the
joint distribution (2.1) and takes the form
Pr(T 5 t

1 s = s ; $1

1
+ @ ( r )1( ;- --).

= @(r)

(2.5)

Approximation (2.5) is due to Skovgaard (1987).


If the sample space in the reference model (2.1) is a lattice, Pierce and Peters (1992,
sec. 4) discussed suitable continuity corrections: for computing Pr(T 5 t I S = s ; $),
w is multiplied by the factor

Furthermore, in the same article (sec. 3) they suggested a decomposition of the higher
order adjustment in (2.3) into two parts; one being analogous to the adjustment needed
in one-parameter models, and the other pertaining to effects of estimating the nuisance
parameter. We may rewrite r* as
r*=r-INF+NP,

A. R. BRAZZALE

656

where

INF = T

- ~
log(r/v)l

NP = T

- ~log(p),

and refer to INF and NP as, respectively, the information aspect and the nuisance parameter aspect. By taking the absolute values over a reasonable $-range of interest, their
maxima can be used as a diagnostic for the behavior of the small-sample approximations.
The foregoing methods are defined, if the maximum likelihood estimate exists
and is finite. The latter is no longer the truth, if the observed value T = t lies on the
boundary of the conditional sample space. The problem can be overcome by moving t
half a step away-that is, using T = t f .5, while leaving the observed value of S = s
unchanged.

3. IMPLEMENTATION IN S-PLUS
As mentioned at the beginning, small-sample asymptotics are generally thought to
be difficult to implement. Our aim is to make the use of these approximations straightforward to applied statisticians and data analysts. Our s-Plus library for approximate
conditional inference in logistic and loglinear regression models is a first step in this
direction. We decided to use the programming environment S-Plus, as it is well-suited
to the implementation of small-sample asymptotics. Its growing popularity guarantees a
large user base, and the object-oriented programming language can easily be extended.
An extensive collection of tools and facilities for statistics and data analysis, including
large-sample inference for linear and nonlinear statistical models, is already available.
In particular, the function glm ( ) , which may be used to fit a generalized linear model,
includes also the families binomial and Poisson,that is logistic and loglinear regression models. All the output from the fit is thereby stored in an object of class glm. This
last feature proves extremely useful: because both models considered are linear exponential families, the approximations given in Section 2 can be computed straightforwardly
using standard glm( ) output (Davison 1988, p. 451).
The aim of the following is to give a short description of how the methods have
been implemented. The basic idea is to create a new object class, cond, such that
objects of this class store all the elements needed in small-sample inference. Going
back to the expressions in Section 2, we notice that the basic elements we need are
the profile log-likelihood !($l A+), the signed likelihood ratio and Wald statistics, T and
w, and the determinant of the observed information matrix J A A ( $ ~ A) evaluated at both
the maximum likelihood estimate ($,A) and the constrained estimate A@. Once they are
known, all small-sample methods of Section 2 are readily obtained.
One problem for the models considered is that there is no closed-form expression
for the constrained maximum likelihood estimate
For a given value of $, this can be
bypassed by treating the model term +t in (2.1) as fixed-in S-Plus terminology, by including it as an offset in the regression formula-and estimating the corresponding value
of A. To have a pointwise approximation to A+, we repeat this step for several values of $
ranging over a reasonable interval of interest. The two basic quantities that have to be retained at each iteration are the deviance of the reduced model, used to compute the profile

&,.

APPROXIMATE
CONDITIONAL
INFERENCE

657

log-likelihood and the statistic T , and the determinant I j x x ( $ , &,)I. Note that the matrix
A+) has as its inverse the variance-covariance matrix of the constrained maximum
likelihood estimate, which is stored in the corresponding glm object. The remaining quantities, which depend on the unconstrained maximum likelihood estimate, may be retrieved
from the glm object created by fitting the full model (2.1), so the observed information matrices j ( 4 ,A) and j p ( $ ) , which correspond to the inverse of, respectively, the
variance-covariance matrix of
A) and the squared standard error of $. Consequently
l j ~ , ( $A)I
, can be calculated remembering that lj(4,I)I = I&,($, A ) l l j x x ( 4 , A)]. Finally,
for a fixed value of $, the likelihood ratio statistic 2{L(4,1) - L($, A+,)} is equal to the
difference between the deviances of, respectively, the full and the reduced model with
offset $t.
The foregoing steps allow the calculation of all the approximations discussed in
Section 2 exactly for several values of the parameter of interest $ and represent the
bulk of the algorithm we have implemented. To extend them over the whole $-range of
interest, a spline interpolation is used. Estimates and confidence bounds can so be read
off. The spline interpolation works extremely well provided the functions considered
are analytic-for example, the profile and approximate conditional log-likelihoods. Care
must be taken, however, in approximating expressions (2.3) and (2.5) as they have
a singularity at @ =
which causes instabilities in the computation. The problem
is a purely numerical one, due to the tiny values of the determinants involved in the
computation (often of order lop6 or smaller). Around they interact with the statistics
T and 2) which also tend to zero. Skovgaard (1987) proved that the limits exist and are
finite. The implementation of the exact expressions (Davison 1988, sec. 2) is, however,
rather cumbersome and not worthwhile compared to the advantages it may bring. What
we do is to exclude from the spline interpolation exactly calculated points that fall in a
small interval around
and to extrapolate over this range. This does not interfere with
the tail approximations and consequently with the analysis. The numerical instabilities
grow even worse with respect to the information and nuisance parameter aspects. As the
method applied to approximations (2.3) and (2.5) does not work satisfactorily for (2.6),
and we are merely interested in the maxima of their absolute values, a less accurate
resolution is used. We approximate the INF and NP curves by a third-order polynomial
in T , corresponding to a third-order Taylor expansion of INF and NP as a function of
the signed likelihood ratio statistic. The approximation proves not as satisfactory as we
hoped, but still provides insight in the behavior of the methods. Undoubtedly this aspect
needs further consideration.
The computations are coded so that just one command is necessary to generate the
conditional object: the function glm .cond produces an object of class cond by updating or fitting a generalized linear model. The variable of interest has to be scalar, either
numerical or a two-level factor. Families supported are binomial and Poisson. By
default m = 20 points are calculated exactly and are equally spaced over the $-range
of f 3.5 j ~ ( $ ) ,but both the value of m and the length of the interval can be
changed if needed. Method functions have been added and produce the necessary output.
summary.cond returns a summary list for objects of class cond which contains the
unconditional and approximate conditional maximum likelihood estimates of the parameter of interest and their standard errors, large- and small-sample confidence intervals

jxx($,

(4,

4,

A. R. BRAZZALE

658

and, if needed, tail probabilities testing an hypothesis on $. By default, only two-sided


95% confidence intervals are returned. plot. cond creates a set of plots that summarize
graphically the output.
More details on the implementation and further features are given in the documentation that accompanies the code. (The code is available at http://dmawww.epfl.ch/
Nbrazzale.) An example is discussed in the following section.

4. EXAMPLE: CRYING BABIES DATA


Cox (1970, p. 61) gives a dataset in the form of matched pairs of binary observations
concerning the crying of babies. The babies were observed on 18 days and on each day
one child was lulled. Interest focuses on whether lulling has any effect on crying. The
binary logistic model for this situation is that in day j the probability that a treated individual will respond positively, that is not cry, is given by exp( X j $)/{ 1 exp(X j +$)}
and the corresponding probability for an untreated individual is exp(Xj)/{ 1 exp(Xj)},
where $ represents the treatment effect, and the parameters XI,. . . , A18 are regarded as
nuisance parameters.
Let the dataframe babies in S-Plus contain the dataset. The variables l u l l , day
and y represent, respectively, an index vector for treatment, an 18-level factor with one
level for each day, and the response variable "not crying." The following screen dump
shows the steps needed to perform approximate conditional inference with our library.

i babies.glm i - glm(formula=y~day+lull-1,family=binomial,data=babies)
1. babies.cond i - glm.cond(glm.obj=babies.glm,offset=lull)
1. summary (babies.cond)

COEFFICIENTS
Value

Std. Error

uncond.

1.43237

0.733757

cond.

1.27718

0.695223

CONFIDENCE INTERVALS

level = 95 %
lower

MLE normal approximation

-0.005767

upper
2.87051

CMLE normal approximation

-0.085434

2.63979

Directed deviance

0.122813

3.08569

Modified directed deviance

0.007007

2.75580

Modified directed deviance (c.c.) -0.155096

3.09805

DIAGNOSTICS

INF

NP

0.0768 0.285
Approximation based on 20 points
1. plot (babies.cond)

APPROXIMATE
CONDITIONAL
INFERENCE

659

All small-sample results are stored in the conditional object babies .cond created
by g l m . cond. The function summary is used to produce a short summary of the approximate conditional fit. Figure 1 shows some examples of graphical output produced by
p l o t . There is a clear difference between the profile log-likelihood and the approximate
conditional log-likelihood for the parameter of interest $, see Figure l(a), and consequently between the related estimates. Note that, in the same figure, the approximate
conditional and exact conditional log-likelihood functions agree within drawing accuracy
(see Davison 1988). From a theoretical point of view the conditional log-likelihood is better, as it only depends on the parameter of interest and, unlike the profile log-likelihood,
produces a nearly consistent estimate.

0.0

4.0 L

-0.4

3.0

2.0

-0.8

5
5

-1.2

1.0

0.0

-1.6

.-

-2.0

-2.4

-1.0

\
\\.

c
-2.0

\
:?

. . . . . . . . .

.. ..
.. ..

-3.0
-4.0

-2.8

0.0

0.6

1.2 1.8 2.4


Coefficient of lull

(4

3.0

-1.0

0.0
1.0 2.0
3.0
Coefficient of lull

'"""?

''

4.0

(b)

0.049

0.036
-1.0

1.5 3.5
Coefficient of lull

-1.0 1.5 3.5


Coefficient of lull
(c)
Figure 1. Examples of graphical output obtained with plot. condfor the crying babies data. (a)Comparison of
log-likelihoods: thin line =profile log-likelihood; thick line = exact and approximate conditional log-likelihood.
(b) Comparison of test statistics: thin line = directed deviance; thick line = modified directed deviance; dashed
line = unconditional and conditional maximum likelihood estimate normal approximations. (c)Information and
nuisance parameter adjustments.

660

A. R.BRAZZALE

For testing the significance of the treatment effect, several large-sample' (first three
in the summary output) and small-sample (last two) confidence intervals are available.
Figure l(b) shows the directed and modified directed deviance statistics, T and T * . The
dotted horizontal lines correspond to, respectively, the 2.5% and 97.5% quantiles of the
standard normal distribution and are used to read off a 95% confidence interval. The
dashed straight lines represent the normal approximation to the Wald statistics based on
either the unconditional or conditional maximum likelihood estimates.
The information and nuisance parameter aspects are given in Figure l(c). The absolute value of the NP correction term exceeds the limiting value of .2 given by Pierce and
Peters (1992), suggesting a need for small-sample results. This is because our data consist
of 18 independent strata with one scalar nuisance parameter associated to each, and the
parameter of interest in common, so that the information on within each stratum is
very small (see Pierce and Peters 1992, sec. 3).
$J

5. FINAL REMARKS AND ONGOING RESEARCH


The S-Plus routines presented in this article are part of a wider project that aims to
create a library for small-sample inference which will include methods for some of the
most widely used statistical models. I decided to start off with the implementation of
approximate conditional inference for logistic and loglinear regression models essentially
for two reasons. First of all, these models are widely used in practice, especially in epidemiology and medical applications. Standard inference generally relies on large-sample
results, even in situations where the sample size would not justify them. Exact conditional inference, as for instance performed by the packages LogXact and StatXact, is
quite cumbersome and seldom possible due to the complexity of the conditional sample
space. My experience using the S-Plus functions described here is that small-sample results are imperative for those kind of datasets and produce extremely accurate results. The
second reason is that small-sample asymptotics are best developed for linear exponential
families. The approximations given in Section 2 are essentially unique, in the sense that
several asymptotically equivalent, though different methods used to approximate either
the log-likelihood or the distribution function yield the same expression. Moreover all
we need to implement the methods is provided by the standard glm ( ) output, so that
there is no need for extra tools.
The extension of the foregoing S-Plus routines we are currently working on is to the
analysis of regression-scale models-that is, to models of the form y = Xp m, where
y is the response variable, X the design matrix, and the errors E follow a known though
nonnormal distribution. Various authors suggest conditioning on the sample configuration
a = ( y - X & / b , where ,8 and b are the maximum likelihood estimates of the parameters.
Inference on any of the parameters is achieved through pivots whose joint distribution
is known up to a constant (Fisher 1934). Numerous small-sample asymptotic methods
have been proposed to provide accurate approximations of the required marginal tail
probabilities, thus avoiding high-dimensional numerical integration. We are reviewing
the various approximations and comparing their numerical accuracy, stability and ease of
implementation in order to determine the best ones to implement. Conditional simulation

APPROXIMATE
CONDITIONAL
INFERENCE

661

studies using Markov chain Monte Car10 techniques help us to assess systematically the
conditional properties of the methods. A second aspect we are dealing with concerns the
S-Plus implementation: the maximization routines already available have to be adapted
to the situation at hand in such a way that they not only provide the maximum likelihood
estimates of the parameters, but also the output needed to compute the small-sample
results.
The implementation of small-sample asymptotics not only involves the development
of a set of computational tools, but also gives rise to theoretical questions. For instance,
small-sample asymptotics are not robust to model failure. The assessment of sensitivity to
model assumptions, the development of diagnostics for model failure and the implementation of robust resolutions are needed. As to this, research is only in the initial phase;
see for instance the work by Field and Welsh (1997). The extension of small-sample
results to a broader context is not yet as well developed as for linear exponential families and regression-scale models. No systematic study is available which tries to assess
numerically the reliability of the methods. These are only few examples of the work
required before an object-oriented implementation of the methods will be available for
use in practice.
[Received July 1998. Revised August 1998.1

REFERENCES
Bamdorff-Nielsen, 0. E. (199 l), Modified Signed Log-Likelihood Ratio, Biometrikn, 78, 557-564.
Bamdorff-Nielsen, 0. E., and Cox, D. R. (1979). Edgeworth and Saddlepoint Approximations With Statistical
Applications (with discussion), Journal of the Royal Statistical Society, Ser. B, 41, 279-312.
Cox, D. R. (1970). Analysis of Binary Data, London: Chapman and Hall, p. 61.
Davison, A. C. (1988), Approximate Conditional Inference in Generalized Linear Models, Journal of the
Royal Statistical Society, Ser. B, 50, 445-461.
Field, C. A., and Welsh, A. H. (1997). Robust Confidence Intervals for Regression Parameters, to appear in
the Australian Journal of Statistics.
Fisher, R. A. (1934). n o New Properties of Mathematical Likelihood, Proceedings of the Royal Society,
Ser. A, 144, 285-307.
Lehmann, E. L. (1986). Testing Statistical Hypotheses (2nd ed.), New York Wiley.

Pierce, D. A,, and Peters, D. (1992), Practical Use of Higher Order Asymptotics for Multiparameter Exponential Families (with discussion), J o u m l of the Royal Statistical Society ser. B, 54, 701-737.
Skovgaard, I. M. (1987). Saddlepoint Expansions for Conditional Distributions, Journal of Applied Probability, 24, 875-887.

Você também pode gostar