Escolar Documentos
Profissional Documentos
Cultura Documentos
1.0
51
cases where the cell estimates are near the bound-
65
58 63
aries, choosing truncate = TRUE imposes a uniform
52 61 75 prior over the unit hypercube such that all cell frac-
0.8
54
67
tions are restricted to the range [0, 1].
68
Proportion Democratic
37 118
139 Output from ecological regression can be summa-
39
144
rized numerically just as in lm, or graphically using
0.6
212
111
18 71 104
30
31
85 92 97
113
117
122
123 128
130 137
density plots. We also include functions to calculate
2829 34 90 127 147
25
86
91
94
99
110
115
120
131
129
145
estimates and standard errors of shares of a subset
35 89 96
95
of columns in order to address questions such as,
0.4
98
207
200
"What is the Democratic share of 2-party registration
88
for each group?" For the Bayesian model, densities
0.2
Ecological Regression
30
In ecological regression (Goodman, 1953), observed Density
row and column marginals are expressed as propor- 0 10
tions and each column is regressed separately on the
row proportions, thus performing C regressions. Re-
gression coefficients then estimate the population in- 0.2 0.2 0.6 1.0
ternal cell proportions. For a given unit i, define Proportion Democratic
R C
Tci = rci Xri and rci = 1 Figure 2: Density plots of ecological regression out-
r=1 c=1
put.
Defining the population cell fractions rc such that
Cc=1 rc = 1 for every r, ecological regression as- Multinomial-Dirichlet (MD) model
sumes that rc = rci for all i, and estimates the
regression equations Tci = rc Xri + ci . Under In the Multinomial-Dirichlet model proposed by
the standard linear regression assumptions, includ- Rosen et al. (2001), the data is expressed as counts
ing E[ci ] = 0 and Var[ci ] = c2 for all i, these and a hierarchical Bayesian model is fit using a
regressions recover the population parameters rc . Metropolis-within-Gibbs algorithm implemented in
eiPack implements frequentist and Bayesian regres- C. Level 1 models the observed column marginals
sion models (via ei.reg and ei.reg.bayes, respec- as multinomial (and independent across units); the
tively). choice of the multinomial corresponds to sampling
In the Bayesian implementation, we offer two op- with replacement from the population. Level 2 mod-
tions for the prior on rc . As a default, truncate els the unobserved row cell fractions as Dirichlet
= FALSE uses an uninformative flat prior that pro- (and independent across rows and units); Level 3
vides point estimates approaching the frequentist es- models the Dirichlet parameters as i.i.d. Gamma.
timates (even when those estimates are outside the More formally, without a covariate, the model is
Proportion of White Democrats
by Rosen et al. (2001), or they may specify normal
priors for each rc and rc as follows:
0.6
N(rc , 2rc )
rc
N(rc , 2rc )
rc
0.4
As Wakefield (2004) notes, the weak identification
that characterizes hierarchical models in the EI con-
0.2
E(rci )
log
E(rCi )
= rc + rc Zi Data Management
Conducting an analysis using the MD model re- In the MD model, reasonable-sized problems produce
quires two steps. First, tuneMD calibrates the tuning unreasonable amounts of data. For example, a model
parameters used for Metropolis-Hastings sampling: for voting in Ohio includes 11000 precincts, 3 racial
groups, and 4 parties. Implementing 1000 iterations
> tune.nocov <- tuneMD(cbind(dem, rep, non)
+ ~ cbind(black, white, natam), data = senc, yields about 130 million parameter draws. These
+ ntunes = 10, totaldraws = 100000) draws occupy about 1GB of RAM, and this is almost
certainly not enough iterations. We provide a few
Second, ei.MD.bayes fits the model by calling C code options to users in order to make this model tractable
to generate MCMC draws: for large EI problems.
> out.nocov <- ei.MD.bayes(cbind(dem, rep, non) The unit-level parameters present the most sig-
+ ~ cbind(black, white, natam), nificant data management problem. Rather than
+ covariate = NULL, data = senc, storing unit-level parameters in the workspace,
+ tune.list = tune.nocov) users can save each chain as a .tar.gz file on
disk using the option ei.MD.bayes(..., ret.beta L. Goodman. Ecological regressions and the behav-
= "s"), or discard the unit-level draws entirely us- ior of individuals. American Sociological Review, 18:
ing ei.MD.bayes(..., ret.beta = "d"). To recon- 663664, 1953.
struct the chains, users can select the row marginals,
column marginals, and units of interest, without re- B. Grofman. A primer on racial bloc voting analy-
constructing the entire matrix of unit-level draws: sis. In N. Persily, editor, The Real Y2K Problem: Cen-
> read.betas(rows = c("black", "white"),
sus 2000 Data and Redistricting Technology. Brennan
+ columns = "dem", units = 1:150, Center for Justice, New York, 2000.
+ dir = getwd())
K. Imai and Y. Lu. eco: R Package for Fitting
If users are interested in some function of the unit- Bayesian Models of Ecological cvs c Inference in 2x2
level parameters, the implementation of the MD Tables, 2005. URL http://imai.princeton.edu/
model allows them to define a function in R that research/eco.html.
will be called from within the C sampling algorithm,
in which case the unit-level parameters need not be A. D. Martin and K. M. Quinn. Applied Bayesian
saved for post-processing. inference in R using MCMCpack. R News, 6:27,
2006.
Acknowledgments M. Plummer, N. Best, K. Cowles, and K. Vines.
CODA: Convergence diagnostics and output anal-
eiPack was developed with the support of the In- ysis for MCMC. R News, 6:711, 2006.
stitute for Quantitative Social Science at Harvard
University. Thanks to John Fox, Gary King, Kevin O. Rosen, W. Jiang, G. King, and M. A. Tanner.
Quinn, D. James Greiner, and an anonymous ref- Bayesian and frequentist inference for ecological
eree for suggestions and Matt Cox and Bob Kinney inference: The R C case. Statistica Neerlandica,
for technical advice. For further information, see 55(2):134156, 2001.
http://www.olivialau.org/software.
J. Wakefield. Ecological inference for 2 2 tables
(with discussion). Journal of the Royal Statistical So-
Bibliography ciety, 167:385445, 2004.
W. T. Cho and A. H. Yoon. Strange bedfellows: Pol-
itics, courts and statistics: Statistical expert testi-
Olivia Lau, Ryan T. Moore, Michael Kellermann
mony in voting rights cases. Cornell Journal of Law
Institute for Quantitative Social Science
and Public Policy, 10:237264, 2001.
Harvard University, Cambridge, MA
O. D. Duncan and B. Davis. An alternative to ecolog- olivia.lau@post.harvard.edu
ical correlation. American Sociological Review, 18: ryantmoore@post.harvard.edu
665666, 1953. kellerm@fas.harvard.edu