Você está na página 1de 14

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/256998909

Measurement errors in a spatial context

Article in Regional Science and Urban Economics · January 2012


DOI: 10.1016/j.regsciurbeco.2011.08.004

CITATIONS READS

4 73

2 authors:

Julie Le Gallo Bernard Fingleton


Institut national supérieur des sciences agro… University of Cambridge
142 PUBLICATIONS 3,231 CITATIONS 123 PUBLICATIONS 4,220 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Environmental quality and housing markets in the city of Madrid View project

Income growth and territorial dynamics View project

All content following this page was uploaded by Julie Le Gallo on 10 November 2017.

The user has requested enhancement of the downloaded file.


(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

This article appeared in a journal published by Elsevier. The attached


copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy

Regional Science and Urban Economics 42 (2012) 114–125

Contents lists available at SciVerse ScienceDirect

Regional Science and Urban Economics


j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / r e g e c

Measurement errors in a spatial context ☆


Julie Le Gallo a,⁎, Bernard Fingleton b, 1
a
CRESE, Université de Franche-Comté, 45D, avenue de l'Observatoire, 25030 Besançon Cedex, France
b
Department of Economics, University of Strathclyde, Sir William Duncan Building, 130 Rottenrow, Glasgow G4 0GE, UK

a r t i c l e i n f o a b s t r a c t

Article history: Measurement error in an independent variable is one reason why OLS estimates may not be consistent.
Received 12 October 2010 However, as shown by Dagenais (1994), in some circumstances the OLS bias may be ameliorated somewhat
Received in revised form 1 August 2011 given the presence of serially correlated disturbances, and OLS may prove superior to standard techniques
Accepted 8 August 2011
used to correct for serial correlation. This paper considers the case of cross-sectional regression models with
Available online xxxx
measurement errors in the explanatory variables and with spatial dependence. The study focuses on the
JEL classification:
evidence provided by an empirical illustration and Monte Carlo experiments examining measurement error
C13 impact in the presence of autoregressive error processes and autoregressive spatial lags.
C21 © 2011 Elsevier B.V. All rights reserved.
R15

Keywords:
Measurement error
Spatial autocorrelation
Instrumental variables
GMM
Monte-Carlo simulations

1. Introduction endogeneity may come from simultaneity or omitted variable(s)


correlated with variables in the regression model, our aim here is to
In cross-sectional models with spatial data, spatial dependence focus on the specific case of endogeneity due to measurement errors
often arises as a consequence of spatial interaction between (errors-in-variables or EIV).
individuals and geographic spillovers. When it involves simultaneous Often, we can regress variables measured with error with little or
interaction between values of a dependent variable y at different no consequence. For instance, measurement error in a dependent
locations, it can be modeled in a regression model with a spatial lag variable is absorbed as part of the error term and therefore this form
variable of the form Wy as an explanatory variable, where W is a of measurement error has no effect on the consistency of parameter
spatial weights matrix and y is the vector of observations on the estimates. Also, while an observed explanatory variable may be an
dependent variable. As is well-known in the spatial econometrics imprecise measure of an unobserved variable, we may be happy to
literature, the spatial lag is endogenous and the appropriate treatment estimate the model using the observed explanatory variable and treat
and estimation of the endogenous spatial lag has therefore received a that as the true object of interest, without wishing to infer anything
good deal of attention (Cliff and Ord, 1981; Upton and Fingleton, about the unobserved variable. The presence of measurement error
1985; Anselin, 1988, 2006). However, latterly the analysis of other becomes an issue when we are actually interested in obtaining
possible endogenous explanatory variables has also begun to garner consistent estimates of a parameter relating to the unobserved
attention from an applied (Anselin and Lozano-Gracia, 2008; Dall'erba variable, but only observe a variable that is the unobserved variable
and Le Gallo, 2008) and a theoretical (Kelejian and Prucha, 2007; plus measurement error. In this case, the OLS estimate of the
Fingleton and Le Gallo, 2008a, 2008b, 2009) perspective. While parameter obtained using the observed variable is a biased and
inconsistent estimate of the parameter on the unobserved variable.
☆ This paper has been presented at the 9th International Workshop Spatial Temple (1998) has demonstrated that estimated technology
Econometrics and Statistics (Orléans, France, 24–25 June 2010). We thank F. Bavaud, J. parameters and convergence rates obtained from cross-sectional
Mutl and two anonymous referees for their useful comments. The usual disclaimer Solow growth models are highly sensitive to measurement error.
applies. However, it is interesting to note that Dagenais (1994) shows that, in
⁎ Corresponding author. Tel.: + 33 3 81 66 67 52; fax: +33 3 81 66 65 76.
E-mail addresses: jlegallo@univ-fcomte.fr (J. Le Gallo), bf100@cam.ac.uk
some circumstances, OLS bias may be ameliorated somewhat given the
(B. Fingleton). presence of serially correlated disturbances, and OLS may prove superior
1
Tel.: + 44 141 548 3842; fax: +44 141 548 4445. to standard techniques used to correct for serial correlation. This paper

0166-0462/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.regsciurbeco.2011.08.004
Author's personal copy

J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125 115

looks at the parallel case of cross-sectional regression models with The author assumes serial correlation in the disturbances of the
measurement errors in the explanatory variables and with spatial form:
dependence, exploring whether the conclusions of Dagenais carry over
to the spatial setting using the evidence of Monte Carlo experiments. ut = ρu ut−1 + εt
  ð4Þ
The outline of the paper is as follows. First, we recall some important
εt e Nid 0; σ 2
results pertaining to measurement errors. Second, the contribution of
Dagenais (1994) is analyzed. Third, we present our research problem in
a spatial context and provide some empirical evidence. Fourth, the OLS requires disturbances that are IID, and to achieve this one can
simulation set-up is defined and the results analyzed. Finally, we transform the variables using quasi-differences, so that as Yt* = Yt −

present our Monte-Carlo simulation results. The final section provides ρuYt − 1 and X̃ t = X̃ t −ρu X̃ t−1 ; jρu j < 1 and hence εt = ut − ρuut − 1 ~
2
some concluding remarks and future research directions. Nid(0, σ ). In other words, using this GLS approach, we can simply
 
estimate the model Yt = b1 X̃ t + εt using standard OLS.

2. Theoretical and empirical background However, we know that since X̃ t contains measurement error, OLS

will incur bias, equal to −b1 ·σξ = σ 2 , where σ 2 is the variance of X̃ t
2
2 X̃ X̃
To see the way in which measurement error affects the bias and and E(ξ′ξ) = σ ξ where X̃ t = Xt + ξt . Following earlier work by
consistency of OLS parameter estimates in a simple regression model Grether and Maddala (1973), Dagenais (1994) shows that when
Yi = b1Xi + ui with ui ~ IID(0, σ 2), assume that the explanatory variable one controls for serial correlation in the disturbances, where the
Xi is unobserved, but is represented by the observed variable explanatory variable also incurs measurement errors, the method of
X̃ i = Xi + ξi , where ξi ~ IID(0, σ ξ2) is independent of Y, X and u. We estimation typically adopted to eliminate error dependence may
wish to estimate the true relationship  Yi =b1Xi + ui. Suppose we increase bias. Therefore, in the regression Yt = b1 X̃ t + ut , the bias
estimate Yi = b1 X̃ i + ηi , since ηi = b1 Xi − X̃ i + ui = ui −b1 ξi then under OLS will be equal to −b1 ·σξ2 = σ 2 , and given that σ 2 N σ 2 , we
X̃ X̃ X̃
Covð X̃ i ; ηi Þ≠0 and the OLS estimator b̂1 is biased and inconsistent. expect to see greater bias under GLS, equal to −b1 ⋅ σξ2 = σ 2 , than
2 2 X̃
The amount of bias induced by EIV depends on the variance of the under OLS. Since, with cov(σ X, σ ξ ) = 0:
measurement error in relation to the variance of the regressor. It is  
2 2 2
possible to show that, since ξ and h u0 are
i assumed to be independent, var X̃ = σ X̃ = σX + σξ ; ð5Þ
and given that E(ξ′ξ) = σ ξ2 and E X̃ X̃ = σ 2 , the bias is:
X̃    
 2 2
var X̃ = ð1−ρÞ var X̃ t = σ X̃  ð6Þ
h i σ ξ2
0 −1 
E ð X̃ X̃Þ −b1 ξ′ ξ = −b1 2 : ð1Þ
σ 2 2
and given ρb 1 it follows that σ X̃ > σ X̃  :Also, typically with serial

correlation in Xt, the variance of Xt is greater than the variance
  of X*.
t The
2
Also, since σ 2 = σX2 + σξ2 , then the bias is equal to: variance of the
.variable Xt = ρXt−1 + ε, where ε e IID 0; σ ε and |ρ|b 1,

is σX2 = σ ε2 1−ρ2 . In contrast, the variance of X*=X t
2
t −ρXt − 1 is σε .
  1 This suggests that the GLS bias incurred with quasi-differencing the
bias b̂1 = −b1 ð2Þ observable variable will be greater than the bias when one does not
1+π
attempt to allow for serial dependence in the disturbances.
where π = σ X2/σ ξ2 is the signal to noise ratio. If σ ξ2 N σ X2 then the bias Dagenais (1994) carries out Monte Carlo experiments in which
increases greatly. This result means that OLS underestimates the true Yt = b1Xt + ut, X̃ t = Xt + υt ; ut = ρu ut−1 + εt ; Xt = ρx Xt−1 + ξt , and
regression coefficient in the presence of measurement error, in other the errors are also serially correlated, so that υt = ρυυt − 1 + ωt. In
words it is always biased towards zero, even asymptotically. In fact, these, the disturbance terms are assumed to be normally distributed,
this provides a lower bound for the true coefficient. are each independent of the corresponding right hand side variables,
These results carry over to EIV for multiple regression models. If additionally υ is independent of u, ε is independent of X and υ, and ξ
just one of several variables is subject to EIV, it turns out that not only and ω are also independent of all other stochastic variables. The
is the coefficient on the badly measured variable biased towards zero, various autoregressive coefficients (the ρs) all satisfy the stationarity
but the coefficients on other correlated regressors are affected also, condition of being less than 1 in absolute value.
although in unknown directions and consequently ‘a badly measured
variable contaminates all the least squares estimates’ (Greene, 2003). 4. Measurement error with spatial data
This effect is known as the contamination bias. Moreover, with more
than one variable with measurement error, the problems are 4.1. The research problem
exacerbated, but ‘there is very little that can be said’.
The problem of measurement error in a spatial context was Consider next the parallel case of spatially autocorrelated
highlighted by Temple (1998), who looked at the highly influential disturbances and error in variables. Is it possible that the results
estimates made by Mankiw et al. (1992) of the Solow–Swan growth found by Dagenais (1994) carry over to the spatial set-up? Assume
model. This paper has been widely cited and is the basis of much that the disturbance process for one observation i is given by:
subsequent work. It provided evidence in favor of diminishing returns to
accumulable factors and supported the notion of conditional conver- ui = ρu ∑ wij uj + εi with i = 1…n ð7Þ
j≠i
gence of economies at a rate of 2% per annum. However, he argues that
this famous result could be entirely due to measurement error.
where εi is identically and normally distributed. In matrix terms, let W
denote the n by n spatial weights matrix for n locations, with typical
3. The contribution of Dagenais (1994)
cell wij. Assume for simplicity that wij = 1 if locations i and j are
‘nearby’ and wij = 0 otherwise. Written in matrix terms, the
Dagenais (1994) analyzes the EIV problem in a time-series context.
disturbance process becomes:
More specifically, consider a model in which:

Yt = b1 X̃ t + ut with t = 1…T ð3Þ u = ρu Wu + ε ð8Þ


Author's personal copy

116 J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125

Assume also that Y = X̃b + u, which X̃ is an n by k matrix of are converging. In order to simulate the impact of measurement error,
observable variables, b is a k by 1 column vector of regression we assume that the observed level of GVA per worker is measured
coefficients, and Y and u are n by 1 column vectors. The variance- without error (the data are official statistics published mainly by the
covariance matrix for this model is V = σ 2D, D ≠ I, in which σ 2 is a EU commission). Of course this may not be true, but under this
scalar and D is a symmetric matrix with a positive determinant. The assumption we estimate the ‘true’ convergence rate based on the
GLS transformation described above in the time series context is regression coefficient on the 1995 log level. We then introduce
 
Y  = X̃ b + ε  , so that now V = σ 2I. In this, Y  = P −1 Y; X̃ = P 1 X̃ measurement error into this regressor. We obtain the relative bias by
−1 −1

and n = P u, and P = (I - ρW), with P P = D. Given ρ, and thus fitting our two alternative models; one (the SEM) is the same as the
having eliminated spatially dependent disturbances, the regression specification used to obtain the ‘true’ regression coefficient, in other

Y  = X̃ b + ε  can be estimated via OLS. However, as we have seen, words comprising the same covariates and spatial autoregressive
this GLS approach may incur greater bias than under OLS estimation of error process, but the regressor used is the log of the 1995 GVA per
the regression Y = X̃b + u. Hence, analogous to Dagenais, we worker 2 plus measurement error. The second model omits the spatial
examine with an empirical example the effect of measurement error error process but is otherwise identical to the SEM specification. This
in X for two estimators. One is ML estimation in which we model error is estimated by OLS. The first model is estimated by maximizing the
dependence, as typically accomplished in the spatial econometrics likelihood using an iterative routine, and the second is the outcome of
literature. The other is OLS in which we ignore error dependence. fitting the model by OLS. The anticipation is that, following Dagenais,
Based on Dagenais (1994), ML should lead to greater bias in estimated controlling for spatial error dependence will introduce more bias than
b than occurs when ignoring the existence of spatial autocorrelation. simply estimating the model by OLS.
Consider first our basic specification assuming no errors in variables
4.2. Preliminary evidence using real data leading to the ‘true’ annual rate of convergence. The results are
summarized by Table 1, column 4. Table 1 shows that in addition to the
There are various situations in which we might encounter main regressor of interest, the log of GVA per worker (ln GVA pw), we
measurement error in practice. For example, many countries build include ‘new entrants’, which is a dummy variable equal to 1 if the region
databases in which the accuracy of the variables is undoubtedly is part of a new entrant3 country into the EU and equal to zero otherwise.
measured with error, and also in many cases pragmatic decisions have The covariate ‘Ln emp. Density’ is the natural logarithm of density of
to be made to use a variable that is only a proxy of a true variable. employment per square km, which is included to pick up the effects of
With census data, for instance, detailed data for small areas may not capital cities and large urban centers. The ‘true’ rate of convergence is
be available immediately, and therefore an analyst might choose to provided by the estimated coefficient for ln GVA pw in 1995 (indicated by
compromise by constructing variables for small areas from available the column SEM, no error in variable, or EIV). Table 1 gives the point
data for larger spatial units, rather than wait for the release of the estimate for b1, equal to 0.801066 together with the 95% confidence
micro-level data. This motivates one of our examples below. interval (0.7303, 0.8718). Denote the annual convergence rate by β,
Undoubtedly such a construction would only be an approximation which is given by β=−ln(b1)/T with standard error is se(β)= se
to the true values of the variable. In this section, we illustrate the (b1)/(Te− βT). We can calculate the half-life equal to the time by which the
impact of measurement error in the context of the convergence of the expected log of GVA per worker equals one half of its original value plus α,
EU regional economies. In order to estimate the rate of convergence, which is given by ln(2)/β. Hence for our model with no measurement
we estimate a very simple model in which the log of GVA per worker error, we find that the annual convergence rate is 0.0276 and the half-life
by NUTS2 region for 2003 is regressed on the log level of GVA per is about 25 years.
worker for 1995 plus a few covariates. The data relate to 255 NUTS2 Consider next what happens when we introduce errors into our
regions of the EU plus Switzerland and Norway. measure of ln GVA pw in 1995. Under the more artificial of our two
This specification is based on the neoclassical convergence model, scenarios, this is achieved by adding ‘rnd’, which is equal to a unit normal
obtained by linearizing the dynamics around the steady state using a random variable multiplied by π(equal to 0.15), giving the variable g=ln
Taylor series expansion, and then integrating to obtain the well- GVA pw in 1995+ rnd. The chosen value of π produces estimates and bias
known neoclassical reduced form: comparable to those obtained in our more realistic measurement error
example given below. Table 1 shows that, relative to the ‘true’ coefficient,
−βT
ð1 = T Þ ln yt = α−ð1 = T Þe ln yt−T + ut ð9Þ the estimated coefficient given by ML is more significantly negatively
biased than is the OLS estimate, meaning that the rate of convergence is
in which t denotes the current year, and t − T an earlier year; β is the positively biased. Note that var(g)= 0.5030.
 By comparison, calculating
annual rate of convergence towards steady state, u is the disturbance the spatially filtered variable P −1 g = I ρ̂WÞg we obtain var(P− 1g)
term, and α is the steady state level of GVA per worker towards which =0.1491. This is, by analogy with Dagenais, consistent with the smaller
each region converges as βT becomes large. Of course this specification OLS bias. In order to carry out the spatial filtering, we use ρ̂ = 0.591997 as
can be elaborated, because there are numerous simplifying assumptions estimated by the SEM (Table1). Observe that spatially filtering regressors
 
embodied within this specification, for example we assume that the X̃ and regressand Y* in this way, we can fit Y  = X̃ b + ε  by OLS
capital share, the savings rate, the rate of technological progress, the (equivalently GLS) and obtain estimates b̂ ‘identical’ to those given by ML
depreciation rate and the population growth rate are constant across estimation of the SEM specification. This is analogous to the use of quasi-
regions. In our model, although it remains simple, we add covariates to differencing in the non-spatial case above.
try to pick up some of these effects, and also add an autoregressive error Table 2 gives the outcome of using the mean GVA pw, denoted by
process so that: ‘Ln mean GVA pw 1995’, instead of the ‘true’ value. For the larger and
older EU members, 4 we allot NUTS1 region means to the NUTS2
ut = ρWu
 t+  εt
ð10Þ
εt e N 0; σ 2 2
The variable 'ln GVA pw 1995' is highly spatially autocorrelated. The value of
Moran's I using the (unstandardized) contiguity matrix is 0.8920, the expected value
In order to operationalize this model, W is a matrix of interregional under the null is −0.0039, the variance is 0.0016, z = 22.2905 which has a p-value of
‘distances’, in our case with 1 s indicating that regions are contiguous almost 0 in the N(0,1) distribution. This is a very relevant observation, as we show
subsequently.
and 0 otherwise, standardized so that rows sum to 1. 3
The new entrant countries are the Czech Republic, Estonia, Hungary, Lithuania,
The expectation is that a regression coefficient will be estimated Latvia, Poland, Slovenia and Slovakia.
that is significantly below 1, which would indicate that the economies 4
Austria, Belgium, Germany, Spain, France, Greece, Italy, Netherlands, Sweden, UK.
Author's personal copy

J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125 117

Table 1 Table 2
Estimates with and without measurement error. Parameter estimated based on NUTS 1 and NUTS 0 data.

π = 0.15 SEM (with EIV) OLS (with EIV) SEM (no EIV) SEM (with EIV) OLS (with EIV) SEM (no EIV)

Constant 6.235973 5.504855 2.279034 Constant 6.305478 5.366299 2.279034


(18.365625) (18.426776) (6.260786) (11.455802) (12.250507) (6.260786)
Ln GVA pw 1995 0.408450 0.488007 0.801066 Ln mean GVA pw 1995 0.387640 0.495164 0.801066
(+rnd = with eiv = g) (0.3427, 0.4742) (0.4303, 0.5457) (0.7303, 0.8718) (0.2837, 0.4916) (0.4188, 0.5715) (0.7303, 0.8718)
New entrants −0.506982 −0.458513 0.127931 New entrants − 0.485747 −0.460696 0.127931
(− 7.473842) (−8.108535) (1.979711) (−4.866630) ( −6.479082) (1.979711)
Ln Emp. density 0.115874 0.070736 0.046030 Ln Emp. density 0.182275 0.100811 0.046030
(4.471489) (2.980003) (2.396382) (6.460657) (3.807168) (2.396382)
ρ 0.591997 – 0.744998 ρ 0.655972 – 0.744998
(10.750996) (18.379250) (13.263662) (18.379250)
R² 0.9454 0.9175 0.9744 R² 0.9330 0.8949 0.9744
2 2
R 0.9447 0.9166 0.9741 R 0.9322 0.8937 0.9741
σ2 0.0159 0.0244 0.0075 σ2 0.0195 0.0311 0.0075

regions. For the smaller and newer EU ‘members’ 5 we use national where ωi is normally distributed and is independent of all the
averages (NUTS 0). By using the means by NUTS1 or NUTS0 region stochastic variables of the model. Indeed, Dagenais (1994) indicates
rather than the NUTS 2 data, we are inducing measurement error. The that it is realistic that both the error terms and explanatory variables
effect is to again produce negative bias, with the OLS estimate of b1 are correlated (temporally in his case, spatially in our case). Moreover
less negatively biased than ML taking account of the error dependence “although very little is known about the nature and the importance of
in the data. Again this case this is reflected by the larger variance of ‘Ln measurement errors in economic data, the measurement errors might
mean GVA pw 1995’ (0.4613) compared with the variance of the also be serially correlated, but presumably less so than the true data
spatially filtered ‘Ln mean GVA pw 1995’, which is equal to 0.1067. series themselves”. Therefore we also assume that the measurement
errors are spatially correlated.
5. Monte-Carlo simulation Consider now these different aspects again but in matrix form. The
model is written as:
5.1. The experimental set-up
y = βx + u ð17Þ
We begin with the following model:
where β' = [β1 β2 β3]' and x is an n by 3 matrix with x = [en x1 x2] and
yi = β1 + β2 xi1 + β3 xi2 + ui with i = 1…n ð12Þ en is the unit vector of order n. The vector x1 is observed with error.
Instead, we observe:
where xi is a non-stochastic explanatory variable, β is an unknown
parameter to be estimated and ui is a normally distributed random x̃1 = x1 + v ð18Þ
disturbance. The xi1's are not observable. Instead, they are measured
with errors. We observe: with:

−1
x̃i1 = xi1 + vi ð13Þ u = ðI−ρu W Þ ε ð19Þ

−1
where vi is a normally and independently distributed stochastic x1 = ðI−ρx W Þ ðx3 + x4 + x5 + ξÞ ð20Þ
measurement error, which is independent of the xi's as well as of the
−1
ui's. On the contrary, yi is measured without error. The error terms ui v = ðI−ρv W Þ ζ ð21Þ
are spatially autocorrelated:
On the other hand, the estimated model is:
ui = ρu ∑ wij uj + εi ð14Þ
j≠i 
y = βx̃ + u ð22Þ
h i
where εi is normally distributed and is independent of vi. with x̃ = en x̃1 x2 .
The x1i's are also spatially correlated:
The Monte-Carlo simulation runs as follows:
xi1 = ρx ∑ wij xj1 + xi3 + xi4 + xi5 + ξi ð15Þ
j≠i • First, we generate x1 using Eq. (20) with a given value of ρx.
• Then, we generate the measurement errors in a similar manner
where x3, x4 and x5 are random normal variables with variance equal using Eq. (21) with a given value of ρv. We can consider different
to 0.25 and ξi is a normally distributed innovation with variance 0.25, values for the variance of v, corresponding to various percentages of
which is independent of all the stochastic variables of the model. We the variance of x.
make x1 depend on x3, x4 and x5 and we then use x4 and x5 as optimal • Then, the error terms u are generated using Eq. (19) with a given
instruments in the simulation. In order to be general x1 is also value of ρu and assuming that ε → Nid(0, 1).
assumed to be spatially autocorrelated as is usually the case in applied • Finally, the y are computed using Eq. (17) and x̃1 is computed using
work. Finally, the errors in the variables are spatially correlated: Eq. (18).

vi = ρv ∑ wij vj + ωi ð16Þ Considering specifically the second point, we need to generate v with
j≠i a variance that is a fraction λ of the variance of x1. However, as both v and
x1 are spatially autocorrelated, they are also heteroskedastic. Since we
5
Czech Republic, Denmark, Estonia, Finland, Hungary, Ireland, Lithuania, Latvia, assume that x3, x4, x5 and ξi are independent normal variables with
Luxembourg, Poland, Portugal, Slovakia, Slovenia, plus Norway and Switzerland. variance equal to 0.25, then the variance covariance matrix of x1 is
Author's personal copy

118 J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125

Table 3 Table 5
Effect of the factors and covariates on the β2 bias and RMSE. Analysis of variance with polynomial contrasts for β2 bias and RMSE.

Parameter Bias RMSE Degrees Bias RMSE


of
ρu − 0.03154⁎⁎⁎ 0.07684 ⁎⁎⁎ Mean Variance Mean Variance
freedom
(− 22.50) (54.76) square ratio square ratio
ρx − 0.12025⁎⁎⁎ 0.11186 ⁎⁎⁎
Method 6 57.000746 10695.83⁎⁎⁎ 13.444515 2833.85⁎⁎⁎
(− 85.77) (79.73)
Linear 1 269.898157 50644.70⁎⁎⁎ 63.139644 13,308.64⁎⁎⁎
ρv 0.01365⁎⁎⁎ −0.01220 ⁎⁎
Quadratic 1 34.548532 6482.82⁎⁎⁎ 4.477564 943.79⁎⁎⁎
(6.40) (− 5.71)
Cubic 1 0.123098 23.10⁎⁎⁎ 0.088539 18.66⁎⁎⁎
λ - 0.48178⁎⁎⁎ 0.44238 ⁎⁎⁎
Quartic 1 16.192877 3038.49⁎⁎⁎ 5.973715 1259.15⁎⁎⁎
(− 188.00) (172.49)
Deviations 2 10.620906 1992.95⁎⁎⁎ 3.493814 736.43⁎⁎⁎
n (sample size) 5.993.10− 5⁎⁎⁎ −2.6345.10− 5⁎⁎⁎
ρu 7 0.453893 85.17⁎⁎⁎ 3.149956 663.95⁎⁎⁎
(18.16) (−79.76)
ρx 7 9.719796 1823.86⁎⁎⁎ 8.510049 1793.76⁎⁎⁎
OLS −0.03418⁎⁎⁎ 0.10936⁎⁎⁎
ρv 2 0.126842 23.80⁎⁎⁎ 0.100754 21.24⁎⁎⁎
(−19.53) (62.43)
λ 2 108.973611 20448.22⁎⁎⁎ 91.544049 19295.75⁎⁎⁎
GMM − 0.06932⁎⁎⁎ 0.12813 ⁎⁎⁎
n 3 0.952095 178.65⁎⁎⁎ 16.746282 3529.80⁎⁎⁎
(− 39.60) (73.14)
ML − 0.08751⁎⁎⁎ 0.14585 ⁎⁎⁎
Interactions with levels of method
(− 49.99) (83.26)
n.ml 3 0.007048 1.32 0.492035 103.71⁎⁎⁎
IV_3g − 0.02478⁎⁎⁎ 0.10992⁎⁎⁎
n.gmm 3 0.001696 0.32 0.563503 118.78⁎⁎⁎
(− 14.15) (62.75)
n.iv_gmm_3g 3 0.000251 0.05 0.459010 96.75⁎⁎⁎
IV_x4x5 0.16995⁎⁎⁎ 0.02380 ⁎⁎⁎
n.ols 3 0.006245 1.17 0.420719 88.68⁎⁎⁎
(97.09) (13.59)
n.iv_3g 3 0.011753 2.21⁎ 0.452598 95.40⁎⁎⁎
IVgmm_3g −0.05678⁎⁎⁎ 0.12224⁎⁎⁎
n.ivgmm_x4x5 3 0.000563 0.11 0.065267 13.76⁎⁎⁎
(− 32.44) (69.78)
n.iv_x4x5 3 0.003568 0.67 0.191594 40.38⁎⁎⁎
IVgmm_x4x5 0.16880⁎⁎⁎ 0.00766⁎⁎⁎
Covariate 1 0.000117 0.02 0.030918 6.52⁎⁎
(96.43) (4.37)
queen
queen 0.000120 − 0.001958⁎⁎
Residual 32,206 0.005329 0.004744
(0.14) (−2.25)
Total 32,255
N 32,256 32,256
σ2 0.006120 0.006129 Note: * significant at 10%, ** significant at 5%, *** significant at 1%.
Notes: t-ratio in brackets * significant at 10%, ** significant at 5%, *** significant at 1%.

this case, we also only allow for spatial error autocorrelation


−1 −1
whereas EIV is not taken into account.
(I − ρxW) (I − ρxW ') . We proxy the variance of x1 by the mean of • IV estimator with 3-group variable as instruments (IV_3g). This
the diagonal elements of this variance-variance matrix, denoting this estimator aims to correct for EIV. The instruments are defined using a
mean variance by σ 2x1 . Likewise, we have v = (I − ρvW)− 1ζ with ζ → Nid quasi-instrument with values equal to −1, 0, 1 depending on whether
(0, σζ2). We denote by σ 2 the mean element of (I − ρvW)− 1(I − ρvW')− 1. the observed variable (x1 plus measurement error) is in the upper,
Then the variance of v can be proxied by σζ2 :σ 2 . Finally, λ, as a fraction of middle or lower third of values when placed in rank order. We also use
the variance of v over the variance of x1 can be proxied by: the spatial lag of this quasi-instrument as an additional instrument.
λ = σζ2 :σ 2 = σ 2x1 . Hence, in order to generate
 v for a given value of λ, This instrument is defined following the 3-group method for
one first needs to compute σζ2 as σζ2 = λ:σ 2x1 =σ 2 . Once σζ2 has been measurement errors (Kennedy, 2003). It has been used in a spatial
computed, then we can generate v as: v = (I − ρvW)− 1ζ with ζ → Nid(0, framework by Fingleton (2003) and its properties have been
σζ2). investigated in Fingleton and Le Gallo (2008a, 2008b, 2009).
Each experiment is characterized by given values of ρx, ρu, ρv and • IV estimator with x4 and x5 as instruments (IV_x4x5). This estimator
λ = σv2/σx2. We use the following values: n = {49, 121, 225, 400}, ρu = corrects for EIV using “optimal” instruments, which are x4 and x5
{0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 0.95}, ρx = {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, since x1 depends upon these two variables. Then we can evaluate
0.95}, ρv = {− 0.25, 0, 0, 25} and λ = {0.1, 0.2, 0.5}. For each how the choice of instruments affects the bias caused by EIV.
experiment, we perform 1,000 replications. We estimate β in • IV estimator with 3-group variable as instruments combined with
Eq. (22) using the following methods: GMM (IVgmm_3g) as suggested by Fingleton and Le Gallo (2008a).
• OLS estimator (OLS). In this case, neither the spatial error Estimation comprises three stages. In the first stage, model (24) is
autocorrelation, nor the EIV is taken into account in the estimation. estimated by IV, using the quasi-instrument and its spatial lag as
• GMM estimator (GMM) suggested by Kelejian and Prucha (1999). In above. The second stage uses the resulting consistent 2SLS residuals
this case, we only allow for the fact that the error terms are spatially to estimate ρu and σ 2 using a GM procedure, with the moments
correlated. The EIV is not taken into account. derived by Kelejian and Prucha (1999). In the third stage, the
• ML estimator (ML) as it is widely used in the applied spatial estimated ρu is used to perform a Cochrane-Orcutt transformation
econometrics literature (Anselin, 1988; Lee, 2004). As previously, in to account for the spatial dependence in the residuals. IV estimation
is then performed with the transformed variables.
• IV estimator with x4 and x5 as instruments combined with GMM
(IVgmm_x4x5). This is the same as the previous estimator but in
Table 4
Mean β2 bias and RMSE across methods. this case we use the optimal instruments.

Bias RMSE
The weights matrices of increasing sizes are based on the rook and
Grand mean − 0.16871 0.23371 the queen criteria for regular lattices. Moreover they are row-
ML − 0.26568 0.28714 standardized. There are an infinite number of possible W matrices
GMM − 0.24749 0.26941
IVgmm_3g − 0.23495 0.26353
that we could use, so we have to restrict our simulations to a subset,
IV_3g − 0.20294 0.25121 and these two are among the most popular. They are useful because
OLS − 0.21235 0.25065 they are distinguished from each other in terms of the amount of
IV_x4x5 − 0.00822 0.16509 connectivity between places that is assumed, which might have a
IVgmm_x4x5 − 0.00937 0.14894
bearing on the outcome of our simulations, bearing in mind that the
Author's personal copy

J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125 119

properties of many estimators depend on the density of connectivity Table 6


of the W matrix (see for instance Baltagi and Liu, 2009; Farber et al., Analysis of variance contrasts for β2 bias and RMSE.

2009; Smith, 2009; Stakhovych and Bijmolt, 2009). A fuller appraisal Degrees Bias RMSE
of the impact of network density and of the importance of lattice of
Mean Variance Mean Variance
regularity is beyond the scope of the current paper, but is something freedom
square ratio square ratio
that could be considered in future work.
Method 6 57.000746 10,695.83⁎⁎⁎ 13.444515 2833.85⁎⁎⁎
Error model versus 1 18.354680 3444.14 ⁎⁎⁎ 3.140340 661.92 ⁎⁎⁎
6. Results no error model
deviations 5 64.729959 12,146.17⁎⁎⁎ 15.505350 3268.23 ⁎⁎⁎
First, as a benchmark case, we have produced simulations results Residual 32,206 0.005329 0.004744
Total 32,255
when no measurement errors are present in the DGP, focusing on
parameter bias and RMSE. Bias is equal to the median of the estimated Note: * significant at 10%, ** significant at 5%, *** significant at 1%. The results for the
covariates are equal to those reported in Table 4.
parameter distribution minus the true value. However, attention should
also focus on the RMSE, which gives equal weight to the two important
measurement error, given that Dagenais (1994) found that in the time
considerations for optimal estimation, bias and dispersion. In other words,
series case the standard approach of controlling for serial correlation in
we have analyzed the bias and RMSE in β2 for the following model:
disturbances performed relatively worse than OLS.
yi = β1 + β2 xi1 + β3 xi2 + ui with i = 1…n ð23aÞ The results given in Table 3 show a clear distinction between the
estimators based on the optimal instruments, namely IV_x4x5 and
ui = ρu ∑ wij uj + εi ð23bÞ IVgmm_x4x5, and the other estimators. Our two optimal instruments
j≠i estimators have minimal bias and by far the lowest RMSE compared
with the other estimators, and this is the case regardless of whether
x1i = ρx ∑ wij x1j + ξi ð23cÞ error autocorrelation is ignored, as in the case of IV_x4x5, or modeled
j≠i
as in the case of IVgmm_x4x5. Ostensibly, this outcome is contrary to
what one would anticipate from the analogous Dagenais time series
with the same values as before for the simulation: n = 49, 121, 225,
result. However, a more realistic situation is the case in which we do
400, ρu = {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.8, 0.95}, ρx = {0, 0.1, 0.2, 0.3, 0.4,
not know the optimal instruments.
0.5, 0.8, 0.95}. Eq. (23a) has been estimated using OLS, GMM and ML.
If we consider the remaining 5 estimators, we do find that there is
The mean β2 bias when Eq. (23a) is estimated by OLS is − 1.2.10 − 5
a larger more negative bias and larger RMSE when one allows for
whereas it is equal to −1.2.10 − 4 (−2.7.10 − 5) when GMM (resp. ML)
residual autocorrelation by means of the ‘ML’, ‘GMM’ and ‘IVgmm_3g’
are used. This difference is not statistically significant. This is as
estimators than occurs via simply using the OLS or IV_3g estimators.
expected because it is well known that the presence of spatial error
The ‘ML’ estimator leads to even a larger negative bias than ‘GMM’ and
autocorrelation leaves OLS estimators unbiased. However, it does
‘IVgmm_3g’. This would indicate that the conclusion made by
affect the efficiency of the estimator, with the outcome that the mean
Dagenais that controlling for residual serial dependence is sub-
β2 RMSE is 0.114 under OLS compared with 0.083 (resp. 0.084) with
optimal in the presence of measurement error also carries over to the
GMM (resp. ML). This is a statistically significant difference.
spatial case with non-optimal instruments. In terms of RMSE, there
We turn now to the case where measurement error is present. One
appears to be no difference between ‘IV_3g’ and ‘OLS’, even though
consequence of our experimental design is that the Monte Carlo exercise
IV_3g seems to produce slightly less bias.
produces large multi-dimensional arrays of data. We obtain data showing
This is apparent simply from the means for each estimator, which are
the bias and RMSE in β26 for each of the 4,032 cases, and analyzing 4
listed in Table 4. Moreover, comparing these figures to the ones obtained
sample sizes and 2 weights matrices this gives 32,256 cases in total.
for the benchmark case (Eqs. (23a), (23b), (23c)) makes the attenuation
Our initial exploratory analysis is aimed at highlighting the significance
bias affecting β2 quite apparent while the detrimental effect on RMSE is
and direction of the effects of our variables. We use OLS to perform these
also evident.
regressions,7 treating each ‘ρu’, ‘ρx’, ‘ρv’, ‘λ’ and sample size ‘n’ as a variate
In order to capture the essential sources of variation in our large
with values as given earlier, and ‘queen’ as a dummy variable, equal to 1
array of results in more detail, we next subject the data to an analysis
when the W matrix is based on the queen's case and equal to 0 when it is
of covariance 8 (ANCOVA) in which there are two responses, ‘bias’ and
based on the rook's case. Each estimation method is also a dummy
‘RMSE’, six factors, namely ‘method’, ‘ρu’, ‘ρx’, ‘ρv’, ‘λ’ and sample size
variable, with the constant omitted from the model to identify each one.
‘n’, and one covariate, ‘queen’. Factor ‘method’ has seven levels,
The results are provided in Table 3. With such a large sample size,
corresponding to the seven estimation methods and described above.
each effect is highly significant, but we can still see large differences
The factors ‘ρu’, ‘ρx’, ‘ρv’, ‘λ’ have 8, 8, 3 and 3 levels respectively, the
between them. The most prominent effect on both bias and RMSE is due
values of which are as designated above. The sample size has four
to λ, which has a negative effect on bias and a positive effect on RMSE.
levels according to whether the matrix is 7 by 7 and so on up to a 20 by
The effect of sample size is mixed and the structure of connectivity of the
20 matrix. The covariate ‘queen’ is a dummy variable.
W matrix does not impact the bias of the estimators although it has an
We show how the bias and RMSE changes as we move through the
effect on RMSE. However, we focus here on the effect of the 7 different
different levels of ‘method’, going from ML (level 1) to GMM (2),
estimators, ‘OLS’, ‘GMM’, ‘ML’, ‘IV_3g’, ‘IV_x4x5’, ‘IVgmm_3g’ and
IVgmm_3g (3), OLS (4), IV_3g (5), IVgmm_x4x5 (6) and finally
‘IVgmm_x4x5’, since we are principally interested in whether or not it
IV_x4x5 (7), by fitting linear, quadratic, cubic and finally polynomial
is necessary to control for spatial dependence in the presence of
of degree 4 (quartic) contrasts to these data. 9 For each contrast, a
mean square is obtained equal to its sum of squares divided by the
number of degrees of freedom. Each mean square is divided by the
6
In this paper, we only present the effect of EIV on the estimate of β2. We have also residual mean square and the resulting variance ratios are referred to
analyzed the properties of the estimators of β1 and β3 but no evidence of
contamination bias was found in our setup.
the relevant F distribution. These are displayed in Table 5.
7
For a smaller subset of our data, we have also obtained robust estimates using 8
8
different weighting functions, and the default tuning constants suggested by the See Griffith and Paelinck (2007) for a Regional Science application of ANCOVA in
MATLAB documentation. On the whole the results are very similar to those obtained another context.
9
by OLS, although in general the robust means tend to be less extreme that the OLS We do not consider higher order orthogonal polynomials. Their effect is given
estimates. Details are available on request. under ‘deviations’ in Table 5.
Author's personal copy

120 J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125

Fig. 1. Bias and RMSE as a function of β2 in function of λ; n = 400; ρv = 0; ρu = 0.5; ρx = 0.5.

Fig. 2. Bias and RMSE as a function of β2 in function of λ; n = 400; ρv = 0; ρu = 0.5; ρx = 0.8.

Table 5 highlights the fact that the there is an overwhelmingly linear contrasts. 10 An additional element in Table 5 is the presence of
trend in bias and RMSE as one goes through the sequence ML, GMM, interactions between sample size and individual estimation methods.
IVgmm_3g, OLS, IV_3g, IVgmm_x4x5 and finally IV_x4x5. We find that With regard to bias, we see almost no significant interactions. There are
as level increases, bias and RMSE diminish systematically, with the two
optimal instrument estimators at the highest level. That these trends 10
The orthogonal polynomial coefficients used to make the linear, quadratic, cubic
are strongly but not perfectly linear is shown by plotting mean bias and and quartic contrasts are given, for example, in the Table on page 272, Draper and
RMSE against level, and by the statistically significant nonlinear Smith (1981) or Wetherill (1981), Table C.11.
Author's personal copy

J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125 121

Fig. 3. Bias and RMSE as a function of β2 in function of λ; n = 400; ρv = 0; ρu = 0.8; ρx = 0.5.

Fig. 4. Bias and RMSE as a function of β2 in function of λ; n = 400; ρv = 0; ρu = 0.8; ρx = 0.8.

statistically significant interactions between estimation method and Using ANCOVA to explicitly test whether modeling the error process
sample size when one considers the RMSE results, but these effects are increases bias and RMSE, the treatment structure for error process
relatively small when one compares variance ratios, and are of little model versus no error process model is −0.25, −0.25, −0.25, 0.333,
empirical significance. Likewise, the effect of W matrix connectivity, as 0.333, −0.25, 0.333, thus given equal weight to the various estimation
represented by the covariate ‘queen’, is insignificant in the case of bias, methods. It is evident from Table 6 that attempting to control for error
and evidently very small in the case of RMSE. dependence produces significantly more negative bias and a higher
Author's personal copy

122 J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125

Table 7
Analysis of covariance for β2 bias and RMSE.

Source of Degrees Bias RMSE


variation of
Mean square Variance ratio Mean square Variance ratio
freedom

Method 6 57 97,621.34 ⁎⁎⁎ 13.44 23,653.43⁎⁎⁎


λ 2 109 186,600 ⁎⁎⁎ 91.54 161,100⁎⁎⁎
ρx 7 9.720 16,646.44 ⁎⁎⁎ 8.510 14,972.04⁎⁎⁎
ρu 7 0.4539 777.35 ⁎⁎⁎ 3.150 5541.83⁎⁎⁎
ρv 2 0.1268 217.23 ⁎⁎⁎ 0.1008 177.26⁎⁎⁎
method. λ 12 6.407 10,972.05 ⁎⁎⁎ 4.403 7746.24⁎⁎⁎
method. ρx 42 1.194 2045.07 ⁎⁎⁎ 1.177 2070.07⁎⁎⁎
λ.ρx 14 0.5146 881.40 ⁎⁎⁎ 0.6792 1194.87⁎⁎⁎
method. ρu 42 0.0898 153.79 ⁎⁎⁎ 0.1339 235.51⁎⁎⁎
λ.ρu 14 9.360.10− 4 1.60 ⁎ 0.02830 49.78⁎⁎⁎
method. ρv 12 0.03107 53.21 ⁎⁎⁎ 0.03189 56.10⁎⁎⁎
λ.ρv 4 8.854.10− 3 15.16 ⁎⁎⁎ 0.01011 17.78⁎⁎⁎
ρx. ρu 49 0.08701 149.02 ⁎⁎⁎ 0.09940 174.87⁎⁎⁎
ρu. ρv 14 0.02512 43.03 ⁎⁎⁎ 0.01903 33.48⁎⁎⁎
ρxρv 14 3.606.10− 3 6.18 ⁎⁎⁎ 2.699.10− 3 4.75⁎⁎⁎
method. λ.ρx 84 0.05202 89.09 ⁎⁎⁎ 0.04794 84.34⁎⁎⁎
method. λ.ρu 84 7.541.10− 4 1.29 ⁎⁎ 0.01464 25.75⁎⁎⁎
method. λ.ρv 24 1.928.10− 3 3.30 ⁎⁎⁎ 2.150.10− 3 3.78⁎⁎⁎
method. ρx. ρu 294 0.01621 27.77 ⁎⁎⁎ 0.01792 31.53⁎⁎⁎
method. ρu.ρv 84 4.372.10− 3 7.49 ⁎⁎⁎ 4.291.10− 3 7.55⁎⁎⁎
method. ρx. ρv 84 2.977.10− 3 5.10 ⁎⁎⁎ 2.698.10− 3 4.75⁎⁎⁎
λ.ρx. ρu 98 4.726.10− 3 8.09 ⁎⁎⁎ 5.282.10− 3 9.29⁎⁎⁎
λ.ρu.ρv 28 7.638.10− 4 1.31 8.261.10− 4 1.45⁎
λ.ρx. ρv 28 4.348.10− 4 0.74 4.441.10− 4 0.78
ρx.ρu. ρv 98 3.015.10− 4 0.52 3.242.10− 4 0.57
n 3 9.521E-01 1630.59⁎⁎⁎ 1.675E + 01 29462.34⁎⁎⁎

Interactions with levels of method


n.ml 3 7.04810− 3 12.07⁎⁎⁎ 0.4920 865.66⁎⁎⁎
n.gmm 3 1.69610− 3 2.90⁎⁎ 0.5635 991.39⁎⁎⁎
n.iv_gmm_3g 3 2.50910− 4 0.43 0.4590 807.55⁎⁎⁎
n.ols 3 6.24510− 3 10.69⁎⁎⁎ 0.4207 740.19⁎⁎⁎
n.iv_3g 3 0.01175 20.13⁎⁎⁎ 0.4526 796.27⁎⁎⁎
n.ivgmm_x4x5 3 5.63310− 4 0.96 0.06527 114.83⁎⁎⁎
n.iv_x4x5 3 3.56810− 3 6.11⁎⁎⁎ 0.1916 337.08⁎⁎⁎
Covariate Queen 1 1.16810− 4 0.20 0.03092 54.39⁎⁎⁎
Residual 31083 5.83910− 4 5.68410− 4
Total 32,255

Note: * significant at 10%, ** significant at 5%, *** significant at 1%.

RMSE than ignoring error dependence, thus reaffirming the main point the optimal instrument estimators. However we focus attention on
of this paper. the other estimators, which the applied worker is more likely to use in
To summarize, and focusing on the non-optimal instrument practice.
estimators, the tables indicate that there is a hierarchy in terms of The statistical significance of these interactions is evident from
both bias and RMSE regardless of the value of λ. For instance, in terms of fitting ANCOVA models involving three-way and lower order in-
bias, ML has the largest effect, so that the negative bias associated with teractions based on the six factors plus the covariate. Given six factors,
ML is greater than that of other estimators, and the negative bias for there are in principle 6 one-way effects, 15 two-way effects, and 20
IV_3g is the least, the order being ML b GMM b IVgmm_3g b OLS b IV_3g. three-way effects, but in order to keep the analysis and table
In terms of RMSE, the order is: MLb GMM b IVgmm_3g b IV_3gb OLS. manageable, we restrict interactions involving sample size ‘n’ to the
Thus in terms of our preferred RMSE measure, OLS turns out to be the individual levels of ‘method’. This gives 10 two-way effects, 10 three-
best estimator, despite the presence of error dependence. Of course, way effects excluding ‘n’, and in addition we have another 7 two-way
our best estimators are IVgmm_x4x5 and IV_x4x5, but these require effects involving ‘n’. These are shown in Table 7, which gives the
knowledge of the optimal estimators, which are rarely if ever outcomes for β2 bias and RMSE.
known. Secondly, the bias in β2 always increases (becomes more The large sample means that we detect significance in almost all
negative) with λ and this also causes the RMSE in β2 to also increase cases, so we compare the relative sizes of the variance ratios in order
with λ. to obtain a measure of relative significance of each interaction. This
Let us next consider interaction effects in more detail. These are indicates that the main effects have a significant effect on bias, but the
first illustrated graphically in Figs. 1 to 4. These figures display the bias covariate ‘queen’ is statistically insignificant even with such a large
and RMSE in β2 as a function of λ for n = 400, ρv = 0, ρu = {0.5, 0.8}, sample. Of the two-way effects, the interaction between ‘method’ and
ρx = {0.5, 0.8}. With regard to the order described above, it is λ stands out prominently. We see that the increasingly negative bias
maintained, but the differences between the estimation methods as λ increases differs according to the estimation method with a
(OLS and IV_3g on the one hand and ML, GMM and IVgmm_3g on the widening gap in the bias incurred under estimation methods that
other hand) increase with λ. In addition the differences between include error dependence modeling (‘ML’, ‘GMM’ and ‘IVgmm_3g’)
estimation methods also increase with ρx and ρu. Finally, the stand- and the better performing OLS and IV_3g methods. Comparing this
out feature of this and other graphs is the far superior performance of with the corresponding panel in Fig. 2 (or equivalently comparing
Author's personal copy

J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125 123

Fig. 5. Bias and RMSE as a function of β2 in function of λ; n = 400; ρv = 0; ρu = 0.5; ρx = 0.

Figs. 3 and 4) illustrates the significant there-way interaction the ‘method’ levels in the case of RMSE. For example for OLS, at small
involving ‘method’, λ and ρx, denoted by method.λ.ρx in Table 7. sample sizes RMSE is smaller with OLS estimation than otherwise, but
Hence, the effect of λ on the impact of ‘method’ on bias itself changes with large samples the RMSE is smaller for non-OLS estimation.
according to ρx. As shown by Table 7, the interaction between Finally, spatial connectivity has a significant effect on RMSE, with
‘method’ and λ barely changes according to ρv or according to ρu. Most increasing connectivity (the ‘queen’ case W matrix) causing a
of the three-way interactions involving ρv are either comparatively reduction in RMSE.
weak or not significant. Finally, consider Figs. 5 and 6, which are the outcomes when we
Table 7 also gives the results of fitting the three-way interaction restrict the spatial autocorrelation in the explanatory variable to
model to the RMSE data for β2. This shows that all of the one-way zero. While the superior performance of the optimal instruments
and two-way effects are significant, with the ‘method’ and λ remains in evidence, it is difficult to see how one might choose any
interaction again prominent. The ‘method’ and λ interaction is one of the other instruments in preference to the others. This
illustrated by the right hand side panel of Fig. 1. Here, we see that reinforces the point that OLS and IVgmm, which make no allowance
RMSE depends on method, but RMSE increases with increasing λ, for spatial error dependence, outperform the estimators provided
once again with a widening gap between the error dependence the regressor is spatially autocorrelated. However, as illustrate in
modeling methods and OLS and IV_3g. The most prominent three- our empirical example, this is very likely to be the case across
way interaction involving λ, ‘method’ and ρx (denoted by the a range of applications typically considered by the applied spatial
method. λ.ρx term in Table 7) is illustrated by comparing the right econometrician.
hand side panels of Figs. 1 and 2 (or equivalent the panels of Figs. 3
and 4). This shows that as we move to a higher level of ρx, the effect
of λ on the ‘method’ effect changes. Comparing the right hand side 7. Conclusions
panels of Figs. 1 and 3 illustrates the λ, ‘method’ and ρuinteraction
(method.λ.ρu). The same effect can be seen by comparing the right The aim of this paper is to analyze the effects of measurement error
hand side panels of Figs. 2 and 4. on estimator bias and RMSE in a spatial context. Previously, Fingleton
With regard to the effect of ‘n’ and its interaction with the levels of and Le Gallo (2008a, 2008b) considered endogeneity stemming from
‘method’, we see that for bias they are relatively weak or insignificant, omitted variables or as a consequence of simultaneity, and LeSage and
although the main effect of ‘n’ is very strong, with bias becoming less Pace (2009) analyzed the case of spatially autocorrelated omitted
as sample size increases. Likewise the impact of sample size on RMSE variables, but up to this point in time the relative performance of
is very strong, with falling RMSE as sample size increases from 49 various estimators in the presence of measurement errors in spatial
through to 400. There are also significant interactions involving ‘n’ and data has not been explored.
Author's personal copy

124 J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125

Fig. 6. Bias and RMSE as a function of β2 in function of λ; n = 400; ρv = 0; ρu = 0.8; ρx = 0.

Our point of departure is the time series oriented paper by Dagenais (1994) in the time series context, indicate that measure-
Dagenais who shows that in some circumstances, given the presence ment error plus a disturbance process involving spatial dependence is,
of serially correlated errors, OLS proves superior to standard somewhat paradoxically, best accommodated by an estimation
techniques used to correct for serial correlation. In this paper, we method that ignores the spatial dependence.
consider the parallel case of cross-sectional regression models. Our
data generating process introduces measurement error into a
(spatially dependent) explanatory variable, and we also introduce References
spatial dependence into the measurement errors and the distur- Anselin, L., 1988. Spatial econometrics: methods and models. Kluwer, Dordrecht.
bances. We compare the properties of various estimators, which do Anselin, L., 2006. Spatial econometrics. In: Mills, T.C., Patterson, K. (Eds.), Handbook of
Econometrics: Volume 1, Econometric Theory. Palgrave MacMillan, Berlin.
and do not control for spatially dependent disturbances using Monte-
Anselin, L., Lozano-Gracia, N., 2008. Errors in variables and spatial effects in hedonic
Carlo simulation. In the case of OLS there is no attempt to allow for the house price models of ambient air quality. Empirical Economics 34, 5–34.
spatially dependent disturbances and no control for endogeneity Baltagi, B., Liu, L., 2009. Spatial lag test with equal weights. Economics Letters 104,
introduced by EIV. In the case of IV_3g we control for EIV using 81–82.
Cliff, A.D., Ord, J.K., 1981. Spatial processes: models and applications. Pion, London.
instrumental variables, but do not allow for the disturbance process. Dagenais, M.G., 1994. Parameter estimation in regression models with errors in the
Our GMM estimators and our ML estimator allow for spatially variables and autocorrelated disturbances. Journal of Econometrics 64, 145–163.
dependent disturbances, and in the case of IVgmm_3g we also Dall'erba, S., Le Gallo, J., 2008. Regional convergence and the impact of European
structural funds over 1989–1999: a spatial econometric analysis. Papers in Regional
instrument the variable with measurement error. We also consider Science 87, 219–244.
estimators based on optimal instruments, which will usually be Draper, N.R., Smith, H., 1981. Applied Regression Analysis, 2nd Edition. Wiley,
unknown. These perform far better than the other estimators both in New York.
Farber, S., Paez, A., Volz, E., 2009. Topology and dependency tests in spatial and network
terms of bias and RMSE, regardless of whether we control for error autoregressive models. Geographical Analysis 41, 158–180.
dependence (IVgmm_x4x5) or not (IV_x4x5). Fingleton, B., 2003. Increasing returns: evidence from local wage rates in Great Britain.
For our non-optimal instrument estimators (which is what the Oxford Economic Papers 55, 716–739.
Fingleton, B., Le Gallo, J., 2008a. Finite sample properties of estimators of spatial models
applied researcher will typically be working with), simulation results
with autoregressive, or moving average, disturbances and system feedback. Annals
show that coefficient estimation bias for the mismeasured variable of Economics and Statistics 87–88, 39–62.
becomes increasingly negative with increasingly positive spatial Fingleton, B., Le Gallo, J., 2008b. Estimating spatial models with endogenous variables, a
spatial lag and spatially dependent disturbances: finite sample properties. Papers in
dependence in the disturbances. Bias also becomes more negative as
Regional Science 87, 319–339.
the ratio of measurement error variance to explanatory variable Fingleton, B., Le Gallo, J., 2009. Endogeneity in a spatial context. In: Páez, A., Le Gallo, J.,
variance augments. We find that OLS and IV_3g estimation outper- Dall'erba, S., Buliung, R. (Eds.), Progress in Spatial Analysis: Theory and
form GMM-based and ML estimation, a conclusion that is reinforced Computation, and Thematic Applications. Springer-Verlag, Berlin.
Greene, W.H., 2003. Econometric Analysis, 5th Edition. Prentice-Hall, New Jersey.
(at least for small samples) when one also takes account of the Grether, D.M., Maddala, G.S., 1973. Errors in variables and serially correlated
precision of the estimates via RMSE. These results, like those of disturbances in distributed lag models. Econometrica 41, 255–262.
Author's personal copy

J. Le Gallo, B. Fingleton / Regional Science and Urban Economics 42 (2012) 114–125 125

Griffith, D., Paelinck, J.H.P., 2007. An equation by any other name is still the same: on Mankiw, N.G., Romer, D., Weil, D.N., 1992. A contribution to the empirics of economic
spatial econometrics and spatial statistics. The Annals of Regional Science 41, growth. Quarterly Journal of Economics 107, 407–437.
209–227. Smith, T.E., 2009. Estimation bias in spatial models with strongly connected weight
Kelejian, H.H., Prucha, I.R., 1999. A generalized moments estimator for the matrices. Geographical Analysis 41, 307–332.
autoregressive parameter in a spatial model. International Economic Review 40, Stakhovych, S., Bijmolt, T.H.A., 2009. Specification and spatial models: a simulation
509–533. study on weights matrices. Papers in Regional Science 88, 389–408.
Kelejian, H.H., Prucha, I.R., 2007. HAC estimation in a spatial framework. Journal of Temple, J.R.W., 1998. Robustness tests of the augmented Solow model. Journal of
Econometrics 140, 131–154. Applied Econometrics 13, 361–375.
Kennedy, P., 2003. A Guide to Econometrics, Fifth Edition. Blackwell, Oxford. Upton, G.J.G., Fingleton, B., 1985. Spatial Data analysis By Example, Volume 1. Wiley,
Lee, L.-F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators for Chichester.
spatial autoregressive models. Econometrica 72, 1899–1925. Wetherill, G.B., 1981. Intermediate Statistical Methods. Chapman and Hall, London.
LeSage, J.P., Pace, K.P., 2009. Introduction to Spatial Econometrics. CRC Press.

View publication stats

Você também pode gostar