Data Markets and The Production of Surveys

The Review of Economic Studies, Ltd.
Data Markets and the Production of Surveys

Author(s): Tomas Philipson
Source: The Review of Economic Studies, Vol. 64, No. 1 (Jan., 1997), pp. 47-72
Published by: Oxford University Press
Stable URL: http://www.jstor.org/stable/2971740 .
Accessed: 23/10/2014 15:33
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
.
Oxford University Press and The Review of Economic Studies, Ltd. are collaborating with JSTOR to digitize,
preserve and extend access to The Review of Economic Studies.
http://www.jstor.org
This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM
All use subject to JSTOR Terms and Conditions
Review of Economic Studies (1997) 64. 47 72 0034-6527/97/'00030047$02.00
(c 1997 The Review of Economic Studies Limited
D a ta Ma rkets a nd the
Production of Surveys
TOMAS PHILIPSON
Unliver.sitl' of Chlica go
Fir.st V1Versi receive(d Ji,ie 1995; fina l version a ccept(l August 1996 (Eds.)
The production of da ta , a nd the functioninig of the ma rket for observa tionis, a re universa l
conccrins to a ll ficids ol positive econiomics. Econiomists, hiowever, hia ve typica lly pla ced grea ter
cmplha .sis
oni systema tica lly a nia lyzinig the conisumptioni of da ta tlia n oin considerinig its production.
In thie productioni of da ta througih surveys, a n importa init input ma irket is tha t of la bour, in wlhiclh
t dema iider tra ides observa tions withi the suipplyinig sa mple members. This pa per a na lyses optima l
moniopsoniy compensa tion in such da ta l ma rkets, the importa nit rela tionshiip it bea rs to estima tion
uISilng the da ta i tha t a ire obta ined. a nd the sta itistica l effects of implicit public wa ge regula tions tha t
a re priesenit
in U.S. ma rkets for observa tions.
1. INTROD UCTION
Sta tistics ma y be interpreted a s a norma tive theory of guessing the fea tures of a popula tion
wlhile ha vinig a ccess to informa tion on only pa rt of it, a nd the field obviously hia s a long
history.t Absent from this history, however, but highly present in the a ctua l production
of sta tistics on hiuma n popula tions, is the influence tha t ma rket forces ha ve on the economic
exclha nges ma de between suppliers a nd denia nders of observa tions. These ma rket forces
a re importa nt beca use sta tistics exists due to economic constra ints. For if there were no
economic constra ints, then sta tistics, in the sense defined a bove, would not be necessa ry:
da ta for the wlhole popula tion would a lwa ys be produced. The very essence of sta tistics
is therefore economic, but this pa per a rgues tha t economics ha s not been fully utilized in
sta tistics. This is pa rticula rly a ppa rent in the fa ilure to a ddress the question of the optima l
production of da ta , a s opposed to their consumption. Although economists a pprecia te
well-produced da ta unlike ma ny other fields of empirica l inquiry, they seldom produce
their own da ta . Tha t is, economists a re more often consumers tha n producers of da ta .
This is ma nifest in the focus of econometrics on the consumption a spects of da ta , with
virtua lly no a ttention given to their production.)
In pa rticula r, despite the fa ct tha t sa inple survevs provide the cornerstone for empirica l
resea rchi in a ll fields of positive economics, little a ttention ha s been pa id by economists to
their production. This pa per a rgues tha t this focus a mong economists is misguided beca use
production bia ses in surveys ma y be la rger tha n those introduced by the a na lysis of the
resulting da ta . Therefore, a systema tic understa nding of the ma rket incentives involved in
the sta tistica l excha nges underlying the production of da ta is importa nt in order to lower
the ma gnitudes of these production bia ses. The ma in a rgument of the pa per is tha t a better
understa nding of da ta production ma y be ga ined by recognizing the la bour economic
1. For a ni hiistorica l a ccount see. e.g. Stigler (1987).
2. For exa mple, T/e HIa idbook of Econometrics (1983) dea ls exclusively withi consumption issues in da ta
a na lysis.
47
48 REVIEW OF ECONOMIC STUD IES
a spects inherent in the ma rket for observa tions. Consequently, better inferences ma y be
ma de using the da ta tha t a re produced. More specifica lly, we will be concerned with how
the compensa tion offered to suppliers ma y a ffect both the qua ntity a nd qua lity of la bour
dema nded. Qua ntity, a s used here, refers to the sa mple size. Qua lity ma y be determined
by how representa tive the supply is of the la rger popula tion being investiga ted, a ma jor
concern here, but it ma y a lso refer to the degree of error in the supplied observa tions,
a nd the fa ilure to supply a nswers to a ll questions.3
The ma rket for huma n observa tions, which is the most importa nt fa ctor ma rket in
the production of da ta sets. Sa mple members in surveys provide the supply side in this
la bour ma rket a nd those producing the survey represent the dema nd side, with the
excha nge between the two ma king up a na lyza ble da ta a nd the a bsence of such a n excha nge
yielding missing da ta in the survey. This la bour ma rket for observa tions ha s severa l distinc-
tive fea tures. The first is tha t the dema nder of observa tions often enjoys monopsony power
a nd therefore wa ge discrimina tion is importa nt. We will therefore pa y pa rticula r a ttention
to optima l compensa tion schemes for sa mple members. The second is tha t a symmetric
informa tion is a n essentia l component of survey production: if the dema nder ha d complete
informa tion a bout the supply side, the survey itself would be unnecessa ry. Thus, this
informa tion a symmetry limits the a bility of the dema nder to enga ge in (third-degree) price
discrimina tion, a s cha ra cteristics of the sa mple members a re unknown a t the time of
excha nge. The third distinctive fea ture is tha t the va lue of the output is, ceteris pa ribus,
increa sing in how representa tive the la bour dema nded is of the la rger popula tion the
survey a ims to lea rn a bout. The importa nt implica tion of this is tha t there is low substitut-
a bility between sa mple a nd non-sa mple members in the dema nd for la bour. This low
substituta bility implies, in turn, tha t substa ntia l resources a re devoted to sea rching for
sa mple members, despite the rea dy a va ila bility of non-sa mple members a t lower wa ges.
The dema nd is for a ra ndomly selected sa mple, not the chea pest sa mple tha t could supply
observa tions. These sea rch costs a re substa ntia l in survey production a nd a re importa nt
to keep low in order to reta in enough resources to genera te the la rgest number of observa -
tiona l tra des with sa mple members. Such tra des, in turn, serve to limit the high sta nda rd
errors a nd bia s stemming from missing da ta .
To illustra te our concerns, consider Ta ble I which depicts the production costs tha t
resulted from this la bour ma rket for a set of surveys conducted by the Na tiona l Opinion
Resea rch Center (NORC) a t the University of Chica go.4
The columns of the ta ble show, from left to right, the tota l costs of production; the
fra ction of the sa mple rea ched a nd interviewed (la belled response ra te); the tota l size of
the sa mple tha t wa s a ttempted to be interviewed; the a vera ge cost of production, a s
mea sured by the cost per interviewed sa mple member; the number of weeks it took to
produce the survey; a nd two mea sures of the growth in the cost per observa tion througlhout
the survey: growth from sta rt to finish a nd a vera ge growth per week. One question of
pa rticula r importa nce here is whether the compensa tion policy of the monopsonist, NORC
in this ca se, ca n increa se the response ra tes in coltumn 2 without ra ising the tota l costs in
column 1. The lowering of production costs is centra l to sta tistica l a na lysis beca use it
a llows for a more efficient use of resources within a given survey budget, resources tha t
ca n therefore be used to reduce bia s a nd sta nda rd errors a rising from the la ck of observa -
tiona l tra des (i.e. missing da ta ). The issue of freeing up the budget for compensa tion is
3. These three qua lity a ttributes a re referred to a s uinit non-responise. mea suremenit error, a nd item noni-
response, respectively.
4. NORC is loca ted on the ca mpus of the University of Chlica go, with outsta tions in New York a nd
Wa shiington, D .C., a nd is the oldest survey resea rch fa cility in the U.S.
PHILIPSON D ATA MARKETS AND SURVEYS 49
r-CD
: V-) _~ oc
30 't
7 o <,- .0
to ._V'zXl(
< bt L
)
6b
_ <
tr
O ocd *
CO
.
. . . . . . . . . . . . . . . . .
<C - 0:
C,^ t W) c, r t \p c, o r- m 00 O
"t ,) " r- _ _ " oc tf \c > o jt
-------o- - -
ICO
rl 00
_
0
c-l
m
tr
l
l
00
_
r
_
> CN
\_ _- _r C/)_tl
> 6
or ---00_ -C_ l?Cl
om O < F- b
Cl~~~~~~~(
.N
Q~~~~~~~~~~~~~~~~~~~~O
_
A
00 C 0 r 3 00 OG OC
-
O) 00 00 Cr Cr 00 OC 00
C G) \
C\
E r 00 ON C O 00 00 Lf O C Cl L - ON Cl)
4
-; ClC4C>'Cd- \ \
l
t
'
0
- d- - CO
-c) -0
o 0) o>>___ F? ? b
013 0e
_
r- C-
,
:t C-, r- tr
oc
<,o _< <
r1v|< -l _ m r- r, - d 00 \. ? t-) m 0?
oj
Cr\
LL =Qmoa o o G ) > > > >
\
01 G
N~~~~~~~~~~~~~~~~~~~
00
C O 0000C000 N ON 000 0 CZ
.C C- t) t) C,
0 )0Lf tr)ftr
\X-
-
r-
L(-
COC O C'-) -N -- 0, r- r-I M "C W O
-
z ,tt
m
C<,
o tm c r-- Nt c j
-
O O t <, O <. t o t t m o
i 0 Ct- r- C, O ) \D Q C, )
F~~~~~~~~~~0
C)
Cl r-
t-tJ
W.)toP
C o m Os
~~~- 08 )0 lT- e rt l1-1 (CO r - -ON t1\oct,
C, C,
,Z 3
Cl 0 ~~~~~~~~~~~~~- ~ ~ ~ ~ ~ ~ ~ ~~~~CN ON 0~. O
0~~ LI~~ONONONONONON00ONC-.C-.C-.~~~~~~~~C,, CrI. CZ00 ~
0)-
H) Z
Z
u
X
C, C, *C Cr
tC C
C, c
C 0
Ho
.
~ CO-C O1)O 'O c- C\ c- 0 _ _._
>
-- - - - J
sC ^ G
C _
- - C O
C'\
X _ - _'_ _ _ , C', C' o E ' C C = r
V-
V
--
C t' _ _ _ _ _ _ _ _
--. -.
G)
e)
C
CO?
-,, Er C7 O
C/)~~~~~~~~~~~~~~~~~~7
, - >1 > > >1 > > >.---- U w %sc;_
._.__ONONO-ON . . .c)
C) X
ON
ZZZZZZZVVVV Z OO . d_/)C/ -
LOCul
Au
0) 0
C) ~~~~~J2 ~ ~ F ; ON ONONO
VI' VI' ul) ul)
H C
0 0 0 0 z~~~~~~.
z
COCOCOCOCOCO0)0)0)o~~~~~~~~~~ -~ 00
ZZZZZZZZZ00c~~0z Z ~ < ~
>
therefore funda mnetnta l: missing da ta presuma bly occurs beca use a n observa tiona l excha nge
is not m11utula lly beneficia l. The ta ble a lso illustra tes the dra ma tic rise in the cost per
observa tion throughout surveys: they rise a bout 5 5'%o per week (column 7), a nd double
on a vela ge fi-om sta rt to finish (column 6). We will a rgue tha t these increa sing unit costs
a re due to the rela tively more expensive sea rches occurring a t the end of the survey, a nd
tha t this ra ises the va ltue of compensa tion by reducing spending on unproductive sea rches
for sa mple members tha t refuse to pa rticipa te in the survey. This va lue of increa sed
compensa tion, we will then a rgue, is importa nt for the increa se in bia s a nd sta nda rd errors
induced by wa ge regula tions tha t ha ve been imposed upon survey production in the U.S.
Section 2 of the pa per sets forth the ba sic model a nd discusses compensa tion of
sa mple members under costly sea rch, when the outcomes mea sured by the survey a re not
correla ted with who ma kes observa tiona l tra des. The centra l tra deoff tha t determines
optima l compensa tion is tha t increa sed compensa tion reduces sea rch costs but ra ises wa ge
outla ys. The section shows how to opera tiona lize optima l wa ge discrimina tion a fter estim-
a ting ha za rd functions describing the proba bility of exit out of not being found, given a
set of consecutive sea rches. We discuss the impa ct of increa sing sea rch costs throughout
the survey, a s depicted in Ta ble I a bove, a nd show how they put upwa rd pressure on
optima l compensa tion.
Section 3 discusses survey production when those tra ding observa tions ma y be differ-
ent in their outcomes tha n those who do not tra de, which is commonly known a s non-
response bia s. We demonstra te how to ea sily incorpora te estima tion of production bia ses
into sta nda rd regressions using da ta from the Hea lth a nd Retirement Study (HRS). We
a rgue tha t instea d of ela sticity-ba sed wa ge discrimina tion, a ra ndomly a ssigned wa ge ma y
be desira ble beca use it serves a s a n instrumenta l va ria ble to estima te bia s induced by the
a bsence of observa tiona l tra des. We compa re such ra ndom wa ge discrimina tion with
intertempora l wa ge discrimina tion, in which re-sa mpling ta kes pla ce a nd a la rge premium
is pa id to those who do not supply observa tions prior to the re-sa mpling. The effectiveness
of such a premium is limited by the incentive it crea tes for initia l non-pa rticipa tion. We
therefore a rgue tha t the re-sa mpling proba bility must reduce the va lue of this incentive to
dela y response, in order for unbia sed re-sa mpling to be incentive compa tible.
Section 4 discusses the implica tions of the public regula tion of surveys, stressing the
effects of the implicit ma ximum wa ge policies of the Office of Ma na gement a nd Budget
(OMB) for publicly fina nced surveys in the U.S. The importa nt point a bout such regula -
tions is not so much tha t they ra ise production costs but tha t they introduce production
bia ses in surveys, bia ses which we show ma y domina te tra ditiona l bia ses introduced by
the wa y the da ta a re a na lyzed. In pa rticula r, the tra de reductions induced by ma ximum
wa ges lea d to regula tion-induced increa ses in both the bia s a nd sta nda rd errors of
estima tors.
Fina lly, Section 5 concludes by discussing severa l topics of importa nce tha t ha ve been
omitted from, but suggested by, the a na lysis. The importa nce of da ta production to virtu-
a lly a ll fields of positive economics, a s well a s the other socia l sciences, ra ises a rich a nd
exciting set of issues tha t ma y be usefully a ddressed by economic a na lysis of da ta ma rkets.
It is well known tha t there exists a n extensive litera ture on survey design outside of
economics.5 Although there is a va st theoretica l litera ture on optima l sta tistica l decision
5. The litera ture is too extensive to review hlere in a ny mea ningful ma nner. Sta tistica l cla ssics include the
books by Ha nsen, Hurwitz, a nd Ma dow (1953), Coclhra ne (1979), a nd Kish (1986). Representa tive trea tments
on the design of survey questionna ires a nd interviewer pra ctices include, for exa mple, Groves (1989), Bra dburn
a nd Sudma n (1988), Beimer et a l. (1992), Lessler a nd Ka lsbeek (1992).
ma king from the sta ndpoint of a single person,6 little focus ha s been pla ced on how
sta tistica l mna rkets, a nd the a spects of excha nge within them, a ffect the production of
sta tistics. Economists ha ve a compa ra tive a dva nta ge in understa nding these ma rkets for
observa tions, pa rticula rly in rela tion to the la rge a mount of litera ture on incentives in
mecha nism design (see, e.g. Green a nd La ffont (1979), La ffont a nd Ma skin (1982)).
Beca use a survey is a mecha nism, in the economic sense of the word, tha t tra nsfers informa -
tion between the suppliers a nd dema nders of observa tions, survey design represents a grea t
a rea for the pra ctica l a pplica tion of economic work on mecha nism design.
2. THE PROD UCTION OF D ATA
The purpose of a survey is to lea rn a bout the outcomes Y of a la rger popula tion, from
which the survey produces da ta on a sma ller, ra ndomly-selected sa mple.7 D ue to the non-
substituta bility of sa mple a nd non-sa mple members induced by the va lue of ra ndomly
selecting the sa mple, the sea rch for sa mple members is extensive, despite the rea dy a va il-
a bility of non-sa mple members a t lower wa ges. Thus, the problem fa cing the producer of
the da ta set is to sea rch for a nd tra de observa tions with sa mple members in the most cost-
efficient wa y.
For a sa mple of size n, consider a production process in which the dema nder sea rches
for a ll unrea ched sa mple members until the fra ction of the sa mple interviewed rea ches a
given level f, referred to throughout the pa per a s the
relsponse
ra te. If a sa mple member
is rea ched, he is offered the opportunity to pa rticipa te in the survey a t the wa ge w. Let
the reserva tion wa ges for pa rticipa tion on the supply side be denoted z, a nd denote by
G(z) its cumula tive distribution function in the popula tion, with support [0, 2] a nd corre-
sponding density g(z). The va lue G(v) is therefore the pa rticipa tion ra te a t wa ge St', the
ra te a t which sa mple members who ha ve been rea ched enter into the survey. Sea rch
continues for unrea ched sa mple members, a t a unit cost denoted c, until the desired
response ra te ha s been rea ched. D enote by S(t1w) a decrea sing surviva l function rep-
resenting the fra ction of the sa mple tha t ha s not been rea ched a fter a dura tion of sea rch
t, with corresponding ha za rd function h(t I wv) representing the propensity to be rea ched a t
t when not a lrea dy rea ched. We will refer to the sea rch process a s being compensa tion
dependent (independent) when the a bility to rea ch sa mple members depends (does not
depend) positively on the wa ge w. D epending on the sea rch technology, the fa ct tha t
compensa tion is offered ma y or ma y not a ffect the ea se with which sa mple members ca n
be found.
The popula tion ma y thus be described by the joint distribution of the ra ndom vector
(Y, T, Z) representing the outcomes mea sured (Y), the sea rch dura tion (T), a vld the
reserva tion wa ge (Z). This section considers the specia l ca se of da ta production when the
elements of ( Y, T, Z) a re a ll independently distributed, while subsequent sections focus
on the dependence between them. Let F1 (t I 11') a nd
Fo(t I ,) denote the fra ction of the
sa mple members tha t ha ve been rea ched a nd pa rticipa te a nd those who do not pa rticipa te,
respectively. Given the wa ge w, the sa mple members' a ccepta nce stra tegy is simply to
a ccept if they get a n offer to supply observa tions a bove their reserva tion wa ge a nd to
6. See, e.g. Sa va ge (1977) a nd Berger (1988).
7. Survey production enta ils ma king explicit a finite listing of the la rger popula tion, the survey fra me,
from which the sma ller sa mple ca n be dra wn.
reject otherwise, which yields
F1(t u)=[l-S(tivw)]G(wv)
a nd
F0(tJw)=[l-S(tJiv)][l-G(iv)].
(1)
This sa ys tha t the pa rticipa nts (non-pa rticipa nts) a re those who ha ve been rea ched who
ha d reserva tion levels below (a bove) the wa ge. Since the sta tes of not being rea ched,
pa rticipa tion, a nd non-pa rticipa tion a re mutua lly exclusive, we ha ve S(t
I
w) +
F,
(t
I
wv) +
F0(,(t
It) = I for a ll dura tions t?0. The ra te a t which non-pa rticipa tion occurs rela tive to
pa rticipa tion is denoted by a (w) a nd defined by
a (l')-=Fo(tIv)/F(tIw)=I
-
G(w)
(2)
G(w)
whichi is independent of the dura tion of sea rch a nd fa lls with the wa ge since more pa rtici-
pa nts a re a ttra cted: da /dt=0 a nd da /dw<O. Let
rf(w)
be defined a s the dura tion of
sea rch for the entire survey when the given response ra te f ha s been produced. It is
implicitly defined through the rela tionship8
F
(rf (iv)l 16v) =f=>S( rf (w)l
I
v)
=
-f
[1+ a
(i})] (3)
The implicit function theorem implies tha t the effect of the wa ge on this dura tion of the
survey is nega tive beca use sa mple members pa rticipa te more:
da dS
drf
dw dw
-- ?~~~~<0.
(4)
dw dS
dt
The present va lue of the recruitment costs
Rf
(w) to produce the response ra te of the survey
is ma de up of sea rch costs, a s in
Rf (w) _ n
S(t I wv)ce -"dt, (S)
.=o
where r is the continuous time discount ra te. This is the sea rches ma de on members who
survived a ll previous sea rches, priced out a t the present va lue of the unit cost of sea rch.
The present va lue of the wa ge expenditures Lf(w) equa ls
Lf (wv)- J fi (t
I w) we
't
dt, (6)
.=0
where f(t'
Iw)
is the deriva tive of F1 (t
Iv)
a nd represents the fra ction who get rea ched a nd
pa rticipa te a t t. This is the sa mple members rea ched a t ea ch dura tion who a gree to
pa rticipa te, priced out a t the present va lue of the wa ge pa id. The tota l cost of producing
a given response ra te, C(f), is thereby given by both the recruitment a nd wa ge expendit-
ures, a s in
C(f)
Min{Rf(w)
+ Lf(w)
I
wO0} . (7)
The optima l wa ge for a given response ra te is denoted v(f ) a nd is determined by ba la ncing
increa sed wa ge costs with reduced sea rch costs, so tha t the necessa ry first-order condition
8. Since f< G(w) _ I for a ny fea sible wa ge, the right-ha nd side of the la st equa lity is bounded a bove by
unity a nd below by zero.
for a n interior solution.
dRf/1dw?+dLf/1dw=-0,
ma y be sta ted a s
frt
(IS
d'rf
IrT'r
df
1
dvf
IT
i ce dt+_ [Sce- rf]+ J
?fi
e- dt+-f [fwere ]=O (8)
j ___ d d(1w
J=
dwiv dw
eva lua ted a t the optima l survey wa ge a nd survey dura tion,
wv(f)
a nd rf(w(f)). The first
term is the ma rgina l wa ge effect on the present va lue of sea rch costs. It consists of the
reduction in those sea rched throughout the dura tion of the survey, a s well a s the ma rgina l
effect of shortening this dura tion. The second term is the ma rgina l wa ge effect on the
present va lue of wa ge outla ys, which a re increa sed throughout the survey by more sa mple
members pa rticipa ting but lowered by the reduction in the dura tion of the survey. Thus,
the essentia l tra deoff in ra ising the wa ge is tha t it decrea ses sea rch costs but increa ses
wa ge outla ys, so tha t a t a n optima l wa ge the two ba la nce ea ch other out.
To ma ke this tra deoff most tra nspa rent, consider the ca se when discounting is negli-
gible, r =
0, a nd the sea rch ha za rd is time- a nd compensa tion-independent, h(t) = h, so tha t
the surviva l function of unrea ched sa mple members is S(t) =e"'. In this ca se, the recruit-
ment costs a nd wa ge outla ys reduce to
Rf(wi)=n (c a nd
Lf(w)=nfw.
(9)
h
However, using tha t FI(rf
Iw)
=[ -
S(rfIw)]G(w)
=f implies 1
-
e -thf/G(w), the tota l
costs become
Rf(w)
+
Lf(w) =nf
c+wj. (10)
This sa ys tha t for ea ch of the
nf
observa tions produced, the tota l cost is the tota l cost of
sea rch, together with the wa ge pa id. The tota l cost of sea rch is the a vera ge dura tion of
sea rch until the sa mple member is found,
l/h,
times the a vera ge length of sea rch until a
found sa mple member is willing to pa rticipa te, I/G(w), both priced out a t the unit cost
of sea rch c. As wa ges rise, the first term representing sea rch costs fa lls, while the second
term representing wa ge outla ys rises.
An illustra tive ca se is when there is a non-zero ma ss of individua ls tha t supply a t no
wa ge, Go -G(O)
>
0, with the rest being uniformly distributed a t the rema inderof its support
a s in G(z)
= G0 + [1
- G0 ]z/z. It ca n be shown tha t in this ca se the necessa ry condition for
a n interior solution implies a closed form solution for the optima l wa ge iv(f), a s in
c z
-
G,
w(f) =
hl-G
1
-
-
G(
The wa ge is independent of the response ra te produced beca use of the time-independence
in the ha za rd ra te of sea rch. D ue to the cost of wa sting sea rch on non-pa rticipa nts, the
wa ge is increa sing in the effective cost of rea ching a sa mple member, c/h, defined a s the
unit cost of sea rch times the expected sea rch time until a sa mple member is rea ched. The
corner solution of no pa y occurs whenever the optima l wa ge is less tha n zero, which turns
out to be true when c/lh
<
[zGJ]/a (O),
which holds when effective sea rch costs a re rela tively
low or free supply is rela tively la rge.
2. 1. Opera tiona lizing wa ge discrimizina tion
For some surveys, differences a mong sa mple members a re observa ble to the dema nder
before the sea rch for the sa mple sta rts.9 For exa mple, for household surveys, a cha ra cter-
istic tha t is frequently a va ila ble is gender, beca use it ca n be inferred from na me lists of
households. In these ca ses, wa ge discrimina tion will na tura lly reduce la bour costs, a nd
the question becomes how to opera tiona lize estima tes of the reserva tion wa ge a nd sea rch
distributions of sub-popula tions into rela tive wa ges.
Consider the ca se when a response ra te
f
is to be produced for different groups
cha ra cterized by the cova ria te x. For a ny two groups denoted by x =0 a nd x = 1, let
(w0, wv) denote the optima l wa ges. The necessa ry first-order condition for a n interior
optimum under the objective function in equa tion (10) implies tha t their rela tive levels
ca n be shown to sa tisfy
F 1~~~~2
h(rzf(w1)Ix=l) g(w,lx=l) G(wolx= 0)
(12)
h(rf(w,o)Jx=0) g(wolx=0) LG(wlIx= 1)
In other words, under optima l compensa tion the rela tive ha za rds equa l the rela tive wa ge
effects on the dura tion of sea rch, d(1/G)/dwv, to find a rea ched sa mple member tha t
pa rticipa tes. Let the sea rch ha za rd functions a cross groups be given by the ea sily estima ted
proportiona l ha za rd model
h(tl
x)
=
q(x)hB(t), where hB(t) is the ba seline ha za rd function
which is sca led proportiona lly by the loa dings q(x) for ea ch group. A very intuitive ca se
turns out to be when the reserva tion wa ge distribution G is uniform, in which ca se estima tes
of sea rch ha za rd functions a nd pa rticipa tion ra tes ca n be directly tra nsla ted into rela tive
wa ges, a s in
WO (x= 1) G(wlx= 1)
w (x=0)G(Ix=01) (13)
IvI q'(x
=
O)
Golvl
x
=
O)
Here, G(wlx) is the estima ted pa rticipa tion ra te for x, conditiona l on being rea ched a t
a ny common wa ge v in a pa st survey, a nd
q(x)
is the estima ted loa ding in the proportiona l
ha za rd model. In other words, the rela tive wa ge ha s the intuitive interpreta tion of being
determined by rela tive pa rticipa tion a nd ha za rd ra tes. Ceteris pa ribus, the la rger a re the
estima ted loa dings in a proportiona l ha za rd regression or the la rger a re the estima ted
pa rticipa tion ra tes, the lower is the wa ge.
The optima l degree of such wa ge discrimina tion a cross genders ma y be estima ted
using unique da ta from the 1993 Hea lth a nd Retirement Study (HRS), which a llows for
estima tes of the sea rch surviva l functions
S(tlx).
The tota l sa mple size in HRS wa s n=
14,370 of which I
-f= 12 2'% did not excha nge observa tions. The HRS collected da ta on
the tota l number of sea rches for ea ch sa mple member. The a vera ge number of sea rches
for the overa ll sa mple wa s E[T]=3 9 per sa mple member, including pa rticipa nts, non-
pa rticipa nts, a nd unrea ched sa mple members. Since some sea rches were censored for unre-
a ched sa mple members, this underestima tes the mea n sea rch dura tion. The va ria nce wa s
V[T]
=
16 7, a nd the ra nge of the support of the distribution of T wa s [1, 74], where the
right ta il of the distribution involved rema rka bly high numbers of sea rches but wa s ra ther
thin.
9. This ma y occuIr wheni the listing of the popula tioni (the survey fra me) conta inis mea sures used to gener-a te
a stra tified sa mple or under so-ca lled cluster sa mpling, when sa mpling ta kes pla ce a cross clusters, such a s census
tra cts or blocks.
Ma les Fema les
Group 0 Group I
100 _
0;
00
0~
<r 025- E
CL
0-25
-
l_
z
0.00
-
0 10 20 30
Tota l number of ca lls
FIGURF 1
Gender effects on non-response surviva l curves
When households listings from which sa mples a re dra wn conta in na mes, gender ca n
be inferred from the first na me for a ll but a sma ll fra ction of the sa mple with dua l-gender
na mes (e.g. Kelly or Fra ncis). To demonstra te the differences in ma le a nd fema le sea rch
dura tions, Figure 1 a bove depicts the Ka pla n-Meier estima tes of the ma le a nd fema le
surviva l curves
S(tlx=0)
a nd
S(tlx=
1) for the HRS.
The empirica l ma le surviva l function first-order domina tes the fema le one. The
Ma ntel-Ha enszel test sta tistic for testing the equa lity of the two surviva l functions wa s
9 71, with a p-va lue of = 0 000, suggesting tha t tlhe two surviva ls differed significa ntly a t
a ny sta nda rd level of significa nce. Furthermore, the a vera ge number of sea rches for fema les
wa s E[7fx
= 1 ]
= 3 6, which wa s significa ntly lower from the mea n for ma les, E[x
=
0]
=
4 3. Fina lly, a fra ction P(T>
rflx=0)
=
12 9% of ma les were censored, with the corre-
sponding fra ction P(T>
rf Ix
=
1)
= 11 7I o% for fema les a lso suggesting tha t sea rch dura tions
were la rger for ma les, beca use censored observa tions involved longer dura tions in the
HRS.
These descriptive sta tistics ma y be summa rized more succinctly by the ma gnitude of
the coefficient for a fema le dummy in a proportiona l ha za rd specifica tion, a s reported in
Ta ble 2 below.
The ta ble reports tha t being fema le ra ises the ha za rd ra te into being rea ched by
16-6%, a nd tha t this increa se in the fema le ha za rd is significa nt a t a ny sta nda rd level.
Aga in, this suggests tha t men a re ha rder to rea ch tha n women not controllina g for other
fa ctors, since such fa ctors a re not a va ila ble before the da ta a re collected, tha t is, a t the
time of contra cting between the monopsonist a nd sa mple members. Na tura lly, other things
consta nt, in pa rticula r the a lloca tion of home rela tive to ma rket production, gender ma y
not ha ve a n effect on sea rch dura tions. The pa rticipa tion ra tes conditiona l on being rea ched
for men a nd women a re a lso reported in Ta ble 2 a nd revea l a significa nt difference between
the genders. Using our formula for optima l rela tive wa ges, the estima ted ma le premium
TABLE 2
Optima l wa ge discrimina tion by gender
Ha za rd Sta nda rd Sa mple 95% Confidence
ra tio error size interva l
Fema le 1 166 0 02 14,243 [1.125, 1 207]
Unconditiona l effect of fema le dummy on coopera tion ra te
Coopera tion Sta nda rd Sa mple
ra te error size
Ma le 0 827 0 005 6,712
Fema le 0 842 0 004 7,657
Implied rela tive wa ge
Ma le
Wa ge
_t'O
ll6 0
184
=16x 1.09
Fema le Wa ge w1, 0 827
in cost-efficient survey production is reported to be 9%. The ma le premium in household
surveys, when the fema le distribution of sea rch dura tions is stocha stica lly domina ted by
tha t of the ma le, is a result of the la rger benefit of getting a ma le to supply when he is
not only ha rder to rea ch, but a lso ha rder to get to pa rticipa te once rea ched.'0
2.2. The wa ge effects
due to dura tion dependence in sea rch ha za rds
One importa nt a spect of survey production is differences in the a va ila bility of sa mple
members. An importa nt a spect of these unobserved differences is the increa sing costs of
sea rch throughout the survey tha t they imply. In the ca se of a compensa tion independent
sea rch process, dS/dw = 0, but one with potentia l dura tion dependence in the ha za rd ra te,
dh/dt<0, this ma y be seen from the necessa ry condition for a n interior optima l wa ge
derived a bove. Substituting the expression for dv,/dw from equa tion (4) into the left-ha nd
side of the necessa ry first-order condition in equa tion (8), one gets
c
da erTIfdL,
h(zjf)
div dnv
a ga in eva lua ted a t the optima l level. The left-ha nd side is the ma rgina l effective cost of
sea rch of finding a dditiona l sa mple members,
c/h(rf),
times the a bsolute va lue of the
odds-ra tio of the pa rticipa tion ra te; it represents the reduction in sea rch costs due to the
fa ct tha t unproductive sea rch on non-pa rticipa nts a re elimina ted by a wa ge increa se. If
sea rch ha za rds decrea se to zero quickly, therefore, then a lmost rega rdless of the size of
the sea rch costs, there will be substa ntia l upwa rd pressure on wa ges to elimina te unproduc-
tive sea rch for non-pa rticipa nts.
Heterogeneity in the ha za rd functions a cross sa mple members will ma ke sea rchi ha z-
a rds decrea se to low levels a nd thus ra ise effective sea rch costs. Consider a popula tion in
which the sea rch surviva l function is a mixture of exponentia ls a s in
S(tl@v)= JS(tjhI,
wv) dIH(h), (15)
10. Furtliermore, the a ge of eligibility for the HRS wa s 51-61 yea rs. Presuml1a bly, gender a va ila bility should
be even more differentia ted for lower a ges, when rela tively more fema les ta ke ca re of children a t home.
where H(h) is the mixture distribution a nd the conditiona l surviva l
S(tlh,
wv) is exponen-
tia lly distributed. It is well known tha t such unobserved differences imply tha t the uncondi-
tiona l hla za rd ra te is of lower slope tha n the conditiona l one, in this ca se decrea sing when
the conditiona l is consta nt a cross dura tions
dlh(tlw)<0
(16)
dt
This occurs beca use the sa mple members who a re rela tively more a va ila ble a re found
ea rly, ma king the surviving pool rela tively less a va ila ble."
This is importa nt beca use the unobserved heterogeneity in ha za rd ra tes increa ses the
optima l wa ge, due to a n increa sed benefit of getting sa mple members to pa rticipa te ea rly,
when sea rch costs a re low. This ca n be seen by compa ring a homogeneous popula tion,
with a degenera te mixture distribution Ho(h)
= 1
{jh,
to a popula tion with a mixture distri-
bution H11(h), obta ined by a mea n-preserving sprea d of H0, such tha t the mea n a nd va ria nce
sa tisfy
EH[h]
=
ho,
but VH [h] >0. This implies the ha za rd rela tionship
hH(t) <ho (17)
beca use the two ha za rds sta rt a t the sa me level, hH(O)=ho, but the ha za rd under a non-
degenera te mixture distribution is decrea sing, a s opposed to being consta nt, a s is the ca se
when the ha za rds a re homogeneous.'2 This implies tha t the dura tion of the survey is
increa sed by heterogeneity, rHO ?_ rH. The first-order condition for the optima l wa ge a bove,
together with the lower ha za rd under a more heterogeneous popula tion, directly implies
tha t the optima l wa ge under heterogeneity is la rger tha n the one under homogeneity
WH(f)>?WHo(f)-
(18)
Tha t is, the unobserved differences increa se the wa ge beca use the effective sea rch costs
increa se with the dura tion of the survey so tha t the benefit of a ttra cting people ea rlier in
the survey, through la rger wa ges, rises.'3
2.3. The dyna mfics of a vera ge costs
Although the reduction in sea rch ha za rds tha t ra ises optima l compensa tion ma y be ha rd
to observe directly, it will ma nifest itself in the intertempora l pa ttern of production costs.
D efine the dura tion specific a vera ge costs A, to be the tota l expenditures on both wa ges
a nd sea rch sprea d over a ll pa rticipa nts, a s in
A=
fi(tIw)w
+
S(t I v)c
r+
c
(I19)
f1 (t
I
W) h(t
I
w)G(w)
The first wa ge term is due to the fa ct tha t only individua ls who pa rticipa te a re pa id. The
second sea rch term follows from the fa ct tha t, a mong those survived to be sea rched, only
a fra ction h(t I wt)G(w) tra de observa tions. This implies tha t the a vera ge costs rise over the
dura tion of the survey, dA, /dt >
0, since the sea rch ha za rd fa lls due to the lower a va ila bility
of sa mple members la ter in the survey. Only when there is no dura tion dependence in
sea rch ha za rds do the a vera ge costs not grow over time.
I 1. See, e.g. La nca ster (1990).
12. For a genera l discussion see, e.g. La nca ster (1990).
13. An importa nt a lterna tive policy not considered here is intertempora l wa ge discrimina tion, pa ying
individua ls who a re rea ched la ter less, which would a lso involve increa sed compensa tion ea rly, rela tive to the
ca se of no dura tion dependence in sea rch ha za rds.
The existence of dura tion dependency in sea rch ha za rds a nd the increa sed compensa -
tion they imply ca n be eva lua ted using direct mea sures on the costs of production. We
consider the a vera ge costs A, tha t were discussed a bove, which come from severa l surveys
conducted by the Na tiona l Opinion Resea rch Center (NORC). We obta ined a ccess to
unique da ta in terms of NORC's interna l cost a ccounting forms, which were coded a nd
served a s the ba sis for the survey production da ta in this section. Over the pa st few yea rs,
NORC ha s implemented a cost-monitoring progra mme which records the cumula tive
production costs throughout the dura tion of their surveys. Ma ny recent surveys ha ve been
monitored in such a ma nner, including some tha t a re frequently used by economists, such
a s the Na tiona l Longitudina l Survey of Youth (NLS) a nd the Genera l Socia l Survey
(GSS).
To illustra te the pa ttern in these da ta , Figure 2 depicts a vera ge costs a cross the
dura tion of the survey for cost-monitored NLS surveys.14 It plots these a vera ge costs,
a pproxima ted by the cost per observa tion, a s a function of ca lenda r time, mea sured in
weeks. The cost per observa tion in a given period of a survey wa s constructed by dividing
the flow of cumula tive costs in tha t week by the flow of observa tions obta ined in the sa me
period.
The overa ll pa ttern suggested by these figures is tha t a vera ge costs increa se dra ma t-
ica lly over time. Some of the figures even displa y increa sing a vera ge costs a t increa sing
ra tes. Furthermore, these pa tterns ma y be understa ting the increa ses in a vera ge costs due
to increa ses in sea rch costs if there is lea rning-by-doing, which would reduce the cost per
observa tion a s the number of observa tions collected increa sed.'5
Ta ble 3 documents these pa tterns more systema tica lly through regressions investiga t-
ing the effects tha t the response ra te a t a given period ha d on a vera ge costs in the subse-
quent period.
The unit of observa tion of the ta ble is a survey-week, a nd the ta ble reports the
coefficient estima tes for the fixed-effects specifica tion of the type
Alt=Xo+XIFl,(t- 1)+X2Q+Ai+ Eit, (20)
where the dependent va ria ble
Ai,
is the cost per observa tion for survey i in week t. The
independent va ria bles a re F1i(t
-
1), RespRa le in the ta ble, which is the response ra te
produced in survey i a t the sta rt of the survey week t; Q, which is a set of controls including
type, yea r, a nd size of survey; a nd A, a n unobserva ble fixed effect. Our ma in interest is
in the independent effect of the response ra te. Na tura lly, the response ra te is correla ted
with dura tion a s mea sured by the va ria ble Week, so tha t wlhen tha t va ria ble is included,
both the size a nd significa nce of the response ra te effect is lowered. However, the response
ra te effect is higlhly significa nt, rega rdless. The ta ble ma y be interpreted to displa y strong
positive effects of the produced response ra te on the subsequent cost per observa tion.'6
14. All costs discussed in this section a re reported in 1994 dolla rs. All of the surveys in the sa mple, those
listed in Ta ble I in the introduction, were field surveys except Ba cca la urea te a nld Beyond, whicih wa s donie
prima rily by telephone, fielding only tha t pa rt of the sa mple for- wlhich the phonie sea rches were unsuccessful.
The figures a re initia ted a t the seconid period of the survey beca use the initia l period of ma ny surveys colnta ined
fixed costs, suchl a s development of the questionna ire.
15. This effect should be sma ller for more experienced initerviewers. With inexperieniced initerviewers, who
a re quite common due to high turnover a monig initerviewers, lea rning-by-doing is likely to ma niifest itself in
reductions in non-pa rticipa tion over the dura tion of the survey (i.e.,
dla /dt<O)
or, a lterna tively, a reduction in
interview time resulting from increa sed fa milia rity withl the questionis of the survey.
16. Thle sa mple in these regressions wa s limited due to the few surveys tha t ha ve been cost-monitored a t
NORC, which, in turn, excluded a more ela bora te set of specifica tions witlhout loss of power to detect effects.
X r
tLo ,j;usiqo
id
<1o,) tlolj'AJMqO ja d qso)
X: c
1
0 ~~~~~~~~~~~00
uowIPAJa Sqo i;d SISOj UOI1rAJa SqO ja d sisoj
0 - 0~~~~~~~~~~0 C
o o I - o j Io o -11 --T- IO O .01 s rn
uoI'eAJa sqo
la d
sisoD 0uIOIJAlAa sqo
J.-d slso
D JLA;)q JdSJO
TABLE 3
Fixed effects r-egressions of inicr-ea sing a ver a ge costs
Model I Model 2 Model 3 Model 4 Model 5
RespRa te 0-0062 0 0062 0 0049 0 0061 0 0062
(9 811) (9 808) (4 895) (9 649) (9 741)
Sa mpSize 0 000007 0 000007 0 00004 0 00005
(0233) (0207) (1 358) (1 003)
Week 0.0044
(1 717)
GSS 1 3483 1 4663
(3 475) (2 592)
NLS 0 4066 0-521
(1-125) (0 899)
1989 -0 0621
(-0 072)
1990 0-0108
(0-017)
1991 -0-1699
(-0 257)
1992 -0-0111
(-0-013)
1993 0 2551
(0 566)
Consta nt 4 1931 4 1287 4 1516 3-325 3-176
(22-124) (12-2) (11-835) (9 868) (5 793)
Notes:
(1) Number of observa tions: 313
(2) t-sta tistics a re reported in pa rentheses.
D a ta Source: Na tiona l Opinion Resea rch Center (NORC).
Of course, the response ra te effects ma y be correla ted with other a spects of survey
production tha t ma ke a vera ge costs rise throughout the survey's dura tion. Figure 3 there-
fore investiga tes the role of effective sea rch costs by direct evidence. It illustra tes the
increa sing na ture of effective sea rch costs in the HRS by plotting the ba seline ha za rd
estima te of the proportiona l ha za rd regression from the previous section.
The underlying ha za rd estima te of the figure involved sea rches being successful a bout
25% of the time, a fter which they declined dra stica lly with the number of sea rches ma de.
The sha rp decline in the ba seline ha za rd implies high effective sea rch costs. If the unit cost
of sea rch is c =
$10, for exa mple, then a fter 35 sea rches the effective cost of a single sea rch
is c/h
=
$230. The difference in the ha za rd forms between genders wa s not significa nt, a nd
thus the unconditiona l ha za rd will be simila rly sha ped with a sha rp nega tive slope.
Together with the increa sing a vera ge costs of the surveys, the rising pa ttern in effective
sea rch costs suggests tha t there is substa ntia l upwa rd pressure on compensa tion due to
sha rply fa lling sea rch ha za rds.
3. PROD UCTION WHEN ABSENCE OF TRAD E IND UCES BIAS
The costs of production must be sepa ra ted from the vclute of the produced output, which
is a ffected by whether missing da ta (due to not completing a ll observa tiona l tra des) a ffect
the inferences ma de using the da ta produced. Consider the ra ndom vector ( Y, T, Z, U)
for the popula tion represented, a s before, by the outcomes mea sured, the sea rch dura tion,
a nd the reserva tion wa ge, but now a lso including U, representing a n unobserva ble fa ctor
tha t a ffects the outcome. In this section, we a re interested in two ma in
production bia ses
0*3
0 o
Cis
cr
0
N~~~~~~~~~~~
x,~~ ~~~ 0 2
0
0~~~~~~~~~~~~~
v o
0~~~~~~~~~~~~~~
C, o1 -
0 0
0
0
0 o o
0 15 30 45
Tota l number of ca lls
FIGURF 3
Ba seline ha za rd function estima te
which a re the result of dependence between U a nd T respective U a nd Z. Under a mea n-
squa re error objective, the ca se of no dependence considered in the previous section would
imply tha t a ll tha t ma ttered for the output wa s the produced response ra te, which, when
combined with the sa mple size, yields a given level of precision in estima tion. However,
bia s occurs whenever those individua ls for whom observa tions a re excha nged differ in
their mea sured outcomes from those who do not enga ge in the observa tiona l excha nge.'7
This section a rgues tha t such bia s, so ca lled non-response bia s, ma y be a ddressed through
wa ge discrimina tion in the compensa tion of sa mple members.8
Ta ble 4 below illustra tes in a sta nda rd regression fra mework the type of specifica tions
tha t ca n be used to provide evidence on the existence a nd ma gnitude of such production
bia ses, using different wa ges a nd sea rch da ta in the unique da ta set provided by HRS.'9
The ta ble reports the effects of sea rches a nd compensa tion in a n ea rnings regression
estima ting a nnua l ea rned income for individua ls in the HRS, who a re a ll a ged 51-61. The
survey involved two sepa ra te wa ge levels for sa mple members, a s indica ted by the dummy
va ria ble Survey Wa ge. The survey a lso recorded the number of sea rches ma de before ea ch
pa rticipa ting sa mple member wa s rea ched, a s mea sured by the va ria ble Sea rches. Both
the ma in effects of these production pa ra meters a nd their intera ctions with a ge, educa tion,
a nd self-reported hea ltlh a re reported.20 The ta ble excludes a la rge set of a lterna tive con-
trols, described in the footnote of the ta ble, tha t were included in the specifica tions.
The ta ble reports la rge a nd significa nt effects of survey sea rch a nd compensa tion on
estima ted ea rnings: the ma in effects in the first specifica tion indica te tha t a single sea rch
17. There is substa ntia l systema tic empirica l evidence tha t, wien va lida ted, differences exist between those
tra ding a nd not tra ding observa tions. See, e.g. the reviews in Ma dow et a l. (1983).
18. Selection bia s, occurring whetn those who supply la bour differ from those who do not, ha s of course
been studied widely by economists in la bour ma rkets other tha nl da ta ma rkets; see, e.g. Heckma ni (1976). One
ma jor difference exploited here is tha t the resea rcher coincides with the employer a nd hence ca nl control wa ges.
19. We discuss the pa ra meters of this re-sa mpling scheme in grea ter deta il in the next section.
20. All va ria bles in the regression were norma lized to devia tions from their mea ns so tha t the mna ini effects
in the model with intera ctions represent the effects a t a vera ge levels.
TABLE 4
Production bia ses in a ni ea rnings regression
Va ria ble Model I Model 2 Model 3
Sea rches 664-7 542-7
(6 2) (4 8)
Survey Wa ge 3317 3 1347 8
(3 4) (1 2)
Intera ctioni Terms;
Sea rclhes * Good Hea lth 491 5 584 0
(1 6) (1-9)
Sea rches * Yea rs of Educa tion 261-0 231 0
(68) (59)
Sea rclhes * Age 103 1 56.4
(3 2)
(1
7)
Survey Wa ge * Good Hea lth 6086-7 6256 1
(1 7) (1 8)
Survey Wa ge * Yea rs of Educa tion 1958 9 1853 6
(50) (4-7)
Survey Wa ge * Age 970 6 843 7
(3-3)
(2
6)
Notes:
(I) t-sta tistics a re reported in pa rentheses.
(2) Control va ria bles: Wea lth, Unea rned Income, Educa tion, Ma rita l Sta tus, Sex, Age,
Region, Ra ce, a nd Hea lth Sta tus.
D a ta Source: Hea lth a nd Retirement Survey (HRS), Wa ve 1.
ra ises ea rnings by a bout $665, a nd tha t the more highly compensa ted group ha d ea rnings
in excess of $3,317 more tha n those who were not compensa ted. This ma y stem from the
fa ct tha t la bour supply towa rds other work a nd survey work a re most likely gross substi-
tutes, in the sense tha t a s the wa ge for other work rises, the survey supply fa lls. Indeed,
the joint la bour supply decision involved in survey a nd a lterna tive work seems importa nt
to understa nd when producing observa tions on a ny la bour ma rket. The intera ctions a re
significa nt a s well. They indica te the degree to which sea rches or compensa tion a ffect the
returns to educa tion, the a ge-ea rnings profile, or the impa ct of hea lth on ea rnings. For
exa mple, the compensa ted group ha s a return to educa tion tha t is $1,850-$1,950 la rger
tha n the uncompensa ted group.
If such potentia l production bia ses a re present in a ctua l surveys of interest to econom-
ists, the question becomes whether they ca n be overcome through a lterna tive production
methods. One wa y would be through full response, f= 1, produced by full sea rch a nd
ma ximum compensa tion. Since the entire sa mple would enga ge in observa tiona l tra des
under such a policy, the compensa tion would induce unbia sed estima tion of outcome
pa ra meters. However, the cost of such a policy ma y be substa ntia l since it is obta ined by
letting sea rch go on until a ll sa mple members ha ve been found,
rI,
a nd compensa tion is
set a t w = z. D enote the resulting ma ximum compensa tion costs CM, a s given by
CM
=
RI
(Z)
+ LI
(Z)=
n
S(t I )ce-"
dt +
zJ f(t I
f)e-'t
dt1
(21)
-
=0 ~~~~~=0
For the ca se of no discounting, with which we will be exclusively concerned in this section,
this reduces to the simple
expression21
CM=n[E[TIz]c?+]. (22)
21. This follows from the fa ct tha t for a ny surviva l function, its integra l equa ls the expected surviva l time
JS(t) dt =
E[T].
For ea ch sa mple member, this is simply the expected number of sea rches priced out a t
the unit cost of sea rch, together with the wa ge pa id a fter the sa mple member ha s been
rea ched.
3. 1. Ra ndom wa ge discrimina tion
One method of reducing compensa tion-ba sed production bia s in a chea per ma nner is
through (third-degree) wa ge discrimina tion. However, instea d of being ta ilored by a mon-
opsonist in the sta nda rd ma nner, to differences in supply ela sticities, wa ge discrimina tion
should be ra ndlom in order to genera te informa tion a bout the compensa tion-ba sed bia s.
In essence, ra ndom wa ge discrimina tion serves a s a high-qua lity instrumenta l va ria ble
for non-response bia s, a va ria ble tha t is correla ted with the supply of observa tions but
uncorrela ted with other unobserved determina nts of investiga ted outcomes.
Consider the simple ca se of estima ting the unconditiona l mea n of a univa ria te outcome
va ria ble Y in a ca nonica l linea r model of the form considered empirica lly in Ta ble 4 a bove,
Y=/PO+JPRD R+P/LD L+
U, (23)
where D R is a dummy va ria ble indica ting whether the sa mple member ha s been rea ched
a nd D L is a dummy va ria ble indica ting a tra de of a n observa tion ta king pla ce. The condi-
tiona l mea n of the outcome va ria ble a s a function of the wa ge is then
E[
Ylw] =/#?/RP(D R= lIw)+/3LP(D L= I1 w)
+ E[
U
W], (24)
where P(D R = I
IIV)
= I - S(t W) is the proba bility of being rea ched given the wa ge, a nd
P(D L
=
I w) = G( w) is the pa rticipa tion ra te given the wa ge. Now consider the ca se of a
ra ndom wa ge-discrimina tion policy in which two wa ge levels, a high one a nd a low one
denoted w- a nd w respectively, a re ra ndomized out a cross the sa mple. If the wa ge discrimi-
na tion scheme is to be unbia sed for a ll popula tions /3, the scheme must na tura lly involve
full sea rch on ea ch wa ge. D ue to ra ndomiza tion of the wa ges a cross the sa mple, the
distribution of ( Y, Z, T, U) is the sa me a cross the two trea tment-groups receiving the high
a nd the low wa ge when both groups a re being fully sea rched. In pa rticula r, the unobserv-
a bles a re independent of the a ssigned wa ge, so tha t
E[
Ul
w]
=
E[ Ul
ITi]
=
E[ U]. (25)
This implies tha t the mea ns of the outcome conditiona l on the two wa ges a nd full sea rch
a re
E[ YI w]
=
/0+ f,LG(w)
+ E[ U] (26)
E[ Yl ii']
=
Po
+ fiG(1iT) +
E[ U], (27)
which implies the effect of the reserva tion wa ge on the outcome a ccording to
/3
t
Yl
w1]-Et[Yl
t]
(28)
G(q)-G(iT')
Substituting in sa mple a na logues to the popula tion va lues of the outcome mea n a nd
pa rticipa tion ra tes for the two wa ge groups is nothing more tha n the IV-estima tor for
the effect of compensa tion on the outcome. The non-response bia s tha t ra ndom wa ge
discrimina tion a llows us to estima te a nd elimina te is therefore the one introduced by
compensa tion bia s.
The key tra deoff in the cost of elimina ting bia s this wa y, rela tive to tha t of full
compensa tion, is determined by how ela stic sea rch dura tions a nd pa rticipa tion ra tes a re
with respect to the wa ge. Consider the undiscounted costs under the wa ge discrimina tion
scheme CD , a s in
CD
=
i[cE[Ti iTi]
+
G(fr)
t-]
+
n[cE[TI w4] +
G(w)w], (29)
where (ni, n) a re the sa mple sizes of the two wa ge groups. Since the ma ximum compensa tion
costs
CA,f
constitute the specia l ca se when iwV = w = z, the difference in costs ma y be written
CD - CM =
ifc(E[TI i] -E[Tf z])
+
(G(i7t)IV -z)]
+
n[c(E[TI v]
-
E[Tj z]) + (G(u)w
-
z)]. (30)
The two terms a re for the two wa ge groups. Within ea ch group, the cost differentia l
between the two wa ge policies is ma de up of the increa se in sea rch expenditures under the
discrimina ting policy rela tive to its lower wa ge outla ys. The key tra deoff, a s in the previous
section, is between sea rch a nd wa ge expenditures. The more sea rch dura tions a re lowered
by compensa tion, the more sea rch expenditures increa se when a ttempting to wa ge discrimi-
na te, rela tive to the full-cost solution. The full compensa tion survey finds sa mple members
much fa ster tha n a ny other one, the survey employing wa ge discrimina tion, in pa rticula r.
Therefore, in order for the wa ge discrimina tion to reduce costs, increa ses in sea rch costs
must be offset by reductions in wa ge expenditures. In pa rticula r, dropping the wa ge to
zero for the low-wa ge group will na tura lly lower wa ge outla ys but a lso increa se sea rch
costs.22
3.2. Intertempora l wva ge discrimina tion
Instea d of discrimina ting a cross sa mple members a t one point in time, the monopsonist
ma y discrimina te a mong them a cross time. Consider a full sea rch, two-period wa ge policy
in which the sa mple is offered a given wa ge in the first period, a fter which, in the second
period, those who do not tra de a t this initia l wa ge a re re-sa mpled a t a substa ntia lly higher
wa ge. The purpose of such a scheme would be to buy ma ny observa tions a t the low wa ge
a nd lea rn a bout those not tra ding through the second higher wa ge. Such a compensa tion
scheme is cha ra cterized by the two wa ges (wv,, w2) a nd the fra ction ir of those not initia lly
tra ding, who a re then resa mpled in the follow-up pha se.
Consider the estima tor Yof the unconditiona l mea n E[ Y] under such a n intertempora l
discrimina tion scheme, a s in
y=n1y-+(n-n)r (31)
n n
where Y1 is the mea n of the n1 sa mple members who supply a t the first wa ge, while
Y2
is
the mea n of the group of size n2 sa mple members who supply a t the second wa ge. If there
is full pa rticipa tion in the follow-up, then n2
= 7
-[n
n]. The bia s of this estima tor is
E[ YI
wI, W21 -E[
Y]
=
G(wiv )E[
YI
Z < w]
+
[ 1-G(w1 )]E[
Yj
wiV < Z
?
W2]-E[ Y]. (32)
22. Our ma in concern here is compensa tion. Na tura lly, the a na logue a rguments could be ma de through
ra ndomizing out differentia l a mounts of sea rch, possibly ha ving
four fea sible trea tment groups in which high/
low compensa tion is combined with high/low sea rch dura tions, where both
wa ges
a nd sea rch dura tions get
ra ndomized out.
This implies tha t a sufficient condition for the wa ge policy to yield a n unbia sed estima tor
E[
Y] for a ll distributions of ( Y, Z) is full pa y in either the first or second period:
it,]
= z
or W2 7. The importa nt a dva nta ge of this intertempora l discrimina tion over ra ndomii wa ge
discrimina tion is tha t the estima tor is unbia sed, rega rdless of the linea rity of the conditiona l
nea n function of the outcome. Therefore, whenever non-linea rities a re importa nt a nd
crea ting instrumenta l va ria bles through wa ge discrimina tion is therefore problema tic,
intertempora l discrimina tion ma y be more a dva nta geous.
The first type of unbia sed wa ge policy with full initia l compensa tion (v, = z) is simply
the previously-discussed ma ximum wa ge policy occurring a s a specia l ca se of intertempora l
discrimina tion. The second policy is the focus here a nd involves a bia s reduction bonus
in the second period. Indeed, the second period wa ge must be la rger to get a ny supply in
the second period. For under decrea sing wa ges, a nyone who decides to pa rticipa te does
so in the first period. However, a n increa sing wa ge schedule will crea te a n incentive to
supply la ter ra ther tha n ea rlier, so a s to ca pture a higher wa ge. This is pa rticula rly true
if, a s is common, the dura tion between the periods is too short to ma ke differentia l
discounting releva nt to intertempora l price discrimina tion. Therefore, for the intertempora l
re-sa mpling scheme to be incentive compa tible, the inclusion proba bility 7r of the second
period must be low enough to elimina te the incentive to a void supplying in the first period.
The proba bility of being included in the second round needs to be low enough to ma ke
the sa mple members a ct myopica lly whenever conta cted by the survey in the first period.
Under risk-neutra lity on the pa rt of sa mple members, this implies tha t incentive compa tible
re-sa mpling must ha ve the wa ge growth limited by the size of the inclusion proba bility a s
in Wt, >?fl1W2-
An incentive compa tibility constra int such a s this ma y be more importa nt in long-term
contra cts (pa nel surveys) tha n in spot ma rkets (cross-sectiona l surveys). Few individua ls in
a popula tion will ever be tra ding in more tha n a few cross-sections. On the other ha nd,
under the repea ted mea surements in pa nel surveys, incentive compa tibility is a n issue. This
is so beca use lea rning ma y ta ke pla ce more rea dily in pa nels tha n in cross-sections, a s the
dema nder ma y be fa ced with the sa me type of incentives every period in the pa nel. How-
ever, in a cross-sectiona l spot ma rket there is no room for lea rning-by-doing by suppliers.
In sum, the incentive compa tibility constra int ma kes the costs per period in pa nel surveys
higher, rela tive to repea ted cross-sections.
For the price discrimina tion considered in the Hea lth a nd Retirement Study, which
is a pa nel survey, the sequentia l na ture of the compensa tion offered in their re-sa mpling
scheme is shown in Figure 4 below.
This survey involved a ten-fold increa se in compensa tion. In pa rticula r, it,,
= $10 or
$30 wa s offered to single individua ls a nd couples, respectively, for their initia l pa rticipa tion.
An initia l decision to not pa rticipa te followed by subsequent pa rticipa tion, on the other
ha nd, wa s rewa rded with a pa yment of w2=$100 or $300. The follow-up re-sa mpling ra te
wa s r
=
68- 5% of those who did not pa rticipa te initia lly. The produced response ra te wa s
f, -n /n=77. 7% in the first period a nd f2=n2/[7r(n-n )]I=259'9%
in the second period.
The sa mple members were not a wa re of this design, a s the higher compensa tion for those
not pa rticipa ting initia lly wa s decided on a fter the first period. However, if such a design
were understood by sa mple members, which it presuma bly would be if it were wvidely used
in cross-sectiona l surveys or used repea tedly in a pa nel survey, it would not be incentive
compa tible. The incentive to wa it, a nd possibly collect the la rge compensa tion, would be
much too la rge under sta nda rd risk preferences of sa mple members. Under risk neutra lity,
the certa inty equiva lent of not pa rticipa ting is 7rVV2 = $68- 50 a nd $205- 50 for singles a nd
couples, respectively, a s compa red to pa rticipa ting, which is w,
=
$10 for singles a nd w,
=
n= 15,444
FIGURF 4
Incentive incompa tible a nd bia sed design of the Hea lth a nd Retirement Survey (HRS)
$30 for couples. Furthermore, the second period under-compensa ted in terms of unbia sed
estima tion, since pa rticipa tion wa s fa r from full, f2<< 1. In the HRS, on the other ha nd,
the la rge re-sa mpling expenditures were spent on a la rge fra ction ir, which for simila r
designs in the future should be reduced to increa se the follow-up pa rticipa tion ra te a bove
25-9% towa rds unity.
The expected cost of a n unbia sed a nd incentive compa tible re-sa mpling scheme, such
a s tha t of HRS, is determined by
C,-nj[cE[Tjwj ] +w, ] +[n-n,
]ir[cE[Tjz-j+,1
it, }87>Z
(33)
The first term is the sea rch expenditures a nd wa ge outla ys in the first period, simila r to
tha t discussed in ea rlier sections. The second term is the cost in the second period, ma de up
of sea rch a nd wa ge expenditures a t the ma ximum wa ge. D ue to the incentive compa tibility
constra int, the first wa ge is restricted to be a bove tha t wa ge which provides a n incentive
to dela y the supply until the second period. The difference between a ma ximum wa ge
policy in which the whole sa mple receives the ma ximtum pa y, a nd the intertempora l dis-
crimina tion cost, in which only the second period sa mple members get ma ximum pa y, is
thereby
CT-sCdi =fnn [i (E[Ta g i detr E[Tmn z])
+
(tve -Zd)]e
-
(n-n sr a r)[( w E[ Tei] +rz)]. (34)
This difference is
a ga in
determined
by
the tra deoffs in sea rch a nd
wa ge expenditures.
More
sea rch expenditures a re incurred for the low wa ge group under intertempora l discrimina -
tion tha n for the sa me group under the ma ximum wa ge: the first term in the equa tion
a bove is positive. These costs must be offset by the lower pa y for first-period sa mple
members, the second term, a s well a s the elimina tion of pa y for those who do not tra de
in either the first or second period, the third term.23
4. REGULATION BIAS IN SURVEY PROD UCTION
The previous sectionis discussed a spects of survey production which put upwa rd pressure
on wa ges a nd ma de va ria tion in their levels a cross sa mple members desira ble: regula r
ela sticity-ba sed wa ge discrimina tion, reductions in sea rch costs, compensa tion-induced
instrumenta l va ria bles, a nd incentive compa tible bia s-reduction bonuses. This ha s impor-
ta nt implica tions for public wa ge regula tions in surveys, which a re extensive in the U.S.
The importa nt a spect of such regula tions is not so much the increa sed production costs,
but their implicit sta tistica l implica tions, pa rticula rly the production bia ses introduced
through regula tion. Most wa ge regula tions of survey resea rch ta ke the form of ma ximum
wa ge policies, limiting the size of pa yments to sa mple members, a s well a s prohibiting
wa ge discrimina tion a ltogether. For exa mple, the Office of Ma na gement a nd Budget
(OMB) restricts the wa ges pa id to respondents in these wa ys for publicly fina nced surveys
in the United Sta tes. Such wa ge policies a re typica lly justified, surprisingly often by econo-
mists involved in survey production, by the a rgument tha t survey costs a re a lrea dy too
la rge to be a ble to a fford the luxury of compensa ting sa mple members.
The regula ted cost function CR(f)
is defined by ma king compensa tion homogeneous
a nd restricted below the level WR, a s in
CR(f Min{ Rf(w) + Lf(w,), 0
<-? _ WR }. (35)
Such wa ge restrictions na tura lly reduce wa ge outla ys, but with the result tha t increa sed
expenditures on sea rch a re la rger.24 In the a bsence of discounting, this excess cost is given
by
CR(f)-C( f) =nJ [S(thIWR)-S(tlW(f))]C
dt
rf (wtR)
+ J S(t I WR)c dt +?(WR-W(f )) fJ0, (36)
1=
r1 (w( f))
where w(f ) is the optima l wa ge in producing the response ra tef The first two terms a re
due to the la rger number of sea rches up to a nd beyond the unregula ted stopping time
rf(w(f
)) under optima l compensa tion a nd the la rger one
lrf
(WR) under regula ted compen-
sa tion. The second term is the reduction in wa ge outla ys under the regula tion.
To consider the sta tistica l effects of regula tion, consider the univa ria te mea n estima -
tion problem discussed in previous sections. Figure 5 below illustra tes the effect of such
wa ge-restrictions on the mea n-squa red error of the mea n estima tor, holding consta nt the
sa mple size of the survey. The figure depicts two gra phs, both of which ha ve the produced
23. Obviously, there a re a dditiona l effects on the va ria nce of the estima tor, beyond tha t on bia s, induced
by the incentive compa tible re-sa mpling proba bility being less tha n unity.
24. Note tha t when the f-percentile of the reserva tion wa ge distribution is a bove regula ted pa y,
Wf_It'
i,
the ma ximum wa ge policy even ma kes certa in response ra tes infea sible, a s ma y be the ca se with surveys of
physicia ns, who require substa ntia l pa yments to pa rticipa te in surveys due to their high opportunity costs.
Regula ted Unregula ted
Costs
Cost Cost
Function Function
C(f C(f)
R
/ I / I Response
< I ~~~~~~Ra te
0
Regula tion
Bia s
- -. -- - - - -(
(
f)2
Regula tion!
Va ria nce
Va ria nce
/
~~~~~~nf
FIGURE 5
Sta tistica l regula tion effects
response ra te on the x-a xis. The first gra ph on the positive pa rt of the y-a xis shows the
regula ted a nd unregula ted cost of functions. Costs a re rising with response ra tes but a re
higher for the unregula ted cost function. The figure indica tes a budget size C, which implies
the response ra tes a fforda ble under the two cost functions. Letf
(p)
denote the unregula ted
response ra te when the regula ted response ra te is p, a s illustra ted in the figure. Since
regula tion constra ins wa ges, costs a re a lwa ys higher a nd responses a re a lwa ys lower:
f (p) _P.
Below the x-a xis a re depicted the components of the mea n-squa red error, the squa red
bia s a nd the va ria nce, of the sa mple mea n estima tor Y of the unconditiona l mea n E[ Y],
obta ined a s before from tra ding sa mple members. In the ca se considered before, when
the conditiona l mea n function is monotonic in the response ra te, E[ YI f]
=
Yo+ yIf,
the
difference between the regula ted a nd unregula ted mea n-squa red errors is given by
C2
2
MSE- MSER= yl2[(1 _f (p))2-
I
_-,)2] + (37)
nJ(p) np
where
U2
iS the va ria nce. The first term is the difference in the bia s a nd the second term
is the difference in the va ria nce of the mea n estima tor. Both of these components a re
depicted in the figure, which tra ces out their ma gnitudes. The regula tion effect opera tes
in a cha in-rule fa shion: the more response ra tes drive the outcome (y, ) a nd the la rger the
ga p between the unregula ted a nd regula ted response ra tes
(f(p) -p), the la rger is the slope
of the bia s component a nd thus the la rger is the regula tion bia s. The second component of
the mea n-squa red error is the va ria nce, a nd the figure shows tha t the la rger the va ria nce
of the outcome in the sa mple (&2) a nd the sma ller the size of the sa mple (n), the la rger
is the regula tion component of the sta nda rd error of the estima te.
It follows tha t fa ctors tha t drive up the optima l wa ge ra ise the excess cost a nd thus
increa se the sta tistica l ha rms, a s well. For exa mple, ceteris pa ribus, sea rch costs a ffect the
excess costs positively, so tha t they a re la rger for field sea rches tha n they a re for phone
sea rches, which a re la rger tha n those for ma il sea rches. Furthermore, ma ximum wa ges
increa se survey costs more when there a re more unobserved differences in sea rch dura tions,
since such unobserva ble differences ra ise the optima l wa ge. Heterogeneity ra ises the wa ste
induced by ma ximum wa ge policies beca use it ma kes sea rch costs rise with the dura tion
of the survey, rendering a n a dditiona l benefit to wa ges tha t a re ra ised a bove the ma ximum
level a llowed. La stly, the sta tistica l regula tion effects a re more severe the la rger the survey,
which implies tha t cost differences increa se with sa mple size.
The qua ntita tive ma gnitude of these regula tion effects a re importa nt for releva nt
pa ra meter va lues. Ta ble 5 illustra tes this by displa ying the a bsolute va lue of the regula tion
bia ses ca used when WR=O for different sa mple sizes, under a given set of rea sona ble
TABLE 5
Production bia ses induced by nma xim?1unm wa ge
r
egula tions
Sa mple
Bia s
size I 100 250 500
1,000 75 0 7,500 0 18,750-0 37,500-0
2,000 65-9 6,591-0 16,477-0 32,955-0
3,000 43 9 4,394 0 10,985-0 21,970-0
4,000 33-0 3,295-0 8,237-5 16,475-0
5,000 26-4 2,636-0 6,590-0 13,180-0
6,000 22-0 2,197-0 5,492-5 10,985 0
7,000 18-8 1,883-0 4,707 5 9,415-0
8,000 16 5 1,648-0 4,120 0 8,240-0
9,000 14-7 1,465-0 3,622-5 7,325-0
10,000 13-2 1,318-0 3,295-0 6,590-0
11,000 12 0 1,198-0 2,995-0 5,990 0
12,000 11 0 1,098-0 2,745-0 5,490-0
13,000 10 1 1,014-0 2,535-0 5,070-0
14,000 9-4 941-6 2,354-0 4,708 0
15,000 8-8 878-8 2,197-0 4,394 0
16,000 8-2 823 9 2,059-8 4,119-5
17,000 7-8 775-4 1,938-5 3,877-0
18,000 7 3 732 3 1,830-8 3,611-5
19,000 6-9 693-8 1,734-5 3,469 0
20,000 6-6 659-1 1,647 8 3,295-5
21,000 6-3 627 7 1,569-3 3,138-5
22,000 6-0 599 2 1,498-0 2,996-0
23,000 5-7 573 1 1,432 8 2,865-5
24,000 5-5 549-2 1,373-0 2,746 0
25,000 5 3 527- 3 1,318-3 2,636- 5
26,000 5-1 507 0 1,267 5 2,535 0
27,000 4-9 488 2 1,220-5 2,441-0
28,000 4-7 470-8 1,177 0 2,354-0
29,000 4-5 454-5 1,136-3 2,272-5
30,000 4-4 439 4 1,098-5 2,197 0
pa ra meters for the exa mple discussed in Section 2, with the optima l wa ge described in
equa tion (11). The ha za rd ra te wa s a ssumed to be one-tenth of successful sea rches, h =
0 10, a nd the fra ction of free-suppliers wa s a ssumed to be one-qua rter of the popula tion,
G,=0
25. Using the unit cost of sea rch a s the numera ire, the cost of production wa s
20,000 sea rch cost units.
The different columns of the ta ble a re for different levels of the non-response bia ses,
= 1, 100, 250, 500, possibly representing dolla r a mounts, simila r to those in the ea rnings
regressions estima ted for HRS in Ta ble 4. The first column thlerefore represents the differ-
ence in response ra tes f (p) -p induced by the regula tion, since the bia s is unita ry. For
exa mple, the ta ble shows tha t the response ra te is reduced by a nywhere from the enormous
effect of 75(%o for a sa mple of size 1000, to 4% for a sa mple size of 30,000. The non-
response bia s with a $500 effect per percenta ge response therefore ra nges from 75 x $500 =
$37,500 for a survey of size 1000 to $2,197 for a survey of size 30,000. In sum, for
rea sona ble pa ra meter va lues, production bia ses introduced by public wa ge regula tions
ma y be substa ntia l.
5. CONCLUD ING REMARKS
This section concludes by discussing only a few of the ma ny a spects of survey production
suggested by the a na lysis. The genera l importa nce of survey production for virtua lly a ll
fields of positive economics ra ises a rich set of issues tha t ma y be usefully a ddressed by
economic, a s opposed to sta tistica l, a na lysis.
5.1. The qua lity a nd qua ntity tra deoff in survey production
The present discussion ha s a bstra cted from the tra deoff between the qua lity of observa tions
(e.g. the degree of mea surement errors supplied) a nd their qua ntity, represented by the
sa mple's size. If compensa tion is performa nce-ba sed, it will rewa rd sa mple members more
for higher qua lity observa tions. One distinction between types of missing da ta is whether
a question on a survey is not supplied or whether a sa mple member does not pa rticipa te
in the survey a t a ll, a distinction commonly referred to in the survey litera ture a s item
versus unit non-response. The differences between these types of missing da ta depend on
the type of performa nce-ba sed compensa tion tha t is used. Compensa tion per question
corresponds to a piece-ra te for sa mple members, while compensa tion ba sed on a bina ry
pa rticipa tion decision reduces the ma rgina l benefit of supplying more questions to zero.
Another dimension of performa nce-ba sed compensa tion is when sa mple members a re
given the incentive to reduce their mea surement errors through incentives tha t a re tied to
va lida tion of the a nswers given. Philipson (1995) discusses the ca se when mea surement
errors in hea lth surveys ma y be reduced by monitoring a sma ll fra ction of the sa mple
through doctor dia gnosis, restricting a high level of compensa tion to sa mple members tha t
supply errorless a nswers. If such performa nce-ba sed compensa tion is used, there ma y be
a limited tra deoff between qua ntity a nd qua lity: only people with low mea surement errors
will ha ve a n incentive to enter into the survey a nd supply observa tions in the first pla ce.
Furthermore, there ha ve been substa ntia l (qua lity-a djusted) technologica l cha nges in the
detection of erroneous observa tions, through computer a ssisted persona l, or telephone,
interviewing, for exa mple.25 The cost of monitoring sa mple members in this wa y ma y be
fa lling, ma king such schemes more a ttra ctive a s their costs fa ll through such innova tion.
25. These technologies a re referred to a s CAPI a nd CATI in survey pra ctice.
5.2. Imputa tions, fr-ee lunches, a nd the production of a ctua l versus missing da ta
One ma y distinguish between two fo-ms of missing da ta . The first a re missing da ta
tha t a re producible, mea ning da ta which a re possible, a lthough not necessa rily optima l,
to produce. For exa mple, da ta for those sa mple members not tra ding their observa tions
a re missing, but they a re a lso producible, prima rily through la rger wa ge offers. The
second a re missing da ta tha t a re non-producible, mea ning da ta which a re impossible
to produce, due to infinite production costs. Counterfa ctua l da ta a re of this type:
missing da ta tha t a re releva nt to a n eva lua tion of policies tha t ha s yet to be implemen-
ted. There a re severa l importa nt differences between these two forms of missing da ta .
First, in the ca se of producible da ta , the methods used to dea l with missing da ta ca n
be eva lua ted by a ctua lly producing the missing da ta . For exa mple, imputa tion methods
for sa mple members not tra ding observa tions ca n be eva lua ted by directly observing
their performa nce rela tive to a follow-up study tha t a ctua lly obta ins the tra des. Such
a n eva lua tion is clea rly impossible for missing but non-producible da ta , since the
counterfa ctua l da ta ca nnot be produced.26
Second, when da ta a re missing but producible, a ny decision a bout optima l da ta
production must consider the tra deoff between a ctua lly producing the missing da ta a t high
cost a nd genera ting it some other wa y, a t lower cost. Methods of producing missing da ta
typica lly involve "imputa tions", inserting da ta for sa mple members who did not tra de
observa tions. Common imputa tion methods involve imputing the best prediction of the
missing da ta (e.g. through regressions on the a ctua l da ta ) or ta king ra ndom sa mples from
the da ta , using single ra ndom sa mples, multiple sa mples, or so-ca lled hotdeck procedures
(see Ma dow et a l. (1983) a nd Little a nd Rubin (1985)). The importa nt point a bout such
procedures is tha t they trea t produced a nd imputed da ta a s petfect substitutes in the
a na lysis of the da ta . Implicitly, this is a free lunch a ssumption: a side from da ta producers,
few other producers ha ve the luxury of being a ble to go to their customers a nd sell them
nothing, a t the sa me time employing the a rgument tha t it is equiva lent to their rea l product.
Most people who use imputa tion methods would object to being sold a ca r tha t wa s never
ma de or a house tha t wa s never built! More importa ntly, if missing a nd a ctua l da ta a re
perfectly substituta ble inputs into the production of the da ta set, then this should be ta ken
into a ccount in the dema nd for inputs: if missing da ta is chea per a nd perfectly substituta ble,
surveys should ha ve very low response ra tes. More precisely, if CA(f) a nd CM(f ) denote
cost-functions for the production of a ctua l a nd imputed (but initia lly missing) da ta , respec-
tively, then the tota l cost of the survey is
n[fC4S(f ) +(1 -f )CM(1 -f )], a nd this implies
a low dema nd for f when CM is substa ntia lly below CA.
Of course, few survey producers would a gree with the a rgument tha t lower response
ra tes a re desira ble ba sed on this a rgument, but the point is tha t their beha viour spea ks
louder tha n their words: sta nda rd imputa tion beha viour implicitly revea ls such ta stes.
Genera lly, the better a missing da ta procedure is a rgued to be, the less a ctua l da ta should
be produced. If one is serious a bout the va lidity of the a ssumptions tha t ma ke sta nda rd
imputa tion methods useful, tha t one ca n produce something out of nothing, then this
should be reflected not only in the consumption of the da ta , but a lso in its production, in
terms of low response ra tes. If one is not serious a bout such a ssumptions, then consump-
tion pra ctices tha t a re ba sed on them a re obviously not a dvisa ble.
26. This inevita bly lea ds to common, but misguided, deba tes a mong empirica l economists on who ha s the
best missing da ta (e.g. whether a n instrumenita l va ria ble is good or ba d). The problem is tha t "best" is not
defined in rela tion to a ny a ctua l da ta since such da ta a re, by definition, non-producible.
Ack-noit-ledgements. I would like to tha nk two a nonymous referees a nd the editor of the Reviewt for very
useful poinlts tha t improved the pa per. I a lso tha a nk John Ca wley, Rica rdo Cossa , Stua rt Ha gen, a nd Tom
La wless for resea rch a ssista nce a nd Avner Alhituv, Ga ry Becker, Micha el Boozer, Pierre-Yves Geoffa rd, Ja mes
Heckma n, Ca sey Mulliga n, a nd D erek Nea l for comments. I a m gra teful to semina r pa rticipa nts a t the University
of Chica go, Ha rva rd University, a nd Ya le University, a s well a s The Na tiona l Opinion Resea rch Center (NORC),
especia lly Phil D ePoy, for their grea t hiospita lity includinig provision of interna l da ta . Fina ncia l support from
the Resea rch Fellows Progra m of the Alfred P. Sloa n Founda tion is gra tefully a cknowledged.
REFERENCES
BRAD BURN, N. a nd SUD MAN, S. (1988) Polls a nid Surveys (Sa n Fra ncisco, CA.: Jossey-Ba ss Publishers).
BERGER, J. (1988) Sta tistica l D ecision Theory a nd Ba yesia n Ana lysis (New York: Springer-Verla g).
COCHRANE, W. (1979) Survei Sa mpling (New York: Wiley & Sons).
GREEN, J. a nd LAFFONT, J-J. (1979) Incenitives in Public D ecision Ma king, Volume I (Amsterda m: North-
Holla nd).
HANSEN, M., HURWITZ, W. a nd MAD OW, W. (1953) Sa mple Suirvey Methods a nd Theory (New York:
Wiley a nid Sons).
HECKMAN, J. (1976), "The Common Structure of Sta tistica l Models of Trunca tion, Sa mple Selection, a nd
Limited D ependent Va ria bles a nd a Simple Estima tor for such Models", Anna ls of Economic a nd Socia l
Mea sur ementt, 5, 475-492.
GRILICHES, Z. a nd INTRILIGATOR, M. (Eds.) (1986) Ha ndbook of Econometrics (New York a nd Heidel-
berg: North-Holla nd).
GROVES, R. (1989)
Survejy
Errors a nd Suirvey Costs (New York: Wiley & Sons).
KISH, L. (1965) Survei' Sa nmpling (New York: John Wiley & Sons).
LANCASTER, T. (1990) The Econometr ic Ana lysis of Tra nsition D a ta (Ca mbridge a nd New York: Ca mbridge
University Press).
LAFFONT, J.-J. a nd MASKIN, E. (1982), "The Theory of Incentives: An Overview", Cha pter 2 in W. Hildeb-
ra nd (ed.), Adva n2ces in Economic Theory (Ca mbridge: Ca mbridge University Press).
LESSLER, J. a nd KALSBECK, W. (1992) Non-Sa mpling Errl or s in Surveys (New York: Wiley & Sons).
LITTLE, R. a nid RUBIN, D . (1985) Sta tistica l Ana ly.sis with Missing D a ta (New York: Wiley & Sons).
MAD OW, W., OLKIN, 1. a nd RUBIN, D . (Eds.) (1983) Incomiiplete D a ta in Sa nple Survey,s, Vols. I-IIl (New
York: Aca demic Press).
MANSKI, C. (1995) Identifica tion Problems in the Socia l Sciences (Ca mbridge a nd London: Ha rva rd University
Press).
PHILIPSON, T. (1994), "Tlhe Production of Hea lth Surveys: A Principa l Investiga tor-Agent Approa ch to
Mea surement Error Reduction" (mimeo, D epa rtment of Economics, University of Chica go).
STEFFEY, D . a nd BRAD BURN, N. (1994) Counting People in the Informta tion Age (Wa shington, D .C.:
Na tiona l Aca demy Press).
STIGLER, S. (1987) The History of Sta tistics (Ca mbridge a nd London: Ha rva rd University Press).
SAVAGE, L. (1977) The Fouinda tionts of Sta tistics (New York: D over).
SUD MAN, S. (1967) Reducing Costs in Surveys (Chica go: Na tiona l Opinion Resea rch Center).

Data Markets and The Production of Surveys

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Data Markets and The Production of Surveys

Enviado por

Direitos autorais:

Formatos disponíveis

The Review of Economic Studies, Ltd.

Data Markets and the Production of Surveys

Você também pode gostar