Author(s): Tomas Philipson Source: The Review of Economic Studies, Vol. 64, No. 1 (Jan., 1997), pp. 47-72 Published by: Oxford University Press Stable URL: http://www.jstor.org/stable/2971740 . Accessed: 23/10/2014 15:33 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. . Oxford University Press and The Review of Economic Studies, Ltd. are collaborating with JSTOR to digitize, preserve and extend access to The Review of Economic Studies. http://www.jstor.org This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions Review of Economic Studies (1997) 64. 47 72 0034-6527/97/'00030047$02.00 (c 1997 The Review of Economic Studies Limited D a ta Ma rkets a nd the Production of Surveys TOMAS PHILIPSON Unliver.sitl' of Chlica go Fir.st V1Versi receive(d Ji,ie 1995; fina l version a ccept(l August 1996 (Eds.) The production of da ta , a nd the functioninig of the ma rket for observa tionis, a re universa l conccrins to a ll ficids ol positive econiomics. Econiomists, hiowever, hia ve typica lly pla ced grea ter cmplha .sis oni systema tica lly a nia lyzinig the conisumptioni of da ta tlia n oin considerinig its production. In thie productioni of da ta througih surveys, a n importa init input ma irket is tha t of la bour, in wlhiclh t dema iider tra ides observa tions withi the suipplyinig sa mple members. This pa per a na lyses optima l moniopsoniy compensa tion in such da ta l ma rkets, the importa nit rela tionshiip it bea rs to estima tion uISilng the da ta i tha t a ire obta ined. a nd the sta itistica l effects of implicit public wa ge regula tions tha t a re priesenit in U.S. ma rkets for observa tions. 1. INTROD UCTION Sta tistics ma y be interpreted a s a norma tive theory of guessing the fea tures of a popula tion wlhile ha vinig a ccess to informa tion on only pa rt of it, a nd the field obviously hia s a long history.t Absent from this history, however, but highly present in the a ctua l production of sta tistics on hiuma n popula tions, is the influence tha t ma rket forces ha ve on the economic exclha nges ma de between suppliers a nd denia nders of observa tions. These ma rket forces a re importa nt beca use sta tistics exists due to economic constra ints. For if there were no economic constra ints, then sta tistics, in the sense defined a bove, would not be necessa ry: da ta for the wlhole popula tion would a lwa ys be produced. The very essence of sta tistics is therefore economic, but this pa per a rgues tha t economics ha s not been fully utilized in sta tistics. This is pa rticula rly a ppa rent in the fa ilure to a ddress the question of the optima l production of da ta , a s opposed to their consumption. Although economists a pprecia te well-produced da ta unlike ma ny other fields of empirica l inquiry, they seldom produce their own da ta . Tha t is, economists a re more often consumers tha n producers of da ta . This is ma nifest in the focus of econometrics on the consumption a spects of da ta , with virtua lly no a ttention given to their production.) In pa rticula r, despite the fa ct tha t sa inple survevs provide the cornerstone for empirica l resea rchi in a ll fields of positive economics, little a ttention ha s been pa id by economists to their production. This pa per a rgues tha t this focus a mong economists is misguided beca use production bia ses in surveys ma y be la rger tha n those introduced by the a na lysis of the resulting da ta . Therefore, a systema tic understa nding of the ma rket incentives involved in the sta tistica l excha nges underlying the production of da ta is importa nt in order to lower the ma gnitudes of these production bia ses. The ma in a rgument of the pa per is tha t a better understa nding of da ta production ma y be ga ined by recognizing the la bour economic 1. For a ni hiistorica l a ccount see. e.g. Stigler (1987). 2. For exa mple, T/e HIa idbook of Econometrics (1983) dea ls exclusively withi consumption issues in da ta a na lysis. 47 This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 48 REVIEW OF ECONOMIC STUD IES a spects inherent in the ma rket for observa tions. Consequently, better inferences ma y be ma de using the da ta tha t a re produced. More specifica lly, we will be concerned with how the compensa tion offered to suppliers ma y a ffect both the qua ntity a nd qua lity of la bour dema nded. Qua ntity, a s used here, refers to the sa mple size. Qua lity ma y be determined by how representa tive the supply is of the la rger popula tion being investiga ted, a ma jor concern here, but it ma y a lso refer to the degree of error in the supplied observa tions, a nd the fa ilure to supply a nswers to a ll questions.3 The ma rket for huma n observa tions, which is the most importa nt fa ctor ma rket in the production of da ta sets. Sa mple members in surveys provide the supply side in this la bour ma rket a nd those producing the survey represent the dema nd side, with the excha nge between the two ma king up a na lyza ble da ta a nd the a bsence of such a n excha nge yielding missing da ta in the survey. This la bour ma rket for observa tions ha s severa l distinc- tive fea tures. The first is tha t the dema nder of observa tions often enjoys monopsony power a nd therefore wa ge discrimina tion is importa nt. We will therefore pa y pa rticula r a ttention to optima l compensa tion schemes for sa mple members. The second is tha t a symmetric informa tion is a n essentia l component of survey production: if the dema nder ha d complete informa tion a bout the supply side, the survey itself would be unnecessa ry. Thus, this informa tion a symmetry limits the a bility of the dema nder to enga ge in (third-degree) price discrimina tion, a s cha ra cteristics of the sa mple members a re unknown a t the time of excha nge. The third distinctive fea ture is tha t the va lue of the output is, ceteris pa ribus, increa sing in how representa tive the la bour dema nded is of the la rger popula tion the survey a ims to lea rn a bout. The importa nt implica tion of this is tha t there is low substitut- a bility between sa mple a nd non-sa mple members in the dema nd for la bour. This low substituta bility implies, in turn, tha t substa ntia l resources a re devoted to sea rching for sa mple members, despite the rea dy a va ila bility of non-sa mple members a t lower wa ges. The dema nd is for a ra ndomly selected sa mple, not the chea pest sa mple tha t could supply observa tions. These sea rch costs a re substa ntia l in survey production a nd a re importa nt to keep low in order to reta in enough resources to genera te the la rgest number of observa - tiona l tra des with sa mple members. Such tra des, in turn, serve to limit the high sta nda rd errors a nd bia s stemming from missing da ta . To illustra te our concerns, consider Ta ble I which depicts the production costs tha t resulted from this la bour ma rket for a set of surveys conducted by the Na tiona l Opinion Resea rch Center (NORC) a t the University of Chica go.4 The columns of the ta ble show, from left to right, the tota l costs of production; the fra ction of the sa mple rea ched a nd interviewed (la belled response ra te); the tota l size of the sa mple tha t wa s a ttempted to be interviewed; the a vera ge cost of production, a s mea sured by the cost per interviewed sa mple member; the number of weeks it took to produce the survey; a nd two mea sures of the growth in the cost per observa tion througlhout the survey: growth from sta rt to finish a nd a vera ge growth per week. One question of pa rticula r importa nce here is whether the compensa tion policy of the monopsonist, NORC in this ca se, ca n increa se the response ra tes in coltumn 2 without ra ising the tota l costs in column 1. The lowering of production costs is centra l to sta tistica l a na lysis beca use it a llows for a more efficient use of resources within a given survey budget, resources tha t ca n therefore be used to reduce bia s a nd sta nda rd errors a rising from the la ck of observa - tiona l tra des (i.e. missing da ta ). The issue of freeing up the budget for compensa tion is 3. These three qua lity a ttributes a re referred to a s uinit non-responise. mea suremenit error, a nd item noni- response, respectively. 4. NORC is loca ted on the ca mpus of the University of Chlica go, with outsta tions in New York a nd Wa shiington, D .C., a nd is the oldest survey resea rch fa cility in the U.S. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 49 r-CD : V-) _~ oc 30 't 7 o <,- .0 to ._V'zXl( < bt L ) 6b _ < tr O ocd * CO . . . . . . . . . . . . . . . . . . <C - 0: C,^ t W) c, r t \p c, o r- m 00 O "t ,) " r- _ _ " oc tf \c > o jt -------o- - - ICO rl 00 _ 0 c-l m tr l l 00 _ r _ > CN \_ _- _r C/)_tl > 6 or ---00_ -C_ l?Cl om O < F- b Cl~~~~~~~( .N Q~~~~~~~~~~~~~~~~~~~~O _ A 00 C 0 r 3 00 OG OC - O) 00 00 Cr Cr 00 OC 00 C G) \ C\ E r 00 ON C O 00 00 Lf O C Cl L - ON Cl) 4 -; ClC4C>'Cd- \ \ l t ' 0 - d- - CO -c) -0 o 0) o>>___ F? ? b 013 0e _ r- C- , :t C-, r- tr oc <,o _< < r1v|< -l _ m r- r, - d 00 \. ? t-) m 0? oj Cr\ LL =Qmoa o o G ) > > > > \ 01 G N~~~~~~~~~~~~~~~~~~~ 00 C O 0000C000 N ON 000 0 CZ .C C- t) t) C, 0 )0Lf tr)ftr \X- - r- L(- COC O C'-) -N -- 0, r- r-I M "C W O - z ,tt m C<, o tm c r-- Nt c j - O O t <, O <. t o t t m o i 0 Ct- r- C, O ) \D Q C, ) F~~~~~~~~~~0 C) Cl r- t-tJ W.)toP C o m Os ~~~- 08 )0 lT- e rt l1-1 (CO r - -ON t1\oct, C, C, ,Z 3 Cl 0 ~~~~~~~~~~~~~- ~ ~ ~ ~ ~ ~ ~ ~~~~CN ON 0~. O 0~~ LI~~ONONONONONON00ONC-.C-.C-.~~~~~~~~C,, CrI. CZ00 ~ 0)- H) Z Z u X C, C, *C Cr tC C C, c C 0 Ho . ~ CO-C O1)O 'O c- C\ c- 0 _ _._ > -- - - - J sC ^ G C _ - - C O C'\ X _ - _'_ _ _ , C', C' o E ' C C = r V- V -- C t' _ _ _ _ _ _ _ _ --. -. G) e) C CO? -,, Er C7 O C/)~~~~~~~~~~~~~~~~~~7 , - >1 > > >1 > > >.---- U w %sc;_ ._.__ONONO-ON . . .c) C) X ON ZZZZZZZVVVV Z OO . d_/)C/ - LOCul Au 0) 0 C) ~~~~~J2 ~ ~ F ; ON ONONO VI' VI' ul) ul) H C 0 0 0 0 z~~~~~~. z COCOCOCOCOCO0)0)0)o~~~~~~~~~~ -~ 00 ZZZZZZZZZ00c~~0z Z ~ < ~ > This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 50 REVIEW OF ECONOMIC STUD IES therefore funda mnetnta l: missing da ta presuma bly occurs beca use a n observa tiona l excha nge is not m11utula lly beneficia l. The ta ble a lso illustra tes the dra ma tic rise in the cost per observa tion throughout surveys: they rise a bout 5 5'%o per week (column 7), a nd double on a vela ge fi-om sta rt to finish (column 6). We will a rgue tha t these increa sing unit costs a re due to the rela tively more expensive sea rches occurring a t the end of the survey, a nd tha t this ra ises the va ltue of compensa tion by reducing spending on unproductive sea rches for sa mple members tha t refuse to pa rticipa te in the survey. This va lue of increa sed compensa tion, we will then a rgue, is importa nt for the increa se in bia s a nd sta nda rd errors induced by wa ge regula tions tha t ha ve been imposed upon survey production in the U.S. Section 2 of the pa per sets forth the ba sic model a nd discusses compensa tion of sa mple members under costly sea rch, when the outcomes mea sured by the survey a re not correla ted with who ma kes observa tiona l tra des. The centra l tra deoff tha t determines optima l compensa tion is tha t increa sed compensa tion reduces sea rch costs but ra ises wa ge outla ys. The section shows how to opera tiona lize optima l wa ge discrimina tion a fter estim- a ting ha za rd functions describing the proba bility of exit out of not being found, given a set of consecutive sea rches. We discuss the impa ct of increa sing sea rch costs throughout the survey, a s depicted in Ta ble I a bove, a nd show how they put upwa rd pressure on optima l compensa tion. Section 3 discusses survey production when those tra ding observa tions ma y be differ- ent in their outcomes tha n those who do not tra de, which is commonly known a s non- response bia s. We demonstra te how to ea sily incorpora te estima tion of production bia ses into sta nda rd regressions using da ta from the Hea lth a nd Retirement Study (HRS). We a rgue tha t instea d of ela sticity-ba sed wa ge discrimina tion, a ra ndomly a ssigned wa ge ma y be desira ble beca use it serves a s a n instrumenta l va ria ble to estima te bia s induced by the a bsence of observa tiona l tra des. We compa re such ra ndom wa ge discrimina tion with intertempora l wa ge discrimina tion, in which re-sa mpling ta kes pla ce a nd a la rge premium is pa id to those who do not supply observa tions prior to the re-sa mpling. The effectiveness of such a premium is limited by the incentive it crea tes for initia l non-pa rticipa tion. We therefore a rgue tha t the re-sa mpling proba bility must reduce the va lue of this incentive to dela y response, in order for unbia sed re-sa mpling to be incentive compa tible. Section 4 discusses the implica tions of the public regula tion of surveys, stressing the effects of the implicit ma ximum wa ge policies of the Office of Ma na gement a nd Budget (OMB) for publicly fina nced surveys in the U.S. The importa nt point a bout such regula - tions is not so much tha t they ra ise production costs but tha t they introduce production bia ses in surveys, bia ses which we show ma y domina te tra ditiona l bia ses introduced by the wa y the da ta a re a na lyzed. In pa rticula r, the tra de reductions induced by ma ximum wa ges lea d to regula tion-induced increa ses in both the bia s a nd sta nda rd errors of estima tors. Fina lly, Section 5 concludes by discussing severa l topics of importa nce tha t ha ve been omitted from, but suggested by, the a na lysis. The importa nce of da ta production to virtu- a lly a ll fields of positive economics, a s well a s the other socia l sciences, ra ises a rich a nd exciting set of issues tha t ma y be usefully a ddressed by economic a na lysis of da ta ma rkets. It is well known tha t there exists a n extensive litera ture on survey design outside of economics.5 Although there is a va st theoretica l litera ture on optima l sta tistica l decision 5. The litera ture is too extensive to review hlere in a ny mea ningful ma nner. Sta tistica l cla ssics include the books by Ha nsen, Hurwitz, a nd Ma dow (1953), Coclhra ne (1979), a nd Kish (1986). Representa tive trea tments on the design of survey questionna ires a nd interviewer pra ctices include, for exa mple, Groves (1989), Bra dburn a nd Sudma n (1988), Beimer et a l. (1992), Lessler a nd Ka lsbeek (1992). This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 51 ma king from the sta ndpoint of a single person,6 little focus ha s been pla ced on how sta tistica l mna rkets, a nd the a spects of excha nge within them, a ffect the production of sta tistics. Economists ha ve a compa ra tive a dva nta ge in understa nding these ma rkets for observa tions, pa rticula rly in rela tion to the la rge a mount of litera ture on incentives in mecha nism design (see, e.g. Green a nd La ffont (1979), La ffont a nd Ma skin (1982)). Beca use a survey is a mecha nism, in the economic sense of the word, tha t tra nsfers informa - tion between the suppliers a nd dema nders of observa tions, survey design represents a grea t a rea for the pra ctica l a pplica tion of economic work on mecha nism design. 2. THE PROD UCTION OF D ATA The purpose of a survey is to lea rn a bout the outcomes Y of a la rger popula tion, from which the survey produces da ta on a sma ller, ra ndomly-selected sa mple.7 D ue to the non- substituta bility of sa mple a nd non-sa mple members induced by the va lue of ra ndomly selecting the sa mple, the sea rch for sa mple members is extensive, despite the rea dy a va il- a bility of non-sa mple members a t lower wa ges. Thus, the problem fa cing the producer of the da ta set is to sea rch for a nd tra de observa tions with sa mple members in the most cost- efficient wa y. For a sa mple of size n, consider a production process in which the dema nder sea rches for a ll unrea ched sa mple members until the fra ction of the sa mple interviewed rea ches a given level f, referred to throughout the pa per a s the relsponse ra te. If a sa mple member is rea ched, he is offered the opportunity to pa rticipa te in the survey a t the wa ge w. Let the reserva tion wa ges for pa rticipa tion on the supply side be denoted z, a nd denote by G(z) its cumula tive distribution function in the popula tion, with support [0, 2] a nd corre- sponding density g(z). The va lue G(v) is therefore the pa rticipa tion ra te a t wa ge St', the ra te a t which sa mple members who ha ve been rea ched enter into the survey. Sea rch continues for unrea ched sa mple members, a t a unit cost denoted c, until the desired response ra te ha s been rea ched. D enote by S(t1w) a decrea sing surviva l function rep- resenting the fra ction of the sa mple tha t ha s not been rea ched a fter a dura tion of sea rch t, with corresponding ha za rd function h(t I wv) representing the propensity to be rea ched a t t when not a lrea dy rea ched. We will refer to the sea rch process a s being compensa tion dependent (independent) when the a bility to rea ch sa mple members depends (does not depend) positively on the wa ge w. D epending on the sea rch technology, the fa ct tha t compensa tion is offered ma y or ma y not a ffect the ea se with which sa mple members ca n be found. The popula tion ma y thus be described by the joint distribution of the ra ndom vector (Y, T, Z) representing the outcomes mea sured (Y), the sea rch dura tion (T), a vld the reserva tion wa ge (Z). This section considers the specia l ca se of da ta production when the elements of ( Y, T, Z) a re a ll independently distributed, while subsequent sections focus on the dependence between them. Let F1 (t I 11') a nd Fo(t I ,) denote the fra ction of the sa mple members tha t ha ve been rea ched a nd pa rticipa te a nd those who do not pa rticipa te, respectively. Given the wa ge w, the sa mple members' a ccepta nce stra tegy is simply to a ccept if they get a n offer to supply observa tions a bove their reserva tion wa ge a nd to 6. See, e.g. Sa va ge (1977) a nd Berger (1988). 7. Survey production enta ils ma king explicit a finite listing of the la rger popula tion, the survey fra me, from which the sma ller sa mple ca n be dra wn. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 52 REVIEW OF ECONOMIC STUD IES reject otherwise, which yields F1(t u)=[l-S(tivw)]G(wv) a nd F0(tJw)=[l-S(tJiv)][l-G(iv)]. (1) This sa ys tha t the pa rticipa nts (non-pa rticipa nts) a re those who ha ve been rea ched who ha d reserva tion levels below (a bove) the wa ge. Since the sta tes of not being rea ched, pa rticipa tion, a nd non-pa rticipa tion a re mutua lly exclusive, we ha ve S(t I w) + F, (t I wv) + F0(,(t It) = I for a ll dura tions t?0. The ra te a t which non-pa rticipa tion occurs rela tive to pa rticipa tion is denoted by a (w) a nd defined by a (l')-=Fo(tIv)/F(tIw)=I - G(w) (2) G(w) whichi is independent of the dura tion of sea rch a nd fa lls with the wa ge since more pa rtici- pa nts a re a ttra cted: da /dt=0 a nd da /dw<O. Let rf(w) be defined a s the dura tion of sea rch for the entire survey when the given response ra te f ha s been produced. It is implicitly defined through the rela tionship8 F (rf (iv)l 16v) =f=>S( rf (w)l I v) = -f [1+ a (i})] (3) The implicit function theorem implies tha t the effect of the wa ge on this dura tion of the survey is nega tive beca use sa mple members pa rticipa te more: da dS drf dw dw -- ?~~~~<0. (4) dw dS dt The present va lue of the recruitment costs Rf (w) to produce the response ra te of the survey is ma de up of sea rch costs, a s in Rf (w) _ n S(t I wv)ce -"dt, (S) .=o where r is the continuous time discount ra te. This is the sea rches ma de on members who survived a ll previous sea rches, priced out a t the present va lue of the unit cost of sea rch. The present va lue of the wa ge expenditures Lf(w) equa ls Lf (wv)- J fi (t I w) we 't dt, (6) .=0 where f(t' Iw) is the deriva tive of F1 (t Iv) a nd represents the fra ction who get rea ched a nd pa rticipa te a t t. This is the sa mple members rea ched a t ea ch dura tion who a gree to pa rticipa te, priced out a t the present va lue of the wa ge pa id. The tota l cost of producing a given response ra te, C(f), is thereby given by both the recruitment a nd wa ge expendit- ures, a s in C(f) Min{Rf(w) + Lf(w) I wO0} . (7) The optima l wa ge for a given response ra te is denoted v(f ) a nd is determined by ba la ncing increa sed wa ge costs with reduced sea rch costs, so tha t the necessa ry first-order condition 8. Since f< G(w) _ I for a ny fea sible wa ge, the right-ha nd side of the la st equa lity is bounded a bove by unity a nd below by zero. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 53 for a n interior solution. dRf/1dw?+dLf/1dw=-0, ma y be sta ted a s frt (IS d'rf IrT'r df 1 dvf IT i ce dt+_ [Sce- rf]+ J ?fi e- dt+-f [fwere ]=O (8) j ___ d d(1w J= dwiv dw eva lua ted a t the optima l survey wa ge a nd survey dura tion, wv(f) a nd rf(w(f)). The first term is the ma rgina l wa ge effect on the present va lue of sea rch costs. It consists of the reduction in those sea rched throughout the dura tion of the survey, a s well a s the ma rgina l effect of shortening this dura tion. The second term is the ma rgina l wa ge effect on the present va lue of wa ge outla ys, which a re increa sed throughout the survey by more sa mple members pa rticipa ting but lowered by the reduction in the dura tion of the survey. Thus, the essentia l tra deoff in ra ising the wa ge is tha t it decrea ses sea rch costs but increa ses wa ge outla ys, so tha t a t a n optima l wa ge the two ba la nce ea ch other out. To ma ke this tra deoff most tra nspa rent, consider the ca se when discounting is negli- gible, r = 0, a nd the sea rch ha za rd is time- a nd compensa tion-independent, h(t) = h, so tha t the surviva l function of unrea ched sa mple members is S(t) =e"'. In this ca se, the recruit- ment costs a nd wa ge outla ys reduce to Rf(wi)=n (c a nd Lf(w)=nfw. (9) h However, using tha t FI(rf Iw) =[ - S(rfIw)]G(w) =f implies 1 - e -thf/G(w), the tota l costs become Rf(w) + Lf(w) =nf c+wj. (10) This sa ys tha t for ea ch of the nf observa tions produced, the tota l cost is the tota l cost of sea rch, together with the wa ge pa id. The tota l cost of sea rch is the a vera ge dura tion of sea rch until the sa mple member is found, l/h, times the a vera ge length of sea rch until a found sa mple member is willing to pa rticipa te, I/G(w), both priced out a t the unit cost of sea rch c. As wa ges rise, the first term representing sea rch costs fa lls, while the second term representing wa ge outla ys rises. An illustra tive ca se is when there is a non-zero ma ss of individua ls tha t supply a t no wa ge, Go -G(O) > 0, with the rest being uniformly distributed a t the rema inderof its support a s in G(z) = G0 + [1 - G0 ]z/z. It ca n be shown tha t in this ca se the necessa ry condition for a n interior solution implies a closed form solution for the optima l wa ge iv(f), a s in c z - G, w(f) = hl-G 1 - - G( The wa ge is independent of the response ra te produced beca use of the time-independence in the ha za rd ra te of sea rch. D ue to the cost of wa sting sea rch on non-pa rticipa nts, the wa ge is increa sing in the effective cost of rea ching a sa mple member, c/h, defined a s the unit cost of sea rch times the expected sea rch time until a sa mple member is rea ched. The corner solution of no pa y occurs whenever the optima l wa ge is less tha n zero, which turns out to be true when c/lh < [zGJ]/a (O), which holds when effective sea rch costs a re rela tively low or free supply is rela tively la rge. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 54 REVIEW OF ECONOMIC STUD IES 2. 1. Opera tiona lizing wa ge discrimizina tion For some surveys, differences a mong sa mple members a re observa ble to the dema nder before the sea rch for the sa mple sta rts.9 For exa mple, for household surveys, a cha ra cter- istic tha t is frequently a va ila ble is gender, beca use it ca n be inferred from na me lists of households. In these ca ses, wa ge discrimina tion will na tura lly reduce la bour costs, a nd the question becomes how to opera tiona lize estima tes of the reserva tion wa ge a nd sea rch distributions of sub-popula tions into rela tive wa ges. Consider the ca se when a response ra te f is to be produced for different groups cha ra cterized by the cova ria te x. For a ny two groups denoted by x =0 a nd x = 1, let (w0, wv) denote the optima l wa ges. The necessa ry first-order condition for a n interior optimum under the objective function in equa tion (10) implies tha t their rela tive levels ca n be shown to sa tisfy F 1~~~~2 h(rzf(w1)Ix=l) g(w,lx=l) G(wolx= 0) (12) h(rf(w,o)Jx=0) g(wolx=0) LG(wlIx= 1) In other words, under optima l compensa tion the rela tive ha za rds equa l the rela tive wa ge effects on the dura tion of sea rch, d(1/G)/dwv, to find a rea ched sa mple member tha t pa rticipa tes. Let the sea rch ha za rd functions a cross groups be given by the ea sily estima ted proportiona l ha za rd model h(tl x) = q(x)hB(t), where hB(t) is the ba seline ha za rd function which is sca led proportiona lly by the loa dings q(x) for ea ch group. A very intuitive ca se turns out to be when the reserva tion wa ge distribution G is uniform, in which ca se estima tes of sea rch ha za rd functions a nd pa rticipa tion ra tes ca n be directly tra nsla ted into rela tive wa ges, a s in WO (x= 1) G(wlx= 1) w (x=0)G(Ix=01) (13) IvI q'(x = O) Golvl x = O) Here, G(wlx) is the estima ted pa rticipa tion ra te for x, conditiona l on being rea ched a t a ny common wa ge v in a pa st survey, a nd q(x) is the estima ted loa ding in the proportiona l ha za rd model. In other words, the rela tive wa ge ha s the intuitive interpreta tion of being determined by rela tive pa rticipa tion a nd ha za rd ra tes. Ceteris pa ribus, the la rger a re the estima ted loa dings in a proportiona l ha za rd regression or the la rger a re the estima ted pa rticipa tion ra tes, the lower is the wa ge. The optima l degree of such wa ge discrimina tion a cross genders ma y be estima ted using unique da ta from the 1993 Hea lth a nd Retirement Study (HRS), which a llows for estima tes of the sea rch surviva l functions S(tlx). The tota l sa mple size in HRS wa s n= 14,370 of which I -f= 12 2'% did not excha nge observa tions. The HRS collected da ta on the tota l number of sea rches for ea ch sa mple member. The a vera ge number of sea rches for the overa ll sa mple wa s E[T]=3 9 per sa mple member, including pa rticipa nts, non- pa rticipa nts, a nd unrea ched sa mple members. Since some sea rches were censored for unre- a ched sa mple members, this underestima tes the mea n sea rch dura tion. The va ria nce wa s V[T] = 16 7, a nd the ra nge of the support of the distribution of T wa s [1, 74], where the right ta il of the distribution involved rema rka bly high numbers of sea rches but wa s ra ther thin. 9. This ma y occuIr wheni the listing of the popula tioni (the survey fra me) conta inis mea sures used to gener-a te a stra tified sa mple or under so-ca lled cluster sa mpling, when sa mpling ta kes pla ce a cross clusters, such a s census tra cts or blocks. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 55 Ma les Fema les Group 0 Group I 100 _ 0; 00 0~ <r 025- E CL 0-25 - l_ z 0.00 - 0 10 20 30 Tota l number of ca lls FIGURF 1 Gender effects on non-response surviva l curves When households listings from which sa mples a re dra wn conta in na mes, gender ca n be inferred from the first na me for a ll but a sma ll fra ction of the sa mple with dua l-gender na mes (e.g. Kelly or Fra ncis). To demonstra te the differences in ma le a nd fema le sea rch dura tions, Figure 1 a bove depicts the Ka pla n-Meier estima tes of the ma le a nd fema le surviva l curves S(tlx=0) a nd S(tlx= 1) for the HRS. The empirica l ma le surviva l function first-order domina tes the fema le one. The Ma ntel-Ha enszel test sta tistic for testing the equa lity of the two surviva l functions wa s 9 71, with a p-va lue of = 0 000, suggesting tha t tlhe two surviva ls differed significa ntly a t a ny sta nda rd level of significa nce. Furthermore, the a vera ge number of sea rches for fema les wa s E[7fx = 1 ] = 3 6, which wa s significa ntly lower from the mea n for ma les, E[x = 0] = 4 3. Fina lly, a fra ction P(T> rflx=0) = 12 9% of ma les were censored, with the corre- sponding fra ction P(T> rf Ix = 1) = 11 7I o% for fema les a lso suggesting tha t sea rch dura tions were la rger for ma les, beca use censored observa tions involved longer dura tions in the HRS. These descriptive sta tistics ma y be summa rized more succinctly by the ma gnitude of the coefficient for a fema le dummy in a proportiona l ha za rd specifica tion, a s reported in Ta ble 2 below. The ta ble reports tha t being fema le ra ises the ha za rd ra te into being rea ched by 16-6%, a nd tha t this increa se in the fema le ha za rd is significa nt a t a ny sta nda rd level. Aga in, this suggests tha t men a re ha rder to rea ch tha n women not controllina g for other fa ctors, since such fa ctors a re not a va ila ble before the da ta a re collected, tha t is, a t the time of contra cting between the monopsonist a nd sa mple members. Na tura lly, other things consta nt, in pa rticula r the a lloca tion of home rela tive to ma rket production, gender ma y not ha ve a n effect on sea rch dura tions. The pa rticipa tion ra tes conditiona l on being rea ched for men a nd women a re a lso reported in Ta ble 2 a nd revea l a significa nt difference between the genders. Using our formula for optima l rela tive wa ges, the estima ted ma le premium This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 56 REVIEW OF ECONOMIC STUD IES TABLE 2 Optima l wa ge discrimina tion by gender Ha za rd Sta nda rd Sa mple 95% Confidence ra tio error size interva l Fema le 1 166 0 02 14,243 [1.125, 1 207] Unconditiona l effect of fema le dummy on coopera tion ra te Coopera tion Sta nda rd Sa mple ra te error size Ma le 0 827 0 005 6,712 Fema le 0 842 0 004 7,657 Implied rela tive wa ge Ma le Wa ge _t'O ll6 0 184 =16x 1.09 Fema le Wa ge w1, 0 827 in cost-efficient survey production is reported to be 9%. The ma le premium in household surveys, when the fema le distribution of sea rch dura tions is stocha stica lly domina ted by tha t of the ma le, is a result of the la rger benefit of getting a ma le to supply when he is not only ha rder to rea ch, but a lso ha rder to get to pa rticipa te once rea ched.'0 2.2. The wa ge effects due to dura tion dependence in sea rch ha za rds One importa nt a spect of survey production is differences in the a va ila bility of sa mple members. An importa nt a spect of these unobserved differences is the increa sing costs of sea rch throughout the survey tha t they imply. In the ca se of a compensa tion independent sea rch process, dS/dw = 0, but one with potentia l dura tion dependence in the ha za rd ra te, dh/dt<0, this ma y be seen from the necessa ry condition for a n interior optima l wa ge derived a bove. Substituting the expression for dv,/dw from equa tion (4) into the left-ha nd side of the necessa ry first-order condition in equa tion (8), one gets c da erTIfdL, h(zjf) div dnv a ga in eva lua ted a t the optima l level. The left-ha nd side is the ma rgina l effective cost of sea rch of finding a dditiona l sa mple members, c/h(rf), times the a bsolute va lue of the odds-ra tio of the pa rticipa tion ra te; it represents the reduction in sea rch costs due to the fa ct tha t unproductive sea rch on non-pa rticipa nts a re elimina ted by a wa ge increa se. If sea rch ha za rds decrea se to zero quickly, therefore, then a lmost rega rdless of the size of the sea rch costs, there will be substa ntia l upwa rd pressure on wa ges to elimina te unproduc- tive sea rch for non-pa rticipa nts. Heterogeneity in the ha za rd functions a cross sa mple members will ma ke sea rchi ha z- a rds decrea se to low levels a nd thus ra ise effective sea rch costs. Consider a popula tion in which the sea rch surviva l function is a mixture of exponentia ls a s in S(tl@v)= JS(tjhI, wv) dIH(h), (15) 10. Furtliermore, the a ge of eligibility for the HRS wa s 51-61 yea rs. Presuml1a bly, gender a va ila bility should be even more differentia ted for lower a ges, when rela tively more fema les ta ke ca re of children a t home. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 57 where H(h) is the mixture distribution a nd the conditiona l surviva l S(tlh, wv) is exponen- tia lly distributed. It is well known tha t such unobserved differences imply tha t the uncondi- tiona l hla za rd ra te is of lower slope tha n the conditiona l one, in this ca se decrea sing when the conditiona l is consta nt a cross dura tions dlh(tlw)<0 (16) dt This occurs beca use the sa mple members who a re rela tively more a va ila ble a re found ea rly, ma king the surviving pool rela tively less a va ila ble." This is importa nt beca use the unobserved heterogeneity in ha za rd ra tes increa ses the optima l wa ge, due to a n increa sed benefit of getting sa mple members to pa rticipa te ea rly, when sea rch costs a re low. This ca n be seen by compa ring a homogeneous popula tion, with a degenera te mixture distribution Ho(h) = 1 {jh, to a popula tion with a mixture distri- bution H11(h), obta ined by a mea n-preserving sprea d of H0, such tha t the mea n a nd va ria nce sa tisfy EH[h] = ho, but VH [h] >0. This implies the ha za rd rela tionship hH(t) <ho (17) beca use the two ha za rds sta rt a t the sa me level, hH(O)=ho, but the ha za rd under a non- degenera te mixture distribution is decrea sing, a s opposed to being consta nt, a s is the ca se when the ha za rds a re homogeneous.'2 This implies tha t the dura tion of the survey is increa sed by heterogeneity, rHO ?_ rH. The first-order condition for the optima l wa ge a bove, together with the lower ha za rd under a more heterogeneous popula tion, directly implies tha t the optima l wa ge under heterogeneity is la rger tha n the one under homogeneity WH(f)>?WHo(f)- (18) Tha t is, the unobserved differences increa se the wa ge beca use the effective sea rch costs increa se with the dura tion of the survey so tha t the benefit of a ttra cting people ea rlier in the survey, through la rger wa ges, rises.'3 2.3. The dyna mfics of a vera ge costs Although the reduction in sea rch ha za rds tha t ra ises optima l compensa tion ma y be ha rd to observe directly, it will ma nifest itself in the intertempora l pa ttern of production costs. D efine the dura tion specific a vera ge costs A, to be the tota l expenditures on both wa ges a nd sea rch sprea d over a ll pa rticipa nts, a s in A= fi(tIw)w + S(t I v)c r+ c (I19) f1 (t I W) h(t I w)G(w) The first wa ge term is due to the fa ct tha t only individua ls who pa rticipa te a re pa id. The second sea rch term follows from the fa ct tha t, a mong those survived to be sea rched, only a fra ction h(t I wt)G(w) tra de observa tions. This implies tha t the a vera ge costs rise over the dura tion of the survey, dA, /dt > 0, since the sea rch ha za rd fa lls due to the lower a va ila bility of sa mple members la ter in the survey. Only when there is no dura tion dependence in sea rch ha za rds do the a vera ge costs not grow over time. I 1. See, e.g. La nca ster (1990). 12. For a genera l discussion see, e.g. La nca ster (1990). 13. An importa nt a lterna tive policy not considered here is intertempora l wa ge discrimina tion, pa ying individua ls who a re rea ched la ter less, which would a lso involve increa sed compensa tion ea rly, rela tive to the ca se of no dura tion dependence in sea rch ha za rds. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 58 REVIEW OF ECONOMIC STUD IES The existence of dura tion dependency in sea rch ha za rds a nd the increa sed compensa - tion they imply ca n be eva lua ted using direct mea sures on the costs of production. We consider the a vera ge costs A, tha t were discussed a bove, which come from severa l surveys conducted by the Na tiona l Opinion Resea rch Center (NORC). We obta ined a ccess to unique da ta in terms of NORC's interna l cost a ccounting forms, which were coded a nd served a s the ba sis for the survey production da ta in this section. Over the pa st few yea rs, NORC ha s implemented a cost-monitoring progra mme which records the cumula tive production costs throughout the dura tion of their surveys. Ma ny recent surveys ha ve been monitored in such a ma nner, including some tha t a re frequently used by economists, such a s the Na tiona l Longitudina l Survey of Youth (NLS) a nd the Genera l Socia l Survey (GSS). To illustra te the pa ttern in these da ta , Figure 2 depicts a vera ge costs a cross the dura tion of the survey for cost-monitored NLS surveys.14 It plots these a vera ge costs, a pproxima ted by the cost per observa tion, a s a function of ca lenda r time, mea sured in weeks. The cost per observa tion in a given period of a survey wa s constructed by dividing the flow of cumula tive costs in tha t week by the flow of observa tions obta ined in the sa me period. The overa ll pa ttern suggested by these figures is tha t a vera ge costs increa se dra ma t- ica lly over time. Some of the figures even displa y increa sing a vera ge costs a t increa sing ra tes. Furthermore, these pa tterns ma y be understa ting the increa ses in a vera ge costs due to increa ses in sea rch costs if there is lea rning-by-doing, which would reduce the cost per observa tion a s the number of observa tions collected increa sed.'5 Ta ble 3 documents these pa tterns more systema tica lly through regressions investiga t- ing the effects tha t the response ra te a t a given period ha d on a vera ge costs in the subse- quent period. The unit of observa tion of the ta ble is a survey-week, a nd the ta ble reports the coefficient estima tes for the fixed-effects specifica tion of the type Alt=Xo+XIFl,(t- 1)+X2Q+Ai+ Eit, (20) where the dependent va ria ble Ai, is the cost per observa tion for survey i in week t. The independent va ria bles a re F1i(t - 1), RespRa le in the ta ble, which is the response ra te produced in survey i a t the sta rt of the survey week t; Q, which is a set of controls including type, yea r, a nd size of survey; a nd A, a n unobserva ble fixed effect. Our ma in interest is in the independent effect of the response ra te. Na tura lly, the response ra te is correla ted with dura tion a s mea sured by the va ria ble Week, so tha t wlhen tha t va ria ble is included, both the size a nd significa nce of the response ra te effect is lowered. However, the response ra te effect is higlhly significa nt, rega rdless. The ta ble ma y be interpreted to displa y strong positive effects of the produced response ra te on the subsequent cost per observa tion.'6 14. All costs discussed in this section a re reported in 1994 dolla rs. All of the surveys in the sa mple, those listed in Ta ble I in the introduction, were field surveys except Ba cca la urea te a nld Beyond, whicih wa s donie prima rily by telephone, fielding only tha t pa rt of the sa mple for- wlhich the phonie sea rches were unsuccessful. The figures a re initia ted a t the seconid period of the survey beca use the initia l period of ma ny surveys colnta ined fixed costs, suchl a s development of the questionna ire. 15. This effect should be sma ller for more experienced initerviewers. With inexperieniced initerviewers, who a re quite common due to high turnover a monig initerviewers, lea rning-by-doing is likely to ma niifest itself in reductions in non-pa rticipa tion over the dura tion of the survey (i.e., dla /dt<O) or, a lterna tively, a reduction in interview time resulting from increa sed fa milia rity withl the questionis of the survey. 16. Thle sa mple in these regressions wa s limited due to the few surveys tha t ha ve been cost-monitored a t NORC, which, in turn, excluded a more ela bora te set of specifica tions witlhout loss of power to detect effects. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 59 X r tLo ,j;usiqo id <1o,) tlolj'AJMqO ja d qso) X: c 1 0 ~~~~~~~~~~~00 uowIPAJa Sqo i;d SISOj UOI1rAJa SqO ja d sisoj 0 - 0~~~~~~~~~~0 C o o I - o j Io o -11 --T- IO O .01 s rn uoI'eAJa sqo la d sisoD 0uIOIJAlAa sqo J.-d slso D JLA;)q JdSJO This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 60 REVIEW OF ECONOMIC STUD IES TABLE 3 Fixed effects r-egressions of inicr-ea sing a ver a ge costs Model I Model 2 Model 3 Model 4 Model 5 RespRa te 0-0062 0 0062 0 0049 0 0061 0 0062 (9 811) (9 808) (4 895) (9 649) (9 741) Sa mpSize 0 000007 0 000007 0 00004 0 00005 (0233) (0207) (1 358) (1 003) Week 0.0044 (1 717) GSS 1 3483 1 4663 (3 475) (2 592) NLS 0 4066 0-521 (1-125) (0 899) 1989 -0 0621 (-0 072) 1990 0-0108 (0-017) 1991 -0-1699 (-0 257) 1992 -0-0111 (-0-013) 1993 0 2551 (0 566) Consta nt 4 1931 4 1287 4 1516 3-325 3-176 (22-124) (12-2) (11-835) (9 868) (5 793) Notes: (1) Number of observa tions: 313 (2) t-sta tistics a re reported in pa rentheses. D a ta Source: Na tiona l Opinion Resea rch Center (NORC). Of course, the response ra te effects ma y be correla ted with other a spects of survey production tha t ma ke a vera ge costs rise throughout the survey's dura tion. Figure 3 there- fore investiga tes the role of effective sea rch costs by direct evidence. It illustra tes the increa sing na ture of effective sea rch costs in the HRS by plotting the ba seline ha za rd estima te of the proportiona l ha za rd regression from the previous section. The underlying ha za rd estima te of the figure involved sea rches being successful a bout 25% of the time, a fter which they declined dra stica lly with the number of sea rches ma de. The sha rp decline in the ba seline ha za rd implies high effective sea rch costs. If the unit cost of sea rch is c = $10, for exa mple, then a fter 35 sea rches the effective cost of a single sea rch is c/h = $230. The difference in the ha za rd forms between genders wa s not significa nt, a nd thus the unconditiona l ha za rd will be simila rly sha ped with a sha rp nega tive slope. Together with the increa sing a vera ge costs of the surveys, the rising pa ttern in effective sea rch costs suggests tha t there is substa ntia l upwa rd pressure on compensa tion due to sha rply fa lling sea rch ha za rds. 3. PROD UCTION WHEN ABSENCE OF TRAD E IND UCES BIAS The costs of production must be sepa ra ted from the vclute of the produced output, which is a ffected by whether missing da ta (due to not completing a ll observa tiona l tra des) a ffect the inferences ma de using the da ta produced. Consider the ra ndom vector ( Y, T, Z, U) for the popula tion represented, a s before, by the outcomes mea sured, the sea rch dura tion, a nd the reserva tion wa ge, but now a lso including U, representing a n unobserva ble fa ctor tha t a ffects the outcome. In this section, we a re interested in two ma in production bia ses This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 61 0*3 0 o Cis cr 0 N~~~~~~~~~~~ x,~~ ~~~ 0 2 0 0~~~~~~~~~~~~~ v o 0~~~~~~~~~~~~~~ C, o1 - 0 0 0 0 0 o o 0 15 30 45 Tota l number of ca lls FIGURF 3 Ba seline ha za rd function estima te which a re the result of dependence between U a nd T respective U a nd Z. Under a mea n- squa re error objective, the ca se of no dependence considered in the previous section would imply tha t a ll tha t ma ttered for the output wa s the produced response ra te, which, when combined with the sa mple size, yields a given level of precision in estima tion. However, bia s occurs whenever those individua ls for whom observa tions a re excha nged differ in their mea sured outcomes from those who do not enga ge in the observa tiona l excha nge.'7 This section a rgues tha t such bia s, so ca lled non-response bia s, ma y be a ddressed through wa ge discrimina tion in the compensa tion of sa mple members.8 Ta ble 4 below illustra tes in a sta nda rd regression fra mework the type of specifica tions tha t ca n be used to provide evidence on the existence a nd ma gnitude of such production bia ses, using different wa ges a nd sea rch da ta in the unique da ta set provided by HRS.'9 The ta ble reports the effects of sea rches a nd compensa tion in a n ea rnings regression estima ting a nnua l ea rned income for individua ls in the HRS, who a re a ll a ged 51-61. The survey involved two sepa ra te wa ge levels for sa mple members, a s indica ted by the dummy va ria ble Survey Wa ge. The survey a lso recorded the number of sea rches ma de before ea ch pa rticipa ting sa mple member wa s rea ched, a s mea sured by the va ria ble Sea rches. Both the ma in effects of these production pa ra meters a nd their intera ctions with a ge, educa tion, a nd self-reported hea ltlh a re reported.20 The ta ble excludes a la rge set of a lterna tive con- trols, described in the footnote of the ta ble, tha t were included in the specifica tions. The ta ble reports la rge a nd significa nt effects of survey sea rch a nd compensa tion on estima ted ea rnings: the ma in effects in the first specifica tion indica te tha t a single sea rch 17. There is substa ntia l systema tic empirica l evidence tha t, wien va lida ted, differences exist between those tra ding a nd not tra ding observa tions. See, e.g. the reviews in Ma dow et a l. (1983). 18. Selection bia s, occurring whetn those who supply la bour differ from those who do not, ha s of course been studied widely by economists in la bour ma rkets other tha nl da ta ma rkets; see, e.g. Heckma ni (1976). One ma jor difference exploited here is tha t the resea rcher coincides with the employer a nd hence ca nl control wa ges. 19. We discuss the pa ra meters of this re-sa mpling scheme in grea ter deta il in the next section. 20. All va ria bles in the regression were norma lized to devia tions from their mea ns so tha t the mna ini effects in the model with intera ctions represent the effects a t a vera ge levels. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 62 REVIEW OF ECONOMIC STUD IES TABLE 4 Production bia ses in a ni ea rnings regression Va ria ble Model I Model 2 Model 3 Sea rches 664-7 542-7 (6 2) (4 8) Survey Wa ge 3317 3 1347 8 (3 4) (1 2) Intera ctioni Terms; Sea rclhes * Good Hea lth 491 5 584 0 (1 6) (1-9) Sea rches * Yea rs of Educa tion 261-0 231 0 (68) (59) Sea rclhes * Age 103 1 56.4 (3 2) (1 7) Survey Wa ge * Good Hea lth 6086-7 6256 1 (1 7) (1 8) Survey Wa ge * Yea rs of Educa tion 1958 9 1853 6 (50) (4-7) Survey Wa ge * Age 970 6 843 7 (3-3) (2 6) Notes: (I) t-sta tistics a re reported in pa rentheses. (2) Control va ria bles: Wea lth, Unea rned Income, Educa tion, Ma rita l Sta tus, Sex, Age, Region, Ra ce, a nd Hea lth Sta tus. D a ta Source: Hea lth a nd Retirement Survey (HRS), Wa ve 1. ra ises ea rnings by a bout $665, a nd tha t the more highly compensa ted group ha d ea rnings in excess of $3,317 more tha n those who were not compensa ted. This ma y stem from the fa ct tha t la bour supply towa rds other work a nd survey work a re most likely gross substi- tutes, in the sense tha t a s the wa ge for other work rises, the survey supply fa lls. Indeed, the joint la bour supply decision involved in survey a nd a lterna tive work seems importa nt to understa nd when producing observa tions on a ny la bour ma rket. The intera ctions a re significa nt a s well. They indica te the degree to which sea rches or compensa tion a ffect the returns to educa tion, the a ge-ea rnings profile, or the impa ct of hea lth on ea rnings. For exa mple, the compensa ted group ha s a return to educa tion tha t is $1,850-$1,950 la rger tha n the uncompensa ted group. If such potentia l production bia ses a re present in a ctua l surveys of interest to econom- ists, the question becomes whether they ca n be overcome through a lterna tive production methods. One wa y would be through full response, f= 1, produced by full sea rch a nd ma ximum compensa tion. Since the entire sa mple would enga ge in observa tiona l tra des under such a policy, the compensa tion would induce unbia sed estima tion of outcome pa ra meters. However, the cost of such a policy ma y be substa ntia l since it is obta ined by letting sea rch go on until a ll sa mple members ha ve been found, rI, a nd compensa tion is set a t w = z. D enote the resulting ma ximum compensa tion costs CM, a s given by CM = RI (Z) + LI (Z)= n S(t I )ce-" dt + zJ f(t I f)e-'t dt1 (21) - =0 ~~~~~=0 For the ca se of no discounting, with which we will be exclusively concerned in this section, this reduces to the simple expression21 CM=n[E[TIz]c?+]. (22) 21. This follows from the fa ct tha t for a ny surviva l function, its integra l equa ls the expected surviva l time JS(t) dt = E[T]. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 63 For ea ch sa mple member, this is simply the expected number of sea rches priced out a t the unit cost of sea rch, together with the wa ge pa id a fter the sa mple member ha s been rea ched. 3. 1. Ra ndom wa ge discrimina tion One method of reducing compensa tion-ba sed production bia s in a chea per ma nner is through (third-degree) wa ge discrimina tion. However, instea d of being ta ilored by a mon- opsonist in the sta nda rd ma nner, to differences in supply ela sticities, wa ge discrimina tion should be ra ndlom in order to genera te informa tion a bout the compensa tion-ba sed bia s. In essence, ra ndom wa ge discrimina tion serves a s a high-qua lity instrumenta l va ria ble for non-response bia s, a va ria ble tha t is correla ted with the supply of observa tions but uncorrela ted with other unobserved determina nts of investiga ted outcomes. Consider the simple ca se of estima ting the unconditiona l mea n of a univa ria te outcome va ria ble Y in a ca nonica l linea r model of the form considered empirica lly in Ta ble 4 a bove, Y=/PO+JPRD R+P/LD L+ U, (23) where D R is a dummy va ria ble indica ting whether the sa mple member ha s been rea ched a nd D L is a dummy va ria ble indica ting a tra de of a n observa tion ta king pla ce. The condi- tiona l mea n of the outcome va ria ble a s a function of the wa ge is then E[ Ylw] =/#?/RP(D R= lIw)+/3LP(D L= I1 w) + E[ U W], (24) where P(D R = I IIV) = I - S(t W) is the proba bility of being rea ched given the wa ge, a nd P(D L = I w) = G( w) is the pa rticipa tion ra te given the wa ge. Now consider the ca se of a ra ndom wa ge-discrimina tion policy in which two wa ge levels, a high one a nd a low one denoted w- a nd w respectively, a re ra ndomized out a cross the sa mple. If the wa ge discrimi- na tion scheme is to be unbia sed for a ll popula tions /3, the scheme must na tura lly involve full sea rch on ea ch wa ge. D ue to ra ndomiza tion of the wa ges a cross the sa mple, the distribution of ( Y, Z, T, U) is the sa me a cross the two trea tment-groups receiving the high a nd the low wa ge when both groups a re being fully sea rched. In pa rticula r, the unobserv- a bles a re independent of the a ssigned wa ge, so tha t E[ Ul w] = E[ Ul ITi] = E[ U]. (25) This implies tha t the mea ns of the outcome conditiona l on the two wa ges a nd full sea rch a re E[ YI w] = /0+ f,LG(w) + E[ U] (26) E[ Yl ii'] = Po + fiG(1iT) + E[ U], (27) which implies the effect of the reserva tion wa ge on the outcome a ccording to /3 t Yl w1]-Et[Yl t] (28) G(q)-G(iT') Substituting in sa mple a na logues to the popula tion va lues of the outcome mea n a nd pa rticipa tion ra tes for the two wa ge groups is nothing more tha n the IV-estima tor for the effect of compensa tion on the outcome. The non-response bia s tha t ra ndom wa ge discrimina tion a llows us to estima te a nd elimina te is therefore the one introduced by compensa tion bia s. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 64 REVIEW OF ECONOMIC STUD IES The key tra deoff in the cost of elimina ting bia s this wa y, rela tive to tha t of full compensa tion, is determined by how ela stic sea rch dura tions a nd pa rticipa tion ra tes a re with respect to the wa ge. Consider the undiscounted costs under the wa ge discrimina tion scheme CD , a s in CD = i[cE[Ti iTi] + G(fr) t-] + n[cE[TI w4] + G(w)w], (29) where (ni, n) a re the sa mple sizes of the two wa ge groups. Since the ma ximum compensa tion costs CA,f constitute the specia l ca se when iwV = w = z, the difference in costs ma y be written CD - CM = ifc(E[TI i] -E[Tf z]) + (G(i7t)IV -z)] + n[c(E[TI v] - E[Tj z]) + (G(u)w - z)]. (30) The two terms a re for the two wa ge groups. Within ea ch group, the cost differentia l between the two wa ge policies is ma de up of the increa se in sea rch expenditures under the discrimina ting policy rela tive to its lower wa ge outla ys. The key tra deoff, a s in the previous section, is between sea rch a nd wa ge expenditures. The more sea rch dura tions a re lowered by compensa tion, the more sea rch expenditures increa se when a ttempting to wa ge discrimi- na te, rela tive to the full-cost solution. The full compensa tion survey finds sa mple members much fa ster tha n a ny other one, the survey employing wa ge discrimina tion, in pa rticula r. Therefore, in order for the wa ge discrimina tion to reduce costs, increa ses in sea rch costs must be offset by reductions in wa ge expenditures. In pa rticula r, dropping the wa ge to zero for the low-wa ge group will na tura lly lower wa ge outla ys but a lso increa se sea rch costs.22 3.2. Intertempora l wva ge discrimina tion Instea d of discrimina ting a cross sa mple members a t one point in time, the monopsonist ma y discrimina te a mong them a cross time. Consider a full sea rch, two-period wa ge policy in which the sa mple is offered a given wa ge in the first period, a fter which, in the second period, those who do not tra de a t this initia l wa ge a re re-sa mpled a t a substa ntia lly higher wa ge. The purpose of such a scheme would be to buy ma ny observa tions a t the low wa ge a nd lea rn a bout those not tra ding through the second higher wa ge. Such a compensa tion scheme is cha ra cterized by the two wa ges (wv,, w2) a nd the fra ction ir of those not initia lly tra ding, who a re then resa mpled in the follow-up pha se. Consider the estima tor Yof the unconditiona l mea n E[ Y] under such a n intertempora l discrimina tion scheme, a s in y=n1y-+(n-n)r (31) n n where Y1 is the mea n of the n1 sa mple members who supply a t the first wa ge, while Y2 is the mea n of the group of size n2 sa mple members who supply a t the second wa ge. If there is full pa rticipa tion in the follow-up, then n2 = 7 -[n n]. The bia s of this estima tor is E[ YI wI, W21 -E[ Y] = G(wiv )E[ YI Z < w] + [ 1-G(w1 )]E[ Yj wiV < Z ? W2]-E[ Y]. (32) 22. Our ma in concern here is compensa tion. Na tura lly, the a na logue a rguments could be ma de through ra ndomizing out differentia l a mounts of sea rch, possibly ha ving four fea sible trea tment groups in which high/ low compensa tion is combined with high/low sea rch dura tions, where both wa ges a nd sea rch dura tions get ra ndomized out. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 65 This implies tha t a sufficient condition for the wa ge policy to yield a n unbia sed estima tor E[ Y] for a ll distributions of ( Y, Z) is full pa y in either the first or second period: it,] = z or W2 7. The importa nt a dva nta ge of this intertempora l discrimina tion over ra ndomii wa ge discrimina tion is tha t the estima tor is unbia sed, rega rdless of the linea rity of the conditiona l nea n function of the outcome. Therefore, whenever non-linea rities a re importa nt a nd crea ting instrumenta l va ria bles through wa ge discrimina tion is therefore problema tic, intertempora l discrimina tion ma y be more a dva nta geous. The first type of unbia sed wa ge policy with full initia l compensa tion (v, = z) is simply the previously-discussed ma ximum wa ge policy occurring a s a specia l ca se of intertempora l discrimina tion. The second policy is the focus here a nd involves a bia s reduction bonus in the second period. Indeed, the second period wa ge must be la rger to get a ny supply in the second period. For under decrea sing wa ges, a nyone who decides to pa rticipa te does so in the first period. However, a n increa sing wa ge schedule will crea te a n incentive to supply la ter ra ther tha n ea rlier, so a s to ca pture a higher wa ge. This is pa rticula rly true if, a s is common, the dura tion between the periods is too short to ma ke differentia l discounting releva nt to intertempora l price discrimina tion. Therefore, for the intertempora l re-sa mpling scheme to be incentive compa tible, the inclusion proba bility 7r of the second period must be low enough to elimina te the incentive to a void supplying in the first period. The proba bility of being included in the second round needs to be low enough to ma ke the sa mple members a ct myopica lly whenever conta cted by the survey in the first period. Under risk-neutra lity on the pa rt of sa mple members, this implies tha t incentive compa tible re-sa mpling must ha ve the wa ge growth limited by the size of the inclusion proba bility a s in Wt, >?fl1W2- An incentive compa tibility constra int such a s this ma y be more importa nt in long-term contra cts (pa nel surveys) tha n in spot ma rkets (cross-sectiona l surveys). Few individua ls in a popula tion will ever be tra ding in more tha n a few cross-sections. On the other ha nd, under the repea ted mea surements in pa nel surveys, incentive compa tibility is a n issue. This is so beca use lea rning ma y ta ke pla ce more rea dily in pa nels tha n in cross-sections, a s the dema nder ma y be fa ced with the sa me type of incentives every period in the pa nel. How- ever, in a cross-sectiona l spot ma rket there is no room for lea rning-by-doing by suppliers. In sum, the incentive compa tibility constra int ma kes the costs per period in pa nel surveys higher, rela tive to repea ted cross-sections. For the price discrimina tion considered in the Hea lth a nd Retirement Study, which is a pa nel survey, the sequentia l na ture of the compensa tion offered in their re-sa mpling scheme is shown in Figure 4 below. This survey involved a ten-fold increa se in compensa tion. In pa rticula r, it,, = $10 or $30 wa s offered to single individua ls a nd couples, respectively, for their initia l pa rticipa tion. An initia l decision to not pa rticipa te followed by subsequent pa rticipa tion, on the other ha nd, wa s rewa rded with a pa yment of w2=$100 or $300. The follow-up re-sa mpling ra te wa s r = 68- 5% of those who did not pa rticipa te initia lly. The produced response ra te wa s f, -n /n=77. 7% in the first period a nd f2=n2/[7r(n-n )]I=259'9% in the second period. The sa mple members were not a wa re of this design, a s the higher compensa tion for those not pa rticipa ting initia lly wa s decided on a fter the first period. However, if such a design were understood by sa mple members, which it presuma bly would be if it were wvidely used in cross-sectiona l surveys or used repea tedly in a pa nel survey, it would not be incentive compa tible. The incentive to wa it, a nd possibly collect the la rge compensa tion, would be much too la rge under sta nda rd risk preferences of sa mple members. Under risk neutra lity, the certa inty equiva lent of not pa rticipa ting is 7rVV2 = $68- 50 a nd $205- 50 for singles a nd couples, respectively, a s compa red to pa rticipa ting, which is w, = $10 for singles a nd w, = This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 66 REVIEW OF ECONOMIC STUD IES n= 15,444 FIGURF 4 Incentive incompa tible a nd bia sed design of the Hea lth a nd Retirement Survey (HRS) $30 for couples. Furthermore, the second period under-compensa ted in terms of unbia sed estima tion, since pa rticipa tion wa s fa r from full, f2<< 1. In the HRS, on the other ha nd, the la rge re-sa mpling expenditures were spent on a la rge fra ction ir, which for simila r designs in the future should be reduced to increa se the follow-up pa rticipa tion ra te a bove 25-9% towa rds unity. The expected cost of a n unbia sed a nd incentive compa tible re-sa mpling scheme, such a s tha t of HRS, is determined by C,-nj[cE[Tjwj ] +w, ] +[n-n, ]ir[cE[Tjz-j+,1 it, }87>Z (33) The first term is the sea rch expenditures a nd wa ge outla ys in the first period, simila r to tha t discussed in ea rlier sections. The second term is the cost in the second period, ma de up of sea rch a nd wa ge expenditures a t the ma ximum wa ge. D ue to the incentive compa tibility constra int, the first wa ge is restricted to be a bove tha t wa ge which provides a n incentive to dela y the supply until the second period. The difference between a ma ximum wa ge policy in which the whole sa mple receives the ma ximtum pa y, a nd the intertempora l dis- crimina tion cost, in which only the second period sa mple members get ma ximum pa y, is thereby CT-sCdi =fnn [i (E[Ta g i detr E[Tmn z]) + (tve -Zd)]e - (n-n sr a r)[( w E[ Tei] +rz)]. (34) This difference is a ga in determined by the tra deoffs in sea rch a nd wa ge expenditures. More This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 67 sea rch expenditures a re incurred for the low wa ge group under intertempora l discrimina - tion tha n for the sa me group under the ma ximum wa ge: the first term in the equa tion a bove is positive. These costs must be offset by the lower pa y for first-period sa mple members, the second term, a s well a s the elimina tion of pa y for those who do not tra de in either the first or second period, the third term.23 4. REGULATION BIAS IN SURVEY PROD UCTION The previous sectionis discussed a spects of survey production which put upwa rd pressure on wa ges a nd ma de va ria tion in their levels a cross sa mple members desira ble: regula r ela sticity-ba sed wa ge discrimina tion, reductions in sea rch costs, compensa tion-induced instrumenta l va ria bles, a nd incentive compa tible bia s-reduction bonuses. This ha s impor- ta nt implica tions for public wa ge regula tions in surveys, which a re extensive in the U.S. The importa nt a spect of such regula tions is not so much the increa sed production costs, but their implicit sta tistica l implica tions, pa rticula rly the production bia ses introduced through regula tion. Most wa ge regula tions of survey resea rch ta ke the form of ma ximum wa ge policies, limiting the size of pa yments to sa mple members, a s well a s prohibiting wa ge discrimina tion a ltogether. For exa mple, the Office of Ma na gement a nd Budget (OMB) restricts the wa ges pa id to respondents in these wa ys for publicly fina nced surveys in the United Sta tes. Such wa ge policies a re typica lly justified, surprisingly often by econo- mists involved in survey production, by the a rgument tha t survey costs a re a lrea dy too la rge to be a ble to a fford the luxury of compensa ting sa mple members. The regula ted cost function CR(f) is defined by ma king compensa tion homogeneous a nd restricted below the level WR, a s in CR(f Min{ Rf(w) + Lf(w,), 0 <-? _ WR }. (35) Such wa ge restrictions na tura lly reduce wa ge outla ys, but with the result tha t increa sed expenditures on sea rch a re la rger.24 In the a bsence of discounting, this excess cost is given by CR(f)-C( f) =nJ [S(thIWR)-S(tlW(f))]C dt rf (wtR) + J S(t I WR)c dt +?(WR-W(f )) fJ0, (36) 1= r1 (w( f)) where w(f ) is the optima l wa ge in producing the response ra tef The first two terms a re due to the la rger number of sea rches up to a nd beyond the unregula ted stopping time rf(w(f )) under optima l compensa tion a nd the la rger one lrf (WR) under regula ted compen- sa tion. The second term is the reduction in wa ge outla ys under the regula tion. To consider the sta tistica l effects of regula tion, consider the univa ria te mea n estima - tion problem discussed in previous sections. Figure 5 below illustra tes the effect of such wa ge-restrictions on the mea n-squa red error of the mea n estima tor, holding consta nt the sa mple size of the survey. The figure depicts two gra phs, both of which ha ve the produced 23. Obviously, there a re a dditiona l effects on the va ria nce of the estima tor, beyond tha t on bia s, induced by the incentive compa tible re-sa mpling proba bility being less tha n unity. 24. Note tha t when the f-percentile of the reserva tion wa ge distribution is a bove regula ted pa y, Wf_It' i, the ma ximum wa ge policy even ma kes certa in response ra tes infea sible, a s ma y be the ca se with surveys of physicia ns, who require substa ntia l pa yments to pa rticipa te in surveys due to their high opportunity costs. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 68 REVIEW OF ECONOMIC STUD IES Regula ted Unregula ted Costs Cost Cost Function Function C(f C(f) R / I / I Response < I ~~~~~~Ra te 0 Regula tion Bia s - -. -- - - - -( ( f)2 Regula tion! Va ria nce Va ria nce / ~~~~~~nf FIGURE 5 Sta tistica l regula tion effects response ra te on the x-a xis. The first gra ph on the positive pa rt of the y-a xis shows the regula ted a nd unregula ted cost of functions. Costs a re rising with response ra tes but a re higher for the unregula ted cost function. The figure indica tes a budget size C, which implies the response ra tes a fforda ble under the two cost functions. Letf (p) denote the unregula ted response ra te when the regula ted response ra te is p, a s illustra ted in the figure. Since regula tion constra ins wa ges, costs a re a lwa ys higher a nd responses a re a lwa ys lower: f (p) _P. Below the x-a xis a re depicted the components of the mea n-squa red error, the squa red bia s a nd the va ria nce, of the sa mple mea n estima tor Y of the unconditiona l mea n E[ Y], obta ined a s before from tra ding sa mple members. In the ca se considered before, when the conditiona l mea n function is monotonic in the response ra te, E[ YI f] = Yo+ yIf, the difference between the regula ted a nd unregula ted mea n-squa red errors is given by C2 2 MSE- MSER= yl2[(1 _f (p))2- I _-,)2] + (37) nJ(p) np where U2 iS the va ria nce. The first term is the difference in the bia s a nd the second term is the difference in the va ria nce of the mea n estima tor. Both of these components a re This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 69 depicted in the figure, which tra ces out their ma gnitudes. The regula tion effect opera tes in a cha in-rule fa shion: the more response ra tes drive the outcome (y, ) a nd the la rger the ga p between the unregula ted a nd regula ted response ra tes (f(p) -p), the la rger is the slope of the bia s component a nd thus the la rger is the regula tion bia s. The second component of the mea n-squa red error is the va ria nce, a nd the figure shows tha t the la rger the va ria nce of the outcome in the sa mple (&2) a nd the sma ller the size of the sa mple (n), the la rger is the regula tion component of the sta nda rd error of the estima te. It follows tha t fa ctors tha t drive up the optima l wa ge ra ise the excess cost a nd thus increa se the sta tistica l ha rms, a s well. For exa mple, ceteris pa ribus, sea rch costs a ffect the excess costs positively, so tha t they a re la rger for field sea rches tha n they a re for phone sea rches, which a re la rger tha n those for ma il sea rches. Furthermore, ma ximum wa ges increa se survey costs more when there a re more unobserved differences in sea rch dura tions, since such unobserva ble differences ra ise the optima l wa ge. Heterogeneity ra ises the wa ste induced by ma ximum wa ge policies beca use it ma kes sea rch costs rise with the dura tion of the survey, rendering a n a dditiona l benefit to wa ges tha t a re ra ised a bove the ma ximum level a llowed. La stly, the sta tistica l regula tion effects a re more severe the la rger the survey, which implies tha t cost differences increa se with sa mple size. The qua ntita tive ma gnitude of these regula tion effects a re importa nt for releva nt pa ra meter va lues. Ta ble 5 illustra tes this by displa ying the a bsolute va lue of the regula tion bia ses ca used when WR=O for different sa mple sizes, under a given set of rea sona ble TABLE 5 Production bia ses induced by nma xim?1unm wa ge r egula tions Sa mple Bia s size I 100 250 500 1,000 75 0 7,500 0 18,750-0 37,500-0 2,000 65-9 6,591-0 16,477-0 32,955-0 3,000 43 9 4,394 0 10,985-0 21,970-0 4,000 33-0 3,295-0 8,237-5 16,475-0 5,000 26-4 2,636-0 6,590-0 13,180-0 6,000 22-0 2,197-0 5,492-5 10,985 0 7,000 18-8 1,883-0 4,707 5 9,415-0 8,000 16 5 1,648-0 4,120 0 8,240-0 9,000 14-7 1,465-0 3,622-5 7,325-0 10,000 13-2 1,318-0 3,295-0 6,590-0 11,000 12 0 1,198-0 2,995-0 5,990 0 12,000 11 0 1,098-0 2,745-0 5,490-0 13,000 10 1 1,014-0 2,535-0 5,070-0 14,000 9-4 941-6 2,354-0 4,708 0 15,000 8-8 878-8 2,197-0 4,394 0 16,000 8-2 823 9 2,059-8 4,119-5 17,000 7-8 775-4 1,938-5 3,877-0 18,000 7 3 732 3 1,830-8 3,611-5 19,000 6-9 693-8 1,734-5 3,469 0 20,000 6-6 659-1 1,647 8 3,295-5 21,000 6-3 627 7 1,569-3 3,138-5 22,000 6-0 599 2 1,498-0 2,996-0 23,000 5-7 573 1 1,432 8 2,865-5 24,000 5-5 549-2 1,373-0 2,746 0 25,000 5 3 527- 3 1,318-3 2,636- 5 26,000 5-1 507 0 1,267 5 2,535 0 27,000 4-9 488 2 1,220-5 2,441-0 28,000 4-7 470-8 1,177 0 2,354-0 29,000 4-5 454-5 1,136-3 2,272-5 30,000 4-4 439 4 1,098-5 2,197 0 This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 70 REVIEW OF ECONOMIC STUD IES pa ra meters for the exa mple discussed in Section 2, with the optima l wa ge described in equa tion (11). The ha za rd ra te wa s a ssumed to be one-tenth of successful sea rches, h = 0 10, a nd the fra ction of free-suppliers wa s a ssumed to be one-qua rter of the popula tion, G,=0 25. Using the unit cost of sea rch a s the numera ire, the cost of production wa s 20,000 sea rch cost units. The different columns of the ta ble a re for different levels of the non-response bia ses, = 1, 100, 250, 500, possibly representing dolla r a mounts, simila r to those in the ea rnings regressions estima ted for HRS in Ta ble 4. The first column thlerefore represents the differ- ence in response ra tes f (p) -p induced by the regula tion, since the bia s is unita ry. For exa mple, the ta ble shows tha t the response ra te is reduced by a nywhere from the enormous effect of 75(%o for a sa mple of size 1000, to 4% for a sa mple size of 30,000. The non- response bia s with a $500 effect per percenta ge response therefore ra nges from 75 x $500 = $37,500 for a survey of size 1000 to $2,197 for a survey of size 30,000. In sum, for rea sona ble pa ra meter va lues, production bia ses introduced by public wa ge regula tions ma y be substa ntia l. 5. CONCLUD ING REMARKS This section concludes by discussing only a few of the ma ny a spects of survey production suggested by the a na lysis. The genera l importa nce of survey production for virtua lly a ll fields of positive economics ra ises a rich set of issues tha t ma y be usefully a ddressed by economic, a s opposed to sta tistica l, a na lysis. 5.1. The qua lity a nd qua ntity tra deoff in survey production The present discussion ha s a bstra cted from the tra deoff between the qua lity of observa tions (e.g. the degree of mea surement errors supplied) a nd their qua ntity, represented by the sa mple's size. If compensa tion is performa nce-ba sed, it will rewa rd sa mple members more for higher qua lity observa tions. One distinction between types of missing da ta is whether a question on a survey is not supplied or whether a sa mple member does not pa rticipa te in the survey a t a ll, a distinction commonly referred to in the survey litera ture a s item versus unit non-response. The differences between these types of missing da ta depend on the type of performa nce-ba sed compensa tion tha t is used. Compensa tion per question corresponds to a piece-ra te for sa mple members, while compensa tion ba sed on a bina ry pa rticipa tion decision reduces the ma rgina l benefit of supplying more questions to zero. Another dimension of performa nce-ba sed compensa tion is when sa mple members a re given the incentive to reduce their mea surement errors through incentives tha t a re tied to va lida tion of the a nswers given. Philipson (1995) discusses the ca se when mea surement errors in hea lth surveys ma y be reduced by monitoring a sma ll fra ction of the sa mple through doctor dia gnosis, restricting a high level of compensa tion to sa mple members tha t supply errorless a nswers. If such performa nce-ba sed compensa tion is used, there ma y be a limited tra deoff between qua ntity a nd qua lity: only people with low mea surement errors will ha ve a n incentive to enter into the survey a nd supply observa tions in the first pla ce. Furthermore, there ha ve been substa ntia l (qua lity-a djusted) technologica l cha nges in the detection of erroneous observa tions, through computer a ssisted persona l, or telephone, interviewing, for exa mple.25 The cost of monitoring sa mple members in this wa y ma y be fa lling, ma king such schemes more a ttra ctive a s their costs fa ll through such innova tion. 25. These technologies a re referred to a s CAPI a nd CATI in survey pra ctice. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions PHILIPSON D ATA MARKETS AND SURVEYS 71 5.2. Imputa tions, fr-ee lunches, a nd the production of a ctua l versus missing da ta One ma y distinguish between two fo-ms of missing da ta . The first a re missing da ta tha t a re producible, mea ning da ta which a re possible, a lthough not necessa rily optima l, to produce. For exa mple, da ta for those sa mple members not tra ding their observa tions a re missing, but they a re a lso producible, prima rily through la rger wa ge offers. The second a re missing da ta tha t a re non-producible, mea ning da ta which a re impossible to produce, due to infinite production costs. Counterfa ctua l da ta a re of this type: missing da ta tha t a re releva nt to a n eva lua tion of policies tha t ha s yet to be implemen- ted. There a re severa l importa nt differences between these two forms of missing da ta . First, in the ca se of producible da ta , the methods used to dea l with missing da ta ca n be eva lua ted by a ctua lly producing the missing da ta . For exa mple, imputa tion methods for sa mple members not tra ding observa tions ca n be eva lua ted by directly observing their performa nce rela tive to a follow-up study tha t a ctua lly obta ins the tra des. Such a n eva lua tion is clea rly impossible for missing but non-producible da ta , since the counterfa ctua l da ta ca nnot be produced.26 Second, when da ta a re missing but producible, a ny decision a bout optima l da ta production must consider the tra deoff between a ctua lly producing the missing da ta a t high cost a nd genera ting it some other wa y, a t lower cost. Methods of producing missing da ta typica lly involve "imputa tions", inserting da ta for sa mple members who did not tra de observa tions. Common imputa tion methods involve imputing the best prediction of the missing da ta (e.g. through regressions on the a ctua l da ta ) or ta king ra ndom sa mples from the da ta , using single ra ndom sa mples, multiple sa mples, or so-ca lled hotdeck procedures (see Ma dow et a l. (1983) a nd Little a nd Rubin (1985)). The importa nt point a bout such procedures is tha t they trea t produced a nd imputed da ta a s petfect substitutes in the a na lysis of the da ta . Implicitly, this is a free lunch a ssumption: a side from da ta producers, few other producers ha ve the luxury of being a ble to go to their customers a nd sell them nothing, a t the sa me time employing the a rgument tha t it is equiva lent to their rea l product. Most people who use imputa tion methods would object to being sold a ca r tha t wa s never ma de or a house tha t wa s never built! More importa ntly, if missing a nd a ctua l da ta a re perfectly substituta ble inputs into the production of the da ta set, then this should be ta ken into a ccount in the dema nd for inputs: if missing da ta is chea per a nd perfectly substituta ble, surveys should ha ve very low response ra tes. More precisely, if CA(f) a nd CM(f ) denote cost-functions for the production of a ctua l a nd imputed (but initia lly missing) da ta , respec- tively, then the tota l cost of the survey is n[fC4S(f ) +(1 -f )CM(1 -f )], a nd this implies a low dema nd for f when CM is substa ntia lly below CA. Of course, few survey producers would a gree with the a rgument tha t lower response ra tes a re desira ble ba sed on this a rgument, but the point is tha t their beha viour spea ks louder tha n their words: sta nda rd imputa tion beha viour implicitly revea ls such ta stes. Genera lly, the better a missing da ta procedure is a rgued to be, the less a ctua l da ta should be produced. If one is serious a bout the va lidity of the a ssumptions tha t ma ke sta nda rd imputa tion methods useful, tha t one ca n produce something out of nothing, then this should be reflected not only in the consumption of the da ta , but a lso in its production, in terms of low response ra tes. If one is not serious a bout such a ssumptions, then consump- tion pra ctices tha t a re ba sed on them a re obviously not a dvisa ble. 26. This inevita bly lea ds to common, but misguided, deba tes a mong empirica l economists on who ha s the best missing da ta (e.g. whether a n instrumenita l va ria ble is good or ba d). The problem is tha t "best" is not defined in rela tion to a ny a ctua l da ta since such da ta a re, by definition, non-producible. This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions 72 REVIEW OF ECONOMIC STUD IES Ack-noit-ledgements. I would like to tha nk two a nonymous referees a nd the editor of the Reviewt for very useful poinlts tha t improved the pa per. I a lso tha a nk John Ca wley, Rica rdo Cossa , Stua rt Ha gen, a nd Tom La wless for resea rch a ssista nce a nd Avner Alhituv, Ga ry Becker, Micha el Boozer, Pierre-Yves Geoffa rd, Ja mes Heckma n, Ca sey Mulliga n, a nd D erek Nea l for comments. I a m gra teful to semina r pa rticipa nts a t the University of Chica go, Ha rva rd University, a nd Ya le University, a s well a s The Na tiona l Opinion Resea rch Center (NORC), especia lly Phil D ePoy, for their grea t hiospita lity includinig provision of interna l da ta . Fina ncia l support from the Resea rch Fellows Progra m of the Alfred P. Sloa n Founda tion is gra tefully a cknowledged. REFERENCES BRAD BURN, N. a nd SUD MAN, S. (1988) Polls a nid Surveys (Sa n Fra ncisco, CA.: Jossey-Ba ss Publishers). BERGER, J. (1988) Sta tistica l D ecision Theory a nd Ba yesia n Ana lysis (New York: Springer-Verla g). COCHRANE, W. (1979) Survei Sa mpling (New York: Wiley & Sons). GREEN, J. a nd LAFFONT, J-J. (1979) Incenitives in Public D ecision Ma king, Volume I (Amsterda m: North- Holla nd). HANSEN, M., HURWITZ, W. a nd MAD OW, W. (1953) Sa mple Suirvey Methods a nd Theory (New York: Wiley a nid Sons). HECKMAN, J. (1976), "The Common Structure of Sta tistica l Models of Trunca tion, Sa mple Selection, a nd Limited D ependent Va ria bles a nd a Simple Estima tor for such Models", Anna ls of Economic a nd Socia l Mea sur ementt, 5, 475-492. GRILICHES, Z. a nd INTRILIGATOR, M. (Eds.) (1986) Ha ndbook of Econometrics (New York a nd Heidel- berg: North-Holla nd). GROVES, R. (1989) Survejy Errors a nd Suirvey Costs (New York: Wiley & Sons). KISH, L. (1965) Survei' Sa nmpling (New York: John Wiley & Sons). LANCASTER, T. (1990) The Econometr ic Ana lysis of Tra nsition D a ta (Ca mbridge a nd New York: Ca mbridge University Press). LAFFONT, J.-J. a nd MASKIN, E. (1982), "The Theory of Incentives: An Overview", Cha pter 2 in W. Hildeb- ra nd (ed.), Adva n2ces in Economic Theory (Ca mbridge: Ca mbridge University Press). LESSLER, J. a nd KALSBECK, W. (1992) Non-Sa mpling Errl or s in Surveys (New York: Wiley & Sons). LITTLE, R. a nid RUBIN, D . (1985) Sta tistica l Ana ly.sis with Missing D a ta (New York: Wiley & Sons). MAD OW, W., OLKIN, 1. a nd RUBIN, D . (Eds.) (1983) Incomiiplete D a ta in Sa nple Survey,s, Vols. I-IIl (New York: Aca demic Press). MANSKI, C. (1995) Identifica tion Problems in the Socia l Sciences (Ca mbridge a nd London: Ha rva rd University Press). PHILIPSON, T. (1994), "Tlhe Production of Hea lth Surveys: A Principa l Investiga tor-Agent Approa ch to Mea surement Error Reduction" (mimeo, D epa rtment of Economics, University of Chica go). STEFFEY, D . a nd BRAD BURN, N. (1994) Counting People in the Informta tion Age (Wa shington, D .C.: Na tiona l Aca demy Press). STIGLER, S. (1987) The History of Sta tistics (Ca mbridge a nd London: Ha rva rd University Press). SAVAGE, L. (1977) The Fouinda tionts of Sta tistics (New York: D over). SUD MAN, S. (1967) Reducing Costs in Surveys (Chica go: Na tiona l Opinion Resea rch Center). This content downloaded from 158.170.10.44 on Thu, 23 Oct 2014 15:33:35 PM All use subject to JSTOR Terms and Conditions
(Experimental Futures) Dimitris Papadopoulos - Experimental Practice - Technoscience, Alterontologies, and More-Than-Social Movements (2018, Duke University Press) PDF
(Experimental Futures) Dimitris Papadopoulos - Experimental Practice - Technoscience, Alterontologies, and More-Than-Social Movements (2018, Duke University Press) PDF