Você está na página 1de 5

Should We Treat Data as Labor?

Moving Beyond “Free”

By I MANOL A RRIETA I BARRA , L EONARD G OFF , D IEGO J IM ÉNEZ H ERN ÁNDEZ , JARON L ANIER , AND
E. G LEN W EYL ∗

In the previous paper in this session and in a plus to users (Brynjolffson et al., 2017) and is
forthcoming book (Posner and Weyl, 2018), one “free” (at point of use) to users. Despite these
of us argues that by creating or strengthening benefits, popular anxiety and backlash is rising.
absent markets, we can simultaneously address The most common concern is employment
the inequality, stagnation and sociopolitical con- and income distribution. Many fear that ar-
flict afflicting developed countries. He calls such tificial intelligence (AI) systems will replace
cases “radical markets” because of their trans- human workers. Economists rightly respond
formative emancipatory potential. A promising that greater technological disruptions in the
example was suggested years earlier by another past, while causing shifts in employment, have
of us, who wrote a book (Lanier, 2013) high- largely left labor’s share of income constant or
lighting the social problems with the culture of even growing (Autor, 2015). Yet recent secu-
“free” online, in which users are neither paid for lar declines in labor’s share (Karabarbounis and
their data contributions to digital services nor Neiman, 2014) belie its universal stability.
pay directly for the value they receive from these Furthermore, the employment numbers of
services. While free data for free services is a leading technology companies give little cause
barter, he argued that the lack of targeting of in- for optimism. The market capitalization and
centives undermines market principles of eval- value-added of firms like Facebook, Google and
uation, skews distribution of financial returns Microsoft are similar to or greater than a firm
from the data economy and stops users from de- like Walmart, yet they employ 1-2 orders of
veloping themselves into “first-class digital citi- magnitude fewer workers and our primitive at-
zens”. In this paper we explore whether and how tempts to estimate the labor income shares of
treating the market for data like a labor market these companies from publicly available statis-
could serve as a radical market that is practical tics suggest they are a small fraction of the tradi-
in the near term. tional average 60-70%. The “future” such firms
represent would validate Piketty (2013)’s fore-
I. The High Cost of Free Data boding of high capital shares.
Simultaneously, the lack of payment to users
The digital economy is perhaps the leading
for data may drag on the contributions of AI
source of innovation today, delivers massive sur-
to productivity growth. Despite the widespread
∗ Arrieta: Department of Management Science and Engi- hype about AI, its contributions to productiv-
neering, School of Engineering, Stanford University, Huang ity seem to have been limited thus far (Gordon,
Engineering Center, 475 Via Ortega Avenue, Stanford, 2016; Nadella, 2017). A potential explanation
CA 94305 (imanol@stanford.edu). Goff: Department of relates to the role of data. The first genera-
Economics,Columbia University, 1022 International Affairs
Building, 420 West 118th Street, New York, NY 10027
tion of AI systems largely failed to achieve their
(ltg2111@columbia.edu). Jiménez: Department of Economics, goals because they relied too heavily on hard-
Stanford University, 579 Serra Mall, Stanford, CA 94305 coding by engineers. The new generation of AI
(diego.jimenez@stanford.edu). Lanier: Office of the Chief Tech- uses statistical methods called “machine learn-
nology Officer, Microsoft Corporation, One Microsoft Way,
Redmond, WA 98052 (jalani@microsoft.com) Weyl: Microsoft
ing” (ML), which adapt to patterns in examples
Research, One Memorial Drive, Cambridge, MA 02142 and Yale of humans performing similar tasks (“big data”).
University Department of Economics and Law School (glen- Yet the free data model has made
weyl@microsoft.com). We are grateful to many colleagues for
comments, but especially to Microsoft business leaders Satya
productivity-related data much less acces-
Nadella and Kevin Scott for their encouragement. All errors are sible than consumption-oriented data. Workers
our own. who expect to be compensated are the primary
1

Electronic copy available at: https://ssrn.com/abstract=3093683


2 PAPERS AND PROCEEDINGS MAY 2018

performers of productivity-related tasks and hate speech) or to have declining self-esteem.


these often occur within firms unwilling to Thinkers promoting the idea of a “universal ba-
surrender their proprietary internal data to sic income (UBI)” have even suggested dignity
AI companies for free. More broadly, many based on work is becoming outdated and that
AI systems depend on active participation by as AI replaces humans leisure may be a grow-
humans to generate relevant data. This ranges ing source of identity (Parijs and Vanderborght,
from users granting permission to access data 2017). Whatever the promise of this idea, for
naturally created in the course of consumption the medium term treating online experiences as
experiences, through users that go out of their purely consumption holds risks for the social
way to provide examples of translations or and political fabric of developed countries.
feedback on translations generated by AIs as
they use these systems, to the sort of active II. Capital or Labor?
labeling and analysis tasks currently supplied
in digital labor markets such as Amazon’s We contend that the key aspect of the current
Mechanical Turk or Mighty AI (Gray and Suri, political economy of data that causes these prob-
2017) and even to the creative content displayed lems is treating data as capital rather than as la-
on blogs and video sharing sites. bor. While it might seem that assets either are
However, these systems seem inefficient as one or the other, and that treatment is irrelevant,
they generally do not reward those with the transitions in the social attitude towards assets
greatest expertise and context (usually those across these categories have played important
producing the data that others currently label in roles in history. Slavery and to a lesser extent
the first place), either reassigning task to those feudalism treated (largely agricultural) work as
with little context or coaxing those with context a possession of a master or lord, while liberal
to provide feedback for free as part of accessing and labor reform worked to give recognition and
online services (as in the case of DuoLingo or its marginal economic product to labor. To un-
reCAPTCHA). They appear to be workarounds derstand what we are trying to accomplish, it is
to avoid directly paying those best able to sup- useful to contrast several attitudes towards data
ply high-quality data rather than efficient pro- at present under the “Data as Capital (DaC)”
curement practices. A purely free data economy paradigm to those appropriate in a world where
acts as a drag on productivity growth that contin- we see data as labor (DaL); we summarize these
ues to lag worldwide (Byrne et al., 2016) despite in Table 1.
bold hopes for AI’s potential. DaC treats data as natural exhaust from con-
Finally, recent anxiety about employment and sumption to be collected by firms, while DaL
the digital economy goes beyond the purely eco- treats them as user possessions that should pri-
nomic. On the one hand, increasing numbers marily benefit their owners. DaC channels pay-
of workers, especially away from cosmopoli- offs from data to AI companies and platforms
tan and high-tech cities, are disillusioned with to encourage entrepreneurship and innovation,
and disenfranchised by technological and eco- while DaL channels them to individual users
nomic progress. Many believe these feelings to encourage increased quality and quantity of
helped stimulate populist movements of the left data. DaC prepares for AI to displace workers
and right throughout the developed world. either by supporting UBI or reserving spheres
Simultaneously young people spend increas- of work where AI will fail for humans, while
ing time on and have developed increasing ex- DaL sees ML as just another production tech-
pertise in digital interactions such as social me- nology enhancing labor productivity and creat-
dia and video games (Perrin, 2015; Aguiar et al., ing a new class of “data jobs”. DaC encourages
2017). Because such activities are overwhelm- workers to find dignity in leisure or in human
ingly framed as consumption rather than produc- interactions outside the digital economy, while
tion, these growing online lives are widely seen DaL views data work as a new source of “digital
as running contrary to or undermining the dig- dignity”. DaC sees the online social contract as
nity provided by work. Many of these young free services in exchange for prevalent surveil-
people seem to have become involved with an- lance, while DaL sees the need for large-scale
tisocial activities (such as cyberbullying and institutions to check the ability of data platforms

Electronic copy available at: https://ssrn.com/abstract=3093683


VOL. 1 NO. 1 DATA AS LABOR 3

Issue Data as Capital Data as Labor


Ownership Corporate Individual
Incentives Entrepreneurship “Ordinary” contributions
Future of work Universal Basic Income Data work
Source of self-esteem Beyond work Digital dignity
Social contract Free services for free data Countervailing power to create data labor market

TABLE 1—L EADING CHARACTERISTICS OF THE “ DATA AS CAPITAL” VERSUS “ DATA AS LABOR ” PERSPECTIVES .

to exploit monopsony power over data providers with advances in ML that allow estimation of the
and ensure a fair and vibrant market for data la- marginal effect of new data on predictions (Koh
bor. and Liang, 2017) suggests a promising avenue
Describing DaL versus DaC as a binary is ob- for valuing data (and one we are pursuing at Mi-
viously too simplistic and extreme. Production crosoft), though there are many conceptual and
function for data and the AI systems built on top computational challenges still to be overcome.
of it are certainly more continuous: data, cap- Whatever the precise balance, the only “third
ital (e.g. computational power), skilled labor way” out of the DaL-DaC spectrum we see is the
(e.g. programmers), entrepreneurial talent and failure of AI: if AI proves to be relatively unpro-
“land” (e.g. rents on network effects) all mat- ductive or irrelevant, neither DaL nor DaC will
ter and these different inputs can likely be sub- much matter. But if AI lives up to even a part
stituted reasonably smoothly. The socially op- of its hype, failure to move towards DaL will
timal shares of each factor depends on as-yet- leave us trapped in the problems we highlight
unmeasured details of production functions and with DaC.
data themselves are not purely created by users:
III. How Did We Get Here?
they requires firms to track, record and organize
user behavior. If treating data purely as capital is economi-
Yet we doubt the optimal (viz.competitive) cally and socially irrational, how have we ended
share of user data contributions is a negligible up in the present equilibrium? As in the nine-
fraction of the total value of the digital econ- teenth century labor struggles, the usual cul-
omy. While the marginal value of data in es- prits are a combination of prejudice (viz. the
timating any finite dimensional quantity even- weight of precedent created by historical acci-
tually steeply declines, the power of the latest dents) and privilege (viz. entrenched interests
generation of ML has been its ability to tackle that derive rents from the inefficient equilib-
increasingly sophisticated tasks as the quality rium). In the present setting, user expectations
and quantity of data improve. Many of these of “lightweight” online experiences has con-
more sophisticated tasks are impossible to even spired with the monopsony power of the tech-
get started on without ample data, as the neu- nology giants (what one of us has called “siren
ral networks and other learning algorithms re- servers”) to maintain the status quo.
quired cannot learn the right representations of The internet economy largely began with a
complex phenomena without many training ex- venture-capital fueled bubble that chased usage
amples. This suggests that the returns to data with little sense for a business model. The so-
may decline only gradually or there may even be cial movement for “free software” collided with
increasing returns to data if more sophisticated a counter-cultural streak in Silicon Valley that
tasks are disproportionately more valuable. This declared information wants to be free and built
is consistent with the empirically-observed dom- users expectations of digital services being of-
inance of the data economy by a few large firms. fered freely. Searching for a way to monetize
Luckily, the production function for AI may this activity, Google and then Facebook turned
be easier to measure than other production func- to advertising targeted using user data. This ac-
tions because the relevant ML algorithms and customed users to surrendering data in exchange
their performance at different times and for dif- for free services (Carrascal et al., 2013), expec-
ferent data sets are usually well-documented, at tations that have persisted as the value of such
least internally to companies. Combining these data to broader AI services has risen. Few users
4 PAPERS AND PROCEEDINGS MAY 2018

are even aware of the productive value of their zon and Apple) mostly follow different busi-
data or the role they play in enabling ML. ness models and a productivity-oriented com-
Yet historical accidents have not only en- pany like Microsoft might even benefit from
trenched expectations and norms, they also have users perceiving themselves more as producers
created powerful interests in maintaining the online. These other companies also lag Face-
status quo. The largest siren servers, especially book and Google in the data race to train ML
Facebook and Google, but also Microsoft and systems. Returning more of the gains to data la-
others, benefit from the free or extremely cheap borers might help them compete in creating AI
availability to them of data. While the total value systems. Smaller companies or start-ups could
created by data might be much larger in a DaL also make a difference, and many (e.g. Meeco)
world, users aware of the value of their data have been formed around DaL-related ideas. Yet
would likely demand compensation in a range we doubt, given the economies of scale related
of settings, dramatically reducing the share of to data in producing AI systems, that a smaller
value that could be captured by the siren servers player could succeed without a significant part-
as profits. This is just an extreme version of nership with one of the largest technology com-
the standard logic of monopsony: while a usual panies.
monopsonist just depresses wages, the historical Second, data laborers could organize a “data
background we explain above has made it attrac- labor union” that would collectively bargain
tive for siren servers to maintain a DaC equilib- with siren servers. While no individual user has
rium where users are not even aware of the value much bargaining power, a union that filters plat-
their data daily create for siren servers. form access to user data could credibly call a
Recent evidence suggests significant monop- powerful strike. Such a union could be an ac-
sony power in online task labor markets. Dube et cess gateway, making a strike easy to enforce
al. (2018) use randomly varied wages on Ama- and on a social network, where users would be
zon Mechanical Turk to find elasticities of the pressured by friends not to break a strike, this
labor supply curve facing a task-poster that are might be particularly effective. A union could
well below unity. These small task-posters al- also be useful in certifying data quality and guid-
most certainly have more elastic residual labor ing users to develop their earning potential.
supply than does a siren server, suggesting ex- Finally, governments can play an important
treme monopsony power in the latter case: a role in helping facilitate DaL both on the pos-
question we have been investigating in on-going itive and negative side. On the positive side,
work with Microsoft data. In on-going work us- new regulatory frameworks such as the Euro-
ing a large Microsoft program that pays users pean General Data Protection Regulations are
in loyalty points for Bing searches, we esti- increasingly shifting ownership rights in data
mate even smaller elasticities in the number of to the users that generate them. Data collec-
searches performed among active users of the tors increasingly must allow users to understand,
program. This reinforces the idea that monop- withdraw and transfer their data across competi-
sony may be an important force blocking the po- tors. On the other hand, existing labor laws
tential productivity gains from DaL fit poorly with a world where much data la-
bor may be done in the course of consumption
IV. Sources of Countervailing Power experiences rather than as a dedicated activity.
Adapting labor laws to defend workers against
The inefficient exploitation of labor by con- monopsony while allowing the flexibility data
centrated capital was a constant theme of po- work will require a combination of economic
litical economy before the Cold War. Gal- and technical sophistication that we hope labor
braith (1952) summarized various solutions to economists can increasingly provide to support
this problem as forms of “countervailing power” policy-makers.
by large scale social institutions.
In the data economy, the first and most nat- V. A Radical Data Market
ural balancing factor is competition. While
Facebook and Google rely heavily on DaC, Ultimately, we believe all three of these fac-
other leading technology companies (e.g. Ama- tors must coordinate for DaL to succeed, just
VOL. 1 NO. 1 DATA AS LABOR 5

as in historical labor movements. Whatever the Dube, Aindrajit, Jeff Jacobs, Suresh Naidu,
mix, however, building a market for data labor and Siddharth Suri, “Monopsony in On-
offers economists an exciting chance to design a line Labor Markets,” 2018. This paper is
market on a much broader scale than most work under preparation. Contact Suresh Naidu at
on market design in the past (Roth, 2015). For suresh.naidu@gmail.com for a copy.
example, we are currently working to use reg- Frey, Carl Benedikt and Michael A. Osborne,
ularized measures of the marginal value of data “The Future of Employment: How Suscep-
points to design and make transparent efficient tible are Jobs to Computerisation?,” Techno-
payments for data workers. With studies pro- logical Forecasting and Social Change, 2017,
jecting that AI might automate as many as 50% 114, 254–280.
of jobs in the coming decades (Frey and Os- Galbraith, John Kenneth, American Capital-
borne, 2017), data labor has the potential to con- ism, New York: Houghton Mifflin, 1952.
stitute a significant fraction of national income. Gordon, Robert J., The Rise and Fall of Amer-
At the same time, economists, in their roles as ican Growth: The U.S. Standard of Living
advisors to governments and technology compa- since the Civil War, Princ, 2016.
nies, are likely to play a central role in defining Gray, Mary L. and Siddharth Suri, “The Hu-
the texture of these markets. A radical market mans Working Behind the AI Curtain,” Har-
in data labor offers a near-term opportunity for vard Business Review, January 9 2017.
economists, in collaboration with the other so- Karabarbounis, Loukas and Brent Neiman,
cial and computer scientists they regularly work “The Global Decline of the Labor Share,”
with in the technology industry, to bring years of Quarterly Journal of Economics, 2014, 129
research in labor economics and market design (1), 61–103.
to bear on a central social problem of our times. Koh, Pang Weh and Percy Liang, “Under-
standing Black-Box Predictions Via Influ-
REFERENCES ence Functions,” in “Proceedings of Machine
Learning Research,” Vol. 70 2017, pp. 1885–
Aguiar, Mark, Mark Bils, Kerwin Kofi 1894.
Charles, and Erik Hurst, “Leisure Luxuries Lanier, Jaron, Who Owns the Future?, New
and the Labor Supply of Young Men,” 2017. York: Simon & Schuster, 2013.
http://www.nber.org/papers/w23552. Nadella, Satya, Hit Refresh: The Quest to Re-
Autor, David H., “Why Are There Still So discover Microsoft’s Soul and Imagine a Bet-
Many Jobs? The History and Future of Work- ter Future for Everyone, New York: Harper
place Automation,” Journal of Economic Per- Business, 2017.
spectives, 2015, 29 (3), 3–30. Parijs, Philippe Van and Yannick Vander-
Brynjolffson, Erik, Felix Eggers, and borght, Basic Income: A Radical Proposal
Avinash Gannamaneni, “Using Massive for a Free Society and a Sane Economy, Cam-
Online Choice Experiments to Measure bridge, MA: Harvard University Press, 2017.
Changes in Well-being,” 2017. Latest version Perrin, Andrew, “Social Media Usage: 2005-
available from authors. 2015,” Technical Report, Pew Research Cen-
Byrne, David M., John G. Fernald, and Mar- ter 2015.
shall B. Reinsdorf, “Does the United States Piketty, Thomas, Le Capital au XXI e Siècle,
have a Productivity Slowdown or a Measure- Paris: Éditions du Seuil, 2013.
ment Problem?,” Brookings Papers on Eco- Posner, Eric A. and E. Glen Weyl, Radical
nomic Activity, 2016, (Spring), 109–182. Markets: Uprooting Capitalism and Democ-
Carrascal, Juan Pablo, Christopher Riederer, racy for a Just Society, Princeton, NJ: Prince-
Vijay Erramilli, Mauro Cherubini, and ton University Press, 2018.
Rodrigo de Oliveira, “Your Browsing Be- Roth, Alvin E., Who Gets What – and Why: The
havior for a Big Mac: Economics of Per- New Economics of Matching and Market De-
sonal Information Online,” in “Proceedings of sign, New York: Houghton Mifflin Harcourt,
the 22Nd International Conference on World 2015.
Wide Web” WWW ’13 ACM New York, NY,
USA 2013, pp. 189–200.

Você também pode gostar