Você está na página 1de 6

Filiação Torne-se um membro

Projetos  | 
Cursos Conecte-se
Tutoriais
Boletim de Notícias

Torne-se um membro.
Ascensão do Cientista de Dados Apoie um site independente.
Faça ótimos gráficos.
4 de junho de O que você ganha
2009

Tópico

Desenho ,
Estatística

Foto de majamarko

Como todos nós já lemos até agora, o economista-chefe do Google, Hal Varian , comentou
em janeiro que o próximo trabalho sexy nos próximos 10 anos seria o de estatísticos.
Obviamente, concordo plenamente . Heck, eu iria um passo além e diria que eles são sexy
agora - mentalmente e fisicamente.

No entanto, se você ler o restante da entrevista de Varian, saberá que, por estatísticos , ele
realmente quis dizer um título geral para alguém que é capaz de extrair informações de
grandes conjuntos de dados e, em seguida, apresentar algo útil para não- especialistas em
dados.

Habilidades sensuais de geeks de dados


Como continuação da citação agora popular de Varian entre os fãs de dados, Michael
Driscoll, do Dataspora, discute as três habilidades sensuais dos geeks de dados . Não vou
repetir o post, mas aqui estão as três habilidades que Michael destaca:

1. Estatísticas – análise tradicional que você está acostumado a pensar


2. Data Munging – análise, extração e formatação de dados
3. Visualização – gráficos, ferramentas, etc.

Ah, mas tem mais…

Essas habilidades realmente se encaixam perfeitamente com a dissertação de Ben Fry sobre
Design de Informação Computacional (2004). No entanto, Fry dá um passo adiante e defende
um campo totalmente novo que combina as habilidades e talentos de áreas de especialização
muitas vezes desconexas:

1. Ciência da Computação - adquirir e analisar dados


2. Mathematics, Statistics, & Data Mining – filter and mine
3. Graphic Design – represent and refine
4. Infovis and Human-Computer Interaction (HCI) – interaction

And after two years of highlighting visualization on FlowingData, it seems collaborations


between the fields are growing more common, but more importantly, computational
information design edges closer to reality. We’re seeing data scientists – people who can do it
all – emerge from the rest of the pack.

Advantages of the Data Scientist

Think about all the visualization stuff you’ve been most impressed with or the groups that
always seem to put out the best work. Martin Wattenberg. Stamen Design. Jonathan Harris.
Golan Levin. Sep Kamvar. Why is their work always of such high quality? Because they’re
not just students of computer science, math, statistics, or graphic design.

They have a combination of skills that not just makes independent work easier and quicker; it
makes collaboration more exciting and opens up possibilities in what can be done.
Oftentimes, visualization projects are disjoint processes and involve a lot of waiting. Maybe
a statistician is waiting for data from a computer scientist; or a graphic designer is waiting for
results from an analyst; or an HCI specialist is waiting for layouts from a graphic designer.

Let’s say you have several data scientists working together though. There’s going to be less
waiting and the communication gaps between the fields are tightened.

How often have we seen a visualization tool that held an excellent concept and looked great
on paper but lacked the touch of HCI, which made it hard to use and in turn no one gave it a
chance? How many important (and interesting) analyses have we missed because certain
ideas could not be communicated clearly? The data scientist can solve your troubles.

An Application
This need for data scientists is quite evident in business applications where educated
decisions need to be made swiftly. A delayed decision could mean lost opportunity and profit.
Terabytes of data are coming in whether it be from websites or from sales across the country,
but in an area where Excel is the tool of choice (or force), there are limitations, hence all the
tools, applications, and consultancies to help out. This of course applies to areas outside of
business as well.

Learn and Prosper

Even if you’re not into visualization, you’re going to need at least a subset of the skills that
Fry highlights if you want to seriously mess with data. Statisticians should know APIs,
databases, and how to scrape data; designers should learn to do things programmatically; and
computer scientists should know how to analyze and find meaning in data.

Basically, the more you learn, the more you can do, and the higher in demand you will be as
the amount of data grows and the more people want to make use of it.

Related

Data Scientist: The hottest job you haven’t heard of


Hiring a data scientist
What data scientists really do

47 Comments
jeff — June 4, 2009 at 3:31 am

i’m curious about how you chose the term “data scientist” to describe this role. that’s
precisely the title we used for folks on the data team at facebook, chosen somewhat
arbitrarily as a contraction of “data analyst” and “research scientist”, with the same
skills in mind as you mention above. i also titled my chapter for “beautiful data”
“information platforms and the rise of the data scientist”. quite amazing if you
formulated the phrase separately! something in the air…

Nathan Yau — June 4, 2009 at 10:29 am

@jeff – ha, yes, there must be something in the air. i read a grant proposal two or
three years ago pushing for a new area of study called “data science.” it’s stuck
with me ever since.

Michael E Driscoll — June 4, 2009 at 3:38 am

Nathan – Nice synthesis and thanks for the shout-out. Ben Fry’s model captures the
various fields that comprise this interdisciplinary ‘data science’ quite elegantly. I’d
even venture to add some bidirectional arrows. Between the four core activities —
Munging, Modeling, Visualizing, and Interacting — there’s a lot of feedback.

And as far as sexiness goes, I’m still holding my breath for People magazine to release
its Sexiest Data Scientist Alive issue. It still may be a decade or more away.

Matt — June 4, 2009 at 9:35 am

Power to the Data Scientists!

Nathan Yau — June 4, 2009 at 10:39 am

@Michael – re:feedback definitely. fry gets into this as its one of his arguments for an
interdisciplinary field – whereas in a collaboration, a person would have to explain to
another what he wanted, have some misunderstandings along the way, and then wait.

re:sexy we’ll have to start a letter campaign :)

Pingback: Data scientists « Jabberwocky Ecology

Craig — June 4, 2009 at 12:37 pm

I have an open question. My graduate training has a foundation in statistics building to


psychometrics. I’ve spent some time in human factors and usability, I was even an art
major for awhile. This has all given me a solid base to understand the story data is
telling as well as put it into visualizations that facilitate understanding among non-
specialists. However, while I appreciate the opportunity to have a sexy I job I am
currently on track to senior management in a major corporation. So here’s the question
how do you balance becoming a specialist and developing the breadth necessary to
perform as a business manager?

Mike Figueroa — June 12, 2009 at 12:40 am

I don’t see that you can’t apply the ‘data scientist’ skill set to the ‘senior
management in a major corporation’ job.
Being able to gather/create, parse/analyze, then present data in a meaningful way
would likely not hinder you.

The only problem of being amazing is it makes others uncomfortable in their


own skin, ie: “Jeez, I’ve been here 14 years and I never knew THAT. This guy is
dangerous.”
Jérôme Cukier — June 4, 2009 at 1:18 pm

I like Fry’s approach but in this time and age you’d have to accept that these skills are
not necessarily mastered by the same person.

The best mathematician could not the best graphic artist who is not necessarily the best
interface designer who is not always a subject-matter expert, etc.

Some people have all those skills and then some, but that’s more the exception than the
rule.

John — June 4, 2009 at 1:56 pm

What about “storyteller?” Not trying to trivialize the issue at all, but the ability to
effectively communicate the relevance and import of the findings would seem to be the
skill that ties it all together. Completing the analysis isn’t the end of the project, getting
the HiPPO’s sign-off is. If all the effort doesn’t go toward meeting an organizational
goal, it’s wasted.

anon — June 4, 2009 at 3:40 pm

@Craig: The best answer is to look at senior managers whose work you admire, and
see how they did it. For example, often specialized knowledge is replaced by the
ability to notice, nurture, and exploit technical talent.

And note well: if you can’t find a senior manager you’d like to imitate, that’s a sign
that being an executive will make you unhappy.

Nathan Yau — June 4, 2009 at 5:24 pm

@Craig, @Jérôme – i think it’s not so much about learning all there is to know
about all the fields. Instead, you’re learning a collection of skills from the fields with
the primary purpose of visualization (or computational information design). So for
example, you might learn graphic design, but you’re not going to learn logo design;
you’re going to learn how to display data.

From a management standpoint, you’ll know what everyone is talking about and the
work involved which makes it easier to delegate and to keep things moving.

Pingback: Analytics Team » Blog Archive » The role of the data scientist

Enrique — June 5, 2009 at 6:04 am

I would add functional expertise to the list of skills. You need to understand the domain
you are analyzing if you really want to understand your data.

Scott Drzyzga — June 5, 2009 at 12:21 pm

I appreciate Dr. Fry’s explict recognzition and acknowledgement that “cartographers


have mastered the ability to successfully organize geographic data in a manner that
communicates effectively.” Fry goes on to suggest cartography could serve as a “useful
model” and be extended in the “direction of Computational Information Design.” My
only comment to Fry (and to Nathan’s post) is that it already has – a field called
‘Geographic Information Science and Technology’ (GIS&T). The GIS&T body of
knowledge was outlined by Mike Goodchild and others in the mid 1990s and fully
scoped in 2006 by the UCGIS. Perhaps new ‘data scientists’ will consider the
groundwork already laid by so many geographers and cartographers as they expand
into new territories and exploit emerging technologies.

NCGIA Core Curriculum in Geographic Information Science


http://www.ncgia.ucsb.edu/giscc/units/u002/u002.html

Geographic Information Science & Technology (GIS&T) Body of Knowledge.


http://www.aag.org/bok/

University Consortium for Geographic Information Science (UCGIS)


http://www.ucgis.org

And a shameless plug for GIS&T at Shippensburg University (PA)


http://webspace.ship.edu/geog/GIS/

Subhankar Ray — June 5, 2009 at 12:38 pm

Can we predict a theory [like special theory of relativity] using data?


Can data help us to prove a theorem like Fermat’s Last Theorem?

John L. Taylor — June 10, 2009 at 12:48 pm

Subhankar Ray,

I am not sure what you are getting at, but these are quite difficult questions.

1)Can we predict a theory [like special theory of relativity] using data?


Understanding you to not be asking the trivial question of whether or not data
and analysis methods are used to make discoveries, but rather asking about the
algorithmic discovery of new laws, or regularities– yes, but we are not very good
at unaided computer discovery yet. Since the inception of AI, computer
scientists, philosophers, mathematicians, and other researchers have endeavored
to find methods for automated discovery of regularities from empirical data.
Herbert Simon (BACON), and Paul Thagard (PI) are two historical examples of
automated discovery. Today, much is being done in the area of machine learning
and discovery, but that is a book (or ten) by itself.

2) Can data help us to prove a theorem like Fermat’s Last Theorem?

While such theorems can be assisted with computers (see the four color theorem
and Coq proof assistant), theorems are derived through mathematical induction,
construction, negation, etc., not though empirical data. Quasi-empirical data,
however is used, meaning the results of enumeration (such as the ever-growing
list of prime numbers) or random evaluation of a complex mathematical object
(such as Monte Carlo methods on probability density functions), allow us
approximations or enumerations for further mathematical consideration

Hope this addresses your questions.


Andreas Weigend — June 5, 2009 at 5:53 pm

I think what’s missing in a discussion of incentives: How to get people to contribute


data, both C2C or C2W.
See:
http://facebook.com/SocialDataRevolution,
http://blogs.harvardbusiness.org/now-new-next/2009/05/the-social-data-revolution.htm
(recent Harvard Business article)
http://stanford2009.wikispaces.com (course wiki of Spring 2009 Data Mining course)
Andreas Weigend

Pingback: Google needs more sexy statisticians « The Bernoulli Trial

Pingback: Bing Community

Sandréa — June 9, 2009 at 2:19 pm

It’s interesting that the graphic you represented in this post has similarity to the job of
librarians…
We aquire books (or information), filter these books, or the information, to the right
user or client; or mine the stacks or systems in pursuit of these information; the files
are represented via catalog (cards or systems); and the reference service refines and
interact with our users.
Do you think that to be a Data Scientist is to have a librarian expertise too? And vice-
versa?

Nathan Yau — June 10, 2009 at 11:52 am

having worked with information scientists, there are definitely several parallels
in what we do.

Dave — June 10, 2009 at 11:45 am

Well, given that it was a sexy maths lady who first pointed me to this article in the first
place… ^^

Pingback: Science Etcetera, Jupiterday 20090611 | ideonexus.com

Pingback: Data Scientists Apply Within - The Environment - Firefly Ecometrics

Krish Swamy — June 14, 2009 at 6:11 pm

Great summary. The data-munging/ scraping part was something I particularly


identified with. Working in the financial services industry (ahead of others when it
comes to using data to drive strategy), I am still surprised with the amount of data that
just goes untapped and unused. The systems (and the vision) required to capture this
data and put in in a form which is ‘analyzable’ just does not exist.

Pingback: Cerebral Mastication » Blog Archive » Keeping Technical Talent or Why I


Just Quit My Job

Pingback: Four short links: 16 June 2009 | Tech-monkey.info Blogs

Silverfern — June 16, 2009 at 8:24 pm

I would add one other domain/skill to Fry’s descriptions — Library/Information


Science — to preserve the data (incl. access and retrieval, etc.) for the indefinite long-
term.

Pingback: Visual Communications » Information Graphics cont’d

Pingback: Four short links: 16 June 2009 | Design Website

Pingback: The Rise of the Data Scientist [From Flowing Data] | Computational Legal
Studies

Jessica — June 30, 2009 at 6:56 pm


This post reminds me of a post you made last year, about why Data Visualization isn’t
popular. One point you made was that people know, but don’t know what it’s called (so
they know.. but they just don’t know they know). But my question is where does ‘data
visualization’ stop?

The post here makes data visualization seem so complex (and I’m sure it is..) that I’m
almost afraid to call simple graphs/maps ‘data visualization’.

Like.. is this visualization?


http://www.townme.com/san-francisco-ca/yuppie-locator/yuppies
It’s a map, it uses census data, but it’s certainly not as complex as the typical
visualizations on a blog like infosthetics..

Or, is this blog (that one of your comment-ers suggested on your 37 visualizations
post) also visualization?
http://thisisindexed.com/

I’m into visualization because reading lots of little words hurts my eyes, because I
believe there’s a more efficient way to convey information, and it really is awesome
looking sometimes. But, does everything count?
Jessica — June 30, 2009 at 7:14 pm

More on my previous comment. Here is a definition of visualization by


http://eagereyes.org. (http://eagereyes.org/theory/Definition-of-Visualization.html)

To summarize… Robert Kosara gives a lot of definitions that I think would rule out the
links in my previous comment as “data visualizations”.

And then…the reason data visualization isn’t popular might also be because it’s so
exclusive/scientific/precise that people don’t want to touch/enjoy/appreciate it! What
do you think?

Jessica — June 30, 2009 at 7:17 pm

Here is a definition of visualization from (a blog that was also suggested by a reader in
your 37 visualizations post) http://eagereyes.org
http://eagereyes.org/theory/Definition-of-Visualization.html

To summarize… he gave a lot of very precise definitions and requirements for


‘information that results in a picture’ to be called visualization.

The definitions (to me at least… casual graph-onlooker) seem pretty intense. And
going back to your post about why data visualization wasn’t popular… it might be
because data visualization is viewed as a science.. very precise/definite/intense that
people are afraid to enjoy/appreciate it at all.

Pingback: The Rise of The Data Scientist « Visualness

Pingback: Rise of the Data Scientist « GIS and Science

Jen — July 7, 2009 at 3:56 pm

This is my favorite unusually good data visualization site:


http://www.babynamewizard.com/voyager
Somebody showed me this one in grad school, and I immediately proceeded to spend
several hours riveted to my screen, looking up the the popularity and geographic trends
associated with the names of everyone I knew.

Pingback: In The Tech News « Caintech.co.uk

Pingback: Ivan Frantar (ivanico) 's status on Wednesday, 08-Jul-09 08:14:31 UTC -
Identi.ca

Pingback: Daily Links #75 | CloudKnow

Pingback: Flow » Blog Archive » Daily Digest for July 9th - The zeitgeist daily

Pingback: Cientista de Dados > Geek de Dados > Designer « Visualizing Economics

Pingback: Coast to Coast Bio Podcast » Arquivo do blog » Episódio 23: Então, por
que você estava falando sobre iPhones?

Pingback: O blog do LinkedIn » Cientistas de dados do arquivo do blog: Wrangling


Data for Professionals «

Pingback: Cientistas de dados: encontrando padrões nos dados do LinkedIn |


Recrutamento CITI

Pingback: 9 dicas para novatos em busca paga | Serviço de SEO sábio

Projetos de FlowingData Ver todos →


Mapeando quando e onde as pessoas Quanto você deve economizar para Como o adulto médio que trabalha Tipos de família mais comuns n
começam seu trajeto a aposentadoria passa os dias América

Para os passageiros, quanto mais longe Há muitas variáveis ​a serem Isso é o que você obtém quando soma A nuclear ainda é a mais comum,
você mora do local de trabalho, mais consideradas, mas para pessoas de todos os dias que o adulto americano há milhões de lares nos Estados
cedo precisa sair de casa para chegar renda média, aqui vai uma sugestão, médio passa dormindo, comendo, se Unidos com uma estrutura famili
ao trabalho no horário. Quanto esse baseada em quando você começa a deslocando e fazendo outras diferente.
horário de início muda quanto mais poupar e quando quer se aposentar. atividades.
longe você chega?

Sobre
Contato
Twitter
Boletim de Notícias
RSS

Copyright © 2007-Presente FlowingData. Todos os direitos reservados.

Você também pode gostar