Você está na página 1de 4

Introduction

One year of multivariable calculus and linear algebra


One year of intro CS
One year of intro probability and inference

Core Classes
Data science
Machine learning
Linear modeling
Predictive modeling

CS electives
Theory of computation / Analysis of algorithms
Data structures and algorithms
Software engineering
Visualization
Parallel programming
Network analysis
More machine learning
Parallel programming

Stats electives
More linear models
Time series analysis
Statistical software
Experimental design
Survey analysis



Ben Hamner
361 Votes by Jack Rae (Quora Data Scientist), Yair Livne (Econ PhD from Stanford, took 2 years of stats
P...), Michael Hochster (PhD in Statistics, Stanford; Director of Data S...), Don van der Drift (In PhD
Physics program for 2.5 years at Technis...), Andrei Kucharavy (PhD student in Bioinformatics), and 356
more.
Here are some that I've found useful:
http://blog.echen.me/ - Edwin regularly gives insanely clear and practical examples of data analyses,
complete with code samples and visualizations, as well as the occasional very insightful and down-to-
earth explanation of sophisticated machine learning algorithms
http://hunch.net - John Langford is one of the foremost applied machine learning researchers & the
author of Vowpal Wabbit, a large scale ML tool used widely in the tech industry. His blog is blog runs
at the intersection of theory and practice; see Clever Methods of Overfitting (Page on Hunch) for an
example of the best his blog has to offer.
Simply Statistics

fastml.com
FastML
Statistical Modeling, Causal Inference, and Social Science
Walking Randomly
Normal Deviate
no free hunch (disclaimer - I work at Kaggle) - Regular "how I did it" posts from machine learning
competition winners (warning - these showcase the state of the art methods for given datasets, which
don't necessarily make the best production models), as well as more general data science tips from a
practitioner's perspective




William Chen, Quora Data Science Intern
102 Votes by Daniel Layon, Robert Eckhardt, Edwin Khoo, and 99 more.
Check out Harvard's data science class at CS109.org

The course is taught by Joe Blitzstein and Hanspeter Pfister.

Lecture/lab videos are freely available at Distance Education Harvard University Extension School

All problem sets, problem set solutions, lab iPython notebooks, and course datasets are available
at CS109 GitHub

Also check out:
Data Science: What are some good free resources to learn data science?
Where can I learn pandas or numpy for data analysis?
What are some good resources for learning about statistical analysis?
Data Science: How do I become a data scientist?
What are some good "toy problems" in data science?
What are some good resources for learning about machine learning?

Also follow my blog Storytelling with Statistics for regular posts about data science!

Suggest Edits



Neil Kodner, Data Engineer at Facebook
210 Votes by Jay Wacker (Professor at Stanford University in Particle Ph...),Joseph Misiti, Jonathan E.
Chen, and 207 more.
The Twitter Streaming API will enable you to capture a lot of data in a relatively short period of time. The
statuses/sample method to capture the (limited) public feed. You can also use the statuses/track method
which retrieve tweets that mention a given keyword or list of keywords.

From there you can perform nearly an unlimited number of analyses. For inspiration, here is a small
sample of the many experiments I've run:
An analysis of tens of thousands canabalt scores posted to twitter, including comparisons of method
of death(wall, fell, squashed) vs device type (ipod touch, iphone, ipad)
At what times do people most frequently tweet about Seinfeld
Which percentage of tweets contain URLs
How people spell Goal during the world cup(i.e. goooooooal vs goooooalllllll vs
gooooooaaaaallll). For example, I found 1158 mentions of GOOOOOOOOL and 2981 mentions of
ggol! I also found a lot of instances where people used their entire 140 to celebrate goals, most often
a g followed by 138 Os, and then an L. Sometimes 137 O's and an AL.
A market basket analysis of hashtags used in conjunction with other hashtags
Using classification and NLP to tell if tweets mentioning #homebrew are talking about beer-brewing
or homebrew software
Learning about graph theory and community detection using friends/followers lists.

Other people, far smarter than myself, have written their own programs to detect what's currently
trending.

With twitter data, the possibilities are truly endless. All you need to get started is some curiosity. The rest
will fall into place. Although these seem like 'toy projects', they've enabled me to learn a great deal in the
process. I've also been able to leverage what I've learned into my own

Você também pode gostar