The document outlines the core and elective classes for a data science program, including:
- Introductory classes in calculus, linear algebra, computer science, and probability/inference
- Core data science, machine learning, and linear modeling classes
- Computer science electives like algorithms, data structures, software engineering, and visualization
- Statistics electives like linear models, time series analysis, and experimental design
The document outlines the core and elective classes for a data science program, including:
- Introductory classes in calculus, linear algebra, computer science, and probability/inference
- Core data science, machine learning, and linear modeling classes
- Computer science electives like algorithms, data structures, software engineering, and visualization
- Statistics electives like linear models, time series analysis, and experimental design
The document outlines the core and elective classes for a data science program, including:
- Introductory classes in calculus, linear algebra, computer science, and probability/inference
- Core data science, machine learning, and linear modeling classes
- Computer science electives like algorithms, data structures, software engineering, and visualization
- Statistics electives like linear models, time series analysis, and experimental design
One year of multivariable calculus and linear algebra
One year of intro CS One year of intro probability and inference
Core Classes Data science Machine learning Linear modeling Predictive modeling
CS electives Theory of computation / Analysis of algorithms Data structures and algorithms Software engineering Visualization Parallel programming Network analysis More machine learning Parallel programming
Stats electives More linear models Time series analysis Statistical software Experimental design Survey analysis
Ben Hamner 361 Votes by Jack Rae (Quora Data Scientist), Yair Livne (Econ PhD from Stanford, took 2 years of stats P...), Michael Hochster (PhD in Statistics, Stanford; Director of Data S...), Don van der Drift (In PhD Physics program for 2.5 years at Technis...), Andrei Kucharavy (PhD student in Bioinformatics), and 356 more. Here are some that I've found useful: http://blog.echen.me/ - Edwin regularly gives insanely clear and practical examples of data analyses, complete with code samples and visualizations, as well as the occasional very insightful and down-to- earth explanation of sophisticated machine learning algorithms http://hunch.net - John Langford is one of the foremost applied machine learning researchers & the author of Vowpal Wabbit, a large scale ML tool used widely in the tech industry. His blog is blog runs at the intersection of theory and practice; see Clever Methods of Overfitting (Page on Hunch) for an example of the best his blog has to offer. Simply Statistics
fastml.com FastML Statistical Modeling, Causal Inference, and Social Science Walking Randomly Normal Deviate no free hunch (disclaimer - I work at Kaggle) - Regular "how I did it" posts from machine learning competition winners (warning - these showcase the state of the art methods for given datasets, which don't necessarily make the best production models), as well as more general data science tips from a practitioner's perspective
William Chen, Quora Data Science Intern 102 Votes by Daniel Layon, Robert Eckhardt, Edwin Khoo, and 99 more. Check out Harvard's data science class at CS109.org
The course is taught by Joe Blitzstein and Hanspeter Pfister.
Lecture/lab videos are freely available at Distance Education Harvard University Extension School
All problem sets, problem set solutions, lab iPython notebooks, and course datasets are available at CS109 GitHub
Also check out: Data Science: What are some good free resources to learn data science? Where can I learn pandas or numpy for data analysis? What are some good resources for learning about statistical analysis? Data Science: How do I become a data scientist? What are some good "toy problems" in data science? What are some good resources for learning about machine learning?
Also follow my blog Storytelling with Statistics for regular posts about data science!
Suggest Edits
Neil Kodner, Data Engineer at Facebook 210 Votes by Jay Wacker (Professor at Stanford University in Particle Ph...),Joseph Misiti, Jonathan E. Chen, and 207 more. The Twitter Streaming API will enable you to capture a lot of data in a relatively short period of time. The statuses/sample method to capture the (limited) public feed. You can also use the statuses/track method which retrieve tweets that mention a given keyword or list of keywords.
From there you can perform nearly an unlimited number of analyses. For inspiration, here is a small sample of the many experiments I've run: An analysis of tens of thousands canabalt scores posted to twitter, including comparisons of method of death(wall, fell, squashed) vs device type (ipod touch, iphone, ipad) At what times do people most frequently tweet about Seinfeld Which percentage of tweets contain URLs How people spell Goal during the world cup(i.e. goooooooal vs goooooalllllll vs gooooooaaaaallll). For example, I found 1158 mentions of GOOOOOOOOL and 2981 mentions of ggol! I also found a lot of instances where people used their entire 140 to celebrate goals, most often a g followed by 138 Os, and then an L. Sometimes 137 O's and an AL. A market basket analysis of hashtags used in conjunction with other hashtags Using classification and NLP to tell if tweets mentioning #homebrew are talking about beer-brewing or homebrew software Learning about graph theory and community detection using friends/followers lists.
Other people, far smarter than myself, have written their own programs to detect what's currently trending.
With twitter data, the possibilities are truly endless. All you need to get started is some curiosity. The rest will fall into place. Although these seem like 'toy projects', they've enabled me to learn a great deal in the process. I've also been able to leverage what I've learned into my own
Career Advice - Where Can Someone Interested in Theoretical Computer Science Continue His - Her Research After PHD Apart From Joining Schools As Teacher - Quora
Python Data Science: A Step-By-Step Guide to Data Analysis. What a Beginner Needs to Know About Machine Learning and Artificial Intelligence. Exercises Included
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ultimate Python Libraries for Data Analysis and Visualization: Leverage Pandas, NumPy, Matplotlib, Seaborn, Julius AI and No-Code Tools for Data Acquisition, Visualization, and Statistical Analysis