Você está na página 1de 18

Big Data

Luis Borbon
19/February/2016
Agenda
1. Data Facts
2. What is Big Data?
3. Getting Value / Applications of Big Data
4. Past vs Future / DM vs ML
5. Big Data Landscapes
6. Data Analysis Programming Languages
7. Jupyter Demo
Conclusion
Questions
1. Data Facts
Over 90% of all the data in the world was created
in the past 2 years.

Total amount of data being captured doubles


every 1.2 years.
570 new websites spring into existence every
minute of the day.
1. Data Facts
IoT amount devices connected to Internet from 13
billion to 50 billion by 2020.

Retail could increase margin profit by 60%.

Big data industry expected to grow from US$10.2


bn to US$54.3 bn by 2017.
2. What is Big Data
‘Big Data’ everything we do is leaving a digital
trace, which can be used and analysed.

Big Data therefore refers to our ability to make


use of the ever-increasing volumes of data.
3.1 Getting Value
3.2 Applications of Big Data
● Better understand and target
customers.
● Understand and optimise business
processes.
● Improving health.
● Improving security and law
enforcement.
● Improving sports performance.
4.1 Past vs. Future
4.2 Data Mining vs. Machine Learning
Data Mining Machine Learning
● Computer science subfield ● Computer science subfield
● Big data sets ● Within artificial intelligence
● Usually human interaction ● Learn without being programmed
● Pattern recognition ● Predictions on data
● Methods from AI, ML, Stat, DB ● Problem types
● Techniques ○ Supervised learning
○ Cluster analysis ○ Unsupervised learning
○ Classification ○ Reinforcement
○ Regression trees ● Applications
○ Neural networks ○ Computer vision, OCR
○ Natural language processing
○ Information retrieval, search engines
5. Big Data Landscapes / Ecosystems
6. Programming Languages
7. Jupyter Demo
Questions
References
Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance by Bernard Marr

Jupyter notebooks
https://try.jupyter.org
https://github.com/donnemartin/data-science-ipython-notebooks#spark

Mining the Social Web


https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition

Learn Data Science


http://learnds.com
https://github.com/donnemartin/data-science-ipython-notebooks

Big Data Analytics


https://plot.ly/python/big-data-analytics-with-pandas-and-sqlite

Spark
http://lintool.github.io/SparkTutorial/slides/day1_intro.pdf
http://stanford.edu/~rezab/sparkclass/slides/itas_workshop.pdf

Você também pode gostar