Você está na página 1de 9

Data Science Course Content Yashoda Technologies

Data Science Introduction

 Introduction to Data Science


 Data science and its importance?
 Case studies, Realtime Examples
 How Data Science can help us in each domain
 Data Science Project Cycle Overview
 How Data science is different from Business Intelligence & Reporting?
 Data science Career opportunities and Roles
 What are the basic skills required to learn Data science?

Introduction to Big Data Analytics

 Big data and analytics?


 Why Big Data?
 What is Big Data?
 Big Data Scalability
 Big Data Architectures
 Handling Big Data
 Introduction to Hadoop
 Hadoop Distributed File System (HDFS)
 Map and Reduce blocks
 What is Hive?
 What is Pig?

1|Page
Data Science Course Content Yashoda Technologies
Python For Data Science

 Introduction
 Why Python for Data Science?
 Working Environment Setup
 Python Distributions for Data Science
 Python 2.x vs. Python 3.x
 Installing Ananconda Distribution
 Jupyter Notebook
o Basics
o Magic Functions
 Extracting Data
o From Databases
o Through APIs
o Using Web Scraping
 Reproducible Script for Getting Data
 Public Datasets
 Exploring and Processing Data
 Introduction to NumPy and Pandas
 Investigating Basic Structure
 Selection, Indexing, and Filtering
 Basic Statistics with Python
 Univariate Distribution and distribution Plots
 Grouping and Aggregation
 Crosstab
 Pivot Table
 Data Munging
 Missing Values
 Missing Imputation Techniques
 Treating Missing Values Using Pandas
 Detecting and Treating Outliers Using Pandas and NumPy
 Feature Creation Using Pandas and NumPy
 Categorical Feature Encoding: Binary Encoding Using Pandas
 Reproducible Script for Data Processing Using Pandas and NumPy

2|Page
Data Science Course Content Yashoda Technologies
Introduction to R-Programming

 A Premier to R-Programming
 History of R
 R Overview
 Why R?
 Why Learn R Programming?
 Install R on Windows (or Install R on Linux or Install R on Mac)
 Hello World in R
 Editors and IDEs for R
 Install R-Studio on Windows (or on Linux or on Mac)
 R-Studio Desktop Overview
 Built-In Help
a. Using Help Commands
b. Using Demo Commands
c. Using Vignettes
 Web Search
 Community Support
a. Mailing List
b. Forums
c. Blogs
 R-Variable
a. Naming Convention
b. Naming Guide
c. Assign Variable
 Environments and Variables
 Operators
a. Arithmetic Operators
b. Special Numbers: Inf, NaN, NA
c. Logical Operators
 Vectorized Operations
a. Types of Vectorized Operations
 Data Structures in R
a. Atomic Vectors & Common Operations on Atomic Vectors
b. Factors & Operations
c. Lists & Common Operation on Lists
d. Data Frames & Common Operation on Data Frames
e. Matrices & Common Operation on Matrices
f. Arrays & examples
 Functions
a. Overview
b. Components
c. Naming Guidelines
d. Argument Matching
e. Arguments with Default Values
f. Additional Arguments Using Ellipsis

3|Page
Data Science Course Content Yashoda Technologies
g. Lazy Evaluation
h. Multiple Return Values
i. Functions as Objects
j. Anonymous Function
 R - Flow Control
a. If
b. If-Else
c. Multiple If-Else
d. Switch
e. Vectorized If
f. Repeat
g. Repeat With Break
h. Repeat With Next
i. While
j. For
k. Apply
l. Functions in Apply Family
 R - Packages
a. About R Package
b. Load R Package
c. Install R Package
d. Manage R Package
 R - Import Data
a. Working Directory
b. Import CSV Files
c. Import Table
d. Import from URL
e. Import XML Files
f. Import Excel Files
g. Import Other File Types
h. Import Built-In Datasets
i. Import from Database
j. Import Database Using RODBC Package
 Miscellaneous
a. Creating new variables or Updating Existing Variables
b. String Manipulations
c. Sub setting data from matrices and data frames
d. Casting and melting data to long and wide format
e. Merging data frames

4|Page
Data Science Course Content Yashoda Technologies
Statistics Basics
 Basics of Statistics
 Definitions of Basic Statistical Terms
o Three Ms ( Mean Median and Mode)
o Variance
o Standard Deviation
 Significant Difference
o Significance & P-value
 Correlation
o Positive
o Negative
o No Correlation
 Spurious correlation
 Correlation vs causation
 Sampling
 Business Statistics
 Data types
 Variables
o Continuous Variables
o Ordinal Variables
o Categorical Variables
 Time Series
 Miscellaneous
 Descriptive Statistics
 Sampling
o Need of Sampling
o Types of Sampling
 Simple random sampling
 Systematic sampling
 Stratified sampling
 Data distributions
 Normal Distribution and its characteristics
 Binomial Distribution
 Inferential Statistics
 Hypothesis Testing
 Type I error
 Type II error
 Null and alternate hypothesis
 Reject or acceptance criterion

5|Page
Data Science Course Content Yashoda Technologies
Exploratory data analysis and visualization

 Working with data


 Getting data into R
o Reading from files, Connecting to DB
 Data Munging
 Cleaning and preparing the data
o Converting data types (Character to Numeric etc.)
 Handling Missing values
o Imputation or Replacing with place holder values
 Cleaning Data with tidyr
o What is tidy Data?
o Wide to Long Conversion
o Long to Wide Conversion
o Splitting Cells
o Joins in dplyr
 Data Filtering and Querying with dplyr and data.table
o Queries at Row and Column Level
o Combined Queries
o Converting to Data.table
o Filtering Big Data
 Data Visualizations with R
 Visualization in R using ggplot (plots and charts)
o Histograms
o Barcharts
o Boxplot
o Scatterplots
 Adding more dimensions to the plots
o Geom(), Dodge etc
 Visualization using Tableau

Introduction to Machine learning

 Machine Learning Basics


 Spam Classification
 Performance Metrics
o Accuracy & F1 Score
o Precision and Recall
 Types of Machine Learning
o Supervised
o Unsupervised
 Machine Learning Workflow
o Data Preparation
o Algorithm Selection
o Training Process
o Testing Model's Accuracy
o Improving Model Performance

6|Page
Data Science Course Content Yashoda Technologies
Statistical Modelling

 Linear Regression
 Modeling relationships b/w Variables using Regression
 Understanding Simple Regression Models
 Solving the Regression Problem
 Residuals and the Regression Assumptions
 R-squared as Variance
 Prediction Using Simple Regression
 Sum of Least Squares
 Multiple Regression in R
 Disadvantages of Linear Models
 Logistic Regression
 Importance of Logistic Regression
 Modeling relationships b/w Variables using Regression
 Applications of Logistic Regression
i. Analysis
ii. Allocation
iii. Prediction
iv. Classification
 Understanding the S-curve
 Maximum likelihood estimation (MLE)
 Confusion Matrix
 ROC Curve
 Logistic Regression and Linear Regression – Similarities and Differences
 Advantages and disadvantages of logistic regression models
 Underfitting vs. Overfitting
 Cross validation
 K-Fold Cross validation
 Decision Trees
 Classification and Regression Trees (CART)
 Process of Tree Building
 Entropy and Gini Index
 Problem of Overfitting
 Pruning a tree back
 Trees for Prediction (Linear) – example
 Trees for Classification models – example
 Advantages of tree based models?
 KNN – K Nearest Neighbors
 Advantages and disadvantages of KNN
 Re-Sampling and ensembles Methods
 Bagging
 Random Forests
 Boosting – Gradient Boosting machines
 Advanced methods
 Support Vector Machines (SVM)
7|Page
Data Science Course Content Yashoda Technologies
 Probabilistic methods
 Naïve Bayes
 Un-Supervised learning
 Cluster Analysis
o Hierarchical clustering
o K-Means Clustering
o Distance measures
o Cluster analysis of Applications
 Principal Component Analysis (PCA)
 Advantages of Principle Components
 Applications of PCA
 Time series analysis – Forecasting
 Simple Moving Averages
 Exceptional Smoothing
 Time series decomposition
 ARIMA
 Association Rules (Market Basket Analysis)
 Apriori
 Recommender Systems
 Collaborative filtering
o User based filtering
o Item based filtering
 Text Analytics
 Introduction to natural language Processing (NLP)
 Finding Frequently occurring words in a document corpus
 WordCloud
 Term Document Matrix
 Sentiment Analysis
 Text classification models (Spam Detection)

8|Page
Data Science Course Content Yashoda Technologies
Introduction to Deep learning

 Overview about Deep learning


 Relationships among
o Artificial Intelligence (AI)
o Machine Learning (ML)
o Deep Learning (DL)
 Artificial Neurons
 Neural networks
 Deep Neural networks
 Deep learning Techniques
 Convolutional Neural Networks
 Recurrent Neural Networks
 Fully Connected Neural Networks
 Generative Adversarial Networks
 Deep Reinforcement Learning
 Deep Learning Applications
 Tables
 Text files
 Audio files
 Video files
 Impact of Deep Learning

9|Page

Você também pode gostar