Escolar Documentos
Profissional Documentos
Cultura Documentos
1|Page
Data Science Course Content Yashoda Technologies
Python For Data Science
Introduction
Why Python for Data Science?
Working Environment Setup
Python Distributions for Data Science
Python 2.x vs. Python 3.x
Installing Ananconda Distribution
Jupyter Notebook
o Basics
o Magic Functions
Extracting Data
o From Databases
o Through APIs
o Using Web Scraping
Reproducible Script for Getting Data
Public Datasets
Exploring and Processing Data
Introduction to NumPy and Pandas
Investigating Basic Structure
Selection, Indexing, and Filtering
Basic Statistics with Python
Univariate Distribution and distribution Plots
Grouping and Aggregation
Crosstab
Pivot Table
Data Munging
Missing Values
Missing Imputation Techniques
Treating Missing Values Using Pandas
Detecting and Treating Outliers Using Pandas and NumPy
Feature Creation Using Pandas and NumPy
Categorical Feature Encoding: Binary Encoding Using Pandas
Reproducible Script for Data Processing Using Pandas and NumPy
2|Page
Data Science Course Content Yashoda Technologies
Introduction to R-Programming
A Premier to R-Programming
History of R
R Overview
Why R?
Why Learn R Programming?
Install R on Windows (or Install R on Linux or Install R on Mac)
Hello World in R
Editors and IDEs for R
Install R-Studio on Windows (or on Linux or on Mac)
R-Studio Desktop Overview
Built-In Help
a. Using Help Commands
b. Using Demo Commands
c. Using Vignettes
Web Search
Community Support
a. Mailing List
b. Forums
c. Blogs
R-Variable
a. Naming Convention
b. Naming Guide
c. Assign Variable
Environments and Variables
Operators
a. Arithmetic Operators
b. Special Numbers: Inf, NaN, NA
c. Logical Operators
Vectorized Operations
a. Types of Vectorized Operations
Data Structures in R
a. Atomic Vectors & Common Operations on Atomic Vectors
b. Factors & Operations
c. Lists & Common Operation on Lists
d. Data Frames & Common Operation on Data Frames
e. Matrices & Common Operation on Matrices
f. Arrays & examples
Functions
a. Overview
b. Components
c. Naming Guidelines
d. Argument Matching
e. Arguments with Default Values
f. Additional Arguments Using Ellipsis
3|Page
Data Science Course Content Yashoda Technologies
g. Lazy Evaluation
h. Multiple Return Values
i. Functions as Objects
j. Anonymous Function
R - Flow Control
a. If
b. If-Else
c. Multiple If-Else
d. Switch
e. Vectorized If
f. Repeat
g. Repeat With Break
h. Repeat With Next
i. While
j. For
k. Apply
l. Functions in Apply Family
R - Packages
a. About R Package
b. Load R Package
c. Install R Package
d. Manage R Package
R - Import Data
a. Working Directory
b. Import CSV Files
c. Import Table
d. Import from URL
e. Import XML Files
f. Import Excel Files
g. Import Other File Types
h. Import Built-In Datasets
i. Import from Database
j. Import Database Using RODBC Package
Miscellaneous
a. Creating new variables or Updating Existing Variables
b. String Manipulations
c. Sub setting data from matrices and data frames
d. Casting and melting data to long and wide format
e. Merging data frames
4|Page
Data Science Course Content Yashoda Technologies
Statistics Basics
Basics of Statistics
Definitions of Basic Statistical Terms
o Three Ms ( Mean Median and Mode)
o Variance
o Standard Deviation
Significant Difference
o Significance & P-value
Correlation
o Positive
o Negative
o No Correlation
Spurious correlation
Correlation vs causation
Sampling
Business Statistics
Data types
Variables
o Continuous Variables
o Ordinal Variables
o Categorical Variables
Time Series
Miscellaneous
Descriptive Statistics
Sampling
o Need of Sampling
o Types of Sampling
Simple random sampling
Systematic sampling
Stratified sampling
Data distributions
Normal Distribution and its characteristics
Binomial Distribution
Inferential Statistics
Hypothesis Testing
Type I error
Type II error
Null and alternate hypothesis
Reject or acceptance criterion
5|Page
Data Science Course Content Yashoda Technologies
Exploratory data analysis and visualization
6|Page
Data Science Course Content Yashoda Technologies
Statistical Modelling
Linear Regression
Modeling relationships b/w Variables using Regression
Understanding Simple Regression Models
Solving the Regression Problem
Residuals and the Regression Assumptions
R-squared as Variance
Prediction Using Simple Regression
Sum of Least Squares
Multiple Regression in R
Disadvantages of Linear Models
Logistic Regression
Importance of Logistic Regression
Modeling relationships b/w Variables using Regression
Applications of Logistic Regression
i. Analysis
ii. Allocation
iii. Prediction
iv. Classification
Understanding the S-curve
Maximum likelihood estimation (MLE)
Confusion Matrix
ROC Curve
Logistic Regression and Linear Regression – Similarities and Differences
Advantages and disadvantages of logistic regression models
Underfitting vs. Overfitting
Cross validation
K-Fold Cross validation
Decision Trees
Classification and Regression Trees (CART)
Process of Tree Building
Entropy and Gini Index
Problem of Overfitting
Pruning a tree back
Trees for Prediction (Linear) – example
Trees for Classification models – example
Advantages of tree based models?
KNN – K Nearest Neighbors
Advantages and disadvantages of KNN
Re-Sampling and ensembles Methods
Bagging
Random Forests
Boosting – Gradient Boosting machines
Advanced methods
Support Vector Machines (SVM)
7|Page
Data Science Course Content Yashoda Technologies
Probabilistic methods
Naïve Bayes
Un-Supervised learning
Cluster Analysis
o Hierarchical clustering
o K-Means Clustering
o Distance measures
o Cluster analysis of Applications
Principal Component Analysis (PCA)
Advantages of Principle Components
Applications of PCA
Time series analysis – Forecasting
Simple Moving Averages
Exceptional Smoothing
Time series decomposition
ARIMA
Association Rules (Market Basket Analysis)
Apriori
Recommender Systems
Collaborative filtering
o User based filtering
o Item based filtering
Text Analytics
Introduction to natural language Processing (NLP)
Finding Frequently occurring words in a document corpus
WordCloud
Term Document Matrix
Sentiment Analysis
Text classification models (Spam Detection)
8|Page
Data Science Course Content Yashoda Technologies
Introduction to Deep learning
9|Page