Você está na página 1de 11

INTERNATIONAL SCHOOL OF ENGINEERING

http://www.insof
e.edu.in
















































Big Data Analytics
and Optimization


C e r t i f i c a t e P r o g r a m i n E n g i n e e r i n g E x c e l l e n c e



INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in






LIST OF COURSES



Essential Business Skills for a Data Scientist ..................................................................................................... 3
Planning and Thinking Skills for Architecting Data Science Solutions .............................................................. 4
Essential Engineering Skills in Big Data Analytics ............................................................................................. 5
Statistical Modeling for Predictive Analytics in Engineering and Business ...................................................... 6
Engineering Big Data with R and Hadoop Ecosystem ....................................................................................... 7
Text Mining, Social Network Analysis and Natural Language Processing ........................................................ 8
Methods and Algorithms in Machine Learning ................................................................................................ 9
Optimization and Decision Analysis ............................................................................................................... 10
Communication, Ethical and IP challenges for Analytics Professionals ......................................................... 11








































INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7110c
Essential Business Skills for a Data Scientist

This module is being independently offered to several CXOs and senior management across
the globe and highly appreciated as one of the most hands-on managerial introduction to
data science. You learn to become a consumer of analytics for which McKinsey predicted
there is unprecedented demand.

Why should we build models or use data to run a business: The edge of evidence
over intuition
What kind of models do data scientists build and where they do not work
When you want a prediction
o How do you estimate how much to pay and how long to wait
o How do you precisely define for the teams what to deliver
o How do you evaluate how good their prediction are
When does big unstructured data become really important
When you want to build an analytics group
o What software or hardware should you invest in
o Several engagement models and the ideal teams for each
Business plan: Each team develops a business plan for setting up an analytics
organization, and creates a complete business plan and presents.
Case analysis: Participants would be divided into separate teams and would be
given several high level business problems. They have to identify the prediction
problems with high ROI and provide concise requirements




INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in


CSE 7111c
Planning and Thinking Skills for Architecting Data Science
Solutions

This module trains the data scientists with skills to design and architect practical and
workable solutions. They also understand the skills needed to coordinate between
business and technical teams.

Thinking tools
o Approximations and estimations
o Geometric visualization of data and models
o Probabilistic analysis of data and models
o Analyzing networks and graphs
o Analyzing transitions, Markov chains and unstructured data
o Estimating complexity of algorithms
Choosing the right models and architecting a solution
o Structure and anatomy of models
o Problematic data and choosing the best experimentation
Sources of errors in predictive models and techniques to minimize them
Interacting with technical and business teams
o Translating typical business problems into technical specifications
o Brainstorming and analyzing data and designing transformations
o Manual analysis of the models
Case study: Participants will be given business problems. They need to:
o Translate it into a specific technical solution
o Brain storm for data and design transformations
o Architect complete solution plan





INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in


CSE 7212c
Essential Engineering Skills in Big Data Analytics

This module trains engineers in hands-on Big Data and analytics tools like R, Hadoop, Hive and Pig. The
students work on several real world data sets.
Reading from Excel, CSV and other forms
Data exploration (histograms, bar chart, box plot, line graph, scatter plot)
Data story telling - The science, ggplot, bubble charts with multiple dimensions, gauge charts,
treemap, heat map and motion charts
Data preprocessing of structured data - Handling missing values, Binning, Standardization,
Outlier/Noise, PCA, Type Conversion, etc.
Visualization












INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7302c
Statistical Modeling for Predictive Analytics in Engineering and
Business

This module is aimed at teaching how to think like a statistician. Statistical thinking will one day be as
necessary for efficient citizenship as the ability to read and write, wrote H. G. Wells in the year 1895. That
day and age has arrived with Data Analytics going mainstream (For Todays Graduate, Just One Word:
Statistics - http://www.nytimes.com/2009/08/06/technology/06stats.html).
This course thoroughly trains candidates on the following and uses Excel and R to explain concepts:
Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range,
Variance, Standard Deviation); Expectations of a Variable; Moment Generating Functions
Describing an attribute: Probability distributions (Discrete and Continuous) - Bernoulli, Geometric,
Binomial and Poisson distributions
Describing the relationship between attributes: Covariance; Correlation; ChiSquare
Describing a single variable continued: Exponential distribution; Special emphasis on Normal
distribution; Central Limit Theorem
Inferential statistics: How to learn about the population from a sample and vice versa; Sampling
distributions; Confidence Intervals, Hypothesis Testing
ANOVA
Regression (Linear, Multivariate Regression) in forecasting
Analyzing and interpreting regression results
Logistic Regression












INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7304c
Engineering Big Data with R and Hadoop Ecosystem

Companies collect and store large amounts of data during daily transactions. This data is both structured
and unstructured. The volume of the data being collected has grown from MB to TB in the past few years
and is continuing to grow at an exponential pace. The very large size, lack of structure and the pace at
which it is growing characterize the Big Data.
To analyze long-term trends and patterns in the data and provide actionable intelligence to managers, this
data needs to be consolidated and processed in specialized processes; those techniques form the core of
the module.
From a tools perspective, this course introduces you to Hadoop. You will learn one of the most powerful
combinations of Big Data, viz., R and Hadoop.
Introduction to Big Data
o World uses more Big Apps than you realize -A taxonomy and demonstrations of apps
Data center as a computer
o From Cells and Grids to Master-Slave Clouds - Evolution of clusters
o Design Considerations: Cost, failure
o What's so special about Hadoop?
Storing big bytes
o GFS, HDFS, Next Generation HDFS
Rapidly ingesting & organizing unstructured data
o Chukwa, Flume, Avro
o NoSQL: Big Table, HBase, Document stores, Graph stores, Key-Value
stores
Your key tool: Split and Merge
o Sequetial and Concurrent algorithms design, metrics
o Two S&M Paradigms - Map Reduce versus BSP
o Yarn, MR2, ZooKeeper
Querying big data
o SQL, Sqoop, Hive, Hive variants like Impala, Spark and Storm
Processing big data
o R-Hadoop, Hadoop Streaming with Python/C++
o PIG programming, Oozie





















INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in



CSE 7206c
Text Mining, Social Network Analysis and Natural Language
Processing

This module in tightly integrated with CSE 7304c module and the topics in the two modules are
interweaved.
Text mining: Unstructured data comprises more than 80% of the stored business information (primarily as
text). This helped text mining emerge as a leading-edge technology. This module describes practical
techniques for text mining, including pre-processing (tokenization, part-of speech tagging), document
clustering and classification, information retrieval, search and sentiment extraction in a business context.
Predictive modeling with social network data: Social network mining is extremely useful in targeted
marketing, on-line advertising and fraud detection. The course teaches how incorporating social media
analysis can help improve the performance of predictive models.
Natural Language Processing:
By the end of the course, you will be able to answer questions like how to classify or tag a document into
a category, how to rank some people in a network as more likely customers than others, etc.
Taming big text
o Text ingestion using crawlers; Preprocessing - making text into data vectors
Handling big graphs
o Why graphs? How to represent, measure and query them? NoSQL graph stores
o Implementing Graph processing in Map Reduce, Hama & Giraph
The purpose of it all: Finding patterns in data
Finding patterns in text
o Mahout, text mining, text as a graph



INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in


CSE 7305c
Methods and Algorithms in Machine Learning

This module discusses the principles and ideas underlying the current practice of data mining and
introduces a powerful set of useful data analytics tools (such as K-Nearest Neighbors, Neural Networks,
etc.). Real-world business problems are used for practice.
Rule based knowledge: Logic of rules, evaluating rules, Rule induction and association rules
Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each
non-leaf node; Entropy; Information Gain
Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with numerical variables;
Other measures of randomness
Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as rules
Specialized decision trees (oblique trees),
Ensemble and Hybrid models
AdaBoost, Random Forests and Gradient boosting machines
K-Nearest Neighbor method
Wilson editing and triangulations
K-nearest neighbors in collaborative filtering, digit recognition
Motivation for Neural Networks and its applications
Perceptron and Single Layer Neural Network, and hand calculations
Learning in a Neural Net: Back propagation and conjugant gradient techniques
Application of Neural Net in Face and Digit Recognition
Linear learning machines and Kernel methods in learning
VC (Vapnik-Chervonenkis) dimension; Shattering power of models
Algorithm of Support Vector Machines (SVM)
Connectivity models (hierarchical clustering)
Centroid models (k-means algorithm)
Distribution models (expectation maximization)
Trend analysis and Time Series
Cyclical and Seasonal analysis; Box-Jenkins method
Smoothing; Moving averages; Auto-correlation; ARIMA Holt-Winters method
Bayesian analysis and Nave Bayes classifier







INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSE 7213c
Optimization and Decision Analysis

This module is designed to teach linear and non-linear Optimization models namely Genetic Algorithms,
Linear programming and Goal programming. The application areas originate from problems in finance and
operations.
Genetic Algorithms: The algorithm and the process
Representing data for a Genetic Algorithm
Why and how do Genetic Algorithms work?
Linear programming: Graphical analysis
Sensitivity and Duality analyses
Integer, binary programming; Applications, problem formulation and solving through R
Goal programming
Data envelopment analysis
Quadratic programming







INTERNATIONAL SCHOOL OF ENGINEERING http://www.insofe.edu.in

CSV 1103
Communication, Ethical and IP challenges for Analytics
Professionals

This module emphasizes the importance of communication for Analytics professionals, especially since
they are expected to deal with technical and non-technical users more closely than in any other discipline.
Students also learn to appreciate the importance of ethical, legal and IP issues given that regulations are
still sketchy in this field where adoption is increasing at rapid rates. Students learn to appreciate how to
avoid ethical and legal pitfalls and what issues to be aware of when dealing with data.

Why is Communication important?
How to communicate effectively: Telling stories
Communications issues from daily life with examples using audio, video, blogs, charts, email, etc.
Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives
Challenges: Mix of stakeholders, Explicability of results, Visualization
Guiding Principles: Clarity, Transparency, Integrity, Humility
Framework for Effective Presentations; Examples of bad and good presentations
Writing effective technical reports
Difference between Legal and Ethical issues
Challenges in current laws, regulations and fair information practices: Data protection, Intellectual
property rights, Confidentiality, Contractual liability, Competition law, Licensing of Open Source
software and Open Data
How to handle legal, ethical and IP issues at an organization and an individual level
The Ethics Check questions

Você também pode gostar