Você está na página 1de 8

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES


Digital Learning
Part A: Course Design
Course Title

Data Mining

Course No(s)

IS ZC415

Credit Units

Credit Model
Content Authors

Surendra Singh Samanth

Course Objectives
No
CO1

To introduce basic concepts of data mining.

CO2

To familiarize students with practical technologies in data mining.

CO3

To provide students interesting problems in the field of data mining to solve.

Text Book(s)
T1
Tan P. N., Steinbach M & Kumar V. Introduction to Data Mining Pearson Education, 2006
T2
Data Mining: Concepts and Techniques, Third Edition by Jiawei Han, Micheline Kamber and
Jian Pei Morgan Kaufmann Publishers
Reference Book(s) & other resources
R1
Predictive Analytics and Data Mining: Concepts and Practice with RapidMiner by Vijay Kotu
and Bala Deshpande Morgan Kaufmann Publishers 2015

Content Structure
Modules
No.
M1

Title of the Module


Introduction to Data Mining

M2 Data Preprocessing:
To understand the need for data preprocessing and various techniques used in the context of
Data Mining
M3 Data Exploration:
A preliminary exploration of the data to better understand its characteristics
M4 Classification and prediction:
To learn different techniques and algorithms for classification, a major predictive and
supervised Data Mining task
M5 Association Analysis:
To understand the descriptive relation between the entities by identifying associations among
them and to learn various algorithms to find them
M6 Clustering:
To learn different techniques and algorithms for clustering, a major descriptive and
unsupervised Data Mining task
M7 Anomaly Detection:
Detecting outliers and noise in data sets is an important Data Mining task. This module focuses
on techniques needed for anomaly detection
M8 Data Mining on unstructured(Big) data:
Graph Mining, Social Network Analysis, Multimedia Data Mining, Text Mining, Mining the
World Wide Web
M9 Data Mining Applications:
Recommendation Systems
Fraud Detection
Sentiment Analysis

Glossary of Terms:
1. Contact Hour (CH) stands for a hour long live session with students conducted either in a physical
classroom or enabled through technology. In this model of instruction, instructor led sessions will be
for 20 CH.
a. Pre CH = Self Learning done prior to a given contact hour
b. During CH = Content to be discussed during the contact hour by the course instructor
c. Post CH = Self Learning done post the contact hour
2. RL stands for Recorded Lecture or Recorded Lesson. It is presented to the student through an online
portal. A given RL unfolds as a sequences of video segments interleaved with exercises
3. SS stands for Self-Study to be done as a study of relevant sections from textbooks and reference
books. It could also include study of external resources.
4. LE stands for Lab Exercises
5. HW stands for Home Work will consist of discussed/new problems; could be a selection of problems

from the text.


M1: Introduction to Data Mining
Type
Description/Plan/Reference
RL1.1

RL1.1.1 = Definition of Data Mining?


RL1.1.2 = What type of data can be mined?

RL1.2

RL1.2.1 = What kind of patterns can be mined?


RL1.2.2 = What kind of applications are targeted?

RL1.3

DM Process (R1) & DM Challenges (T2)


RL1.3.1 = Process/Technologies used in DM.
RL1.3.2 = Challenges in DM.

CS1.1

CS1.1.1 = Review of Data Mining basics Examples of patterns that can be mined
CS1.1.2 = Examples of technologies used in DM Approaches to overcome challenges.
Discuss one example Case Study for data mining

LE1.1

Exploration of Weka, operations, features, arff files.

SS1.1

T1, Chapter 1; T2, Ch 1

HW1.1

Exercises at the end of T2, Ch 1

QZ1.1
M2: Data Preprocessing
Type
Description/Plan/Reference
RL2.1

RL2.1.1 = Why does data need preprocessing?


RL2.1.2 = Major tasks in data preprocessing

RL2.2

RL2.2.1 = Data Cleaning techniques


RL2.2.2 = Data discretization, transformation, integration, reduction

CS2.1

CS2.1.1 = Review of concepts of data preprocessing


CS2.1.2 = Examples of application of preprocessing techniques.

LE2.1

Experiments with Weka - filters, discretization

SS2.1
HW2.1
QZ2.1

M3: Data Exploration


Type
Description/Plan/Reference
RL3.1

RL3.1.1 = Various types of data to be mined


RL3.1.2 = Statistical descriptions of data

RL3.2

RL3.2.1 = Measuring data similarity & dissimilarity


RL3.2.2 = Data Visualization

CS3.1

CS3.1.1 = Review of concepts of data exploration


CS3.1.2 = Examples of similarities & dissimilarities.

LE3.1
SS3.1
HW3.1
QZ3.1
M4: Classification and Prediction
Type
Description/Plan/Reference
RL4.1

RL4.1.1 = Introduction to classification and prediction


RL4.1.2 = Decision trees for classification
RL4.1.3 = Rule based classification, Bayesian classification, Support vector machines

RL4.2

RL4.2.1 = Issues regarding classification and prediction,


RL4.2.2 = Linear Regression, Nonlinear Regression

CS4.1

CS4.1.1 = Review of concepts of recorded lectures, Algorithm for Decision trees induction,
Classification by back propagation, Comparison of methods of classification
CS4.1.2 = Prediction: Other Regression-Based Methods.

LE4.1

Experiments with Weka - decision trees, rules, prediction

SS4.1
HW4.1
QZ4.1
M5: Association Analysis
Type
Description/Plan/Reference
RL5.1

RL5.1.1 = What is association rule mining?


RL5.1.2 = Frequent Itemsets, Closed Itemsets, and Association Rules

RL5.2

RL5.2.1 = What is Apriori Algorithm?


RL5.2.2 = Finding Frequent Itemsets Using Candidate Generation, Generating Association
Rules from Frequent Itemsets

CS5.1

CS5.1.1 = Review of concepts of recorded lectures , Improving the Efficiency of Apriori


CS5.1.2 = Mining Frequent Itemsets without Candidate Generation.

LE5.1

Experiments with Weka - mining association rules

SS5.1

HW5.1
QZ5.1
M6: Clustering
Type
Description/Plan/Reference
RL6.1

RL6.1.1 = What is cluster analysis? Types of data in Cluster analysis.


RL6.1.2 = Partitioning methods: k-means

RL6.2

RL6.2.1 = Hierarchical algorithms


RL6.2.2 = Introduction to density based approach

CS6.1

CS6.1.1 = Review of concepts of recorded lectures


CS6.1.2 = Density based algorithm: DBSCAN

LE6.1

Experiments with Weka - k-means

SS6.1
HW6.1
QZ6.1
M7: Anomaly Detection
Type
Description/Plan/Reference
RL7.1

RL7.1.1 = Preliminaries
RL7.1.2 = Statistical approach

RL7.2

RL7.2.1 = Proximity based outlier detection


RL7.2.2 = Density based outlier detection

CS7.1

CS7.1.1 = Review of concepts of recorded lectures


CS7.1.2 = Clustering based techniques

LE7.1
SS7.1
HW7.1
QZ7.1
M8: Data mining on unstructured (Big) data
Type
Description/Plan/Reference
RL8.1

RL8.1.1 = Graph Mining methods and applications- Graph Indexing, Similarity Search,
Classification, and Clustering
RL8.1.2 = Multimedia Data Mining- Classification and Prediction Analysis of Multimedia
Data, Mining Associations in Multimedia Data, Audio and Video Data Mining

RL8.2

RL8.2.1 = Text Mining - Text Data Analysis and Information Retrieval


RL8.2.2 = Dimensionality Reduction for Text, Text Mining Approaches

CS8.1

CS8.1.1 = Social Network Analysis


CS8.1.2 = Mining the World Wide Web

LE8.1
SS8.1
HW8.1
QZ8.1
M9: Data Mining Applications
Type
Description/Plan/Reference
RL9.1

RL9.1.1 = Recommendation systems


RL9.1.2 = Case study for Recommendation systems

RL9.2

RL9.2.1 = Fraud Detection


RL9.2.2 = Case study for Fraud Detection

CS9.1

CS9.1.1 = Sentiment Analysis


CS9.1.2 = Case study for Sentiment Analysis

LE9.1
SS9.1
HW9.1
QZ9.1

Part B: Contact Session Plan


First Semester 2016-2017

Academic Term
Course Title

Data Mining

Course No

IS ZC415

Content Developer

Surender Singh Samanth

Contact hour

Pre-contact hour prep

During Contact hour

RL 1.1, RL 1.2

CS 1.1

RL 1.3

CS1.2

RL2.1

CS2.1

RL2.2

CS2.2

RL3.1

CS3.1

RL3.2, RL3.3

CS3.2

RL4.1, RL4.2, RL4.3

CS4.1

RL4.4

CS4.2

RL5.1, RL5.2

CS5.1

10

Review

11

Review

12

RL5.3, RL5.4

CS5.2

13

RL6.1

CS6.1

14

RL6.2, RL6.3, RL6.4

CS6.2

15

RL7.1

CS7.1

16

RL7.2, RL7.3

CS7.2

17

RL8.1, RL8.2, RL8.3

CS8.1, CS8.2

18

RL9.1, RL9.2, RL 9.3

CS9.1

19

Python basics, scikitlearn

Class notes/case study

20

Earlier case study/python


basics

Class notes/case study

21

Review

22

Review

Notes:

Post-contact hour
LE1.1, HW1.1 ,SS1.1
LE2.1, SS2.1, HW2.1
LE3.1, SS3.1, HW3.1
LE4.1, SS4.1, HW4.1

LE5.1, SS5.1, HW5.1


LE6.1, SS6.1, HW6.1
LE7.1, SS7.1, HW7.1
SS8.1, SS9.1, HW8.1

Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session
No
Name
Type
Duration Weight Day, Date, Session, Time
EC-1
Quiz-I/ Assignment-I Online
5%
September 1-10, 2016
Quiz-II
Online
5%
October 1-10, 2016
Lab
Online
10%
To be announced
EC-2
Mid-Semester Test
Closed Book 2 hours
30%
24/09/2016 (AN) 2 PM TO 4 PM
EC-3
Comprehensive Exam Open Book 3 hours
50%
05/11/2016 (AN) 2 PM TO 5 PM
Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 11
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 22)
Important links and information:
Elearn portal: https://elearn.bits-pilani.ac.in
Students are expected to visit the Elearn portal on a regular basis and stay up to date with the latest
announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule provided on the Elearn
portal.
Evaluation Guidelines:
1. EC-1 consists of either two Assignments or three Quizzes. Students will attempt them through the
course pages on the Elearn portal. Announcements will be made on the portal, in a timely manner.
2. For Closed Book tests: No books or reference material of any kind will be permitted.
3. For Open Book exams: Use of books and any printed / written reference material (filed or bound) is
permitted. However, loose sheets of paper will not be allowed. Use of calculators is permitted in all
exams. Laptops/Mobiles of any kind are not allowed. Exchange of any material is not allowed.
4. If a student is unable to appear for the Regular Test/Exam due to genuine exigencies, the student
should follow the procedure to apply for the Make-Up Test/Exam which will be made available on
the Elearn portal. The Make-Up Test/Exam will be conducted only at selected exam centres on the
dates to be announced later.
It shall be the responsibility of the individual student to be regular in maintaining the self study schedule as
given in the course handout, attend the online lectures, and take all the prescribed evaluation components
such as Assignment/Quiz, Mid-Semester Test and Comprehensive Exam according to the evaluation scheme
provided in the handout.