Escolar Documentos
Profissional Documentos
Cultura Documentos
Mining
Course objectives
Grading scheme
Class Standing
10% Assignment
25% Cases
40% Quizzes
25% Long Quiz
Policies
Attendance will be checked.
No make-up quizzes
Make-up long exam only for excused absence.
Set schedule within a week after the exam date
References
Han, J. & Kamber, M. (2006) Data Mining Concepts
and Techniques 2nd Edition. Morgan Kaufmann
Publisher Elsevier Inc., California.
P. Tan, M. Steinbach & V. Kumar, Introduction to Data
Mining, Addison Wesley, 2006.
Software Links
Data Mining Software Links by Dr. Pang-Ning Tan :
www.cse.msu.edu/~cse980/software.html
RapidMiner : http://rapidi.com/content/view/26/84/lang,en/
Weka : http://www.cs.waikato.ac.nz/ml/weka/
Objectives
Define Data Mining and knowledge discovery in
databases.
Discuss some business applications of data mining
Identify the elements of the data mining process
Discuss the steps in CRISP-DM
Data Mining
Exploratory data analysis
Finds its roots along with the development in classical
statistics, artificial intelligence and machine learning
Looks for actionable information, or information that
can be utilized in a concrete way to improve
profitability
Knowledge Discovery
Preconceived notion may not be present
Relationships can be identified by looking in to the data
Telecommunications
Churn customer turnover or switching carriers
Medicine
Cancer Cell Detection
Machine Vision
Pattern Recognition
CRISP-DM Process
Cross-Industry Standard Process for Data Mining
Phases
Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Business Understanding
Knowing what the study is for
Identify business task
Data Understanding
Select the related data from many available
databases to correctly describe a given
business task
Data Preparation
Also known as data preprocessing
Clean selected data for better quality
Filter, aggregate and fill in missing values (imputation)
Filter: remove outliers and redundancies
Aggregate: data is reduces to obtain aggregated
information
Filling-in or Smoothing: missing values are found and
replaces with reasonable values
Data Preparation
Data transformation
Uses mathematical formulations to convert
different measurements into a unified numerical
scale
Numerical to numerical scales
Shrink or enlarge the data
Modeling
Data mining software is used to generate results for
various situations
Data is divided into:
Training set used for the development of the model
Test set used to test the model thats built
Modeling
Data Modeling Techniques
Association the relationship of a particular item in a
data transaction on other items in the same transaction
is used to predict patterns
Classification learning different functions that map
each item of the selected data into one of a predefined
set of classes
Modeling
Clustering takes ungrouped data and uses automatic
techniques to put this data into groups
Prediction Analysis discover the relationship between
the dependent and independent variables
Sequential Pattern Analysis seeks to fine similar
patterns in data transaction over a business period
Evaluation
Data interpretation stage
Two things to consider:
How to recognize business value from knowledge
patterns discovered
How to visualize the results to properly interpret
patterns
Deployment
The results are reported to project sponsors
The result is applied to business task or data mining
objective
Data Cleaning
Data Integration
Data Selection
Data Transformation
Data Mining
Pattern Evaluation
Knowledge Presentation
Pattern Evaluation
Knowledge
Base
Data Mining Engine
Database
Data
Warehouse
WWW
Other
Repositories
Relational Databases
Data Warehouses
Transactional Databases
Object-Relational Databases
Temporal, Sequence or Time-Series Database
Spatial Databases and Spatiotemporal Databases
1.
2.
3.
4.
5.