Escolar Documentos
Profissional Documentos
Cultura Documentos
Chapter 1
Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign &
Simon Fraser University
2013 Han, Kamber & Pei. All rights reserved.
1
Chapter 1. Introduction
Summary
2
1 byte = 8 bits
1 kilobyte (K/KB) = 2 ^ 10 bytes = 1,024 bytes
1 megabyte (M/MB) = 2 ^ 20 bytes = 1,048,576 bytes
1 gigabyte (G/GB) = 2 ^ 30 bytes = 1,073,741,824
bytes
1 terabyte (T/TB) = 2 ^ 40 bytes = 1,099,511,627,776
bytes
1 petabyte (P/PB) = 2 ^ 50 bytes =
1,125,899,906,842,624 bytes
1 exabyte (E/EB) = 2 ^ 60 bytes =
1,152,921,504,606,846,976 bytes
1 zettabyte (Z/ZB) =1 000 000 000 000 000 000 000
bytes
1 yottabyte (Y/YB) =1 000 000 000 000 000 000 000
000 bytes
4
Chapter 1. Introduction
Summary
Alternative names
Task-relevant Data
Data Warehouse
Selection
Data Cleaning
Data Integration
Databases
8
Decisio
n
Making
Data Presentation
Visualization Techniques
End User
Business
Analyst
Data Mining
Information Discovery
Data
Analyst
Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
DBA
10
Data warehousing
a subject-oriented,
integrated,
time-variant, and
nonvolatile collection
of data in support of
managements decision making process
11
Chapter 1. Introduction
Summary
13
Data to be mined
Database data (relational), data warehouse, transactional
data, time-series, sequence etc
Knowledge to be mined (or: Data mining functions)
Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc.
Descriptive vs. predictive data mining
Multiple/integrated functions and mining at multiple levels
Techniques utilized
Data warehouse (OLAP), machine learning, statistics,
pattern recognition, visualization, high-performance, etc.
Applications adapted
Retail, telecommunication, banking, fraud analysis, bio-data
mining, stock market analysis, text mining, Web mining, etc.
14
Chapter 1. Introduction
Summary
15
16
Data warehouse
18
Time-series data,
20
21
Chapter 1. Introduction
Summary
22
Two categories
Descriptive
Predictive
23
25
Clustering
27
Outlier analysis
28
Chapter 1. Introduction
Summary
29
Applications
Algorithm
Pattern
Recognition
Data Mining
Database
Technology
Statistics
Visualization
High-Performance
Computing
30
Chapter 1. Introduction
Summary
32
Summary
36
Mining Methodology
User Interaction
Interactive mining
37
KDD Conferences
Pacific-Asia Conf. on
Knowledge Discovery and Data
Mining (PAKDD)
DB conferences: ACM
SIGMOD, VLDB, ICDE, EDBT,
ICDT,
PR conferences: CVPR,
Journals
KDD Explorations
Statistics
Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), CVPR, NIPS,
etc.
Journals: Machine Learning, Artificial Intelligence, Knowledge and Information
Systems, IEEE-PAMI, etc.
Web and IR
Visualization
40