Você está na página 1de 17

Advanced Database

Applications:
Database Indexing and Data
Mining

CS562 -- Fall 2005

George Kollios
Boston University
Prof. George Kollios
Office: MCS 288
Office Hours: Monday 2:30pm-4:00pm
Thursday 11:00am-12:30pm
Mailing List: cs562
Web: http://www.cs.bu.edu/faculty/gkollios/ada05
Book1:
http://www.cs.bu.edu/faculty/gkollios/ada05/Book/
History of Database
Technology
1960s: Data collection, database creation, IMS and
network DBMS
1970s: Relational data model, relational DBMS
implementation
1980s: RDBMS, advanced data models (extended-
relational, OO, deductive, etc.) and application-
oriented DBMS (spatial, scientific, engineering, etc.)
1990s2000s: Data mining and data warehousing,
multimedia databases, and Web databases
Modern Database Systems
Extend these layers

Structure of a RDBMS
A DBMS is an OS for Query Optimization
and Execution
data!
Relational Operators

Files and Access Methods


A typical RDBMS
has a layered Buffer Management
architecture. Disk Space Management

DB
Index Methods for RDBMS
Hashing Methods:
Linear Hashing, Extensible Hashing

B-tree family:
B+-trees and variations

Both of them are one-dimensional


Overview of the course
Spatial Database Systems
GIS, CAD/CAM, EOSDIS project NASA
Manages points, lines and regions
Temporal Database Systems
Billing, medical records
Spatio-temporal Databases
Moving objects, changing regions, etc
Overview of the course
Multimedia databases
A multimedia system can store and
retrieve objects/documents with text,
voice, images, video clips, etc
Time series databases
Stock market, ECG, trajectories, etc
Multimedia databases
Applications:
Digital libraries, entertainment, office
automation
Medical imaging: digitized X-rays and
MRI images (2 and 3-dimensional)
Query by content: (or QBE)
Efficient
Complete (no false dismissals)
What is Data
Mining?
Data mining (knowledge discovery in
databases):
The efficient discovery of : previously unknown,
valid, potentially useful and understandable
information or patterns from data in large databases
Alternative names:
Knowledge discovery(mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, etc.
DM Applications

Database analysis and decision support


Market analysis: target marketing, market
basket analysis, market segmentation
Fraud detection and management
Biology and medicine
Text mining (news group, email,
documents) and Web analysis.
Data Mining: Confluence of
Multiple Disciplines
Database
Statistics
Technology

Machine
Learning
Data Mining Visualization

Information Other
Science Disciplines
Overview of terms
Data: a set of facts (items) D, stored in
a database
Pattern: an expression E in a language
L, that describes a subset of facts
Attribute: a field in an item i in D.
Interestingness: a function I D,L that
maps an expression to a measure
space M
The Data Mining Task
For a given dataset D, language of
facts L, interestingness function ID,L
and threshold c, find the
expressions E that:
ID,L(E) > c efficiently.
How Data Mining is used
Identify the problem
Use data mining techniques to
transform the data into information
Act on the information
Measure the results
DM Functionalities
Concept description:
Generalize, summarize, and contrast data
characteristics, e.g., dry vs. wet regions
Association (correlation and causality):
Multi-dimensional vs. single-dimensional association
age(X, 20..29) ^ income(X, 20..29K) buys(X,
PC) [support = 2%, confidence = 60%]
contains(T, computer) contains(x, software)
[1%, 75%]
DM Functionalities
Cluster analysis
Class label is unknown: Group data to
form new classes, e.g., cluster houses
to find distribution patterns
Clustering based on the principle:
maximizing the intra-class similarity
and minimizing the interclass
similarity
DM Functionalities
Classification and Prediction
Finding models (functions) that describe and
distinguish classes or concepts for future
prediction
E.g., classify countries based on climate, or
classify cars based on gas mileage
Presentation: decision-tree, classification rule,
neural network
Prediction: Predict some unknown or missing
numerical values

Você também pode gostar