Escolar Documentos
Profissional Documentos
Cultura Documentos
transaction records
Data cleaning and data integration techniques
are applied.
Ensure consistency in naming conventions,
all
0-D(apex) cuboid
time,location,supplier
time,item,location 3-D cuboids
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier
Conceptual Modeling
of Data Warehouses
Modeling data warehouses: dimensions &
measures
Star schema: A fact table in the middle
connected to a set of dimension tables
Snowflake schema: A refinement of star
schema where some dimensional hierarchy is
normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold province_or_street
country
avg_sales
Measures
Example of Snowflake
Schema
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key city
branch_type
dollars_sold
city_key
avg_sales city
province_or_street
Measures country
Example of Fact
Constellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
from_location
branch_key
branch location_key location to_location
branch_key location_key dollars_cost
branch_name units_sold
street
branch_type dollars_sold city units_shipped
province_or_street
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
shipper_type
A Data Mining Query
Language, DMQL: Language
Primitives
Cube Definition (Fact Table)
define cube <cube_name> [<dimension_list>]:
<measure_list>
Dimension Definition ( Dimension Table )
define dimension <dimension_name> as
(<attribute_or_subdimension_list>)
Special Case (Shared Dimension Tables)
First time as “cube definition”
<dimension_name_first_time> in cube
<cube_name_first_time>
Defining a Star Schema in
DMQL
all all
Specification of
hierarchies
Schema hierarchy
day < {month <
quarter; week} <
year
Set_grouping
hierarchy
{1..10} <
Multidimensional Data
Sales volume as a function of product,
month, and region
Dimensions: Product, Location, Time
Hierarchical summarization paths
o n
gi
Office Day
Month
A Sample Data Cube
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
ct
TV
du
PC U.S.A
o
Pr
VCR
Country
sum
Canada
Mexico
sum
Cuboids Corresponding to the
Cube
all
0-D(apex) cuboid
product date country
1-D cuboids
3-D(base) cuboid
product, date, country
Browsing a Data Cube
Visualization
OLAP capabilities
Interactive
manipulation
Typical OLAP Operations
ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT
REGION
DIVISION
Location Each circle is
called a Promotion Organization
footprint
Chapter 2: Data Warehousing
and OLAP Technology for Data
Mining
(mature)
Bottom-up: Starts with experiments and prototypes
(rapid)
From software engineering point of view
Waterfall: structured and systematic analysis at each
invoices, etc.
Choose the grain (atomic level of data) of the business
process
Multi-Tiered Architecture
Monitor
& OLAP Server
other Metadata
source Integrator
s Analysis
Operational Extract Query
Transform Data Serve Reports
DBs Load
Refresh
Warehouse Data mining
Data Marts
Enterprise
Data Data
Data
Mart Mart
Warehouse
matrix techniques)
fast indexing to pre-computed summarized data
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
Multi-way Array Aggregation
for Cube Computation
C c3 61
c2 45
62 63 64
46 47 48
c1 29 30 31 32
c0
B13 14 15 16 60
b3 44
B 28 56
b2 9
40
24 52
b1 5
36
20
b0 1 2 3 4
a0 a1 a2 a3
A
Multi-Way Array Aggregation for
Cube Computation (Cont.)
dimensions
Efficient Processing OLAP
Queries
Layer2
MDDB
MDDB
Meta
Data
Filtering&Integration Database API Filtering
Layer1
Data cleaning Data
Databases Data
Data integration Warehouse Repository
Summary
Data warehouse
A subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making
process
A multi-dimensional model of a data warehouse
Star schema, snowflake schema, fact constellations
A data cube consists of dimensions & measures
OLAP operations: drilling, rolling, slicing, dicing and pivoting
OLAP servers: ROLAP, MOLAP, HOLAP
Efficient computation of data cubes
Partial vs. full vs. no materialization
Multiway array aggregation
Bitmap index and join index implementations
Further development of data cube technology
Discovery-drive and multi-feature cubes
From OLAP to OLAM (on-line analytical mining)