Escolar Documentos
Profissional Documentos
Cultura Documentos
to
Data Warehousing
Course Outcomes:
Understand the need for data warehouse and data
mining
Design a data warehouse to support a business
problem
Apply different algorithms to large databases to
solve problems ,also for strategic business decisions
15
15
20
50
40
100
60
Total
100
Which customers
are most likely to go
to the competition ?
What impact will
new products/services
have on revenue
and margins?
Data Warehouse
A single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in a what
they can understand and use in
a business context.
[Barry Devlin]
8
Data
- WH Inmon
Data
Warehouse
Leads
Prospects
Customers
Products
Quotes
Orders
Regions
Time
m,f
balance dec
fixed (13,2)
Appl A - bal-on-hand
Appl B - current-balance
Appl C - cash-on-hand
Current balance
date (julian)
change
Data
Warehouse
Operational
delete
insert
load
replace
change
read only
access
Operational
Data
Warehouse
Snapshot data
time horizon : 5-10 years
data warehouse stores historical
data
Alternate Definitions
- Imhoff
Alternate Definitions
Unfriendly
Slow
Dependent on IS programmers
Inflexible
Analysis limited to defined reports
Focus on Reporting
Trend Analysis
What If ?
Cross Dimensional Comparisons
Statistical profiles
Automated pattern and rule discovery
Focus on Online Analysis
Business Queries
Typical Business Queries
OLTP Vs Warehouse
Operational System
Data Warehouse
Transaction Processing
Query Processing
Time Sensitive
History Oriented
Operator View
Managerial View
Normalized Efficient
Design for TP
Query Processing
OLTP Vs Warehouse
Operational System
Data Warehouse
Organized by transactions
(Order, Input, Inventory)
Organized by subject
(Customer, Product)
Volatile Data
OLTP Vs Warehouse
Operational System
Data Warehouse
Performance Sensitive
Not Flexible
Flexible
Efficiency
Effectiveness
Retailers
Target Marketing
Market Segmentation
Budgeting
Profitability Management
Event tracking
Customers
Data Marts
Data Marts
A Logical Subset of The Complete Data
Warehouse
Subject or Application Oriented Business View of
Warehouse
Finance, Manufacturing, Sales etc.
Smaller amount of data used for Analytic Processing
Address a single business process
Data Marts
Data Marts
Scope
Application Neutral
Centralized, Shared
Specific Application
Requirement
Business Process
Oriented
Data
Perspective
Subjects
Many
Operational/ External
Data
Few
Operational, external
data
OLTP snapshots
Implementation Time
Frame
Characteristics
4-12 months
Restrictive, non
extensible
Short life/tactical
Project Orientation
Expensive
Relatively cheap
Change management is
difficult
Technical challenges in
building large databases
Cleansing, transformation,
modeling techniques may be
incompatible
Current
Recent
Historical
detailed data
OLTP
ODS
Data Warehouse
Operating
Analysts
Managers and
Personnel
analysts
Individual records, Individual records, Set of records,
Data access
transaction driven transaction or
analysis driven
analysis driven
Current, real-time Current and near- Historical
Data content
current
Detailed and lightly Summarized and
Data granularity Detailed
summarized
derived
Subject-oriented
Subject-oriented
Data organization Functional
Audience
Data quality
All application
specific detailed
data needed to
support a business
activity
OLTP
ODS
Data Warehouse
Data redundancy
Somewhat
redundant with
operational
databases
Managed
redundancy
Data stability
Non-redundant
within system;
Unmanaged
redundancy among
systems
Dynamic
Data update
Field by field
Field by field
Controlled batch
Data usage
Highly structured,
repetitive
Somewhat
structured, some
analytical
Database size
Moderate
Moderate
Highly
unstructured,
heuristic or
analytical
Large to very large
Somewhat stable
Dynamic
Stable
Database
structure stability
OLTP
ODS
Data Warehouse
Development
methodology
Requirements
driven, structured
Data driven,
evolutionary
Operational
priorities
Performance and
availability
Data driven,
somewhat
evolutionary
Availability
Philosophy
Predictability
Stable
Response time
Sub-second
Support day-to-day
decisions &
operational
activities
Mostly stable, some Unpredictable
unpredictability
Seconds to minutes Seconds to minutes
Return set
Small amount of
data
Small to medium
amount of data
Access flexibility
and end user
autonomy
Support managing
the enterprise
Small to large
amount of data
Data
Marts
EIS /DSS
Select
Metadata
Query Tools
Extract
Transform
Integrate
Maintain
Data
Warehouse
OLAP/ROLAP
Web Browsers
Operational
Systems/Data
Data
Preparation
Middleware/
API
Data Mining
Metadata
Select
Select
Extract
Extract
Transform
Integrate
ODS
Transform
Data
Warehouse
Load
Maintain
Operational
Systems/Data
Data
Preparation
Data
Preparation
BOTTOM UP APPROACH
Data
Marts
EIS /DSS
Metadata
Query Tools
Select
Extract
Transform
Integrate
Data
Warehouse
OLAP/ROLAP
Maintain
Web Browsers
Operational
Systems/Data
Data
Preparation
Middleware/
API
Data Mining
A Practical Approach
The Steps in the Practical Approach are :
1. The first step is to do Planning and defining
requirements at the overall corporate level.
2. An architecture is created for a complete
warehouse.
Benefits of DWH
These capabilities empower the corporate...
To formulate effective business, marketing
and sales strategies.
To precisely target promotional activity.
To discover and penetrate new markets.
To successfully compete in the marketplace
from a position of informed strength.
To build predictive models.
Warehouse Architecture - 1
EIS /DSS
Metadata
Query Tools
Select
Extract
Transform
Integrate
Data
Warehouse
OLAP/ROLAP
Maintain
Web Browsers
Operational
Systems/Data
Data
Preparation
Middleware/
API
Data Mining
Warehouse Architecture - 2
Metadata
EIS /DSS
Data Mart
Select
Metadata
Query Tools
Extract
Transform
Data Mart
Integrate
Maintain
OLAP/ROLAP
Metadata
Web Browsers
Operational
Systems/Data
Data Mart
Data
Preparation
Middleware/
API
Data Mining
Warehouse Architecture - 3
Data
Marts
EIS /DSS
Metadata
Query Tools
Select
Extract
Transform
Data
Warehouse
Integrate
OLAP/ROLAP
Maintain
Web Browsers
Operational
Systems/Data
Data
Preparation
Operational
Data Store
Middleware/
API
Data Mining
Kimballs View
Operational Systems
Presentation Server
Staging Area
Each Star is
a Data Mart
and has both
summary and
detail data
LAN
Data Warehouse
Server
Processes
Extract
Scrubbing
Transformation
Load Jobs
Aggregation Jobs
Replication
Monitoring
Management
Meta Data Repository
Meta Data Population
Meta Data Maintenance
DW is sum
total of all
Data Marts
DW Bus using
Conformed Dimensions
Inmons View
Operational Systems
Staging Area
Data Warehouse
Data Marts
LAN
Data Warehouse Server
Processes
Extract
Scrubbing
Transformation
Load Jobs
Aggregation Jobs
Replication
Monitoring
Management
Meta Data Repository
Meta Data Population
Meta Data Maintenance
Detail Data
in ER format
Summarized Data
in Star formats
Source
Databases
Data
Modeling
Tool
ETL Tool
Central
Metadata
ROLAP
Engine
Central
Warehouse
(RDBMS)
RDBMS
-Desktop OLAP
-ROLAP
-MOLAP
Local meta
data
Warehouse
Admin Tool
- Data Mining
MDDB
Architected
Datamarts
Warehouse Databases
Metadata Component
technology.
Metadata Component
In order to transform a data warehouse into a Webenabled data warehouse, first bring the data
warehouse to the Web, and secondly bring the
Web to the data warehouse.
Key Issues
Risk Assessment
Top-down or Bottom-up
Build or Buy
support tools
Preliminary survey
Interviews
Group Sessions
Subject Areas
Data Sources
Available data sources
Data Transformation
Data Storage
Information Delivery
Drill-down analysis
Roll-up analysis
Drill-through analysis
Ad hoc reports
User expectations.
User participation and sign-off
General implementation plan.
Data Visualization
Parallel Processing
Query Tools
Browser Tools
Data Fusion
Trends..
Multidimensional Analysis