Você está na página 1de 39

The Data Warehouse Environment

Session 2
Learning Outcomes

Identify the basic concepts, components and architecture of


data warehouse.
Analyse the strategy of design and implement data
warehouse and why data warehouse is the suitable solution
that appropriate to the need.

2
Acknowledgments

These slides have been adapted from Inmon, W.H. (2005).


Building The Data Warehouse. Third edition. John Wiley &
Sons. New York
Outline
Structure of Data Warehouse
Subject Orientation
Day 1- Day n Phenomenon
Granularity
Exploration and Data Mining
Living Sample Database
Partitioning as a Design Approach
Structuring Data in the Data Warehouse
Data Warehouse: The Standards Manual
Auditing and the Data Warehouse
Cost Justification
Data Homogeneity/ Heterogeneity
Purging Warehouse Data
Reporting and the Architected Environment
The Operational Window of Opportunity
Incorrect data in the Data warehouse

4
Data Warehouse
A data warehouse is a subject-oriented, integrated,
nonvolatile, and time-variant collection of data in support
of managements decisions.
The data warehouse contains granular corporate data.
Data in the data warehouse is able to be used for many
different purposes, including sitting and waiting for future
requirements which are unknown today.

5
Subject Orientation
The Classical operations
systems are organized
around the functional
applications of the
company.

6
7
Bina Nusantara 8
University
Integrated
Data is fed from
multiple, disparate
sources into the data
warehouse.
As the data is fed, it is
converted, reformatted,
re-sequenced,
summarized, and so
forth.

9
Non-Volatile
Data warehouse data is loaded
(usually, but not always, en masse)
and accessed, but it is not
updated (in the general sense).
Instead, when data in the data
warehouse is loaded, it is loaded
in a snapshot, static format. When
subsequent changes occur, a new
snapshot record is written.
In doing so, a historical record of
data is kept in the data
warehouse.
10
Time Variant

Time variancy implies that every unit of data in the data


warehouse is accurate as of some moment in time.

11
Data Warehouse Architecture

A staging area,
or landing
zone, is an
intermediate
storage area
used for data
processing
during the
12
extract, transf
13
Structure of Data Warehouse
Structure of Data Warehouse

There is an older level of


detail (usually on alternate,
bulk storage), a current level
of detail, a level of lightly
summarized data (the data
mart level), and a level of
highly summarized data.
Data flows into the data
warehouse from the
operational environment.

15
16
Day 1 to Day n Phenomenon

The day 1 to day n phenomenon


is the ideal way to get to the
data warehouse.
Data warehouses are not built
all at once. Instead, they are
designed and populated a step
at a time, and as such are
evolutionary, not revolutionary.

17
Granularity
Granularity
Granularity refers to the level
of detail or summarization of
the units of data in the data
warehouse.
The more detail there is, the
lower the level of granularity.
The less detail there is, the
higher the level of
granularity.
A summary of all transactions
for the month would be at a
high level of granularity.

19
Bina Nusantara
20
The Benefits of Granularity
Looking at the data in different
ways
Reconcile data
Flexibility
Contains a history of activities
and events across the
corporation.
Example of
Granularity

21
Exploration and Data Mining

The granular data found in the data warehouse


supports more than data marts. It also supports
the processes of exploration and data mining.
Exploration and data mining take masses of
detailed, historical data and examine it for
previously unknown patterns of business activity.
The data warehouse contains a very useful source
of data for the explorer and data miner.
The data found in the data warehouse is cleansed,
integrated, and organized.

22
Living Sample Database
The living sample database refers to a subset of either true archival
data or lightly summarized data taken from a data warehouse.
The term living stems from the fact that it is a subseta
sampleof a larger database, and the term sample stems
from the fact that periodically the database needs to be
refreshed.

23
Partitioning as a Design Approach
Partitioning as a Design Approach

Partitioning of data refers to


the breakup of data into
separate physical units that
can be handled
independently.
The purpose of partitioning
current detail data is to break
data up into small,
manageable physical units.

25
26
Structuring Data in the Data
Warehouse
Structuring Data in the Data
Warehouse
There are many more ways to
structure data within the data
warehouse. The most common are
these:
Simple cumulative
Rolling summary
Simple direct
Continuous

28
Data Homogeneity and
Heterogeneity
Data Homogeneity and Heterogeneity

Homogeneous
All of the types of records
are the same.
The data in the data
warehouse then is
subdivided by the
following criteria:
Subject area
Table
Occurrences of data
within table

30
Purging Warehouse Data
There are several ways in which data is purged or the
detail of data is transformed, including the following:
Data is added to a rolling summary file where
detail is lost.
Data is transferred to a bulk storage medium
from a high-performance medium such as DASD.
Data is actually purged from the system.
Data is transferred from one level of the
architecture to another, such as from the
operational level to the data warehouse level.

31
Reporting and the Architected
Environment
Reporting and the Architected
Environment
Data warehouse or
informational processing
focuses on management and
contains summary or
otherwise calculated
information.
In the data warehouse style of
reporting, little use is made of
line item, detailed information
once the basic calculation of
data is made.

33
Bina Nusantara
34
35
36
37
Incorrect Data in the Data
Warehouse
Incorrect Data in the Data Warehouse

For example, suppose that on July 1 an entry for $5,000 is made


into an operational system for account ABC. On July 2 a
snapshot for $5,000 is created in the data warehouse for
account ABC. Then on August 15 an error is discovered.
Instead of an entry for $5,000, the entry should have been for
$750.
How can the data in the data warehouse be corrected?
Choice 1: Go back into the data warehouse for July 2 and
find the offending entry. Then, using update capabilities,
replace the value $5,000 with the value $750.
Choice 2: Enter offsetting entries.
Choice 3: Reset the account to the proper value on August
16. 39

Você também pode gostar