Escolar Documentos
Profissional Documentos
Cultura Documentos
Session 2
Learning Outcomes
2
Acknowledgments
4
Data Warehouse
A data warehouse is a subject-oriented, integrated,
nonvolatile, and time-variant collection of data in support
of managements decisions.
The data warehouse contains granular corporate data.
Data in the data warehouse is able to be used for many
different purposes, including sitting and waiting for future
requirements which are unknown today.
5
Subject Orientation
The Classical operations
systems are organized
around the functional
applications of the
company.
6
7
Bina Nusantara 8
University
Integrated
Data is fed from
multiple, disparate
sources into the data
warehouse.
As the data is fed, it is
converted, reformatted,
re-sequenced,
summarized, and so
forth.
9
Non-Volatile
Data warehouse data is loaded
(usually, but not always, en masse)
and accessed, but it is not
updated (in the general sense).
Instead, when data in the data
warehouse is loaded, it is loaded
in a snapshot, static format. When
subsequent changes occur, a new
snapshot record is written.
In doing so, a historical record of
data is kept in the data
warehouse.
10
Time Variant
11
Data Warehouse Architecture
A staging area,
or landing
zone, is an
intermediate
storage area
used for data
processing
during the
12
extract, transf
13
Structure of Data Warehouse
Structure of Data Warehouse
15
16
Day 1 to Day n Phenomenon
17
Granularity
Granularity
Granularity refers to the level
of detail or summarization of
the units of data in the data
warehouse.
The more detail there is, the
lower the level of granularity.
The less detail there is, the
higher the level of
granularity.
A summary of all transactions
for the month would be at a
high level of granularity.
19
Bina Nusantara
20
The Benefits of Granularity
Looking at the data in different
ways
Reconcile data
Flexibility
Contains a history of activities
and events across the
corporation.
Example of
Granularity
21
Exploration and Data Mining
22
Living Sample Database
The living sample database refers to a subset of either true archival
data or lightly summarized data taken from a data warehouse.
The term living stems from the fact that it is a subseta
sampleof a larger database, and the term sample stems
from the fact that periodically the database needs to be
refreshed.
23
Partitioning as a Design Approach
Partitioning as a Design Approach
25
26
Structuring Data in the Data
Warehouse
Structuring Data in the Data
Warehouse
There are many more ways to
structure data within the data
warehouse. The most common are
these:
Simple cumulative
Rolling summary
Simple direct
Continuous
28
Data Homogeneity and
Heterogeneity
Data Homogeneity and Heterogeneity
Homogeneous
All of the types of records
are the same.
The data in the data
warehouse then is
subdivided by the
following criteria:
Subject area
Table
Occurrences of data
within table
30
Purging Warehouse Data
There are several ways in which data is purged or the
detail of data is transformed, including the following:
Data is added to a rolling summary file where
detail is lost.
Data is transferred to a bulk storage medium
from a high-performance medium such as DASD.
Data is actually purged from the system.
Data is transferred from one level of the
architecture to another, such as from the
operational level to the data warehouse level.
31
Reporting and the Architected
Environment
Reporting and the Architected
Environment
Data warehouse or
informational processing
focuses on management and
contains summary or
otherwise calculated
information.
In the data warehouse style of
reporting, little use is made of
line item, detailed information
once the basic calculation of
data is made.
33
Bina Nusantara
34
35
36
37
Incorrect Data in the Data
Warehouse
Incorrect Data in the Data Warehouse