Escolar Documentos
Profissional Documentos
Cultura Documentos
WAREHOUSING
WHAT IS DW?
DW : storage area for processed and integrated data across different sources (operational data & external data)
A data warehouse allows its users to extract required data for Business Analysis & Strategic Decision Making
OTHER
A data warehouse is a
SUBJECT ORIENTED
Example for an insurance company : Applications Area
Commercial and Life Insurance Systems Auto and Fire Policy Processing Systems
Data Warehouse
Customer
Policy
Data
Accounting System Claims Processing System Losses
Data
Premium
Billing System
INTEGRATED
Data is stored once in a single integrated location (e.g. insurance company)
Subject = Customer
TIME - VARIANT
Data is stored as a series of snapshots or views which record how it is collected across time.
Data Warehouse Data
Time
Data
Data is tagged with some element of time - creation date, as of date, etc. Data is available on-line for long periods of time for trend analysis and forecasting. For example, five or more years
Key
NON-VOLATILE
Existing data in the warehouse is not overwritten or updated.
External Sources Data Warehouse Database
Load Read-Only
ARCHITECTURE
EXTRACT-TRANSFORM-LOAD
ETL is a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse. ETL involves the following tasks:
Extracting the data: from source systems (SAP, ERP, other operational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing. Transforming the data: may involve the following tasks: Applying business rules (so-called derivations, e.g., calculating new measures and dimensions)
10
Cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.), Filtering (e.g., selecting only certain columns to load), Splitting a column into multiple columns and vice versa, Joining together data from multiple sources (e.g., lookup, merge), Transposing rows and columns, Applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing) Loading the data into a data warehouse or data repository other reporting applications
11
DATA MARTS
The data mart is a subset of the data warehouse that is usually oriented to a specific business line or team. The information in data marts pertains to a single department.
Each department or business unit is considered the owner of its data mart including all the hardware, software and data.
This enables each department to use, manipulate and develop their data any way they see fit without altering information inside other data marts or the data warehouse.
12
Time frame for implementation is less than data warehouse and takes around 4-12 months It is relatively cheap than data warehouse
Information
Individually Structured
Departmentally Structured
Data
Organizationally Structured
Data Warehouse
13
14
16