Escolar Documentos
Profissional Documentos
Cultura Documentos
BUILDING BLOCKS
Characteristics: 1. A central database that is loaded from multiple operational databases for the purpose of end-user access and decision support.
Definition
Bill Inmon defines a central data warehouse as a database that is: 1. Subject Oriented The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales). This is reflected in the need to store decision-support data rather than application-oriented data.
Definition (Continued)
2. Integrated
The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent.
The integrated data source must be made consistent to present a unified view of the data to the users.
Definition (Continued)
3. Time Variant Data in the warehouse is only accurate and valid at some point in time or over some time interval.
Time-variance is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots.
4. Non-Volatile Data
Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. New data is always added as a supplement to the database, rather than a replacement.
13
5. Data Granularity
Data in the warehouse is summarized at different levels. Granularity levels are based on the data types and the expected system performance for queries.
Data Warehouse vs. Data Mart In Terms of Data Granularity Data Mart Data Warehouse
Corporate/Enterprise-wide Union of all data marts Data received from staging area Queries on presentation source Structure for corporate view of data Organized on E-R Model Departmental A single business process Star-join (facts & dimensions) Technology optimal for data access and analysis Structure to suit the departmental view of data
Top-Down Approach Advantages 1. A truly corporate effort, an enterprise view of data 2. Inherently architectednot a union of disparate data marts 3. Single, central storage of data about the content 4. Centralized rules and control 5. May see quick results if implemented with iterations Disadvantages 1. Takes longer to build even with an iterative method 2. High exposure/risk to failure 3. Needs high level of crossfunctional skills 4. High outlay without proof of concept
Bottom-Up Approach Advantages 1. Faster and easier implementation of manageable pieces 2. Favorable return on investment and proof of concept 3. Less risk of failure 4. Inherently incremental; can schedule important data marts first 5. Allows project team to learn and grow Disadvantages 1. Each data mart has its own narrow view of data 2. Permeates redundant data in every data mart 3. Perpetuates inconsistent and irreconcilable data 4. Proliferates unmanageable interfaces
An operational data store (ODS) provides the basis for operational processing and may be used to feed the data warehouse. It consists of the following: Production data Internal Data Archived data External Data
Data staging provides a place and an area with a set of functions to clean, change, combine, convert, duplicate, prepare source data for storage and use in the data warehouse.
Whats Metadata
THE DATA WAREHOUSE PROVIDES A MEANS FOR IMPLEMENTING AN EFFECTIVE DECISION SUPPORT ENVIRONMENT BY BUILDING EXISTING DATA FROM DISPARATE SOURCES SCATTERED ALL OVER AN ORGANIZATION. METADATA (META MODEL) COULD BE COMPARED TO AN INFORMATION DIRECTORY, CONTAINING THE YELLOW PAGES, ROAD MAP FOR NAVIGATING A DATA WAREHOUSE.
Types of Metadata
Extraction and Transformation Metadata-Extraction and loading processes metadata is used to map data sources to a common view of information within the warehouse. Operational Metadata-- Warehouse management process - metadata is used to automate the production of summary tables. End-User Metadata -- Query management process - metadata is used to direct a query to the most appropriate data source.
THANK YOU
tha