Você está na página 1de 26

Presented By

Ashutosh Chandra
Prachi Sharma
Richa Palyaal
Shikha Jain
 A data warehouse is a repository of information gathered from
multiple sources stored under a unified schema, at a single
site.
 The data warehouse is a relational data base organized to hold
information in a structure that best supports reporting and
analysis.
 A copy of transaction data specifically structured to Query and
Analysis (Ralph Kimball, 1996)
 A single, complete and consistent store of data obtained from
a variety of different sources made available to end users, in
what they can understand and use in a business context (Barry
Devlin 1992)
 A process of transforming data into information and making it
available to users in a timely enough manner to make a
difference (Forrester Research 1996)
 A collection of integrated, subject oriented databases designed
to support the DSS function where each unit of data is relevant
at some moment of time (Bill Inmon, 1991)
 Organized around major subjects, such as customer, product,
sales.
 Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing.
 Provide a simple and concise view around particular subject
issues by excluding data that are not useful in the decision
support process.
 Constructed by integrating multiple, heterogeneous data
sources.
 relational databases, flat files, on-line transaction records
 Data cleaning and data integration techniques are applied.
 Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.
 When data is moved to the warehouse, it is converted.
 The time horizon for the data warehouse is significantly longer
than that of operational systems.
 Operational database: current value data.
 Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
 Every key structure in the data warehouse
 Contains an element of time, explicitly or implicitly
 But the key of operational data may or may not contain “time
element”.
 A physically separate store of data transformed from the
operational environment.
 Operational update of data does not occur in the data
warehouse environment.
 Does not require transaction processing, recovery, and
concurrency control mechanisms
 Requires only two operations in data accessing:
 initial loading of data and access of data.
 A Data warehouse Architecture (DWA) is a way of representing the overall
structure of data, communication, processing and presentation that exists for
end-user computing within the enterprise.
 Three parts of the data warehouse:
 The data warehouse that contains the data and associated software
 Data acquisition (back-end) software that extracts data from legacy
systems and external sources, consolidates and summarizes them,
and loads them into the data warehouse
 Client (front-end) software that allows users to access and analyze
data from the warehouse
 Data flows into the data warehouse through the “load
manager". The data is extracted from the operational databases
& supplemented by data imported from external sources.
 The load manager primarily performs an extract Transform
load(ETL) operation :
 Data extraction.
 Data transformation.
 Data loading.
 It provides an interface between the warehouse& its users. It
performs task like directing the queries to appropriate tables,
monitoring the effectiveness of the indexes & summary data
& query scheduling.
 The primary components of data warehouses are :

 Data Sources
 Data Transformation
 Reporting
 Metadata
 Operations
 Optional Components
Data Sources:
Data sources refers to any electronic repository of
information where data is passed from these systems to the
data warehouse either on a transaction-by transaction basis for
real-time data warehouses or on a regular cycle.
Data Transformation:
The Data Transformation layer receives data from the
data sources, cleans and standardizes it, and loads it into the
data repository.
Data Warehouse:
The data warehouse is a relational database organized to
hold information in a structure that best supports reporting and
analysis.
Reporting:
The data in the data warehouse must be available to all the
users if the data warehouse is to be useful.
Metadata:
Metadata or "data about data", is used to inform users of the
data warehouse about its status and the information held within the
data warehouse.
Operations:
Data warehouse operations comprises of the processes of
loading, manipulating and extracting data from the data warehouse.
Operations also covers user management, security, capacity
management and related functions.
In addition, the following components also exist in some data
warehouses:
 Dependent Data Marts: A dependent data mart is a physical
database (either on the same hardware as the data warehouse
or on a separate hardware platform) that receives all its
information from the data warehouse
 Logical Data Marts: A logical data mart is a filtered view of
the main data warehouse but does not physically exist as a
separate data copy.
 Operational Data Store: An ODS is an integrated database
of operational data. Its sources include legacy systems and it
contains current or near term data
 Helps in Reporting & Analyzing the data.
 Increases data consistency.
 Increases productivity and decreases computing costs.
 Is able to combine data from different sources, in one place.
 It provides an infrastructure that could support changes to data
and replication of the changed data back into the operational
systems.
 Extracting, cleaning and loading data could be time
consuming.
 Problems with compatibility with systems already in place e.g.
transaction processing system.
 Providing training to end-users, who end up not using the data
warehouse.
 Security could develop into a serious issue, especially if the
data warehouse is web accessible
Data Warehousing is such a new field that it is difficult to
estimate what new developments are likely to most affect it.
Clearly, the development of parallel DB servers with
improved query engines is likely to be one of the most
important. Parallel servers will make it possible to access huge
data bases in much less time.
Data Warehousing is not a new phenomenon. All large
organizations already have data warehouses, but they are just not
managing them. Over the next few years, the growth of data
warehousing is going to be enormous with new products and
technologies coming out frequently. In order to get the most out of
this period, it is going to be important that data warehouse planners
and developers have a clear idea of what they are looking for and
then choose strategies and methods that will provide them with
performance today and flexibility for tomorrow.

Você também pode gostar