Escolar Documentos
Profissional Documentos
Cultura Documentos
DATA WAREHOUSE
A data warehouse is the main repository of the organization's historical data, its corporate memory. For
example, an organization would use the information that's stored in its data warehouse to find out what day of the
week they sold the most widgets in May 1992, or how employee sick leae the week !efore the winter !reak
differed !etween "alifornia and #ew $ork from 2%%1&2%%'. (n other words, the data warehouse contains the raw
material for management's decision support system. )he critical factor leading to the use of a data warehouse is
that a data analyst can perform complex *ueries and analysis on the information without slowing down the
operational systems.
+hile operational systems are optimized for simplicity and speed of modification ,online transaction processing,
or -.)/0 through heay use of data!ase normalization and an entity&relationship model, the data warehouse is
optimized for reporting and analysis ,on line analytical processing, or -.1/0. Fre*uently data in data warehouses is
heaily denormalised, summarised and2or stored in a dimension&!ased model !ut this is not always re*uired to
achiee accepta!le *uery response times.
More formally, 3ill (nmon ,one of the earliest and most influential practitioners0 defined a data warehouse as
follows4
Subject-oriented, meaning that the data in the data!ase is organized so that all the data elements relating to the
same real&world eent or o!5ect are linked together6
Time-variant, meaning that the changes to the data in the data!ase are tracked and recorded so that reports can
!e produced showing changes oer time6 o!ieefans.com
Non-volatile, meaning that data in the data!ase is neer oer&written or deleted, once committed, the data is
static, read&only, !ut retained for future reporting6
Integrated, meaning that the data!ase contains data from most or all of an organization's operational applications,
and that this data is made consistent 7istory of data warehousing
8ata +arehouses !ecame a distinct type of computer data!ase during the late 199%s and early 199%s. )hey were
deeloped to meet a growing demand for management information and analysis that could not !e met !y
operational systems. -perational systems were una!le to meet this need for a range of reasons4
)he processing load of reporting reduced the response time of the operational systems,
)he data!ase designs of operational systems were not optimized for information analysis and reporting,
Most organizations had more than one operational system, so company&wide reporting could not !e
supported from a single system, and
8eelopment of reports in operational systems often re*uired writing specific computer programs which
was slow and expensie.
1s a result, separate computer data!ases !egan to !e !uilt that were specifically designed to support management
information and analysis purposes. )hese data warehouses were a!le to !ring in data from a range of different data
sources, such as mainframe computers, minicomputers, as well as personal computers and office automation
1
Data Warehousing & INFORMATICA
software such as spreadsheet, and integrate this information in a single place. )his capa!ility, coupled with user&
friendly reporting tools and freedom from operational impacts, has led to a growth of this type of computer system.
1s technology improed ,lower cost for more performance0 and user re*uirements increased ,faster data load cycle
times and more features0, data warehouses hae eoled through seeral fundamental stages4
Offline Oerational Databa!e! & 8ata warehouses in this initial stage are deeloped !y simply copying the data!ase
of an operational system to an off&line serer where the processing load of reporting does not impact on the
operational system's performance.
Offline Data Ware"ou!e & 8ata warehouses in this stage of eolution are updated on a regular time cycle ,usually
daily, weekly or monthly0 from the operational systems and the data is stored in an integrated reporting&oriented
data structure
Real Time Data Ware"ou!e & 8ata warehouses at this stage are updated on a transaction or eent !asis, eery time
an operational system performs a transaction ,e.g. an order or a deliery or a !ooking etc.0
Integrated Data Ware"ou!e & 8ata warehouses at this stage are used to generate actiity or transactions that are
passed !ack into the operational systems for use in the daily actiity of the organization.
DATA WAREHOUSE AR#HITE#TURE
)he term data warehouse architecture is primarily used today to descri!e the oerall structure of a 3usiness
(ntelligence system. -ther historical terms include decision support systems ,8::0, management information
systems ,M(:0, and others.
)he data warehouse architecture descri!es the oerall system from arious perspecties such as data, process, and
infrastructure needed to communicate the structure, function and interrelationships of each component. )he
infrastructure or technology perspectie details the arious hardware and software products used to implement the
distinct components of the oerall system. )he data perspecties typically diagrams the source and target data
structures and aid the user in understanding what data assets are aaila!le and how they are related. )he process
perspectie is primarily concerned with communicating the process and flow of data from the originating source
system through the process of loading the data warehouse, and often the process that client products use to access
and extract data from the warehouse.
DATA STORA$E %ETHOTS
(n O&T' & online transaction processing systems relational data!ase design use the discipline of data
modeling and generally follow the "odd rules of data normalization in order to ensure a!solute data integrity. .ess
complex information is !roken down into its most simple structures ,a ta!le0 where all of the indiidual atomic leel
elements relate to each other and satisfy the normalization rules. "odd defines ' increasing stringent rules of
normalization and typically -.)/ systems achiee a ;rd leel normalization. Fully normalized -.)/ data!ase designs
often result in haing information from a !usiness transaction stored in dozens to hundreds of ta!les. <elational
2
Data Warehousing & INFORMATICA
data!ase managers are efficient at managing the relationships !etween ta!les and result in ery fast insert2update
performance !ecause only a little !it of data is affected in each relational transaction.
-.)/ data!ases are efficient !ecause they are typically only dealing with the information around a single
transaction. (n reporting and analysis, thousands to !illions of transactions may need to !e reassem!led imposing a
huge workload on the relational data!ase. =ien enough time the software can usually return the re*uested results,
!ut !ecause of the negatie performance impact on the machine and all of its hosted applications, data
warehousing professionals recommend that reporting data!ases !e physically separated from the -.)/ data!ase.
(n addition, data warehousing suggests that data !e restructured and reformatted to facilitate *uery and analysis !y
noice users. -.)/ data!ases are designed to proide good performance !y rigidly defined applications !uilt !y
programmers fluent in the constraints and conentions of the technology. 1dd in fre*uent enhancements, and to
many a data!ase is 5ust a collection of cryptic names, seemingly unrelated and o!scure structures that store data
using incomprehensi!le coding schemes. 1ll factors that while improing performance, complicate use !y untrained
people. .astly, the data warehouse needs to support high olumes of data gathered oer extended periods of time
and are su!5ect to complex *ueries and need to accommodate formats and definitions of inherited from
independently designed package and legacy systems.
8esigning the data warehouse data 1rchitecture synergy is the realm of 8ata +arehouse 1rchitects. )he goal of a
data warehouse is to !ring data together from a ariety of existing data!ases to support management and reporting
needs. )he generally accepted principle is that data should !e stored at its most elemental leel !ecause this
proides for the most useful and flexi!le !asis for use in reporting and information analysis. 7oweer, !ecause of
different focus on specific re*uirements, there can !e alternatie methods for design and implementing data
warehouses. )here are two leading approaches to organizing the data in a data warehouse. )he dimensional
approach adocated !y <alph >im!all and the normalized approach adocated !y 3ill (nmon. +hilst the dimension
approach is ery useful in data mart design, it can result in a rats nest of long term data integration and a!straction
complications when used in a data warehouse.
(n the ?dimensional? approach, transaction data is partitioned into either a measured ?facts? which are generally
numeric data that captures specific alues or ?dimensions? which contain the reference information that gies each
transaction its context. 1s an example, a sales transaction would !e !roken up into facts such as the num!er of
products ordered, and the price paid, and dimensions such as date, customer, product, geographical location and
salesperson. )he main adantage of a dimensional approach is that the data warehouse is easy for !usiness staff
with limited information technology experience to understand and use. 1lso, !ecause the data is pre&5oined into the
dimensional form, the data warehouse tends to operate ery *uickly. )he main disadantage of the dimensional
approach is that it is *uite difficult to add or change later if the company changes the way in which it does
!usiness.
)he ?normalized? approach uses data!ase normalization. (n this method, the data in the data warehouse is stored in
third normal form. )a!les are then grouped together !y su!5ect areas that reflect the general definition of the data
,customer, product, finance, etc.0. )he main adantage of this approach is that it is *uite straightforward to add
new information into the data!ase && the primary disadantage of this approach is that !ecause of the num!er of
ta!les inoled, it can !e rather slow to produce information and reports. Furthermore, since the segregation of
facts and dimensions is not explicit in this type of data model, it is difficult for users to 5oin the re*uired data
3
Data Warehousing & INFORMATICA
elements into meaningful information without a precise understanding of the data structure.
:u!5ect areas are 5ust a method of organizing information and can !e defined along any lines. )he traditional
approach has su!5ects defined as the su!5ects or nouns within a pro!lem space. For example, in a financial serices
!usiness, you might hae customers, products and contracts. 1n alternatie approach is to organize around the
!usiness transactions, such as customer enrollment, sales and trades.
1dantages of using data warehouse
)here are many adantages to using a data warehouse, some of them are4
@nhances end&user access to a wide ariety of data.
3usiness decision makers can o!tain arious kinds of trend reports e.g. the item with the most sales in
a particular area 2 country for the last two years.
1 data warehouse can !e a significant ena!ler of commercial !usiness applications, most nota!ly
#u!tomer relation!"i management (#R%)*
"oncerns in using data warehouses
@xtracting, cleaning and loading data is time consuming.
8ata warehousing pro5ect scope must !e actiely managed to delier a release of defined content
and alue.
"ompati!ility pro!lems with systems already in place.
:ecurity could deelop into a serious issue, especially if the data warehouse is we! accessi!le.
8ata :torage design controersy warrants careful consideration and perhaps prototyping of the
data warehouse solution for each pro5ect's enironments.
HISTOR+ O, DATA WAREHOUSIN$
8ata warehousing emerged for many different reasons as a result of adances in the field of information systems.
1 ital discoery that propelled the deelopment of data warehousing was the fundamental differences !etween
operational ,transaction processing0 systems and informational ,decision support0 systems. -perational systems are
run in real time where in contrast informational systems support decisions on a historical point&in&time. 3elow is a
comparison of the two.
"haracteristic -perational :ystems ,-.)/0 (nformational :ystems
,-.1/0
/rimary /urpose <un the !usiness on a current
!asis
:upport managerial decision
making
)ype of 8ata <eal time !ased on current
data
:napshots and predictions
/rimary Asers "lerks, salespersons,
administrators
Managers, analysts,
customers
:cope #arrow, planned, and simple
updates and *ueries
3road, complex *ueries and
analysis
4
Data Warehousing & INFORMATICA
8esign =oal /erformance throughput,
aaila!ility
@ase of flexi!le access and
use
8ata!ase
concept
"omplex simple
#ormalization 7igh .ow
)ime&focus /oint in time /eriod of time
Bolume Many & constant updates and
*ueries on one or a few ta!le
rows
/eriodic !atch updates and
*ueries re*uiring many or
all rows
-ther aspects that also contri!uted for the need of data warehousing are4
C (mproements in data!ase technology
o )he !eginning of relational data models and relational data!ase management systems ,<83M:0
C 1dances in computer hardware
o )he a!undant use of afforda!le storage and other architectures
C )he importance of end&users in information systems
o )he deelopment of interfaces allowing easier use of systems for end users
C 1dances in middleware products
o @na!led enterprise data!ase connectiity across heterogeneous platforms
8ata warehousing has eoled rapidly since its inception. 7ere is the story timeline of data warehousing4
19D%Es F -perational systems ,such as data processing0 were not a!le to handle large and fre*uent re*uests for data
analyses. 8ata stored was in mainframe files and static data!ases. 1 re*uest was processed from recorded tapes for
specific *ueries and data gathering. )his proed to !e time consuming and an inconenience.
199%Es F <eal time computer applications !ecame decentralized. <elational models and data!ase management
systems started emerging and !ecoming the wae. <etrieing data from operational data!ases still a pro!lem
!ecause of Gislands of data.H
199%Es F 8ata warehousing emerged as a feasi!le solution to optimize and manipulate data !oth internally and
externally to allow !usinessE to make accurate decisions.
+hat is data warehousingI
1fter information technology took the world !y storm, there were many reolutionary concepts that were created
to make it more effectie and helpful. 8uring the nineties as new technology was !eing !orn and was !ecoming
o!solete in no time, there was a need for a concrete fool proof idea that can help data!ase administration more
secure and relia!le. )he concept of data warehousing was thus, inented to help the !usiness decision making
process. )he working of data warehousing and its applications has !een a !oon to information technology
professionals all oer the world. (t is ery important for all these managers to understand the architecture of how it
works and how can it !e used as a tool to improe performance. )he concept has reolutionized the !usiness
planning techni*ues.
#oncet
(nformation processing and managing a data!ase are the two important components for any !usiness to hae a
5
Data Warehousing & INFORMATICA
smooth operation. 8ata warehousing is a concept where the information systems are computerized. :ince there
would !e a lot of applications that run simultaneously, there is a possi!ility that each indiidual processes create an
exclusie Gsecondary dataH which originates from the source. )he data warehouses are useful in tracking all the
information down and are useful in analyzing this information and improe performance. )hey offer a wide ariety
of options and are highly compati!le to irtually all working enironments. )hey help the managers of companies to
gauge the progress that is made !y the company oer a period of time and also explore new ways to improe the
growth of the company. )here are many GitEsH in !usiness and these data warehouses are read only integrated
data!ases that help to answer these *uestions. )hey are useful to form a structure of operations and analyze the
su!5ect matter on a gien time period.
T"e !tructure
1s is the case with all computer applications there are arious steps that are inoled in planning a data warehouse.
)he need is analyzed and most of the time the end user is taken into consideration and their input forms an
inalua!le asset in !uilding a customized data!ase. )he !usiness re*uirements are analyzed and the GneedH is
discoered. )hat would then !ecome the focus area. (f a company wants to analyze all its records and use the
research in improing performance.
1 data warehouse allows the manager to focus on this area. 1fter the need is zeroed in on then a conceptual data
model is designed. )his model is then used a !asic structure that companies follow to !uild a physical data!ase
design. 1 num!er of iterations, technical decisions and prototypes are formulated. )hen the systems deelopment
life cycle of design, deelopment, implementation and support !egins.
#ollection of data
)he pro5ect team analyzes arious kinds of data that need to go into the data!ase and also where they can find all
this information that they can use to !uild the data!ase. )here are two different kinds of data. -ne which can !e
found internally in the company and the other is the data that comes from another source. )here would !e another
team of professionals who would work on the creation, extraction programs that are used to collect all the
information that is needed from a num!er of data!ases, Files or legacy systems. )hey identify these sources and ten
copy them onto a staging area outside the data!ase. )hey clean all the data which is descri!ed as cleansing and
make sure that it does not contain any errors. )hey copy all the data into his data warehouse. )his concept of data
extraction from the source and the selection, transformation processes hae !een uni*ue !enchmarks of this
concept. )his is ery important for the pro5ect to !ecome successful. 1 lot of meticulous planning is inoled in
arriing at a step !y step configuration of all the data from the source to the data warehouse.
U!e of metadata
)he whole process of extracting data and collecting it to make it effectie component in the operation re*uires
GmetadataH. )he transformation of an analytical system from an operational system is achieed only with maps of
Meta data. )he transformational data includes the change in names, data changes and the physical characteristics
that exist. (t also includes the description of the data, its !rigand updates. 1lgorithms are used in summarizing the
data.Meta data proides graphical user interface that helps the non&technical end users. )his offers richness in
naigation and accessing the data!ase. )here is other form of Meta data called the operational Meta data. )his
forms the fundamental structure of accessing the procedures and monitoring the growth of data warehouse in
relation with the aaila!le storage space. (t also recognizes who would !e responsi!le to access the data in the
warehouse and in operational systems.
6
Data Warehousing & INFORMATICA
Data mart!-!ecific data
(n eery data !ase systems, there is a need for updation. :ome of them do it !y the day and some !y the minute.
7oweer if a specific department needs to monitor its own data in sync with the oerall !usiness process. )hey
store it as data marts. )hese are not as !ig as data arehouse and are useful for storing the data and the information
of a specific !usiness module. )he latest trend in data warehousing is to deelop smaller data marts and then
manage each of them indiidually and later integrate them into the oerall !usiness structure.
:ecurity and relia!ility :imilar to information system, trustworthiness of data is determined !y the trustworthiness
of the hardware, software, and the procedures that created them. )he relia!ility and authenticity of the data and
information extracted from the warehouse will !e a function of the relia!ility and authenticity of the warehouse
and the arious source systems that it encompasses.
(n data warehouse enironments specifically, there needs to !e a means to ensure the integrity of data first !y
haing procedures to control the moement of data to the warehouse from operational systems and second !y
haing controls to protect warehouse data from unauthorized changes. 8ata warehouse trustworthiness and security
are contingent upon ac*uisition, transformation and access metadata and systems documentation
)he !asic need for eery data !ase is that it needs to !e secure and trustworthy. )his is determined !y the
hardware components of the system the relia!ility and authenticity of the data and information extracted from the
warehouse will !e a function of the relia!ility and authenticity of the warehouse and the arious source systems
that it encompasses. (n data warehouse enironments specifically, there needs to !e a means to ensure the integrity
of data first !y haing procedures to control the moement of data to the warehouse from operational systems and
second !y haing controls to protect warehouse data from unauthorized changes. 8ata warehouse trustworthiness
and security are contingent upon ac*uisition, transformation and access metadata and systems documentation.
7an and >am!er ,2%%10 define a data warehouse as G1 repository of information collected from multiple sources,
stored under a unified scheme, and which usually resides at a single site.H
(n educational terms, all past information aaila!le in electronic format a!out a school or district such as !udget,
payroll, student achieement and demographics is stored in one location where it can !e accessed using a single set
of in*uiry tools.
)hese are some of the driers that hae !een created to initiate data warehousing.
C #R%- "ustomer relationship management .there is a threat of losing customers due to poor *uality and sometimes
those unknown reasons that no!ody eer explored. 1s a result of direct competition, this concept of customer
relationship management has !een on the forefront to proide the solutions. 8ata warehousing techni*ues hae
helped this cause enormously. 8iminishing profit margins4 =lo!al competition has forced many companies that
en5oyed generous profit margins on their products to reduce their prices to remain competitie. :ince cost of goods
sold remains constant, companies need to manage their operations !etter to improe their operating margins
C 8ata warehouses ena!le management decision support for managing !usiness operations. <etaining the existing
customers has !een the most important feature of present day !usiness. )o facilitate good customer relationship
management companies are inesting a lot of money to find out the exact needs of the consumer. 1s a result of this
direct competition the concept of customer relationship management came into existence. 8ata warehousing
techni*ues hae helped this cause enormously. 8iminishing profit margins4 =lo!al competition has forced many
companies that en5oyed generous profit margins on their products to reduce their prices to remain competitie.
7
Data Warehousing & INFORMATICA
:ince cost of goods sold remains constant, companies need to manage their operations !etter to improe their
operating margins. 8ata warehouses ena!le management decision support for managing !usiness operations.
. Deregulation4 the eer growing competition and the diminishing profit margins hae made companies to explore
arious new possi!ilities to play the game !etter. 1 company deelops in one direction and esta!lishes a particular
core competency in the market. 1fter they hae their own speciality, they look for new aenues to go into a new
market with a completely new set of possi!ilities. For a company to enture into deeloping a new core
competency, the concept of deregulation is ery important. . 8ata warehouses are used to proide this information.
8ata warehousing is useful in generating a cross reference data !ase that would help companies to get into cross
selling. this is the single most effectie way that this can hap
C T"e comlete life c/cle. )he industry is ery olatile where we come across a wide range of new products eery
day and then !ecoming o!solete in no time. )he waiting time for the complete lifecycle often results in a heay loss
of resources of the company. )here was a need to !uild a concept which would help in tracking all the olatile
changes and update them !y the minute. )his allowed companies to !e extra safe (n regard to all their products.
)he system is useful in tracking all the changes and helps the !usiness decision process to a great deal. )hese are
also descri!ed as !usiness intelligence systems in that aspect.
%erging of bu!ine!!e!4 1s descri!ed a!oe, as a direct result of growing competition, companies 5oin forces to
care a niche in a particular market. )his would help the companies to work towards a common goal with twice the
num!er of resources. (n case of such an eent, there is a huge amount of data that has to !e integrated. )his data
might !e on different platforms and different operating systems. )o hae a centralized authority oer the data, it is
important that a !usiness tool has to !e generated which not only is effectie !ut also relia!le. 8ata warehousing
fits the need <eleance of 8ata +arehousing for organizations @nterprises today, !oth nationally and glo!ally, are in
perpetual search of competitie adantage. 1n incontroerti!le axiom of !usiness management is that information
is the key to gaining this adantage. +ithin this explosion of data are the clues management needs to define its
market strategy. 8ata +arehousing )echnology is a means of discoering and unearthing these clues, ena!ling
organizations to competitiely position themseles within market sectors. (t is an increasingly popular and powerful
concept of applying information technology to soling !usiness pro!lems. "ompanies use data warehouses to store
information for marketing, sales and manufacturing to help managers get a feel for the data and run the !usiness
more effectiely. Managers use sales data to improe forecasting and planning for !rands, product lines and !usiness
areas. <etail purchasing managers use warehouses to track fast&moing lines and ensure an ade*uate supply of high&
demand products. Financial analysts use warehouses to manage currency and exchange exposures, oersee cash
flow and monitor capital expenditures.
8ata warehousing has !ecome ery popular among organizations seeking competitie adantage !y getting strategic
information fast and easy ,1dhikari, 199J0. )he reasons for organizations for haing a data warehouse can !e
grouped into four sections4
C Ware"ou!ing data out!ide t"e oerational !/!tem!4
)he primary concept of data warehousing is that the data stored for !usiness analysis can most effectiely !e