Escolar Documentos
Profissional Documentos
Cultura Documentos
Overview
Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction Analysis technologies that use the DW
OLAP Data mining Visualization A good DW is a prerequisite for using these technologies
BI Is Important
Palo Alto Management Group: BI = $113 bio. in 2002 The Web makes BI more necessary
Customers do not appear physically in the store Customers can change to other stores more easily
Thus:
Know your customers using data and BI! Web logs makes is possible to analyze customer behavior in a more detailed than before (what was not bought?) Combine web data with traditional customer data
Data Warehousing
Solution: new analysis environment (DW) where data are
Subject oriented (versus function oriented) Integrated (logically and physically) Stable (data not deleted, several versions ) Time variant (data can always be related to time) Supporting management decisions (different organization)
DB
Appl.
OLAP
DB
Appl. Trans.
DM DW
Data mining
Appl.
DM Data Marts
Visualization
Appl.
DB
Subject-oriented systems DM
DB
Appl.
D-Appl.
DB
Appl. Trans.
D-Appl.
DM DW
Appl.
D-Appl.
DM Selected subjects
Appl.
DB
10
n x m versus n + m
Appl. D-App
DB
Appl. Trans.
DM
DB
Appl.
DM DB
Trans.
D-App
Appl.
DM DB
Trans.
D-App
Appl.
DB
11
Architecture Alternative
Appl. D-Appl.
DB
Appl.
Trans.
DM
DB
Appl.
DB
DM
D-Appl.
DM DB DW
Appl.
DB DM
Torben Bach Pedersen 2006 - DWML course
D-Appl.
12
DB
Appl.
DM
DB
Appl. Trans.
D-Appl.
DM DW
DB
Appl.
DB
Appl.
In-between: 1. Design of DW for DM1 2. Design of DM2 and integration with DW 3. Design of DM3 and integration with DW 4. ...
D-Appl.
Staging area
Large, sequential bulk operations => flat files best ?
Cleansing
Data checked for missing parts and erroneous values Default values provided and out-of-range values marked
Transformation
Data transformed to decision-oriented format Data from several sources merged, optimize for querying
Aggregation?
Are individual business transactions needed in the DW ?
Loading into DW
Large bulk loads rather than SQL INSERTs Fast indexing (and pre-aggregation) required
Torben Bach Pedersen 2006 - DWML course 14
OLAP operations
Aggregation of data Standard aggregations operator, e.g., SUM Starting level, (Quarter, City) Roll Up: Less detail, Quarter->Year Drill Down: More detail, Quarter->Month Slice/Dice: Selection, Year=1999 Drill Across: Join
Torben Bach Pedersen 2006 - DWML course 15
Cube Example
Sales
350 300 250 Total 200 150 100 50 0 2000 Year 2001 Copenhagen Aalborg City Aalborg Copenhagen
16
OLAP example
Millions of clicks
Still fast query response due to specialized DBMS technology
17
OLAP applications
Reporting and querying Problem and opportunity analysis
I (and most) use Business Intelligence to mean more than this
18
DW Applications: Visualization
Graphical presentation of complex result Color, size, and form help to give a better overview
19
Prediction
Predict/estimate unknown value based on similar cases
Clustering
Partition data into groups so the similarity within individual groups are greatest and the similarity between groups are smallest
Affinity grouping/associations
Find associations/dependencies between data that occur together Rules: A -> B (c%,s%): if A occurs, B occurs with confidence c and support s
21
Common DW Issues
Metadata management
Need to understand data = metadata needed Greater need that in OLTP applications as raw data is used Need to know about:
Data definitions, dataflow, transformations, versions, usage, security
DW project management
DW projects are large and different from ordinary SW projects
12-36 months and 1+ mio. US$ per project Data marts are smaller and safer (bottom up approach)
Summary
Why Business Intelligence? Data analysis problems Data Warehouse (DW) introduction Analysis technologies that use the DW
OLAP Data mining Visualization
23
DWML Software
DW software
MS SQL Server 2005 RDBMS MS Analysis Services MS Integration Services MS Reporting Services
25