Você está na página 1de 9

Road Map

Facts and dimensions


Steps in dimensional modeling
Modeling example
Summary

Dimensional Modeling
Prof.dr.ing. Florin Radulescu
Universitatea Politehnica din Bucureti

Florin Radulescu, Note de curs

DMDW-12

Facts and dimensions

Facts

In the previous lesson we saw that:


A large enterprise data warehouse will
consist of 20 or more very similar-looking
data marts, with similar dimensional
models.
Each data mart may contain several fact
tables, each with 5 to 15 dimension tables.
Many of these dimension tables will be
shared between several fact tables.

A fact table represents a business process


and contains the values for the main
measurements describing that process.
For a sale business process, this table
contains for example quantity sold, total
price, total cost, total gross profit.
The attributes of a fact table may be
additive, semi-additive or non-additive

Florin Radulescu, Note de curs

DMDW-12

Florin Radulescu, Note de curs

Additive measures

Semi-Additive

Additive measures can be aggregated


across all dimensions.
All examples in the previous slide are
additive.
For example, SUM(total price) is
meaningful on all dimensions: time,
location, store, customer, etc.

 Semi-Additive measures can be aggregated


across some dimensions but not all.
 Here are for example periodic measurements:
account balance for a bank account or
inventory level for a retail chain.

 In the first case an average may be computed for


knowing the average daily balance but the sum of
daily balances is not meaningful.
 In the second case, inventory level is additive on
product and warehouse but not across time: the
sum of yesterday and today inventory level for a
given product is not a meaningful value.

Florin Radulescu, Note de curs

DMDW-12

DMDW-12

Florin Radulescu, Note de curs

DMDW-12

Non-additive measures
Non-additive measures cannot be aggregated
across all/any dimension.
A classical example is the unit price.
Considering a retail company, the sum of unit
prices along any dimension (product, customer,
location, etc.) is not meaningful.
For that reason, if these values can be computed
based on additive measures, the non-additive
measures are not stored in the fact tables.
For our example, the unit price can always be
computed dividing total cost by the quantity sold.

Grain
The level of detail of o record in a fact
table is called the grain of the table.
Besides business process measurements,
the fact table contains also foreign keys for
all the dimension tables and possibly
some pseudo-foreign keys for some
degenerate dimensions.
More details on this topic in the next
paragraphs of this lesson.

Florin Radulescu, Note de curs

DMDW-12

Florin Radulescu, Note de curs

DMDW-12

Dimensions

Dimensions

In [CS680, 2004], dimension tables are


characterized as follows:
 Represent the who, what, where, when and how of
a measurement/artifact
 Represent real-world entities not business
processes
 Give the context of a measurement (subject)
Example: in a retail company DW, the Sales fact
table can be linked with dimensions like Location
(Where), Time (When), Product (What), Customer
(Who), Sales Channel (How).

The Dimension Attributes are the columns of


the dimension table. [Wikipedia] lists some
features for these attributes:
Verbose - labels consisting of full words,
Descriptive,
Complete - no missing values,
Discretely valued - only one value per row in
dimensional table,
Quality assured - no misspelling, no
impossible values.

Florin Radulescu, Note de curs

DMDW-12

Florin Radulescu, Note de curs

DMDW-12

10

Star scheme

Advantages
 Each fact table is surrounded by several linked
dimension tables, as in Figure 1.
 Because of its appearance, such a construction is
called a star scheme.
 A star scheme has several advantages:
Is easy to understand. Graphic representations have
almost always this advantage
Provide better performance: data is de-normalized in
fact and dimension tables, so for obtaining a query
result needs only the joins between the fact table and
the implied dimensions
Is extensible. Attributes and dimensions may be
added easily
Florin Radulescu, Note de curs

11

DMDW-12

Florin Radulescu, Note de curs

12

DMDW-12

SQL Query

Snow-flake schemes

SELECT P.Name, SUM(S.Sales), . . .

Sometimes the dimension tables are


normalized: each dimension is stored as a
set of tables.
In this case, the scheme is called Snowflake scheme

FROM Sales S, Date D, Product P, Location L, Promotion R


WHERE S.Date_Id = D.Date_Id

// join condition

AND S.Poduct_Id = P.Product_Id


AND S.Location_Id = L.Location_Id
AND S.Promotion_Id = R.Promotion_Id
// additional condition
AND D.Month='JUN' AND D.Year='2012' AND L.Country_Name='ROU'
GROUP BY P.Product_Id, P.Name

// P.Name is needed because


// is in the SELECT clause

Florin Radulescu, Note de curs

DMDW-12

13

Florin Radulescu, Note de curs

DMDW-12

14

Example

Road Map
Facts and dimensions
Steps in dimensional modeling
Modeling example
Summary

Florin Radulescu, Note de curs

15

DMDW-12

Florin Radulescu, Note de curs

The four step approach


There are four steps in dimensional
modeling design.
These steps must be performed in
particular order and every review of a step
triggers the review of all subsequent steps.

The four step approach


These four steps are:
1. Select the business processes
2. Declare the grain
3. Choose the dimensions
4. Identify the facts

Florin Radulescu, Note de curs

17

DMDW-12

DMDW-12

16

Florin Radulescu, Note de curs

18

DMDW-12

Select the business processes


 An organization has several departments
and carries out several business
processes.
Selecting the business process does not
refer to one structural department of the
organization but to a process carried out
by one or more departments together.

Select the business processes


In a general company for example some
main business processes are:
Supply chain management
Orders,
Shipments,
Invoicing,
Stocking and inventory
General ledger

Florin Radulescu, Note de curs

DMDW-12

19

Florin Radulescu, Note de curs

No duplicate data

No duplicate data

 This approach also ensures that data contains no duplicate


data.
 If a department approach in structuring the data warehouse is
used, same data may be used by several departments and
must be presented redundantly in the DW.
 For example, inventory data are used for supply chain
management but also for production management in a car
factory.
 A data warehouse organized based on departmental structure
will duplicate inventory data but organizing it on business
processes will avoid redundancy and both departments
supply management and production will use the same data.

No duplicate data means also that:


Consistency is better preserved; redundancy
is known for inducing consistency problems
(as the update anomaly)
Data are published once and this single view
is used as published for decision support in all
departments and activities.

Florin Radulescu, Note de curs

21

DMDW-12

Florin Radulescu, Note de curs

22

Step 2: Declare the grain


 Each line in a fact table is a grain in our data
warehouse. In step 2 of the dimensional design
process the level of detail for these lines / grains must
be defined.
 Thinking at a retail company with registered users (like
Metro or Selgros), for the POS sales business
process, a grain may be:
1. An individual line item on a customers retail sales
ticket or invoice, as measured by a scanner device
(in that case the same item may be on several lines
in the same ticket/invoice because the quantity was
greater than one and each product was scanned
individually).
DMDW-12

DMDW-12

Step 2: Declare the grain


2. The same significance as above but lines
containing the same part number are
summarized in a single line.
3. A daily reunion of the sales tickets of a customer
containing items and prices.
4. A sales ticket / invoice for a customer. The same
customer may have several sales tickets each
day.
5. A daily summary on the sales tickets of a
customer, containing only total sales amount.
6. A weekly summary on the sales tickets of a
customer containing only total sales amount.

Florin Radulescu, Note de curs

23

DMDW-12

20

Florin Radulescu, Note de curs

24

DMDW-12

Discussion

Discussion

In the first three cases data on what products


bought each customer is preserved in the
DW, so there is a low detail lever allowing
much more queries and much more slicing
and dicing actions: reports, analysis, etc.
For the last three cases these data are
removed, the level of detail is higher and only
a part of the above actions may be
performed.

A key idea in choosing the granularity level is


emphasized in [Kimball, Ross, 2002]:
 Preferably you should develop dimensional models
for the most atomic information captured by a
business process. Atomic data is the most detailed
information collected; such data cannot be subdivided
further.
and
 A data warehouse almost always demands data
expressed at the lowest possible grain of each
dimension not because queries want to see individual
low-level rows, but because queries need to cut
through the details in very precise ways.

Florin Radulescu, Note de curs

DMDW-12

25

Florin Radulescu, Note de curs

Atomic data features

Atomic data features

Some features of atomic data listed in


Kimball & Ross book are:
Is highly dimensional,
Being highly dimensional, detailed and atomic
fact measurement allow more dimensions and
so data may be drilled in more ways,
Dimensional approach is favored by atomic
data, each extra dimension being easily
added to the star schemes,

Features cont.:
Provides maximum analytic flexibility,
Detailed data allow more ad hoc queries,
Low level grain does not prohibit adding also
summary high level grain in the DW for speeding
up frequent queries and reports.

Note that declaring the grain is a critical step.


If later the granularity choice is proved to be
wrong, the process must go back to step 2 for
re-declaring the grain correctly, and after that
steps 3 and 4 must be run again.

Florin Radulescu, Note de curs

DMDW-12

27

Step 3: Choose the dimensions


 Knowing the grain, dimensions can be determined
easily: each dimension represents a possible fact table
line description.
 Example of common dimensions used in a sales data
warehouse:
 Product,
 Customer,
 Date,
 Ticket number
 Status,
 Store,
 Salesperson
 Promotion

Florin Radulescu, Note de curs

28

DMDW-12

DMDW-12

Step 3: Choose the dimensions


Dimensions can be found by asking
ourselves haw can we describe a single
line in the fact table.
In the above example, a line in the fact
table represents a single line on a sales
ticket.
This line is about selling a product to a
customer at a given date and in a given
store possibly under a promotion.

Florin Radulescu, Note de curs

29

DMDW-12

26

Florin Radulescu, Note de curs

30

DMDW-12

Step 3: Choose the dimensions

Step 4: Identify the facts

The line is on a ticket having a number, a


status (for example paid by credit card)
and has been made by a particular
salesperson.
The number of attributes in a dimension
table is not so small.
This lesson presents an example showing
that each dimension may have several
tens of attributes.

 Every line in a fact table must contain some attribute


values.
 These attributes represents the measures assigned to
that business process that must be determined at this
step.
 In the case of a star scheme containing data on POS
retail sales in a store chain, possible attributes of the
fact table are:
 Quantity sold additive value
 Total line value amount additive value
 Line cost amount additive value
 Line profit amount additive value
 Unit price not an additive value

Florin Radulescu, Note de curs

DMDW-12

31

Florin Radulescu, Note de curs

DMDW-12

32

Discussion

Discussion

The product sold, store, date, time,


customer, promotion, sales ticket and
other data are also identified by the linked
records in the corresponding dimension
tables.
In the fact table are stored only information
specific to an association of an instance
for each dimension.

Additive measures are preferred. So unit price,


which is not an additive value, will be removed
because it can be computed by division from the
total line value amount and quantity sold.
Redundant data can be stored in a fact table if
they are additive or semi-additive.
For example, Line profit amount may be computed
by subtracting the cost amount from the value
amount.
The presence of these redundant values is
allowed for speeding up processing.

Florin Radulescu, Note de curs

DMDW-12

33

Florin Radulescu, Note de curs

Road Map

Modeling example
 A retail sales modeling example is presented in
[Kimball, Ross, 2002] for a store chain.
 Each store has several departments and sales
several tens of thousands items (called stock
keeping units SKU).
 Each SKU has either a universal product code
imprinted by the manufacturers or a local code for
bulk goods (for example agricultural products vegetables and fruits, meat, bakery, etc.).
 Package variation of a product is another SKU and
by consequence has a different code.

Facts and dimensions


Steps in dimensional modeling
Modeling example
Summary

Florin Radulescu, Note de curs

35

DMDW-12

34

DMDW-12

Florin Radulescu, Note de curs

36

DMDW-12

Modeling example

Step 1

Some products are sold under some


promotions. There are four types of
promotions in the modeling example cited:

Step 1: Select the business process


The most important business process in a
retail company is customer purchases as
captured by the POS system, so the
business process modeled is POS retail
sales.

Temporary price reductions,


Ads in newspapers and newspaper inserts,
Displays in the store (end-aisle displays
included),
Coupons.
Florin Radulescu, Note de curs

DMDW-12

37

Florin Radulescu, Note de curs

DMDW-12

38

Step 2

Step 3

Step 2: Declare the grain


As seen earlier, in dimensional modeling is
preferably to store atomic information,
data collected at the POS location and not
summarized or aggregated data based on
POS transactions.
So the grain in this modeling example will
be an individual line on the sales ticket
generated by the POS.

Step 3: Choose the dimensions


As presented earlier, a line on a sales
ticket is about selling a product to a
customer at a given date and in a given
store possibly under a promotion.
The line is on a ticket having a number
and is made by a particular salesperson.

Florin Radulescu, Note de curs

DMDW-12

39

Florin Radulescu, Note de curs

Step 3 star scheme


Product
Product_Key (PK)
Product attributes

Date
Date_Key (PK)
Date attributes

Step 3 - details

Store
Store_Key (PK)
Store attributes
POS_Sales
Product_Key (FK)
Date_Key (FK)
Store_Key (FK)
SP_Key (FK)
Promotion_Key (FK)
Ticket_number (FK)
Fact table attributes

From this definition of the grain, the


dimensions are:
Product
Date
Store
Promotion
Salesperson
Sales ticket

Salesperson
SP_Key (PK)
SP attributes

Promotion
Promotion_Key (PK)
Promotion attributes

Florin Radulescu, Note de curs

41

DMDW-12

40

DMDW-12

Florin Radulescu, Note de curs

42

DMDW-12

Degenerate dimensions

Degenerate dimensions

The sales ticket is a so-called degenerate


dimension.
Such degenerate dimensions come from
operational control numbers: ticket
number, order number, invoice number,
and so on.
These dimensions are empty - without
other attributes.

No associated dimension table is present in the


star scheme, but they are necessary in some
queries, for example for finding products sold
together in the same sales basket.
For these dimensions only the pseudo-foreign
key associated with the dimension is present as
attribute in the fact table.
Every dimension has a surrogate primary key
and this key is also contained as foreign key in
the fact table.

Florin Radulescu, Note de curs

DMDW-12

43

Florin Radulescu, Note de curs

DMDW-12

44

Step 4: Identify the facts

Star scheme again

The candidate fact attributes:

Product
Product_Key (PK)
Product attributes

Quantity_sold additive value


Line_amount additive value
Cost_amount additive value
Profit amount additive value
Unit_price not an additive value

Date
Date_Key (PK)
Date attributes

POS_Sales
Product_Key (FK)
Date_Key (FK)
Store_Key (FK)
SP_Key (FK)
Promotion_Key (FK)
Ticket_number (FK)
Quantity_sold
Line_amount
Cost_amount
Profit amount

DMDW-12

Florin Radulescu, Note de curs

DMDW-12

46

Discussion

Dimension attributes example

Because Unit_price is not additive, this


attribute will not be stored in the fact table
but can always be computed by dividing
Line_amount to Quantity_sold.
This applies to percentages and ratios,
(almost all are non-additive values) and
and in these cases the numerator and
denominator must be stored in the fact
table.

Date Key (PK)


Date
Full Date Description
Day of Week
Day Number in Epoch
Week Number in Epoch
Month Number in Epoch
Day Number in Calendar Month
Day Number in Calendar Year
Day Number in Fiscal Month
Day Number in Fiscal Year
Last Day in Week Indicator
Last Day in Month Indicator
Calendar Week Ending Date
Calendar Week Number in Year
Calendar Month Name
Calendar Month Number in Year
Calendar Year-Month (YYYY-MM)

Florin Radulescu, Note de curs

47

Salesperson
SP_Key (PK)
SP attributes

Promotion
Promotion_Key (PK)
Promotion attributes

Florin Radulescu, Note de curs

45

Store
Store_Key (PK)
Store attributes

DMDW-12

Calendar Quarter
Calendar Year-Quarter
Calendar Half Year
Calendar Year
Fiscal Week
Fiscal Week Number in Year
Fiscal Month
Fiscal Month Number in Year
Fiscal Year-Month
Fiscal Quarter
Fiscal Year-Quarter
Fiscal Half Year
Fiscal Year
Holiday Indicator
Weekday Indicator
Selling Season
Major Event
SQL Date Stamp

Florin Radulescu, Note de curs

48

DMDW-12

Product attributes example


Product Key (PK)
SKU Number (Natural Key)
Category Description
Package Type Description
Fat Content
Weight
Storage Type
Shelf Width
Shelf Depth

Store attributes example

Product Description
Brand Description
Department Description
Package Size
Diet Type
Weight Units of Measure
Shelf Life Type
Shelf Height

Store Name
Store Number (Natural Key)
Store Street Address
Store City
Store County
Store State
Store Zip Code
Store Manager
Store District

Store Region
Floor Plan Type
Photo Processing Type
Financial Service Type
Selling Square Footage
Total Square Footage
First Open Date
Last Remodel Date

Florin Radulescu, Note de curs

DMDW-12

49

Florin Radulescu, Note de curs

Promotion attributes example


Promotion Key (PK)
Promotion Name
Price Reduction Type
Promotion Media Type
Ad Type
Display Type

DMDW-12

50

Summary
This course presented the dimensional model
of data warehouses:

Coupon Type
Ad Media Name
Display Provider
Promotion Cost
Promotion Begin Date
Promotion End Date

Definitions for facts and dimensions, definitions


for star scheme and snow-flake scheme.
The four steps in dimensional modeling: identify
the business process, declare the grain, choose
dimensions and identify the facts
A modeling example for a sales chain with
illustration of attributes in fact and dimension
tables

Next week: Data warehouse case study


Florin Radulescu, Note de curs

DMDW-12

51

Florin Radulescu, Note de curs

52

DMDW-12

References
[CS680, 2004] Introduction to Data Warehouses, Drexel Univ. CS
680 Course notes, 2004 (page
https://www.cs.drexel.edu/~dvista/cs680/2.DW.Overview.ppt
visited 2010)
[Kimball, Ross, 2002] Ralph Kimball, Margy Ross - The Data
Warehouse Toolkit, Second Edition, Wiley & Sons, 2002
[Wikipedia] Wikipedia, the free encyclopedia, en.wikipedia.org

Florin Radulescu, Note de curs

53

DMDW-12

Você também pode gostar