Star Schema and Indexes

Data Warehouse
Fundamentals of Star
Transformations
04/06/2015
Overview
Learning Objectives
Data Integrity
Keys, Indexes, Cardinality
Star Transformation
Source to Target Documentation
Conclusion
Questions
Learning Objectives
 To understand the purpose of the star transformation

 To understand what to index
 To understand when to index
Data Integrity
Data integrity refers to the overall completeness,

accuracy and consistency of data. It is normally
enforced in a database system by a series of integrity
constraints or rules:
 Null rule
 Unique Column Values
 Entity integrity
 Referential integrity
Null Rule
 A Null is a rule defined on a single column that allows

or disallows inserts or updates of rows containing a
null in that column.
Unique Column Values
 A unique value defined on a column (or set of

columns) allows the insert or update of a row only if it
contains a unique value in that column (or set of
columns).
Unique Column Values
Entity integrity
 Concept of a primary key
 Every table must have a primary key and that the
column or columns chosen to be the primary key
should be unique and not null.
Referential integrity
Concept of a foreign key

 Usually the foreign-key value refers to a primary key
value of another table in the database.
Cardinality
 In data modeling, cardinality refers to the number of

rows in table A that relate to table B: "one-to-many"
or "many-to-many." This is said to be the cardinality
of a given table in relation to another.
 The many in a one-to-many relationship does not

mean that there must be more than one instance of
the child connected to a parent. The many in one-to-
many really means that there are zero, one, or more
instances of the child paired up to the parent.
Cardinality
Crow’s Foot Notation used in ERD (Entity Relationship
Diagram):
Cardinality
 When referring to indexes, cardinality refers to the

number of distinct values in a particular column. If
you have a PERSON table, for example, GENDER is
likely to be a very low cardinality column (only 5
values in GENDER_DIM in DW) while PATIENT_ID is
likely to be a very high cardinality column (every row
will have a different value).
Cardinality
 When looking at query plans, cardinality refers to the
number of rows that are expected to be returned
from a particular operation.
Indexes
Indexes provide faster access to data for operations

that return a small portion of a table's rows.
In general, you should create an index on a column in
any of the following situations:
 The column is filtered frequently.
 A UNIQUE key integrity constraint exists on the
column (PK).
 A referential integrity constraint exists on the column
(FK).
Indexes
Limit the Number of Indexes for Each Table
 The more indexes, the more overhead is incurred as the

table is altered. When rows are inserted or deleted, all
indexes on the table must be updated. When a column is
updated, all indexes on the column must be updated.
 If a table is primarily read-only, you might use more

indexes; but, if a table is heavily updated, you might use
fewer indexes.
Bitmap Index vs. B-tree Index
Internally, a bitmap and a B-tree indexes are very different,

but functionally they are identical in that they serve to assist
Oracle in retrieving rows faster than a full-table scan. The
basic differences between B-tree and bitmap indexes include:
 1. Syntax differences: The bitmap index includes the
"bitmap" keyword. The B-tree index does not say
"bitmap".
 2. Cardinality differences: The bitmap index is generally
for columns with lots of duplicate values (low cardinality),
while B-tree indexes are best for high cardinality columns.
B-tree Index vs. Bitmap Index
A B-tree index keeps data A bitmap index keeps data sorted

sorted in a tree-like structure in a two-dimensional array with
and it walks the branches one column for every row in the
until it hits the node with the table being indexed. It finds the
answer. answer by merging the bitmaps.
Composite Index
You can create a composite index (using several columns, up
to 32), and the same index can be used for queries that
reference all of these columns, or just some of them.
In general, you should put the column expected to be used

most often first in the index.
Primary Key vs. Unique Key
Indexes
Questions to ask clients:
 What data do you plan to filter?

 What are your required fields vs. ‘nice to haves’?
 Which columns uniquely identify each record?
Database Normalization
Database normalization is the process of organizing the

attributes and tables of a relational database to
minimize data redundancy.
First normal form (1NF)
First normal form sets the fundamental rules for database
normalization and relates to a single table within a relational
database system.
 Every column in the table must be unique

 Separate tables must be created for each set of related data
 Each table must be identified with a unique column or
concatenated columns called the primary key
 No rows may be duplicated
 no row/column intersections contain a null value
 no row/column intersections contain multivalued fields
Second normal form (2NF)
Second normal form builds on the first normal form

(1NF).
1. Split up all data resulting in many-to-many

relationships and store the data as separate tables.
2. Each nonkey attribute in the relation must be

functionally dependent upon the primary key.
Third Normal Form (3NF)
3NF states that only foreign key columns should be

used to reference another table, and no other columns
from the parent table should exist in the referenced
table.
Source to Target
Star Schema
The Star Schema is a physical database
model which consists of one or more fact
tables referencing any number of dimension
tables.
Benefits for using star schema:

• Improved query performance
• Load performance and Administration
• Referential Integrity
Star Schema
Patient_DIM Vendor_DIM
• Patient_DIM_ID • Vendor_DIM_ID
• Patient Name • Vendor Number
• Patient Address • Vendor Name
• SHC# • Vendor Address
MCA_CLAIM_FACT
• DATE_DIM_ID
• DX_DIM_ID • Month
• Diagnosis Code • Day
• Diagnosis Desc • Quarter
Diagnosis_DIM Service Dt_DIM

Star Schema Table
CLAIM LINE TOTBILL PATIENT_DIM_ID VENDOR_DIM_ID DX_DIM_ID
98675 1 100.00 68593 3287 2364
98675 2 100.00 68593 3287 3265
25555 6 250.00 42563 1256 8996
68395 3 500.00 23254 6985 5688

Flat Table (without Star Schema)
CLAIM LINE TOTBILL PATIENT_SHC PATIENT_NAME VENDOR_NAME VENDOR_HM DX_CODE DX_Desc

O_DIV
98675 1 100.00 100-232-563 SMITH, JOHN VU, LIU FAMILY 250.0 DIABETES
MEDICINE
98675 2 100.00 100-232-563 SMITH, JOHN VU, LIU FAMILY 276. CHF
MEDICINE
25555 6 250.00 101-103-600 DOE, JANE JONES, ORTHO 656.20 ANKLE
MARK SPRAIN
68395 3 500.00 102-896-405 MILLER, KELLY, KRIS PEDIATRICS 426.11 FEVER
MIKE
Snowflake Schema
• Structure in which a single fact table is
surrounded by one or more multileveled
dimensions
• Designed for flexible querying across
more complex dimension relationship
• Suitable for many-to-many and one-to-
many relationships among related
dimension levels
Snowflake Schema
• Gender_DIM_ID
• Gender Code
Gender_DIM • Gender Desc Division_DIM
Patient_DIM Vendor_DIM
• Patient_DIM_ID
• Patient Name
• Patient Address
• SHC#
MCA_CLAIM_FACT
Diagnosis_DIM Service Dt_DIM

When to use Star vs. Snowflake
• If there are attributes in the lowest level
dimension that need to be filtered on, use
SNOWFLAKE If users plan to filter
• Gender_DIM_ID on Gender values (e.g.
Gender_DIM
• Gender Code - Male or Female)
• Gender Desc during their reporting
• Patient_DIM_ID
• Patient Name Patient_DIM
• Patient Address
• SHC#
• Gender_DIM_ID
Claim Patient_DI Vendor_D Diagnosis Service_D MCA_CLAIM_FACT

M_ID IM_ID _DIM_ID t_DIM_ID
When to use Star vs. Snowflake
• If there are attributes in the lowest level dimension that
will only be displayed in reporting, but not filtered on,
“flatten out” the attributes in the highest level dimension.
This will create a Star Schema.
The Patient_DIM table
• Patient_DIM_ID should contain the
• Patient Name Patient_DIM Gender values
• Patient Address
• SHC#
• Gender Code
• Gender Desc
Claim Patient_DI Vendor_D Diagnosis Service_D

M_ID IM_ID _DIM_ID t_DIM_ID MCA_CLAIM_FACT
Conclusion
Learning Objectives
Data Integrity
Keys, Indexes, Cardinality
Star Transformation
Source to Target Documentation
Questions?

Star Schema and Indexes

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Star Schema and Indexes

Enviado por

Direitos autorais:

Formatos disponíveis

Data Warehouse

 To understand the purpose of the star transformation

Data integrity refers to the overall completeness,

 A Null is a rule defined on a single column that allows

 A unique value defined on a column (or set of

Concept of a foreign key

 In data modeling, cardinality refers to the number of

 The many in a one-to-many relationship does not

 When referring to indexes, cardinality refers to the

Indexes provide faster access to data for operations

Limit the Number of Indexes for Each Table

 The more indexes, the more overhead is incurred as the

 If a table is primarily read-only, you might use more

Internally, a bitmap and a B-tree indexes are very different,

A B-tree index keeps data A bitmap index keeps data sorted

In general, you should put the column expected to be used

Questions to ask clients:

 What data do you plan to filter?

Database normalization is the process of organizing the

 Every column in the table must be unique

Second normal form builds on the first normal form

1. Split up all data resulting in many-to-many

2. Each nonkey attribute in the relation must be

3NF states that only foreign key columns should be

Benefits for using star schema:

Diagnosis_DIM Service Dt_DIM

CLAIM LINE TOTBILL PATIENT_DIM_ID VENDOR_DIM_ID DX_DIM_ID

98675 1 100.00 68593 3287 2364

98675 2 100.00 68593 3287 3265

25555 6 250.00 42563 1256 8996

68395 3 500.00 23254 6985 5688

CLAIM LINE TOTBILL PATIENT_SHC PATIENT_NAME VENDOR_NAME VENDOR_HM DX_CODE DX_Desc

Diagnosis_DIM Service Dt_DIM

Claim Patient_DI Vendor_D Diagnosis Service_D MCA_CLAIM_FACT

Claim Patient_DI Vendor_D Diagnosis Service_D

Você também pode gostar