Escolar Documentos
Profissional Documentos
Cultura Documentos
2
The Key Acronyms And Definitions ......................................................................................................... 2
What's the need for a Data Warehouse? ................................................................................................ 3
Difference Between DWH and OLTP ....................................................................................................... 4
Data Acquisition ...................................................................................................................................... 4
Data Loading ............................................................................................................................................ 4
Good Link................................................................................................................................................. 5
Difference between Data mart and DWH ............................................................................................... 5
Top-down and Bottom-up Approach ...................................................................................................... 6
ETL Application Development Example................................................................................................... 6
Staging ..................................................................................................................................................... 7
Informatica (9.5) ...................................................................................................................................... 7
Informatica Corporation Products .......................................................................................................... 8
PowerCenter Integration Service (PCIS) .................................................................................................. 8
Transformations ...................................................................................................................................... 9
Mapplets................................................................................................................................................ 10
Tasks and types of Tasks........................................................................................................................ 10
Worklet and types of Worklets ............................................................................................................. 10
Dimensions ............................................................................................................................................ 11
Installation and Practice ........................................................................................................................ 12
What is the purpose of this article ?
With this article it is possible to understand what needs to be known to have a solid
understanding with ETL, BI and Informatica
Homogenous Joins :
Homogenous means same. A Join that is made on similar kind of database is known as Homogenous
Joins.
Heterogeneous Joins :
Heterogeneous means different. A Join that is made on different kinds of database is known as
Homogenous Joins.
Flat Files :
A File which is saved with an extension , some examples, of " .txt, .csv, .dat etc ". There are two kinds
of Flat Files (1) Delimited Flat Files (2) Fixed with Flat Files. Fixed with Flat Files can improve the
performance over Delimited Flat Files when extracting the data from Flat Files.
File List :
A File list is the list of all data files that can be grouped together. The files should contain similar meta
data definitions.
XML Source Qualifier Transformation :
It reads the data from XML files, which are saved with an extension of .xml
Detailed data
Summarized data
Data Acquisition
It means an ETL
It is the process of extracting data from the multiple OLTP source systems
Transforming the data into a desired Business format
Loading the data into a destination system
There are two types of ETL to perform Data Acquisition, that are :
Data Loading
It is the process of inserting the data into a destination system. There are two types of data loading.
That are:
1. Initial Load or Full Load and
2. Incremental Load or Delta Load
1. Initial Load: It is the process of inserting the source data rows into an empty target tables. In the
Initial Load all the source data gets loaded into target.
2. Incremental Load:
It is the process of inserting only the new records after Initial Load.
This topic places a question in our mind "How do we do it? " and introduces us "Top-Down Approach
and Bottom-Up Approach".
Good Link
http://www.nagesh.com/publications/technology/173-inmon-vs-kimball-an-analysis.html
The DWH Gurus INMON and KIMBALL explained us about Top-Down Approach and Bottom-Up
Approach.
1. Top-down Approach (INMON)
According to the Inmon's approach we develop an Enterprise Data Warehouse first and then derive
department specific subject oriented Database called Data marts.
2. Bottom-up Approach (KIMBALL)
According to the Kimball's approach we develop department specific subject oriented Database
called Data marts first and then consolidate or conglomerate the Data marts into an Enterprise Data
Warehouse.
Data mart
Designed to
information
store
department
specific
Designed for Top Management who take the Designed for Middle Level Management who take
decisions at Enterprise Level (CEO etc)
the decisions for department level
Recommended to build high range Database. For Recommended to build mid range Database. For
example Teradata Database
example Oracle Database
Enterprise
DWH
Data mart
Finance
Bottom-up Approach:
Data mart
Enterprise
DWH
Finance
Data mart
HR
OLTP Database
DWH Database
customer
+cust_id
+cust_first_name
+ gender
dim_cust
+cust_id
+cust_name
+ gender
metadata
metadata
OLTP Database
Customer
+cust_id
+cust_first_name
+ gender
Metadata of source[E]
customer
DWH Database
ODBC
ODBC
dim_cust
+cust_id
+cust_name
+ gender
Staging
A Staging is a "Temporary Database" and an intermediate layer between OLTP System and
DWH.
A Staging is a replica of OLTP System.
A Staging can be called as a data parking area where the data parks temporarily. It can be a
collection of data from one more OLTP Databases.
Informatica (9.5)
This took me to the main topics from Informatica. An Informatica PowerCenter is a Data
Integration platform which allows to access the data from multiple OLTP source systems.
Transforming the data and deliver the data throughout the Enterprise at any speed.
Founded by Mr. Gaurav Dhillon.
Current CEO is Mr. Sohaib Abbasi (It is very interesting and inspiring to read his professional
achievements. Do not miss it ! )
Transformations
A Transformation is a PowerCenter object which allows to create business rules for processing data.
There are two types of Transformations, they are:
1) Active Transformation (New feature from Informatica PowerCenter 9.1)
2) Passive Transformation
1) Active Transformation :
A Transformation which can affect the rows that is change the number of rows while processing the
data is known as an Active Transformation. Below is the list of Active Transformations used for
processing the data :
Note : t/r means Transformation
1. Source Qualifier t/r
2. Filter t/r
3. Router t/r
4. Sorter t/r
5. Aggregator t/r
6. Rank t/r
7. Joiner t/r
8. Union t/r
9. Update Strategy t/r
10. Transaction Control t/r
11. Normalizer t/r
2) Passive Transformation :
A Transformation which does not affect the rows or does not the change the number of rows while
processing the data is known as a Passive Transformation. Below is the list of Passive
Transformations used for processing the data :
Note : t/r means Transformation
1. Expression t/r
2. Lookup t/r (Very important Topic)
3. Sequence Generator t/r
4. Stored Procedure t/r
Mapplets
A Mapplet is a reusable object created with the business rules using a set of transformations
A maplet is created using Mapplet designer tool in the designer client components.
There are types of Mapplets :
1. Active Mapplet :
A Mapplet which is created with at least one Active Transformation. Such Mapplet is known as Active
Mapplet.
2. Passive Mapplet :
A Mapplet which is created with only Passive Transformation is known as Passive Mapplet.
Non-Reusable Worklet :
A Worklet which is created specific to the Workflow is known as Non-Reusable Worklet
Created using "Workflow Desinger" tool and can be converted to Reusable Worklet.
Dimensions
Star Schema :
A Star Schema is a Database Design which contains a centrally located Fact Table which is radially
surrounded by multiple Dimension Tables.
Since the database design looks like a Star hence it is known as Star Schema. There two variants of
Star Schema, that are :
1. Simple Star Schema
2. Complex Star Schema
Simple Star Schema :
It contains only one Fact Table.
Complex Star Schema :
It contains more than one Fact Table
Fact Table :
Dimension table :
A Dimension table is a descriptive data and text which describes the Key Performance
indicators, knows as Facts.
The related dimensions are organized in dimension tables
A dimension table can organize the data in hierarchical format and dimension tables are in
de-normalized
A dimension can proved or answer the following business questions, "who, what , when,
where"
Snowflake Schema :
A very large de-normalized dimensional tables are split into normalized dimensional tables
A dimensional table may have parents
Advantage : query performance increases
Disadvantage : more joins
Snowflake Schema
Galaxy Schema :
It is also known as Hybrid Schema (or) Complex Star Schema (or) Multi Star Schema (or) Integrated
Schema (or) Constellation Schema
Fact Constellation :
It is the process of joining two Fact Tables using constraints (Primary and Foreign Keys)
Conformed Dimensions :
A dimension table which is shared or used by multiple Fact Tables is known as Conformed
Dimensions.
Slowly Changing Dimensions (SCD) :
A dimension that can capture the changes which takes place over the period of time is known as
Slowly Changing Dimensions.
There are three types of Slowly Changing Dimensions, that are :
1 .TYPE 1 Dimension : It captures only current data. It does not maintain history.
2. TYPE 2 Dimension : It captures complete historical data. For each update in OLTP it will insert a
new record in the target.
3. TYPE 3 Dimension : It captures partial historical information in the target that is current and
previous.
Link to Download :
Registration of a normal Oracle Account is sufficient to download the software
https://edelivery.oracle.com/
Below is about 64 bit Installation. Navigate to the below and click "Go" and then "Continue"
Scroll down or use Ctl+F to see the below and to start downloading
Tip :