Você está na página 1de 187

Definition of data warehousing?

Data warehouse is a Subject oriented, Integrated, Time variant, Non


volatile collection of data in support of management's decision making
process.
Subject Oriented
Data warehouses are designed to help you analyze data. For example,
to learn more about your company's sales data, you can build a warehouse
that concentrates on sales. Using this warehouse, you can answer questions
like "Who was our best customer for this item last year?" This ability to
define a data warehouse by subject matter, sales in this case makes the data
warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must
put data from disparate sources into a consistent format. They must
resolve such problems as naming conflicts and inconsistencies among units
of measure. When they achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data
should not change. This is logical because the purpose of a warehouse is
to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts
of data. This is very much in contrast to online transaction processing
(OLTP) systems, where performance requirements demand that historical
data be moved to an archive. A data warehouse's focus on change over time
is what is meant by the term time variant.
2.
How many stages in Datawarehousing?
Data warehouse generally includes two stages
ETL
Report Generation
ETL
Short for extract, transform, load, three database functions that are
combined into one tool

Extract -- the process of reading data from a source database.

Transform -- the process of converting the extracted data from its


previous form into required form
Load -- the process of writing the data into the target database.
ETL is used to migrate data from one database to another, to form data
marts anddata warehouses and also to convert databases from one format to
another format.

It is used to retrieve the data from various operational databases and is


transformed into useful information and finally loaded into Datawarehousing
system.
1 INFORMATICA
2 ABINITO
3 DATASTAGE
4. BODI
5 ORACLE WAREHOUSE BUILDERS
Report generation
In report generation, OLAP is used (i.e.) online analytical processing. It
is a set of specification which allows the client applications in retrieving the
data for analytical processing.
It is a specialized tool that sits between a database and user in order to
provide various analyses of the data stored in the database.
OLAP Tool is a reporting tool which generates the reports that are useful for
Decision support for top level management.
1.
Business Objects
2.
Cognos
3.
Micro strategy
4.
Hyperion
5.
Oracle Express
6. Microsoft Analysis Services

2
3
4
5
6
7
8
9
1
0
1
1
1
2

Different Between OLTP and OLAP


OLTP
OLAP
Application Oriented
Subject Oriented (subject in the
(e.g., purchase order it is sense customer, product, item,
functionality of an
time)
application)
Used to run business
Used to analyze business
Detailed data
Summarized data
Repetitive access
Ad-hoc access
Few Records accessed at Large volumes accessed at a
a time (tens), simple
time(millions), complex query
query
Small database
Large Database
Current data
Historical data
Clerical User
Knowledge User
Row by Row Loading
Bulk Loading
Time invariant
Time variant
Normalized data

De-normalized data

E R schema

Star schema

3.
What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
It provides a central database for decision support throughout the enterprise
It is a collection of DATAMARTS
DATAMART
It is a subset of Datawarehousing
It is a subject oriented database which supports the needs of individuals
depts. in an organizations
It is called high performance query structure
It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
It is defined as an integrated view of operational database designed to
support operational monitoring
It is a collection of operational data sources designed to support Transaction
processing
Data is refreshed near real-time and used for business activity
It is an intermediate between the OLTP and OLAP which helps to create an
instance reports

4.

What are the modeling involved in Data Warehouse Architecture?

5.

What are the types of Approach in DWH?


Bottom up approach: first we need to develop data mart then we integrate
these data mart into EDW
Top down approach: first we need to develop EDW then form that EDW we
develop data mart
Bottom up
OLTP
ETL
Data mart
DWH
OLAP
Top down
OLTP
ETL
DWH
Data mart
OLAP
Top down
Cost of initial planning & design is high
Takes longer duration of more than an year
Bottom up
Planning & Designing the Data Marts without waiting for the Global
warehouse design
Immediate results from the data marts
Tends to take less time to implement
Errors in critical modules are detected earlier.
Benefits are realized in the early phases.
It is a Best Approach
Data Modeling Types:
Conceptual Data Modeling
Logical Data Modeling
Physical Data Modeling
Dimensional Data Modeling
1. Conceptual Data Modeling
Conceptual data model includes all major entities and relationships and does
not contain much detailed level of information about attributes and is often
used in the INITIAL PLANNING PHASE
Conceptual data model is created by gathering business requirements from
various sources like business documents, discussion with functional teams,
business analysts, smart management experts and end users who do the
reporting on the database. Data modelers create conceptual data model and
forward that model to functional team for their review.
Conceptual data modeling gives an idea to the functional and
technical team about how business requirements would be
projected in the logical data model.

2. Logical Data Modeling


This is the actual implementation and extension of a conceptual data
model. Logical data model includes all required entities, attributes, key
groups, and relationships that represent business information and
define business rules.
3. Physical Data Modeling
Physical data model includes all required tables, columns, relationships,
database properties for the physical implementation of databases.

Database performance, indexing strategy, physical storage and


demoralization are important parameters of a physical model.
Logical vs. Physical Data Modeling
Logical Data Model
Physical Data Model
Represents business information
and defines business rules

Represents the physical implementation


of the model in a database.

Entity

Table

Attribute

Column

Primary Key

Primary Key Constraint

Alternate Key

Unique Constraint or Unique Index

Inversion Key Entry

Non Unique Index

Rule

Check Constraint, Default Value

Relationship

Foreign Key

Definition

Comment

Dimensional Data Modeling


Dimension model consists of fact and dimension tables
It is an approach to develop the schema DB designs
Types of Dimensional modeling
Star schema
Snow flake schema
Star flake schema (or) Hybrid schema
Multi star schema
What is Star Schema?
The Star Schema Logical database design which contains a centrally located
fact table surrounded by at least one or more dimension tables
Since the database design looks like a star, hence it is called star schema db
The Dimension table contains Primary keys and the textual descriptions
It contain de-normalized business information
A Fact table contains a composite key and measures
The measure are of types of key performance indicators which are used to
evaluate the enterprise performance in the form of success and failure
Eg: Total revenue , Product sale , Discount given, no of customers
To generate meaningful report the report should contain at least one
dimension and one fact table
The advantage of star schema
Less number of joins
Improve query performance
Slicing down
Easy understanding of data.
Disadvantage:
Require more storage space

Example of Star Schema:


Snowflake Schema
In star schema, If the dimension tables are spitted into one or more
dimension tables
The de-normalized dimension tables are spitted into a normalized dimension
table
Example of Snowflake Schema:
In Snowflake schema, the example diagram shown below has 4 dimension
tables, 4 lookup tables and 1 fact table. The reason is that hierarchies
(category, branch, state, and month) are being broken out of the dimension
tables (PRODUCT, ORGANIZATION, LOCATION, and TIME) respectively and
separately.
It increases the number of joins and poor performance in retrieval of data.
In few organizations, they try to normalize the dimension tables to save
space.
Since dimension tables hold less space snow flake schema approach may be
avoided.
Bit map indexes cannot be effectively utilized
Important aspects of Star Schema & Snow Flake Schema
In a star schema every dimension will have a primary key.
In a star schema, a dimension table will not have any parent table.
Whereas in a snow flake schema, a dimension table will have one or more
parent tables.
Hierarchies for the dimensions are stored in the dimensional table itself in
star schema.
Whereas hierarchies are broken into separate tables in snow flake schema.
These hierarchies help to drill down the data from topmost hierarchies to the
lowermost hierarchies.
Star flake schema (or) Hybrid Schema
Hybrid schema is a combination of Star and Snowflake schema
Multi Star schema
Multiple fact tables sharing a set of dimension tables
Confirmed Dimensions are nothing but Reusable Dimensions.
The dimensions which u r using multiple times or in multiple data marts.
Those are common in different data marts
Measure Types (or) Types of Facts

Additive - Measures that can be summed up across all dimensions.


o
Ex: Sales Revenue

Semi Additive - Measures that can be summed up across few


dimensions and not with others
o
Ex: Current Balance

Non Additive - Measures that cannot be summed up across any of the


dimensions.
Ex: Student attendance
Surrogate Key
Joins between fact and dimension tables should be based on surrogate keys
Users should not obtain any information by looking at these keys
These keys should be simple integers

A sample data warehouse schema


WHY NEED STAGING AREA FOR DWH?
Staging area needs to clean operational data before loading into data
warehouse.
Cleaning in the sense your merging data which comes from different source.
Its the area where most of the ETL is done
Data Cleansing
It is used to remove duplications
It is used to correct wrong email addresses
It is used to identify missing data
It used to convert the data types
It is used to capitalize name & addresses.
Types of Dimensions:
There are three types of Dimensions
Confirmed Dimensions
Junk Dimensions Garbage Dimension
Degenerative Dimensions
Slowly changing Dimensions
Garbage Dimension or Junk Dimension
Confirmed is something which can be shared by multiple Fact Tables or
multiple Data Marts.
Junk Dimensions is grouping flagged values
Degenerative Dimension is something dimensional in nature but exist fact
table.(Invoice No)
Which is neither fact nor strictly dimension attributes. These
are useful for some kind of analysis. These are kept as attributes in fact table
called degenerated dimension
Degenerate dimension: A column of the key section of the fact table
that does not have the associated dimension table but used for
reporting and analysis, such column is called degenerate dimension or line
item dimension.
For ex, we have a fact table with customer_id, product_id, branch_id,
employee_id, bill_no, and date in key section and price, quantity, amount in
measure section. In this fact table, bill_no from key section is a single value;
it has no associated dimension table. Instead of creating a
Separate dimension table for that single value, we can Include it in fact table
to improve performance. SO here the column, bill_no is a degenerate
dimension or line item dimension.
Informatica Architecture

The Power Center domain


It is a primary unit of the Administration.
Can have single and multiple domains.
It is a collection of nodes and services.
Nodes
A node is the logical representation of a machine in a domain
One node in the domain acts as a gateway node to receive service requests
from clients and route them to the appropriate service and node
Integration Service:
Integration Service does all the real job. It extracts data from sources,
processes it as per the business logic and loads data to targets.
Repository Service:
Repository Service is used to fetch the data from the repository and sends it
back to the
requesting components (mostly client tools and integration
service)
Power Center Repository:
Repository is nothing but a relational database which stores all the metadata
created in Power Center.
Power Center Client Tools:
The Power Center Client consists of multiple tools.
Power Center Administration Console:
This is simply a web-based administration tool you can use to administer the
Power Center installation.

Q. How can you define a transformation? What are different types of


transformations available in Informatica?
A. A transformation is a repository object that generates, modifies, or passes
data. The Designer provides a set of transformations that perform specific
functions. For example, an Aggregator transformation performs calculations
on groups of data. Below are the various transformations available in
Informatica:
Aggregator
Custom
Expression
External Procedure
Filter
Input
Joiner
Lookup
Normalizer
Rank
Router
Sequence Generator
Sorter
Source Qualifier
Stored Procedure
Transaction Control
Union
Update Strategy
XML Generator
XML Parser
XML Source Qualifier
Q. What is a source qualifier? What is meant by Query Override?
A. Source Qualifier represents the rows that the PowerCenter Server reads
from a relational or flat file source when it runs a session. When a relational
or a flat file source definition is added to a mapping, it is connected to a
Source Qualifier transformation.
PowerCenter Server generates a query for each Source Qualifier
Transformation whenever it runs the session. The default query is SELET
statement containing all the source columns. Source Qualifier has capability
to override this default query by changing the default settings of the
transformation properties. The list of selected ports or the order they appear
in the default query should not be changed in overridden query.
Q. What is aggregator transformation?
A. The Aggregator transformation allows performing aggregate calculations,
such as averages and sums. Unlike Expression Transformation, the
Aggregator transformation can only be used to perform calculations on

groups. The Expression transformation permits calculations on a rowby-row


basis only.
Aggregator Transformation contains group by ports that indicate how to
group the data. While grouping the data, the aggregator transformation
outputs the last row of each group unless otherwise specified in the
transformation properties.
Various group by functions available in Informatica are : AVG, COUNT, FIRST,
LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.
Q. What is Incremental Aggregation?
A. Whenever a session is created for a mapping Aggregate Transformation,
the session option for Incremental Aggregation can be enabled. When
PowerCenter performs incremental aggregation, it passes new source
data through the mapping and uses historical cache data to perform
new aggregation calculations incrementally.
Q. How Union Transformation is used?
A. The union transformation is a multiple input group transformation that can
be used to merge data from various sources (or pipelines). This
transformation works just like UNION ALL statement in SQL, that is used to
combine result set of two SELECT statements.
Q. Can two flat files be joined with Joiner Transformation?
A. Yes, joiner transformation can be used to join data from two flat file
sources.
Q. What is a look up transformation?
A. This transformation is used to lookup data in a flat file or a relational table,
view or synonym. It compares lookup transformation ports (input ports) to
the source column values based on the lookup condition. Later returned
values can be passed to other transformations.
Q. Can a lookup be done on Flat Files?
A. Yes.
Q. What is a mapplet?
A. A mapplet is a reusable object that is created using mapplet designer. The
mapplet contains set of transformations and it allows us to reuse that
transformation logic in multiple mappings.
Q. What does reusable transformation mean?
A. Reusable transformations can be used multiple times in a mapping. The
reusable
transformation is stored as a metadata separate from any other mapping
that uses the
transformation. Whenever any changes to a reusable transformation are
made, all the mappings where the transformation is used will be invalidated.
Q. What is update strategy and what are the options for update
strategy?

A. Informatica processes the source data row-by-row. By default every row is


marked to be inserted in the target table. If the row has to be
updated/inserted based on some logic Update Strategy transformation is
used. The condition can be specified in Update Strategy to mark the
processed row for update or insert.
Following options are available for update strategy:
DD_INSERT: If this is used the Update Strategy flags the row for insertion.
Equivalent numeric value of DD_INSERT is 0.
DD_UPDATE: If this is used the Update Strategy flags the row for update.
Equivalent numeric value of DD_UPDATE is 1.
DD_DELETE: If this is used the Update Strategy flags the row for deletion.
Equivalent numeric value of DD_DELETE is 2.
DD_REJECT: If this is used the Update Strategy flags the row for rejection.
Equivalent numeric value of DD_REJECT is 3.

Q. What are the types of loading in Informatica?


There are two types of loading, 1. Normal loading and 2. Bulk loading.
In normal loading, it loads record by record and writes log for that. It takes
comparatively a longer time to load data to the target.
In bulk loading, it loads number of records at a time to target database. It
takes less time to load data to target.
Q. What is aggregate cache in aggregator transformation?
The aggregator stores data in the aggregate cache until it completes
aggregate calculations. When you run a session that uses an aggregator
transformation, the informatica server creates index and data caches in
memory to process the transformation. If the informatica server requires
more space, it stores overflow values in cache files.
Q. What type of repositories can be created using Informatica
Repository Manager?
A. Informatica PowerCenter includes following type of repositories:
Standalone Repository: A repository that functions individually and this
is unrelated to any other repositories.
Global Repository: This is a centralized repository in a domain. This
repository can
contain shared objects across the repositories in a domain. The objects are
shared through global shortcuts.
Local Repository: Local repository is within a domain and its not a
global repository. Local repository can connect to a global repository using
global shortcuts and can use objects in its shared folders.

Versioned Repository: This can either be local or global repository but it


allows version control for the repository. A versioned repository can store
multiple copies, or versions of an object. This feature allows efficiently
developing, testing and deploying metadata in the production environment.
Q. What is a code page?
A. A code page contains encoding to specify characters in a set of one or
more languages. The code page is selected based on source of the data. For
example if source contains Japanese text then the code page should be
selected to support Japanese text.
When a code page is chosen, the program or application for which the code
page is set, refers to a specific set of data that describes the characters the
application recognizes. This influences the way that application stores,
receives, and sends character data.
Q. Which all databases PowerCenter Server on Windows can connect
to?
A. PowerCenter Server on Windows can connect to following databases:
IBM DB2
Informix
Microsoft Access
Microsoft Excel
Microsoft SQL Server
Oracle
Sybase
Teradata
Q. Which all databases PowerCenter Server on UNIX can connect to?
A. PowerCenter Server on UNIX can connect to following databases:
IBM DB2
Informix
Oracle
Sybase
Teradata
Q. How to execute PL/SQL script from Informatica mapping?
A. Stored Procedure (SP) transformation can be used to execute PL/SQL
Scripts. In SP
Transformation PL/SQL procedure name can be specified. Whenever the
session is executed, the session will call the pl/sql procedure.
Q. What is Data Driven?
The informatica server follows instructions coded into update strategy
transformations within the session mapping which determine how to flag
records for insert, update, delete or reject. If we do not choose data driven
option setting, the informatica server ignores all update strategy
transformations in the mapping.

Q. What are the types of mapping wizards that are provided in


Informatica?
The designer provide two mapping wizard.
1. Getting Started Wizard - Creates mapping to load static facts and
dimension tables as well as slowly growing dimension tables.
2. Slowly Changing Dimensions Wizard - Creates mappings to load
slowly changing dimension tables based on the amount of historical
dimension data we want to keep and the method we choose to handle
historical dimension data.
Q. What is Load Manager?
A. While running a Workflow, the PowerCenter Server uses the Load
Manager
process and the Data Transformation Manager Process (DTM) to run
the workflow and carry out workflow tasks. When the PowerCenter Server
runs a workflow, the Load Manager performs the following tasks:
1. Locks the workflow and reads workflow properties.
2. Reads the parameter file and expands workflow variables.
3. Creates the workflow log file.
4. Runs workflow tasks.
5. Distributes sessions to worker servers.
6. Starts the DTM to run sessions.
7. Runs sessions from master servers.
8. Sends post-session email if the DTM terminates abnormally.
When the PowerCenter Server runs a session, the DTM performs the
following tasks:
1. Fetches session and mapping metadata from the repository.
2. Creates and expands session variables.
3. Creates the session log file.
4. Validates session code pages if data code page validation is enabled.
Checks
Query conversions if data code page validation is disabled.
5. Verifies connection object permissions.
6. Runs pre-session shell commands.
7. Runs pre-session stored procedures and SQL.
8. Creates and runs mappings, reader, writer, and transformation threads to
extract,
transform, and load data.
9. Runs post-session stored procedures and SQL.
10. Runs post-session shell commands.
11. Sends post-session email.
Q. What is Data Transformation Manager?
A. After the load manager performs validations for the session, it creates the
DTM

process. The DTM process is the second process associated with the session
run. The
primary purpose of the DTM process is to create and manage threads that
carry out
the session tasks.
The DTM allocates process memory for the session and divide it into
buffers. This
is also known as buffer memory. It creates the main thread, which is called
the
master thread. The master thread creates and manages all other threads.
If we partition a session, the DTM creates a set of threads for each partition
to
allow concurrent processing.. When Informatica server writes messages to
the
session log it includes thread type and thread ID.
Following are the types of threads that DTM creates:
Master Thread - Main thread of the DTM process. Creates and manages all
other
threads.
Mapping Thread - One Thread to Each Session. Fetches Session and
Mapping
Information.
Pre and Post Session Thread - One Thread each to Perform Pre and Post
Session
Operations.
Reader Thread - One Thread for Each Partition for Each Source Pipeline.
Writer Thread - One Thread for Each Partition if target exist in the source
pipeline
write to the target.
Transformation Thread - One or More Transformation Thread For Each
Partition.
Q. What is Session and Batches?
Session - A Session Is A set of instructions that tells the Informatica Server
How
And When To Move Data From Sources To Targets. After creating the session,
we
can use either the server manager or the command line program pmcmd to
start
or stop the session.
Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel
Execution By The Informatica Server. There Are Two Types Of Batches:
1. Sequential - Run Session One after the Other.
2. Concurrent - Run Session At The Same Time.

Q. How many ways you can update a relational source definition and
what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition
Q. What is a transformation?
A. It is a repository object that generates, modifies or passes data.
Q. What are the designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. In how many ways can you create ports?
A. Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.
Q. What are reusable transformations?
A. A transformation that can be reused is called a reusable transformation
They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q. Is aggregate cache in aggregator transformation?
A. The aggregator stores data in the aggregate cache until it completes
aggregate calculations. When u run a session that uses an aggregator
transformation, the Informatica server creates index and data caches in
memory to process the transformation. If the Informatica server requires
more space, it stores overflow values in cache files.
Q. What r the settings that u use to configure the joiner
transformation?
Master and detail source
Type of join
Condition of the join
Q. What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail (matching or non
matching)
Q. What are the joiner caches?
A. When a Joiner transformation occurs in a session, the Informatica Server
reads all the records from the master source and builds index and data
caches based on the master rows. After building the caches, the Joiner
transformation reads records

from the detail source and performs joins.


Q. What r the types of lookup caches?
Static cache: You can configure a static or read-only cache for only
lookup table. By default Informatica server creates a static cache. It caches
the lookup table and lookup values in the cache for each row that comes into
the transformation. When the lookup condition is true, the Informatica server
does not update the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new
rows into cache and the target, you can create a look up transformation to
use dynamic cache. The Informatica server dynamically inserts data to the
target table.
Persistent cache: You can save the lookup cache files and reuse them
the next time the Informatica server processes a lookup transformation
configured to use the cache.
Recache from database: If the persistent cache is not synchronized
with the lookup table, you can configure the lookup transformation to rebuild
the lookup cache.
Shared cache: You can share the lookup cache between multiple
transactions. You can share unnamed cache between transformations in the
same mapping.
Q. What is Transformation?
A: Transformation is a repository object that generates, modifies, or passes
data.
Transformation performs specific function. They are two types of
transformations:
1. Active
Rows, which are affected during the transformation or can change the no of
rows that pass through it. Eg: Aggregator, Filter, Joiner, Normalizer, Rank,
Router, Source qualifier, Update Strategy, ERP Source Qualifier, Advance
External Procedure.
2. Passive
Does not change the number of rows that pass through it. Eg: Expression,
External Procedure, Input, Lookup, Stored Procedure, Output, Sequence
Generator, XML Source Qualifier.
Q. What are Options/Type to run a Stored Procedure?
A: Normal: During a session, the stored procedure runs where the
transformation exists in the mapping on a row-by-row basis. This is useful for
calling the stored procedure for each row of data that passes through the
mapping, such as running a calculation against an input port. Connected
stored procedures run only in normal mode.
Pre-load of the Source. Before the session retrieves data from the source,
the stored procedure runs. This is useful for verifying the existence of tables
or performing joins of data in a temporary table.

Post-load of the Source. After the session retrieves data from the source,
the stored procedure runs. This is useful for removing temporary tables.
Pre-load of the Target. Before the session sends data to the target, the
stored procedure runs. This is useful for verifying target tables or disk space
on the target system.
Post-load of the Target. After the session sends data to the target, the
stored procedure runs. This is useful for re-creating indexes on the database.
It must contain at least one Input and one Output port.
Q. What kinds of sources and of targets can be used in Informatica?
Sources may be Flat file, relational db or XML.
Target may be relational tables, XML or flat files.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM
process, and
sends post-session email when the session completes.
Q. What is DTM process?
A: The DTM process creates threads to initialize the session, read, write,
transform
data and handle pre and post-session operations.
Q. What is the different type of tracing levels?
Tracing level represents the amount of information that Informatica
Server writes in a log file. Tracing levels store information about mapping
and transformations. There are 4 types of tracing levels supported
1. Normal: It specifies the initialization and status information and
summarization of the success rows and target rows and the information
about the skipped rows due to transformation errors.
2. Terse: Specifies Normal + Notification of data
3. Verbose Initialization: In addition to the Normal tracing, specifies the
location of the data cache files and index cache files that are treated and
detailed transformation statistics for each and every transformation within
the mapping.
4. Verbose Data: Along with verbose initialization records each and every
record processed by the informatica server.
Q. TYPES OF DIMENSIONS?
A dimension table consists of the attributes about the facts. Dimensions
store the textual descriptions of the business.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact
table to which they are joined.
Eg: The date dimension table connected to the sales facts is identical to the
date dimension connected to the inventory facts.
Junk Dimension:

A junk dimension is a collection of random transactional codes flags and/or


text attributes that are unrelated to any particular dimension. The junk
dimension is simply a structure that provides a convenient place to store the
junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension.
In the fact table we need to maintain two keys referring to these dimensions.
Instead of that create a junk dimension which has all the combinations of
gender and marital status (cross join gender and marital status table and
create a junk table). Now we can maintain only one key in the fact table.
Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table
and doesnt have its own dimension table.
Eg: A transactional code in a fact table.
Slowly changing dimension:
Slowly changing dimensions are dimension tables that have slowly increasing
data as well as updates to existing data.
Q. What are the output files that the Informatica server creates
during the
session running?
Informatica server log: Informatica server (on UNIX) creates a log for all
status and
error messages (default name: pm.server.log). It also creates an error log for
error
messages. These files will be created in Informatica home directory
Session log file: Informatica server creates session log file for each session.
It writes
information about session into log files such as initialization process, creation
of sql
commands for reader and writer threads, errors encountered and load
summary. The
amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in
mapping.
Session detail includes information such as table name, number of rows
written or
rejected. You can view this file by double clicking on the session in monitor
window.
Performance detail file: This file contains information known as session
performance
details which helps you where performance can be improved. To generate
this file
select the performance detail option in the session property sheet.

Reject file: This file contains the rows of data that the writer does not write
to
targets.
Control file: Informatica server creates control file and a target file when
you run a
session that uses the external loader. The control file contains the
information about
the target flat file such as data format and loading instructions for the
external
loader.
Post session email: Post session email allows you to automatically
communicate
information about a session run to designated recipients. You can create two
different messages. One if the session completed successfully the other if
the session
fails.
Indicator file: If you use the flat file as a target, you can configure the
Informatica
server to create indicator file. For each target row, the indicator file contains
a
number to indicate whether the row was marked for insert, update, delete or
reject.
Output file: If session writes to a target file, the Informatica server creates
the
target file based on file properties entered in the session property sheet.
Cache files: When the Informatica server creates memory cache it also
creates cache
files.
For the following circumstances Informatica server creates index and data
cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q. What is meant by lookup caches?
A. The Informatica server builds a cache in memory when it processes the
first row
of a data in a cached look up transformation. It allocates memory for the
cache
based on the amount you configure in the transformation or session
properties. The

Informatica server stores condition values in the index cache and output
values in
the data cache.
Q. How do you identify existing rows of data in the target table
using lookup
transformation?
A. There are two ways to lookup the target table to verify a row exists or not :
1. Use connect dynamic cache lookup and then check the values of
NewLookuprow
Output port to decide whether the incoming record already exists in the table
/ cache
or not.
2. Use Unconnected lookup and call it from an expression transformation and
check
the Lookup condition port value (Null/ Not Null) to decide whether the
incoming
record already exists in the table or not.
Q. What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is
grouped to certain levels of dimensions. Retrieving the required data from
the actual table, which have millions of records will take more time and also
affects the server performance. To avoid this we can aggregate the table to
certain required level and can use it. This tables reduces the load in the
database server and increases the performance of the query and can
retrieve the result very fastly.
Q. What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a
data warehouse. For example: Based on design you can decide to put the
sales data in each transaction. Now, level of granularity would mean what
detail you are willing to put for each transactional fact. Product sales with
respect to each minute or you want to aggregate it upto minute and put that
data.
Q. What is session?
A session is a set of instructions to move data from sources to targets.
Q. What is worklet?
Worklet are objects that represent a set of workflow tasks that allow to reuse
a set of workflow logic in several window.
Use of Worklet: You can bind many of the tasks in one place so that they
can easily get identified and also they can be of a specific purpose.
Q. What is workflow?
A workflow is a set of instructions that tells the Informatica server how to
execute the tasks.

Q. Why cannot we use sorted input option for


incremental aggregation?
In incremental aggregation, the aggregate calculations are stored in
historical cache on the server. In this historical cache the data need not be in
sorted order. If you give sorted input, the records come as presorted for that
particular run but in the historical cache the data may not be in the sorted
order. That is why this option is not allowed.
Q. What is target load order plan?
You specify the target loadorder based on source qualifiers in a mapping. If
you have the multiple source qualifiers connected to the multiple targets,
you can designate the order in which informatica server loads data into the
targets.
The Target load Plan defines the order in which data extract from source
qualifier transformation. In Mappings (tab) Target Load Order Plan
Q. What is constraint based loading?
Constraint based load order defines the order of loading the data into the
multiple targets based on primary and foreign keys constraints.
Set the option is: Double click the session
Configure Object > check the Constraint Based Loading
Q. What is the status code in stored procedure transformation?
Status code provides error handling for the informatica server during the
session. The stored procedure issues a status code that notifies whether or
not stored procedure completed successfully. This value cannot see by the
user. It only used by the informatica server to determine whether to continue
running the session or stop.
Q. Define Informatica Repository?
The Informatica repository is a relational database that stores information, or
metadata, used by the Informatica Server and Client tools. Metadata can
include information such as mappings describing how to transform source
data, sessions indicating when you want the Informatica Server to perform
the transformations, and connect strings for sources and targets.
The repository also stores administrative information such as usernames and
passwords, permissions and privileges, and product version.
Use repository manager to create the repository. The Repository Manager
connects to the repository database and runs the code needed to create the
repository tables. These tables stores metadata in specific format the
informatica server, client tools use.
Q. What is a metadata?

Designing a data mart involves writing and storing a complex set of


instructions. You need to know where to get data (sources), how to change it,
and where to write the information (targets). PowerMart and PowerCenter
call this set of instructions metadata. Each piece of metadata (for example,
the description of a source table in an operational database) can contain
comments about it.
In summary, Metadata can include information such as mappings describing
how to transform source data, sessions indicating when you want the
Informatica Server to perform the transformations, and connect strings for
sources and targets.

Q. What is metadata reporter?


It is a web based application that enables you to run reports against
repository metadata. With a Meta data reporter you can access information
about your repository without having knowledge of sql, transformation
language or underlying tables in the repository.

Q. What are the types of metadata that stores in repository?


Source definitions. Definitions of database objects (tables, views,
synonyms) or files that provide source data.
Target definitions. Definitions of database objects or files that contain the
target data. Multi-dimensional metadata. Target definitions that are
configured as cubes and dimensions.
Mappings. A set of source and target definitions along
with transformations containing business logic that you build into the
transformation. These are the instructions that the Informatica Server uses to
transform and move data.
Reusable transformations. Transformations that you can use in
multiple mappings.
Mapplets. A set of transformations that you can use in multiple mappings.
Sessions and workflows. Sessions and workflows store information about
how and when the Informatica Server moves data. A workflow is a set of
instructions that describes how and when to run tasks related to extracting,
transforming, and loading data. A session is a type of task that you can put
in a workflow. Each session corresponds to a single mapping.
Following are the types of metadata that stores in the repository
Database Connections
Global Objects
Multidimensional Metadata
Reusable Transformations
Short cuts
Transformations
Q. How can we store previous session logs?
Go to Session Properties > Config Object > Log Options
Select the properties as follows.
Save session log by > SessionRuns
Save session log for these runs > Change the number that you want to save
the number of log files (Default is 0)
If you want to save all of the logfiles created by every run, and then select
the option
Save session log for these runs > Session TimeStamp
You can find these properties in the session/workflow Properties.
Q. What is Changed Data Capture?
Changed Data Capture (CDC) helps identify the data in the source system
that has changed since the last extraction. With CDC data extraction takes
place at the same time the insert update or delete operations occur in the
source tables and the change data is stored inside the database in change
tables.

The change data thus captured is then made available to the target systems
in a controlled manner.
Q. What is an indicator file? and how it can be used?
Indicator file is used for Event Based Scheduling when you dont know when
the Source Data is available. A shell command, script or a batch file creates
and send this indicator file to the directory local to the Informatica Server.
Server waits for the indicator file to appear before running the session.

Q. What is audit table? and What are the columns in it?


Audit Table is nothing but the table which contains about your workflow
names and session names. It contains information about workflow and
session status and their details.
WKFL_RUN_ID
WKFL_NME
START_TMST
END_TMST
ROW_INSERT_CNT
ROW_UPDATE_CNT
ROW_DELETE_CNT
ROW_REJECT_CNT
Q. If session fails after loading 10000 records in the target, how can
we load 10001th record when we run the session in the next time?
Select the Recovery Strategy in session properties as Resume from the
last check point. Note Set this property before running the session
Q. Informatica Reject File How to identify rejection reason
D - Valid data or Good Data. Writer passes it to the target database. The
target accepts it unless a database error occurs, such as finding a duplicate
key while inserting.
O - Overflowed Numeric Data. Numeric data exceeded the specified
precision or scale for the column. Bad data, if you configured the mapping
target to reject overflow or truncated data.
N - Null Value. The column contains a null value. Good data. Writer passes
it to the target, which rejects it if the target database does not accept null
values.
T - Truncated String Data. String data exceeded a specified precision for
the column, so the Integration Service truncated it. Bad data, if you
configured the mapping target to reject overflow or truncated data.
Also to be noted that the second column contains column indicator flag value
D which signifies that the Row Indicator is valid.
Now let us see how Data in a Bad File looks like:
0,D,7,D,John,D,5000.375,O,,N,BrickLand Road Singapore,T

Q. What is Insert Else Update and Update Else Insert?


These options are used when dynamic cache is enabled.
Insert Else Update option applies to rows entering the lookup
transformation with the row type of insert. When this option is enabled the
integration service inserts new rows in the cache and updates existing rows.
When disabled, the Integration Service does not update existing rows.
Update Else Insert option applies to rows entering the lookup
transformation with the row type of update. When this option is enabled, the
Integration Service updates existing rows, and inserts a new row if it is new.
When disabled, the Integration Service does not insert new rows.

Q. What are the Different methods of loading Dimension tables?


Conventional Load - Before loading the data, all the Table constraints will
be checked against the data.
Direct load (Faster Loading) - All the Constraints will be disabled. Data will
be loaded directly. Later the data will be checked against the table
constraints and the bad data wont be indexed.
Q. What are the different types of Commit intervals?
The different commit intervals are:
Source-based commit. The Informatica Server commits data based on
the number of source rows. The commit point is the commit interval you
configure in the session properties.
Target-based commit. The Informatica Server commits data based on
the number of target rows and the key constraints on the target table. The
commit point also depends on the buffer block size and the commit interval.
Q. How to add source flat file header into target file?
Edit Task-->Mapping-->Target-->Header Options--> Output field names
Q. How to load name of the file into relation target?
Source Definition-->Properties-->Add currently processed file name port
Q. How to return multiple columns through un-connect lookup?
Suppose your look table has f_name,m_name,l_name and you are using
unconnected lookup. In override SQL of lookup use f_name||~||m_name||
~||l_name you can easily get this value using unconnected lookup in
expression. Use substring function in expression transformation to separate
these three columns and make then individual port for downstream
transformation /Target.
-----------------------------------------------------------------------------------------

Q. What is Factless fact table? In which purpose we are using this in


our DWH projects? Plz give me the proper answer?
It is a fact table which does not contain any measurable data.
EX: Student attendance fact (it contains only Boolean values, whether
student attended class or not ? Yes or No.)
A Factless fact table contains only the keys but there is no measures or in
other way we can say that it contains no facts. Generally it is used to
integrate the fact tables
Factless fact table contains only foreign keys. We can have two kinds of
aggregate functions from the factless fact one is count and other is distinct
count.
2 purposes of factless fact
1. Coverage: to indicate what did NOT happen. Like to
Like: which product did not sell well in a particular region?
2. Event tracking: To know if the event took place or not.
Like: Fact for tracking students attendance will not contain any measures.
Q. What is staging area?
Staging area is nothing but to apply our logic to extract the data from source
and cleansing the data and put the data into meaningful and summaries of
the data for data warehouse.
Q. What is constraint based loading
Constraint based load order defines the order of loading the data into the
multiple targets based on primary and foreign keys constraints.
Q. Why union transformation is active transformation?
the only condition for a transformation to bcum active is row number
changes.
Now the thing is how a row number can change. Then there are
2 conditions:
1. either the no of rows coming in and going out is diff.
eg: in case of filter we have the data like
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
and we have a filter condition like dept=4 then the o/p wld
b like
id name dept row_num
1 aa 4 1

3 cc 4 2
So row num changed and it is an active transformation
2. or the order of the row changes
eg: when Union transformation pulls in data, suppose we have
2 sources
sources1:
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
source2:
id name dept row_num
4 aaa 4 4
5 bbb 3 5
6 ccc 4 6
it never restricts the data from any source so the data can
come in any manner
id name dept row_num old row_num
1 aa 4 1 1
4 aaa 4 2 4
5 bbb 3 3 5
2 bb 3 4 2
3 cc 4 5 3
6 ccc 4 6 6
so the row_num are changing . Thus we say that union is an active
transformation
Q. What is use of batch file in informatica? How many types of batch
file in informatica?
With the batch file, we can run sessions either in sequential or in
concurrently.
Grouping of Sessions is known as Batch.
Two types of batches:
1)Sequential: Runs Sessions one after another.
2)Concurrent: Run the Sessions at the same time.
If u have sessions with source-target dependencies u have to go for
sequential batch to start the sessions one after another. If u have several
independent sessions u can use concurrent batches Which run all the
sessions at the same time

Q. What is joiner cache?


When we use the joiner transformation an integration service maintains the
cache, all the records are stored in joiner cache. Joiner caches have 2 types
of cache 1.Index cache 2. Joiner cache.
Index cache stores all the port values which are participated in the join
condition and data cache have stored all ports which are not participated in
the join condition.
Q. What is the location of parameter file in Informatica?
$PMBWPARAM
Q. How can you display only hidden files in UNIX
$ ls -la
total 16
8 drwxrwxrwx 2 zzz yyy 4096 Apr 26 12:00 ./
8 drwxrwxrwx 9 zzz yyy 4096 Jul 31 16:59 ../
Correct answer is
ls -a|grep "^\."
$ls -a
Q. How to delete the data in the target table after loaded.
SQ---> Properties tab-->Post SQL
delete from target_tablename
SQL statements executed using the source database connection, after a
pipeline is run write post sql in target table as truncate table name. we have
the property in session truncate option.
Q. What is polling in informatica?
It displays the updated information about the session in the monitor window.
The monitor window displays the status of each session when you poll the
Informatica server.
Q. How i will stop my workflow after 10 errors
Session level property error handling mention condition stop on errors:
10
--->Config object > Error Handling > Stop on errors
Q. How can we calculate fact table size?
A fact table is multiple of combination of dimension tables
ie if we want 2 find the fact table size of 3years of historical date with 200
products and 200 stores
3*365*200*200=fact table size
Q. Without using emailtask how will send a mail from informatica?

by using 'mailx' command in unix of shell scripting


Q. How will compare two mappings in two different repositories?
in the designer client , goto mapping tab there is one
option that is 'compare', here we will compare two mappings in two different
repository
in informatica designer go to mapping tab--->compare..
we can compare 2 folders within the same repository ..
we can compare 2 folders within different repository ..
Q. What is constraint based load order
Constraint based load order defines the order in which data loads into the
multiple targets based on primary key and foreign key relationship.
Q. What is target load plan
Suppose i have 3 pipelines in a single mapping designer
emp source--->sq--->tar1
dept source--->sq--->tar2
bonus source--->sq--->tar3
my requirement is to load first in tar2 then tar1 and then finally tar3
for this type of loading to control the extraction of data from source by
source qualifier we use target load plan.
Q. What is meant by data driven.. in which scenario we use that..?
Data driven is available at session level. it says that when we r using update
strategy t/r ,how the integration service fetches the data and how to
update/insert row in the database log.
Data driven is nothing but instruct the source rows that should take action on
target i.e(update,delete,reject,insert). If we use the update strategy
transformation in a mapping then will select the data driven option in
session.
Q. How to run workflow in unix?
Syntax: pmcmd startworkflow -sv <service name> -d <domain name> -u
<user name> -p <password> -f <folder name> <workflow name>
Example
Pmcmd start workflow service
${INFA_SERVICE} -domain
${INFA_DOMAIN} -uv xxx_PMCMD_ID -pv PSWD -folder
${ETLFolder} -wait ${ETLWorkflow} \
Q. What is the main difference between a Joiner Transformation and
Union Transformation?
Joiner Transformation merge horizontally

Union Transformation merge vertically


A joiner Transformation is used to join data from hertogenous database ie
(Sql database and flat file) where has Union transformation is used to join
data from
the same relational sources.....(oracle table and another Oracle table)
Join Transformation combines data record horizontally based on join
condition.
And combine data from two different sources having different metadata.
Join transformation supports heterogeneous, homogeneous data source.
Union Transformation combines data record vertically from multiple sources,
having same metadata.
Union transformation also support heterogeneous data source.
Union transformation functions as UNION ALL set operator.

Q. What is constraint based loading exactly? And how to do this? I


think it is when we have primary key-foreign key relationship. Is it
correct?
Constraint Based Load order defines load the data into multiple targets
depend on the primary key foreign key relation.
set the option is: Double click the session
Configure Object check the Constraint Based Loading
Q. Difference between top down(w.h inmon)and bottom up(ralph
kimball)approach?
Top Down approach:As per W.H.INWON, first we need to build the Data warehouse after that we
need to build up the DataMart but this is so what difficult to maintain the
DWH.
Bottom up approach;As per Ralph Kimbal, first we need to build up the Data Marts then we need
to build up the Datawarehouse..
this approach is most useful in real time while creating the Data warehouse.
Q. What are the different caches used in informatica?

Static cache
Dynamic cache
Shared cache
Persistent cache

Q. What is the command to get the list of files in a directory in unix?


$ls -lrt
Q. How to import multiple flat files in to single target where there is
no common column in the flat files
in workflow session properties in Mapping tab in properties choose Source
filetype - Indirect
Give the Source filename : <file_path>
This <file_path> file should contain all the multiple files which you want to
Load
Q. How to connect two or more table with single source qualifier?
Create a Oracle source with how much ever column you want and write the
join query in SQL query override. But the column order and data type should
be same as in the SQL query.
Q. How to call unconnected lookup in expression transformation?
:LKP.LKP_NAME(PORTS)
Q. What is diff between connected and unconnected lookup?
Connected lookup:
It is used to join the two tables
it returns multiple rows
it must be in mapping pipeline
u can implement lookup condition
using connect lookup u can generate sequence numbers by
enabling dynamic lookup cache.
Unconnected lookup:
it returns single output through return port
it acts as a lookup function(:lkp)
it is called by another t/r.
not connected either source r target.
-----CONNECTED LOOKUP:
>> It will participated in data pipeline
>> It contains multiple inputs and multiple outputs.
>> It supported static and dynamic cache.
UNCONNECTED LOOKUP:
>> It will not participated in data pipeline
>> It contains multiple inputs and single output.

>> It supported static cache only.


Q. Types of partitioning in Informatica?
Partition 5 types
1.
2.
3.
4.
5.

Simple pass through


Key range
Hash
Round robin
Database

Q. Which transformation uses cache?


1. Lookup transformation
2. Aggregator transformation
3. Rank transformation
4. Sorter transformation
5. Joiner transformation
Q. Explain about union transformation?
A union transformation is a multiple input group transformation, which is
used to merge the data from multiple sources similar to UNION All SQL
statements to combine the results from 2 or more sql statements.
Similar to UNION All statement, the union transformation doesn't remove
duplicate rows. It is an active transformation.
Q. Explain about Joiner transformation?
Joiner transformation is used to join source data from two related
heterogeneous sources. However this can also be used to join data from the
same source. Joiner t/r join sources with at least one matching column. It
uses a condition that matches one or more pair of columns between the 2
sources.
To configure a Joiner t/r various settings that we do are as below:
1) Master and detail source
2) Types of join
3) Condition of the join
Q. Explain about Lookup transformation?
Lookup t/r is used in a mapping to look up data in a relational table, flat file,
view or synonym.
The informatica server queries the look up source based on the look up ports
in the transformation. It compares look up t/r port values to look up source
column values based on the look up condition.
Look up t/r is used to perform the below mentioned tasks:
1) To get a related value.
2) To perform a calculation.

3) To update SCD tables.


Q. How to identify this row for insert and this row for update in
dynamic lookup cache?
Based on NEW LOOKUP ROW.. Informatica server indicates which one is
insert and which one is update.
Newlookuprow- 0...no change
Newlookuprow- 1...Insert
Newlookuprow- 2...update
Q. How many ways can we implement SCD2?
1) Date range
2) Flag
3) Versioning
Q. How will you check the bottle necks in informatica? From where
do you start checking?
You start as per this order
1. Target
2. Source
3. Mapping
4. Session
5. System
Q. What is incremental aggregation?
When the aggregator transformation executes all the output data will get
stored in the temporary location called aggregator cache. When the next
time the mapping runs the aggregator transformation runs for the new
records loaded after the first run. These output values will get incremented
with the values in the aggregator cache. This is called incremental
aggregation. By this way we can improve performance...
--------------------------Incremental aggregation means applying only the captured changes in the
source to aggregate calculations in a session.
When the source changes only incrementally and if we can capture those
changes, then we can configure the session to process only those changes.
This allows informatica server to update target table incrementally, rather
than forcing it to process the entire source and recalculate the same
calculations each time you run the session. By doing this obviously the
session performance increases.
Q. How can i explain my project architecture in interview..? Tell me
your project flow from source to target..?
Project architecture is like

1. Source Systems: Like Mainframe,Oracle,People soft,DB2.


2. Landing tables: These are tables act like source. Used for easy to access,
for backup purpose, as reusable for other mappings.
3. Staging tables: From landing tables we extract the data into staging
tables after all validations done on the data.
4. Dimension/Facts: These are the tables those are used for analysis and
make decisions by analyzing the data.
5. Aggregation tables: These tables have summarized data useful for
managers who wants to view monthly wise sales, year wise sales etc.
6. Reporting layer: 4 and 5 phases are useful for reporting developers to
generate reports. I hope this answer helps you.
Q. What type of transformation is not supported by mapplets?

Normalizer transformation
COBOL sources, joiner
XML source qualifier transformation
XML sources
Target definitions
Pre & Post Session stored procedures
Other mapplets

Q. How informatica recognizes mapping?


All are organized by Integration service.
Power center talks to Integration Service and Integration service talk to
session. Session has mapping Structure. These are flow of Execution.
Q. Can every transformation reusable? How?
Except source qualifier transformation, all transformations support reusable
property. Reusable transformation developed in two ways.
1. In mapping which transformation do you want to reuse, select the
transformation and double click on it, there you got option like make it as
reusable transformation
option. There you need to check the option for converting non reusable to
reusable transformation. but except for source qualifier trans.
2. By using transformation developer
Q. What is Pre Sql and Post Sql?
Pre SQL means that the integration service runs SQL commands against the
source database before it reads the data from source.
Post SQL means integration service runs SQL commands against target
database after it writes to the target.

Q. Insert else update option in which situation we will use?


if the source table contain multiple records .if the record specified in the
associated port to insert into lookup cache. it does not find a record in the
lookup cache when it is used find the particular record & change the data in
the associated port.
---------------------We set this property when the lookup TRFM uses dynamic cache and the
session property TREAT SOURCE ROWS AS "Insert" has been set.
-------------------This option we use when we want to maintain the history.
If records are not available in target table then it inserts the records in to
target and records are available in target table then it updates the records.
Q. What is an incremental loading? in which situations we will use
incremental loading?
Incremental Loading is an approach. Let suppose you a mapping for load the
data from employee table to a employee_target table on the hire date basis.
Again let suppose you already move the employee data from source to
target up to the employees hire date 31-12-2009.Your organization now want
to load data on employee_target today. Your target already have the data of
that employees having hire date up to 31-12-2009.so you now pickup the
source data which are hiring from 1-1-2010 to till date. That's why you
needn't take the data before than that date, if you do that wrongly it is
overhead for loading data again in target which is already exists. So in
source qualifier you filter the records as per hire date and you can also
parameterized the hire date that help from which date you want to load data
upon target.
This is the concept of Incremental loading.
Q. What is target update override?
By Default the integration service updates the target based on key columns.
But we might want to update non-key columns also, at that point of time we
can override the
UPDATE statement for each target in the mapping. The target override
affects only when the source rows are marked as update by an update
strategy in the mapping.

Q. What is the Mapping parameter and Mapping variable?


Mapping parameter: Mapping parameter is constant values that can be
defined before mapping run. A mapping parameter reuses the mapping for
various constant values.
Mapping variable: Mapping variable is represent a value that can be
change during the mapping run that can be stored in repository the
integration service retrieve that value from repository and incremental value
for next run.
Q. What is rank and dense rank in informatica with any examples
and give sql query for this both ranks
for eg: the file contains the records with column
100
200(repeated rows)
200
300
400
500
the rank function gives output as
1
2
2
4
5
6
and dense rank gives
1
2
2
3
4
5
for eg: the file contains the records with column
empno sal
100 1000
200(repeated rows) 2000
200 3000
300 4000
400 5000
500 6000
Rank :

select rank() over (partition by empno order by sal) from emp


1
2
2
4
5
6
Dense Rank
select dense_rank() over (partition by empno order by sal) from emp
and dense rank gives
1
2
2
3
4
5
Q. What is the incremental aggregation?
The first time you run an upgraded session using incremental aggregation,
the Integration Service upgrades the index and data cache files. If you want
to partition a session using a mapping with incremental aggregation, the
Integration Service realigns the index and data cache files.
Q. What is session parameter?
Parameter file is a text file where we can define the values to the parameters
.session parameters are used for assign the database connection values
Q. What is mapping parameter?
A mapping parameter represents a constant value that can be defined before
mapping run. A mapping parameter defines a parameter file which is saved
with an extension.prm a mapping parameter reuse the various constant
values.
Q. What is parameter file?
A parameter file can be a text file. Parameter file is to define the values for
parameters and variables used in a session. A parameter file is a file created
by text editor such as word pad or notepad. You can define the following
values in parameter file

Mapping parameters

Mapping variables

Session parameters

Q. What is session override?


Session override is an option in informatica at session level. Here we can
manually give a sql query which is issued to the database when the session
runs. It is nothing but over riding the default sql which is generated by a
particular transformation at mapping level.
Q. What are the diff. b/w informatica versions 8.1.1 and 8.6.1?
Little change in the Administrator Console. In 8.1.1 we can do all the creation
of IS and repository Service, web service, Domain, node, grid ( if we have
licensed version),In 8.6.1 the Informatica Admin console we can manage
both Domain page and security page. Domain Page means all the above like
creation of IS and repository Service, web service, Domain, node, grid ( if we
have licensed version) etc. Security page means creation of users, privileges,
LDAP configuration, Export Import user and Privileges etc.
Q. What are the uses of a Parameter file?
Parameter file is one which contains the values of mapping variables.
type this in notepad.save it .
foldername.sessionname
$$inputvalue1=
--------------------------------Parameter files are created with an extension of .PRM
These are created to pass values those can be changed for Mapping
Parameter and Session Parameter during mapping run.
Mapping Parameters:
A Parameter is defined in a parameter file for which a Parameter is create
already in the Mapping with Data Type , Precision and scale.
The Mapping parameter file syntax (xxxx.prm).
[FolderName.WF:WorkFlowName.ST:SessionName]
$$ParameterName1=Value
$$ParameterName2=Value
After that we have to select the properties Tab of Session and Set Parameter
file name including physical path of this xxxx.prm file.
Session Parameters:
The Session Parameter files syntax (yyyy.prm).
[FolderName.SessionName]
$InputFileValue1=Path of the source Flat file

After that we have to select the properties Tab of Session and Set Parameter
file name including physical path of this yyyy.prm file.
Do following changes in Mapping Tab of Source Qualifier's
Properties section
Attributes
values
Source file Type ---------> Direct
Source File Directory --------> Empty
Source File Name
--------> $InputFileValue1
Q. What is the default data driven operation in informatica?
This is default option for update strategy transformation.
The integration service follows instructions coded in update strategy within
session mapping determine how to flag records for
insert,delete,update,reject. If you do not data driven option setting, the
integration service ignores update strategy transformations in the mapping.
Q. What is threshold error in informatica?
When the target is used by the update strategy DD_REJECT,DD_UPDATE and
some limited count, then if it the number of rejected records exceed the
count then the
session ends with failed status. This error is called Threshold Error.

Q. SO many times i saw "$PM parser error ". What is meant by PM?
PM: POWER MART
1) Parsing error will come for the input parameter to the lookup.
2) Informatica is not able to resolve the input parameter CLASS for your
lookup.
3) Check the Port CLASS exists as either input port or a variable port in your
expression.
4) Check data type of CLASS and the data type of input parameter for your
lookup.
Q. What is a candidate key?
A candidate key is a combination of attributes that can be uniquely used to
identify a database record without any extraneous data (unique). Each table
may have one or more candidate keys. One of these candidate keys is
selected as the table primary key else are called Alternate Key.
Q. What is the difference between Bitmap and Btree index?
Bitmap index is used for repeating values.
ex: Gender: male/female

Account status:Active/Inactive
Btree index is used for unique values.
ex: empid.
Q. What is ThroughPut in Informatica?
Thoughtput is the rate at which power centre server read the rows in bytes
from source or writes the rows in bytes into the target per second.
You can find this option in workflow monitor. Right click on session choose
properties and Source/Target Statictics tab you can find thoughtput
details for each instance of source and target.
Q. What are set operators in Oracle
UNION
UNION ALL
MINUS
INTERSECT
Q. How i can Schedule the Informatica job in "Unix Cron scheduling
tool"?
Crontab
The crontab (cron derives from chronos, Greek for time; tab stands for table)
command, found in Unix and Unix-like operating systems, is used to schedule
commands to be executed periodically. To see what crontabs are currently
running on your system, you can open a terminal and run:
sudo crontab -l
To edit the list of cronjobs you can run:
sudo crontab -e
This will open a the default editor (could be vi or pico, if you want you can
change the default editor) to let us manipulate the crontab. If you save and
exit the editor, all your cronjobs are saved into crontab. Cronjobs are written
in the following format:
* * * * * /bin/execute/this/script.sh
Scheduling explained
As you can see there are 5 stars. The stars represent different date parts in
the following order:
1.
minute (from 0 to 59)
2.
hour (from 0 to 23)
3.
day of month (from 1 to 31)
4.
month (from 1 to 12)
5.
day of week (from 0 to 6) (0=Sunday)
Execute every minute
If you leave the star, or asterisk, it means every. Maybe

that's a bit unclear. Let's use the the previous example


again:
* * * * * /bin/execute/this/script.sh
They are all still asterisks! So this means
execute /bin/execute/this/script.sh:
1.
every minute
2.
of every hour
3.
of every day of the month
4.
of every month
5.
and every day in the week.
In short: This script is being executed every minute.
Without exception.
Execute every Friday 1AM
So if we want to schedule the script to run at 1AM every
Friday, we would need the following cronjob:
0 1 * * 5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1.
minute: 0
2.
of hour: 1
3.
of day of month: * (every day of month)
4.
of month: * (every month)
5.
and weekday: 5 (=Friday)
Execute on weekdays 1AM
So if we want to schedule the script to run at 1AM every Friday, we would
need the following cronjob:
0 1 * * 1-5 /bin/execute/this/script.sh
Get it? The script is now being executed when the system
clock hits:
1.
minute: 0
2.
of hour: 1
3.
of day of month: * (every day of month)
4.
of month: * (every month)
5.
and weekday: 1-5 (=Monday til Friday)
Execute 10 past after every hour on the 1st of every month
Here's another one, just for practicing
10 * 1 * * /bin/execute/this/script.sh
Fair enough, it takes some getting used to, but it offers great flexibility.

Q. Can anyone tell me the difference between persistence and


dynamic caches? On which conditions we are using these caches?
Dynamic:-1)When you use a dynamic cache, the Informatica Server updates the lookup
cache as it passes rows to the target.
2)In Dynamic, we can update catch will New data also.
3) Dynamic cache, Not Reusable
(when we need Updated cache data, That only we need Dynamic Cache)
Persistent:-1)a Lookup transformation to use a non-persistent or persistent cache. The
PowerCenter Server saves or deletes lookup cache files after a successful
session based on the Lookup Cache Persistent property.
2) Persistent, we are not able to update the catch with New data.
3) Persistent catch is Reusable.
(When we need Previous Cache data, That only we need Persistent Cache)
---------------------------------few more additions to the above answer.....
1. Dynamic lookup allows modifying cache where as Persistent lookup does
not allow us to modify cache.
2. Dynamic lookup use 'newlookup row', a default port in the cache but
persistent does use any default ports in cache.
3.As session completes dynamic cache removed but the persistent cache
saved in informatica power centre server.

Q. How to obtain performance data for individual transformations?


There is a property at session level Collect Performance Data, you can
select that property. It gives you performance details for all the
transformations.

Q. List of Active and Passive Transformations in Informatica?


Active Transformation - An active transformation changes the number of
rows that pass through the mapping.
Source Qualifier Transformation
Sorter Transformations
Aggregator Transformations
Filter Transformation
Union Transformation
Joiner Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Update Strategy Transformation
Advanced External Procedure Transformation
Passive Transformation - Passive transformations do not change the
number of rows that pass through the mapping.
Expression Transformation
Sequence Generator Transformation
Lookup Transformation
Stored Procedure Transformation
XML Source Qualifier Transformation
External Procedure Transformation
Q. Eliminating of duplicate records without using dynamic lookups?
Hi U can eliminate duplicate records by an simple one line SQL Query.
Select id, count (*) from seq1 group by id having count (*)>1;
Below are the ways to eliminate the duplicate records:
1. By enabling the option in Source Qualifier transformation
as select distinct.
2. By enabling the option in sorter transformation as select distinct.
3. By enabling all the values as group by in Aggregator
transformation.
Q. Can anyone give idea on how do we perform test load in
informatica? What do we test as part of test load in informatica?
With a test load, the Informatica Server reads and transforms data without
writing to targets. The Informatica Server does everything, as if running the
full session. The Informatica Server writes data to relational targets, but rolls
back the data when the session completes. So, you can enable collect
performance details property and analyze the how efficient your mapping is.
If the session is running for a long time, you may like to find out the

bottlenecks that are existing. It may be bottleneck of type target, source,


mapping etc.
The basic idea behind test load is to see the behavior of Informatica Server
with your session.
Q. What is ODS (Operational Data Store)?
A collection of operation or bases data that is extracted from operation
databases and standardized, cleansed, consolidated, transformed, and
loaded into enterprise data architecture.
An ODS is used to support data mining of operational data, or as the store
for base data that is summarized for a data warehouse.
The ODS may also be used to audit the data warehouse to assure
summarized and derived data is calculated properly. The ODS may further
become the enterprise shared operational database, allowing operational
systems that are being reengineered to use the ODS as there operation
databases.

Q. How many tasks are there in informatica?


Session Task
Email Task
Command Task
Assignment Task
Control Task
Decision Task
Event-Raise
Event- Wait
Timer Task
Link Task
Q. What are business components in Informatica?

Domains
Nodes
Services

Q. WHAT IS VERSIONING?
Its used to keep history of changes done on the mappings and workflows
1. Check in: You check in when you are done with your changes so that
everyone can see those changes.
2. Check out: You check out from the main stream when you want to make
any change to the mapping/workflow.
3. Version history: It will show you all the changes made and who made it.

Q. Diff between $$$sessstarttime and sessstarttime?


$$$SessStartTime - Returns session start time as a string value
(String datatype)
SESSSTARTTIME - Returns the date along with date timestamp
(Date datatype)
Q. Difference between $,$$,$$$ in Informatica?
1. $ Refers
These are the system variables/Session Parameters like $Bad file,$input
file, $output file, $DB connection,$source,$target etc..
2. $$ Refers
User defined variables/Mapping Parameters like $$State,$$Time, $$Entity, $
$Business_Date, $$SRC,etc.
3. $$$ Refers
System Parameters like $$$SessStartTime
$$$SessStartTime returns the session start time as a string value. The format
of the string depends on the database you are using.
$$$SessStartTime returns the session start time as a string value --> The
format of the string depends on the database you are using.
Q. Finding Duplicate Rows based on Multiple Columns?
SELECT firstname, COUNT(firstname), surname, COUNT(surname), email,
COUNT(email) FROM employee
GROUP BY firstname, surname, email
HAVING (COUNT(firstname) > 1) AND (COUNT(surname) > 1) AND
(COUNT(email) > 1);

Q. Finding Nth Highest Salary in Oracle?


Pick out the Nth highest salary, say the 4th highest salary.
Select * from
(select ename,sal,dense_rank() over (order by sal desc) emp_rank from emp)
where emp_rank=4;
Q. Find out the third highest salary?
SELECT MIN(sal) FROM emp WHERE
sal IN (SELECT distinct TOP 3 sal FROM emp ORDER BY sal DESC);
Q. How do you handle error logic in Informatica? What are the
transformations that you used while handling errors? How did you
reload those error records in target?
Row indicator: It generally happens when working with update strategy
transformation. The writer/target rejects the rows going to the target
Column indicator:
D -Valid

o - Overflow
n - Null
t - Truncate
When the data is with nulls, or overflow it will be rejected to write the data to
the target
The reject data is stored on reject files. You can check the data and reload
the data in to the target using reject reload utility.
Q. Difference between STOP and ABORT?
Stop - If the Integration Service is executing a Session task when you issue
the stop command, the Integration Service stops reading data. It continues
processing and writing data and committing data to targets. If the
Integration Service cannot finish processing and committing data, you can
issue the abort command.
Abort - The Integration Service handles the abort command for the Session
task like the stop command, except it has a timeout period of 60 seconds. If
the Integration Service cannot finish processing and committing data within
the timeout period, it kills the DTM process and terminates the session.
Q. WHAT IS INLINE VIEW?
An inline view is term given to sub query in FROM clause of query which can
be used as table. Inline view effectively is a named sub query
Ex : Select Tab1.col1,Tab1.col.2,Inview.col1,Inview.Col2
From Tab1, (Select statement) Inview
Where Tab1.col1=Inview.col1
SELECT DNAME, ENAME, SAL FROM EMP ,
(SELECT DNAME, DEPTNO FROM DEPT) D
WHERE A.DEPTNO = B.DEPTNO
In the above query (SELECT DNAME, DEPTNO FROM DEPT) D is the inline
view.
Inline views are determined at runtime, and in contrast to normal view they
are not stored in the data dictionary,
Disadvantage of using this is
1. Separate view need to be created which is an overhead
2. Extra time taken in parsing of view
This problem is solved by inline view by using select statement in sub query
and using that as table.

Advantage of using inline views:


1. Better query performance
2. Better visibility of code
Practical use of Inline views:
1. Joining Grouped data with non grouped data
2. Getting data to use in another query
Q. WHAT IS GENERATED KEY AND GENERATED COLUMN ID IN
NORMALIZER TRANSFORMATION?

The integration service increments the generated key (GK) sequence


number each time it process a source row. When the source row contains a
multiple-occurring column or a multiple-occurring group of columns, the
normalizer transformation returns a row for each occurrence. Each row
contains the same generated key value.
The normalizer transformation has a generated column ID (GCID) port for
each multiple-occurring column. The GCID is an index for the instance of the
multiple-occurring data. For example, if a column occurs 3 times in a source
record, the normalizer returns a value of 1, 2 or 3 in the generated column
ID.
Q. WHAT IS DIFFERENCE BETWEEN SUBSTR AND INSTR?
INSTR function search string for sub-string and returns an integer indicating
the position of the character in string that is the first character of this
occurrence.
SUBSTR function returns a portion of string, beginning at character position,
substring_length characters long. SUBSTR calculates lengths using
characters as defined by the input character set.
Q. WHAT ARE DIFFERENT ORACLE DATABASE OBJECTS?

TABLES
VIEWS
INDEXES
SYNONYMS
SEQUENCES
TABLESPACES
Q. WHAT IS @@ERROR?
The @@ERROR automatic variable returns the error code of the last TransactSQL statement. If there was no error, @@ERROR returns zero. Because
@@ERROR is reset after each Transact-SQL statement, it must be saved to a
variable if it is needed to process it further after checking it.
Q. WHAT IS DIFFERENCE BETWEEN CO-RELATED SUB QUERY AND
NESTED SUB QUERY?

Correlated subquery runs once for each row selected by the outer query. It
contains a reference to a value from the row selected by the outer query.
Nested subquery runs only once for the entire nesting (outer) query. It
does not contain any reference to the outer query row.
For example,
Correlated Subquery:
Select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal
= (select max(basicsal) from emp e2 where e2.deptno = e1.deptno)
Nested Subquery:
Select empname, basicsal, deptno from emp where (deptno, basicsal) in
(select deptno, max(basicsal) from emp group by deptno)
Q. HOW DOES ONE ESCAPE SPECIAL CHARACTERS WHEN BUILDING
SQL QUERIES?
The LIKE keyword allows for string searches. The _ wild card character is
used to match exactly one character, % is used to match zero or more
occurrences of any characters. These characters can be escaped in SQL.
Example:
SELECT name FROM emp WHERE id LIKE %\_% ESCAPE \;
Use two quotes for every one displayed. Example:
SELECT Frankss Oracle site FROM DUAL;
SELECT A quoted word. FROM DUAL;
SELECT A double quoted word. FROM DUAL;

Q. DIFFERENCE BETWEEN SURROGATE KEY AND PRIMARY KEY?


Surrogate key:
1. Query processing is fast.
2. It is only numeric
3. Developer develops the surrogate key using
sequence generator transformation.
4. Eg: 12453
Primary key:
1. Query processing is slow
2. Can be alpha numeric
3. Source system gives the primary key.
4. Eg: C10999
Q. HOW DOES ONE ELIMINATE DUPLICATE ROWS IN AN
ORACLE TABLE?
Method 1:

DELETE from table_name A


where rowid > (select min(rowid) from table_name B where A.key_values =
B.key_values);
Method 2:
Create table table_name2 as select distinct * from table_name1;
drop table table_name1;
rename table table_name2 as table_name1;
In this method, all the indexes,constraints,triggers etc have to be re-created.
Method 3:
DELETE from table_name t1
where exists (select x from table_name t2 where
t1.key_value=t2.key_value and t1.rowid > t2.rowid)
Method 4:
DELETE from table_name where rowid not in (select max(rowid) from
my_table group by key_value )
Q. QUERY TO RETRIEVE NTH ROW FROM AN ORACLE TABLE?
The query is as follows:
select * from my_table where rownum <= n
MINUS
select * from my_table where rownum < n;

Q. How does the server recognize the source and target databases?
If it is relational - By using ODBC connection
FTP connection - By using flat file
Q. WHAT ARE THE DIFFERENT TYPES OF INDEXES SUPPORTED
BY ORACLE?
1.
2.
3.
4.
5.
6.

B-tree index
B-tree cluster index
Hash cluster index
Reverse key index
Bitmap index
Function Based index
Q. TYPES OF NORMALIZER TRANSFORMATION?
There are two types of Normalizer transformation.
VSAM Normalizer transformation
A non-reusable transformation that is a Source Qualifier transformation for a
COBOL source. The Mapping Designer creates VSAM Normalizer columns
from a COBOL source in a mapping. The column attributes are read-only. The
VSAM Normalizer receives a multiple-occurring source column through one
input port.
Pipeline Normalizer transformation
A transformation that processes multiple-occurring data from relational
tables or flat files. You might choose this option when you want to process
multiple-occurring data from another transformation in the mapping.
A VSAM Normalizer transformation has one input port for a multipleoccurring column. A pipeline Normalizer transformation has multiple input
ports for a multiple-occurring column.
When you create a Normalizer transformation in the Transformation
Developer, you create a pipeline Normalizer transformation by default. When
you create a pipeline Normalizer transformation, you define the columns
based on the data the transformation receives from another type of
transformation such as a Source Qualifier transformation.
The Normalizer transformation has one output port for each single-occurring
input port.
Q. WHAT ARE ALL THE TRANSFORMATION YOU USED IF SOURCE AS
XML FILE?

XML Source Qualifier


XML Parser
XML Generator

Q. List the files in ascending order in UNIX?


ls -lt (sort by last date modified)
ls ltr (reverse)
ls lS (sort by size of the file)
Q. How do identify the empty line in a flat file in UNIX? How to
remove it?
grep v ^$ filename
Q. How do send the session report (.txt) to manager after session
is completed?
Email variable %a (attach the file) %g attach session log file
Q. How to check all the running processes in UNIX?
$> ps ef
Q. How can i display only and only hidden file in the current
directory?
ls -a|grep "^\."
Q. How to display the first 10 lines of a file?
# head -10 logfile
Q. How to display the last 10 lines of a file?
# tail -10 logfile

Q. How did you schedule sessions in your project?


1. Run once Set 2 parameter date and time when session should start.
2. Run Every Informatica server run session at regular interval as we
configured, parameter Days, hour, minutes, end on, end after, forever.
3. Customized repeat Repeat every 2 days, daily frequency hr, min, every
week, every month.
Q. What is lookup override?
This feature is similar to entering a custom query in a Source Qualifier
transformation. When entering a Lookup SQL Override, you can enter the
entire override, or generate and edit the default SQL statement.
The lookup query override can include WHERE clause.
Q. What is Sql Override?
The Source Qualifier provides the SQL Query option to override the default
query. You can enter any SQL statement supported by your source database.
You might enter your own SELECT statement, or have the database perform
aggregate calculations, or call a stored procedure or stored function to read
the data and perform some tasks.
Q. How to get sequence value using Expression?
v_temp = v_temp+1
o_seq = IIF(ISNULL(v_temp), 0, v_temp)
Q. How to get Unique Record?
Source > SQ > SRT > EXP > FLT OR RTR > TGT
In Expression:
flag = Decode(true,eid=pre_eid, Y,'N)
flag_out = flag
pre_eid = eid
Q. WHAT ARE THE DIFFERENT TRANSACTION LEVELS AVAILABLE IN
TRANSACTION CONTROL TRANSFORMATION (TCL)?

The following are the transaction levels or built-in variables:


TC_CONTINUE_TRANSACTION: The Integration Service does not
perform any transaction change for this row. This is the default value of the
expression.
TC_COMMIT_BEFORE: The Integration Service commits the transaction,
begins a new transaction, and writes the current row to the target. The
current row is in the new transaction.
TC_COMMIT_AFTER: The Integration Service writes the current row to
the target, commits the transaction, and begins a new transaction. The
current row is in the committed transaction.
TC_ROLLBACK_BEFORE: The Integration Service rolls back the current
transaction, begins a new transaction, and writes the current row to the
target. The current row is in the new transaction.

TC_ROLLBACK_AFTER: The Integration Service writes the current row to


the target, rolls back the transaction, and begins a new transaction. The
current row is in the rolled back transaction.
Q. What is difference between grep and find?
Grep is used for finding any string in the file.
Syntax - grep <String> <filename>
Example - grep 'compu' details.txt
Display the whole line,in which line compu string is found.
Find is used to find the file or directory in given path,
Syntax - find <filename>
Example - find compu*
Display all file names starting with computer

Q. WHAT ARE THE DIFFERENCE BETWEEN DDL, DML AND


DCL COMMANDS?

DDL is Data Definition Language statements


CREATE to create objects in the database
ALTER alters the structure of the database
DROP delete objects from the database
TRUNCATE remove all records from a table, including all spaces
allocated for the records are removed
COMMENT add comments to the data dictionary
GRANT gives users access privileges to database
REVOKE withdraw access privileges given with the GRANT command
DML is Data Manipulation Language statements
SELECT retrieve data from the a database
INSERT insert data into a table
UPDATE updates existing data within a table
DELETE deletes all records from a table, the space for the records
remain
CALL call a PL/SQL or Java subprogram
EXPLAIN PLAN explain access path to data
LOCK TABLE control concurrency
DCL is Data Control Language statements
COMMIT save work done
SAVEPOINT identify a point in a transaction to which you can later roll
back
ROLLBACK restore database to original since the last COMMIT
SET TRANSACTION Change transaction options like what rollback
segment to use

Q. What is Stored Procedure?


A stored procedure is a named group of SQL statements that have been
previously created and stored in the server database. Stored procedures
accept input parameters so that a single procedure can be used over the
network by several clients using different input data. And when the
procedure is modified, all clients automatically get the new version. Stored
procedures reduce network traffic and improve performance. Stored
procedures can be used to help ensure the integrity of the database.
Q. What is View?
A view is a tailored presentation of the data contained in one or more
tables (or other views). Unlike a table, a view is not allocated any storage
space, nor does a view actually contain data; rather, a view is defined by a
query that extracts or derives data from the tables the view references.
These tables are called base tables.
Views present a different representation of the data that resides within
the base tables. Views are very powerful because they allow you to tailor
the presentation of data to different types of users.
Views are often used to:
Provide an additional level of table security by restricting access to
a predetermined set of rows and/or columns of a table
Hide data complexity
Simplify commands for the user
Present the data in a different perspective from that of the base table
Isolate applications from changes in definitions of base tables
Express a query that cannot be expressed without using a view
Q. What is Trigger?
A trigger is a SQL procedure that initiates an action when an event (INSERT,
DELETE or UPDATE) occurs. Triggers are stored in and managed by the DBMS.
Triggers are used to maintain the referential integrity of data by changing the
data in a systematic fashion. A trigger cannot be called or executed; the
DBMS automatically fires the trigger as a result of a data modification to the
associated table. Triggers can be viewed as similar to stored procedures in
that both consist of procedural logic that is stored at the database level.
Stored procedures, however, are not event-drive and are not attached to a
specific table as triggers are. Stored procedures are explicitly executed by
invoking a CALL to the procedure while triggers are implicitly executed. In
addition, triggers can also execute stored Procedures.
Nested Trigger: A trigger can also contain INSERT, UPDATE and DELETE
logic within itself, so when the trigger is fired because of data modification it
can also cause another data modification, thereby firing another trigger. A
trigger that contains data modification logic within itself is called a nested
trigger.
Q. What is View?

A simple view can be thought of as a subset of a table. It can be used for


retrieving data, as well as updating or deleting rows. Rows updated or
deleted in the view are updated or deleted in the table the view was created
with. It should also be noted that as data in the original table changes, so
does data in the view, as views are the way to look at part of the original
table. The results of using a view are not permanently stored in the
database. The data accessed through a view is actually constructed using
standard T-SQL select command and can come from one to many different
base tables or even other views.
Q. What is Index?
An index is a physical structure containing pointers to the data. Indices are
created in an existing table to locate rows more quickly and efficiently. It is
possible to create an index on one or more columns of a table, and each
index is given a name. The users cannot see the indexes; they are just used
to speed up queries. Effective indexes are one of the best ways to improve
performance in a database application. A table scan happens when there is
no index available to help a query. In a table scan SQL Server examines
every row in the table to satisfy the query results. Table scans are sometimes
unavoidable, but on large tables, scans have a terrific impact on
performance. Clustered indexes define the physical sorting of a database
tables rows in the storage media. For this reason, each database table may
have only one clustered index. Non-clustered indexes are created outside of
the database table and contain a sorted list of references to the table itself.
Q. What is the difference between clustered and a non-clustered
index?
A clustered index is a special type of index that reorders the way records in
the table are physically stored. Therefore table can have only one clustered
index. The leaf nodes of a clustered index contain the data pages. A
nonclustered index is a special type of index in which the logical order of the
index does not match the physical stored order of the rows on disk. The leaf
node of a nonclustered index does not consist of the data pages. Instead, the
leaf nodes contain index rows.
Q. What is Cursor?
Cursor is a database object used by applications to manipulate data in a set
on a row-by row basis, instead of the typical SQL commands that operate on
all the rows in the set at one time.
In order to work with a cursor we need to perform some steps in the
following order:
Declare cursor
Open cursor
Fetch row from the cursor
Process fetched row
Close cursor

Deallocate cursor
Q. What is the difference between a HAVING CLAUSE and a WHERE
CLAUSE?
1. Specifies a search condition for a group or an aggregate. HAVING can be
used only with the SELECT statement.
2. HAVING is typically used in a GROUP BY clause. When GROUP BY is not
used, HAVING behaves like a WHERE clause.
3. Having Clause is basically used only with the GROUP BY function in a
query. WHERE Clause is applied to each row before they are part of the
GROUP BY function in a query.

RANK CACHE
Sample Rank Mapping
When the Power Center Server runs a session with a Rank transformation, it
compares an input row with rows in the data cache. If the input row out-ranks
a Stored row, the Power Center Server replaces the stored row with the input
row.
Example: Power Center caches the first 5 rows if we are finding top 5
salaried Employees. When 6th row is read, it compares it with 5 rows in
cache and places it in Cache is needed.
1) RANK INDEX CACHE:
The index cache holds group information from the group by ports. If we are
Using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.
All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO
2) RANK DATA CACHE:
It holds row data until the Power Center Server completes the ranking and is
generally larger than the index cache. To reduce the data cache size, connect
only the necessary input/output ports to subsequent transformations.
All Variable ports if there, Rank Port, All ports going out from RANK
Transformations are stored in RANK DATA CACHE.
Example: All ports except DEPTNO In our mapping example.

Aggregator Caches
1. The Power Center Server stores data in the aggregate cache until it
completes Aggregate calculations.
2. It stores group values in an index cache and row data in the data cache. If
the Power Center Server requires more space, it stores overflow values in
cache files.
Note: The Power Center Server uses memory to process an Aggregator
transformation with sorted ports. It does not use cache memory. We do not
need to configure cache memory for Aggregator transformations that use
sorted ports.
1) Aggregator Index Cache:
The index cache holds group information from the group by ports. If we are
using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.

All Group By Columns are in AGGREGATOR INDEX CACHE. Ex. DEPTNO


2) Aggregator Data Cache:
DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.
Columns in Data Cache:

Variable ports if any

Non group by input/output ports.

Non group by input ports used in non-aggregate output expression.

Port containing aggregate function

JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds
Index cache and Data Cache based on MASTER table.
1) Joiner Index Cache:
All Columns of MASTER table used in Join condition are in JOINER INDEX
CACHE.
Example: DEPTNO in our mapping.
2) Joiner Data Cache:
Master column not in join condition and used for output to other
transformation or target table are in Data Cache.
Example: DNAME and LOC in our mapping example.

Lookup Cache Files


1. Lookup Index Cache:
Stores data for the columns used in the lookup condition.
2. Lookup Data Cache:
For a connected Lookup transformation, stores data for the connected
output ports, not including ports used in the lookup condition.
For an unconnected Lookup transformation, stores data from the return
port.

OLTP and OLAP


Logical Data Modeling Vs Physical Data Modeling

Router Transformation And Filter Transformation


Source Qualifier And Lookup Transformation
Mapping And Mapplet
Joiner Transformation And Lookup Transformation
Dimension Table and Fact Table

Connected Lookup and Unconnected Lookup


Connected Lookup

Unconnected Lookup

Receives input values directly


from the pipeline.

Receives input values from the result of a


:LKP expression in another
transformation.

We can use a dynamic or static


cache.

We can use a static cache.

Cache includes all lookup


columns used in the mapping.

Cache includes all lookup/output ports in


the lookup condition and the
lookup/return port.

If there is no match for the


If there is no match for the lookup
lookup condition, the Power
condition, the Power Center Server
Center Server returns the
returns NULL.
default value for all output ports.
If there is a match for the lookup
condition, the Power Center
Server returns the result of the
lookup condition for all
lookup/output ports.

If there is a match for the lookup


condition, the Power Center Server
returns the result of the lookup condition
into the return port.

Pass multiple output values to


another transformation.

Pass one output value to another


transformation.

Supports user-defined default


values

Does not support user-defined default


values.

Cache Comparison
Persistence and Dynamic Caches
Dynamic
1) When you use a dynamic cache, the Informatica Server updates the
lookup cache as it passes rows to the target.
2) In Dynamic, we can update catch will new data also.
3) Dynamic cache, Not Reusable.
(When we need updated cache data, That only we need Dynamic Cache)
Persistent
1) A Lookup transformation to use a non-persistent or persistent cache. The
PowerCenter Server saves or deletes lookup cache files after a successful
session based on the Lookup Cache Persistent property.
2) Persistent, we are not able to update the catch with new data.
3) Persistent catch is Reusable.
(When we need previous cache data, that only we need Persistent Cache)
View And Materialized View
Star Schema And Snow Flake Schema

Informatica - Transformations
In Informatica, Transformations help to transform the source data according
to the requirements of target system and it ensures the quality of the data
being loaded into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it
from source to target. (i.e) It eliminates rows that do not meet the condition
in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass
through it (i.e) It passes all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly
to target table in the mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in
the mapping. It is called within another transformation, and returns a value
to that transformation.
Following are the list of Transformations available in Informatica:
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation

Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
In the following pages, we will explain all the above Informatica
Transformations and their significances in the ETL process in detail.
==============================================
================================
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages and
sums (mainly to perform calculations on multiple rows or groups).
For example, to calculate total of daily sales or to calculate average of
monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT,
PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.
==============================================
================================
Expression Transformation
Expression transformation is a Passive and Connected transformation.
This can be used to calculate values in a single row before writing to the
target.
For example, to calculate discount of each product
or to concatenate first and last names
or to convert date to a string field.
==============================================
================================
Filter Transformation
Filter transformation is an Active and Connected transformation.

This can be used to filter rows in a mapping that do not meet the condition.
For example,
To know all the employees who are working in Department 10 or
To find out the products that falls between the rate category $500 and
$1000.
==============================================
================================
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be
used to join two sources coming from two different locations or from same
location. For example, to join a flat file and a relational source or to join two
flat files or to join a relational source and a XML source.
In order to join two sources, there must be at least one matching port. While
joining two sources it is a must to specify one source as master and the other
as detail.
The Joiner transformation supports the following types of joins:
1)Normal
2)Master Outer
3)Detail Outer
4)Full Outer
Normal join discards all the rows of data from the master and detail source
that do not match, based on the condition.
Master outer join discards all the unmatched rows from the master source
and keeps all the rows from the detail source and the matching rows from
the master source.
Detail outer join keeps all rows of data from the master source and the
matching rows from the detail source. It discards the unmatched rows from
the detail source.
Full outer join keeps all rows of data from both the master and detail
sources.
==============================================
================================

Lookup transformation
Lookup transformation is Passive and it can be both Connected and
UnConnected as well. It is used to look up data in a relational table, view, or
synonym. Lookup definition can be imported either from source or from
target tables.
For example, if we want to retrieve all the sales of a product with an ID 10
and assume that the sales data resides in another table. Here instead of
using the sales table as one more source, use Lookup transformation to
lookup the data for the product, with ID 10 in sales table.
Connected lookup receives input values directly from mapping pipeline
whereas
Unconnected lookup receives values from: LKP expression from another
transformation.
Connected lookup returns multiple columns from the same row whereas
Unconnected lookup has one return port and returns one column from each
row.
Connected lookup supports user-defined default values whereas
Unconnected lookup does not support user defined values.
==============================================
================================
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation.
It is used mainly with COBOL sources where most of the time data is stored
in de-normalized format.
Also, Normalizer transformation can be used to create multiple rows from a
single row of data.
==============================================
================================
Rank Transformation
Rank transformation is an Active and Connected transformation.
It is used to select the top or bottom rank of data.
For example,

To select top 10 Regions where the sales volume was very high
or
To select 10 lowest priced products.
==============================================
================================
Router Transformation
Router is an Active and Connected transformation. It is similar to filter
transformation.
The only difference is, filter transformation drops the data that do not meet
the condition whereas router has an option to capture the data that do not
meet the condition. It is useful to test multiple conditions.
It has input, output and default groups.
For example, if we want to filter data like where State=Michigan,
State=California, State=New York and all other States. Its easy to route data
to different tables.
==============================================
================================
Sequence Generator Transformation
Sequence Generator transformation is a Passive and Connected
transformation. It is used to create unique primary key values or cycle
through a sequential range of numbers or to replace missing keys.
It has two output ports to connect transformations. By default it has two
fields CURRVAL and NEXTVAL (You cannot add ports to this transformation).
NEXTVAL port generates a sequence of numbers by connecting it to a
transformation or target. CURRVAL is the NEXTVAL value plus one or
NEXTVAL plus the Increment By value.
==============================================
================================
Sorter Transformation
Sorter transformation is a Connected and an Active transformation.
It allows sorting data either in ascending or descending order according to a
specified field.

Also used to configure for case-sensitive sorting, and specify whether the
output rows should be distinct.
==============================================
================================
Source Qualifier Transformation
Source Qualifier transformation is an Active and Connected transformation.
When adding a relational or a flat file source definition to a mapping, it is
must to connect it to a Source Qualifier transformation.
The Source Qualifier performs the various tasks such as
Overriding Default SQL query,
Filtering records;
join data from two or more tables etc.
==============================================
================================
Stored Procedure Transformation
Stored Procedure transformation is a Passive and Connected &
Unconnected transformation. It is useful to automate time-consuming tasks
and it is also used in error handling, to drop and recreate indexes and to
determine the space in database, a specialized calculation etc.
The stored procedure must exist in the database before creating a Stored
Procedure transformation, and the stored procedure can exist in a source,
target, or any database with a valid connection to the Informatica Server.
Stored Procedure is an executable script with SQL statements and control
statements, user-defined variables and conditional statements.
==============================================
================================
Update Strategy Transformation
Update strategy transformation is an Active and Connected transformation.
It is used to update data in target table, either to maintain history of data or
recent changes.
You can specify how to treat source rows in table, insert, update, delete or
data driven.
==============================================
================================
XML Source Qualifier Transformation

XML Source Qualifier is a Passive and Connected transformation.


XML Source Qualifier is used only with an XML source definition.
It represents the data elements that the Informatica Server reads when it
executes a session with XML sources.
==============================================
================================

Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a
session. When you select this option, the Integration Service orders the
target load on a row-by-row basis. For every row generated by an active
source, the Integration Service loads the corresponding transformed row first
to the primary key table, then to any foreign key tables. Constraint-based
loading depends on the following requirements:
Active source: Related target tables must have the same active source.
Key relationships: Target tables must have key relationships.
Target connection groups: Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You
cannot use updates with constraint based loading.
Active Source:
When target tables receive rows from different active sources, the
Integration Service reverts to normal loading for those tables, but loads all
other targets in the session using constraint-based loading when possible.
For example, a mapping contains three distinct pipelines. The first two
contain a source, source qualifier, and target. Since these two targets receive
data from different active sources, the Integration Service reverts to normal
loading for both targets. The third pipeline contains a source, Normalizer, and
two targets. Since these two targets share a single active source (the
Normalizer), the Integration Service performs constraint-based loading:
loading the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does
not perform constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration
Service reverts to a normal load. For example, you have one target
containing a primary key and a foreign key related to the primary key in a
second target. The second target also contains a foreign key that references
the primary key in the first target. The Integration Service cannot enforce
constraint-based loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the
same target connection group. If you want to specify constraint-based
loading for multiple targets that receive data from the same active source,
you must verify the tables are in the same target connection group. If the
tables with the primary key-foreign key relationship are in different target
connection groups, the Integration Service cannot enforce constraint-based
loading when you run the workflow. To verify that all targets are in the same
target connection group, complete the following tasks:
Verify all targets are in the same target load order group and receive data
from the same active source.
Use the default partition properties and do not add partitions or partition
points.
Define the same target type for all targets in the session properties.

Define the same database connection name for all targets in the session
properties.
Choose normal mode for the target load type for all targets in the session
properties.
Treat Rows as Insert:
Use constraint-based loading when the session option Treat Source Rows As
is set to insert. You might get inconsistent data if you select a different Treat
Source Rows As option and you configure the session for constraint-based
loading.
When the mapping contains Update Strategy transformations and you need
to load data to a primary key table first, split the mapping using one of the
following options:
Load primary key table in one mapping and dependent tables in another
mapping. Use constraint-based loading to load the primary table.
Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the
mapping. Target load ordering defines the order the Integration Service reads
the sources in each target load order group in the mapping. A target load
order group is a collection of source qualifiers, transformations, and targets
linked together in a mapping. Constraint based loading establishes the order
in which the Integration Service loads individual targets within a set of
targets receiving data from a single source qualifier.

Example
The following mapping is configured to perform constraint-based loading:
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain
foreign keys referencing the T1 primary key. T_3 has a primary key that T_4
references as a foreign key.
Since these tables receive records from a single active source, SQ_A,
theIntegration Service loads rows to the target in the following
order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key
dependencies and contains a primary key referenced by T_2 and T_3. The
Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no
dependencies, they are not loaded in any particular order. The Integration
Service loads T_4 last, because it has a foreign key that references a primary
key in T_3.After loading the first set of targets, the Integration Service begins
reading source B. If there are no key relationships between T_5 and T_6, the
Integration Service reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and
T_6 receive data from a single active source, the Aggregator AGGTRANS, the
Integration Service loads rows to the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the
same database connection for each target, and you use the default partition
properties. T_5 and T_6 are in another target connection group together if
you use the same database connection for each target and you use the
default partition properties. The Integration Service includes T_5 and T_6 in a
different target connection group because they are in a different target load
order group from the first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders
the target load on a row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the
Treat Source Rows As property.
2. Click the Config Object tab. In the Advanced settings, select Constraint
Based Load Ordering.
3. Click OK.

Target Load Plan


When you use a mapplet in a mapping, the Mapping Designer lets you set
the target load plan for sources within the mapplet.
Setting the Target Load Order
You can configure the target load order for a mapping containing any type of
target definition. In the Designer, you can set the order in which the
Integration Service sends rows to targets in different target load order groups
in a mapping. A target load order group is the collection of source qualifiers,
transformations, and targets linked together in a mapping. You can set the
target load order if you want to maintain referential integrity when inserting,
deleting, or updating tables that have the primary key and foreign key
constraints.
The Integration Service reads sources in a target load order group
concurrently, and it processes target load order groups sequentially.
To specify the order in which the Integration Service sends data to targets,
create one source qualifier for each target within a mapping. To set the
target load order, you then determine in which order the Integration Service
reads each source in the mapping.
The following figure shows two target load order groups in one
mapping:
In this mapping, the first target load order group includes ITEMS, SQ_ITEMS,
and T_ITEMS. The second target load order group includes all other objects in
the mapping, including the TOTAL_ORDERS target. The Integration Service
processes the first target load order group, and then the second target load
order group.
When it processes the second target load order group, it reads data from
both sources at the same time.
To set the target load order:
Create a mapping that contains multiple target load order groups.
Click Mappings > Target Load Plan.
The Target Load Plan dialog box lists all Source Qualifier transformations
in the mapping and the targets that receive data from each source qualifier.
Select a source qualifier from the list.
Click the Up and Down buttons to move the source qualifier within the
load order.
Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click
OK.

Mapping Parameters & Variables


Mapping parameters and variables represent values in mappings and
mapplets.
When we use a mapping parameter or variable in a mapping, first we declare
the mapping parameter or variable for use in each mapplet or mapping.
Then, we define a value for the mapping parameter or variable before we run
the session.
Mapping Parameters
A mapping parameter represents a constant value that we can define before
running a session.
A mapping parameter retains the same value throughout the entire session.
Example: When we want to extract records of a particular month during ETL
process, we will create a Mapping Parameter of data type and use it in query
to compare it with the timestamp field in SQL override.
After we create a parameter, it appears in the Expression Editor.
We can then use the parameter in any expression in the mapplet or
mapping.
We can also use parameters in a source qualifier filter, user-defined join, or
extract override, and in the Expression Editor of reusable transformations.
Mapping Variables
Unlike mapping parameters, mapping variables are values that can change
between sessions.
The Integration Service saves the latest value of a mapping variable to
the repository at the end of each successful session.
We can override a saved value with the parameter file.
We can also clear all saved values for the session in the Workflow
Manager.
We might use a mapping variable to perform an incremental read of the
source. For example, we have a source table containing time stamped
transactions and we want to evaluate the transactions on a daily basis.
Instead of manually entering a session override to filter source data each
time we run the session, we can create a mapping variable, $
$IncludeDateTime. In the source qualifier, create a filter to read only rows
whose transaction date equals $$IncludeDateTime, such as:
TIMESTAMP = $$IncludeDateTime
In the mapping, use a variable function to set the variable value to increment
one day each time the session runs. If we set the initial value of $
$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the
session, it reads only rows dated 8/1/2004. During the session, the
Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to
the repository at the end of the session. The next time it runs the session, it
reads only rows from August 2, 2004.
Used in following transformations:
Expression
Filter
Router

Update Strategy
Initial and Default Value:
When we declare a mapping parameter or variable in a mapping or a
mapplet, we can enter an initial value. When the Integration Service needs
an initial value, and we did not declare an initial value for the parameter or
variable, the Integration Service uses a default value based on the data type
of the parameter or variable.
Data ->Default Value
Numeric ->0
String ->Empty String
Date time ->1/1/1
Variable Values: Start value and current value of a mapping variable
Start Value:
The start value is the value of the variable at the start of the session. The
Integration Service looks for the start value in the following order:
Value in parameter file
Value saved in the repository
Initial value
Default value
Current Value:
The current value is the value of the variable as the session progresses.
When a session starts, the current value of a variable is the same as the start
value. The final current value for a variable is saved to the repository at the
end of a successful session. When a session fails to complete, the Integration
Service does not update the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a
mapping variable, the start value of the variable is saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping
variable in a mapping, we need to configure the Data type and aggregation
type for the variable. The IS uses the aggregate type of a Mapping variable
to determine the final current value of the mapping variable.
Aggregation types are:
Count: Integer and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.
Variable Functions
Variable functions determine how the Integration Service calculates the
current value of a mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of
values. It ignores rows marked for update, delete, or reject. Aggregation type
set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of
values. It ignores rows marked for update, delete, or reject. Aggregation type
set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the
variable value when a row is marked for insertion, and subtracts one when

the row is Marked for deletion. It ignores rows marked for update or reject.
Aggregation type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a
session, it compares the final current value of the variable to the start value
of the variable. Based on the aggregate type of the variable, it saves a final
value to the repository.
Creating Mapping Parameters and Variables

Open the folder where we want to create parameter or variable.

In the Mapping Designer, click Mappings > Parameters and Variables. -orIn the Mapplet Designer, click Mapplet > Parameters and Variables.

Click the add button.

Enter name. Do not remove $$ from name.

Select Type and Data type. Select Aggregation type for mapping
variables.

Give Initial Value. Click ok.

Example: Use of Mapping of Mapping Parameters and Variables

EMP will be source table.


Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME,
DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.
TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that
changes every month)
SET_VAR: We will be added one month to the HIREDATE of every employee.
Create shortcuts as necessary.
Creating Mapping
1. Open folder where we want to create the mapping.
2. Click Tools -> Mapping Designer.
3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example
4. Drag EMP and target table.
5. Transformation -> Create -> Select Expression for list -> Create > Done.
6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.
7. Create Parameter $$Bonus and Give initial value as 200.
8. Create variable $$var_max of MAX aggregation type and initial value 1500.
9. Create variable $$var_min of MIN aggregation type and initial value 1500.
10.
Create variable $$var_count of COUNT aggregation type and initial
value 0. COUNT is visible when datatype is INT or SMALLINT.
11.
Create variable $$var_set of MAX aggregation type.
12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,
out_COUNT_VAR and out_SET_VAR.
13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for
SAL+ COMM. To add $$BONUS to it, select variable tab and select the
parameter from mapping parameter. SAL + COMM + $$Bonus
14. Open Expression editor for out_max_var.
15. Select the variable function SETMAXVARIABLE from left side pane. Select
$$var_max from variable tab and SAL from ports tab as shown below.
SETMAXVARIABLE($$var_max,SAL)

17. Open Expression editor for out_min_var and write the following
expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following
expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following
expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.

PARAMETER FILE
A parameter file is a list of parameters and associated values for a workflow,
worklet, or session.
Parameter files provide flexibility to change these variables each time we run
a workflow or session.
We can create multiple parameter files and change the file we use for a
session or workflow. We can create a parameter file using a text editor such
as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session
properties.
A parameter file contains the following types of parameters and variables:
Workflow variable: References values and records information in a
workflow.
Worklet variable: References values and records information in a worklet.
Use predefined worklet variables in a parent workflow, but we cannot use
workflow variables from the parent workflow in a worklet.
Session parameter: Defines a value that can change from session to
session, such as a database connection or file name.
Mapping parameter and Mapping variable
USING A PARAMETER FILE
Parameter files contain several sections preceded by a heading. The heading
identifies the Integration Service, Integration Service process, workflow,
worklet, or session to which we want to assign parameters or variables.
Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.
Sample Parameter File for Our example:
In the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFIGURING PARAMTER FILE
We can specify the parameter file name and directory in the workflow or
session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.

3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.

Mapplet
A mapplet is a reusable object that we create in the Mapplet Designer.
It contains a set of transformations and lets us reuse that transformation
logic in multiple mappings.
Created in Mapplet Designer in Designer Tool.
We need to use same set of 5 transformations in say 10 mappings. So
instead of making 5 transformations in every 10 mapping, we create a
mapplet of these 5 transformations. Now we use this mapplet in all 10
mappings. Example: To create a surrogate key in target. We create a mapplet
using a stored procedure to create Primary key for target table. We give
target table name and key column name as input to mapplet and get the
Surrogate key as output.
Mapplets help simplify mappings in the following ways:
Include source definitions: Use multiple source definitions and source
qualifiers to provide source data for a mapping.
Accept data from sources in a mapping
Include multiple transformations: As many transformations as we need.
Pass data to multiple transformations: We can create a mapplet to feed
data to multiple transformations. Each Output transformation in a mapplet
represents one output group in a mapplet.
Contain unused ports: We do not have to connect all mapplet input and
output ports in a mapping.
Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input
transformation in the mapplet. We can create multiple pipelines in a
mapplet.
We use Mapplet Input transformation to give input to mapplet.
Use of Mapplet Input transformation is optional.
Mapplet Output:
The output of a mapplet is not connected to any target table.
We must use Mapplet Output transformation to store mapplet output.
A mapplet must contain at least one Output transformation with at least one
connected port in the mapplet.
Example1: We will join EMP and DEPT table. Then calculate total salary. Give
the output to mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:
Open folder where we want to create the mapping.
Click Tools -> Mapplet Designer.
Click Mapplets-> Create-> Give name. Ex: mplt_example1
Drag EMP and DEPT table.
Use Joiner transformation as described earlier to join them.
Transformation -> Create -> Select Expression for list -> Create -> Done
Pass all ports from joiner to expression and then calculate total salary as
described in expression transformation.

Now Transformation -> Create -> Select Mapplet Out from list > Create
-> Give name and then done.
Pass all ports from expression to Mapplet output.
Mapplet -> Validate
Repository -> Save
Use of mapplet in mapping:
We can mapplet in mapping by just dragging the mapplet from mapplet
folder on left pane as we drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the
ports from the Input and Output transformations. These are referred to as the
mapplet input and mapplet output ports.
Make sure to give correct connection information in session.
Making a mapping: We will use mplt_example1, and then create a filter
transformation to filter records whose Total Salary is >= 1500.
mplt_example1 will be source.
Create target table same as Mapplet_out transformation as in picture
above. Creating Mapping
Open folder where we want to create the mapping.
Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mplt_example1
Drag mplt_Example1 and target table.
Transformation -> Create -> Select Filter for list -> Create -> Done.
Drag all ports from mplt_example1 to filter and give filter condition.
Connect all ports from filter to target. We can add more transformations
after filter if needed.
Validate mapping and Save it.
Make session and workflow.
Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.

Indirect Loading For Flat Files


Suppose, you have 10 flat files of same structure. All the flat files have same
number of columns and data type. Now we need to transfer all the 10 files to
same target.
Names of files are say EMP1, EMP2 and so on.
Solution1:
1. Import one flat file definition and make the mapping as per need.
2. Now in session give the Source File name and Source File Directory
location of one file.
3. Make workflow and run.
4. Now open session after workflow completes. Change the Filename and
Directory to give information of second file. Run workflow again.
5. Do the above for all 10 files.
Solution2:
1. Import one flat file definition and make the mapping as per need.
2. Now in session give the Source Directory location of the files.
3. Now in Fieldname use $InputFileName. This is a session parameter.
4. Now make a parameter file and give the value of $InputFileName.
$InputFileName=EMP1.txt
5. Run the workflow
6. Now edit parameter file and give value of second file. Run workflow again.
7. Do same for remaining files.
Solution3:
1. Import one flat file definition and make the mapping as per need.
2. Now make a notepad file that contains the location and name of each 10
flat files.

Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory
location fields, give the name and location of above created file.
4. In Source file type field, select Indirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.

Incremental Aggregation
When we enable the session option-> Incremental Aggregation the
Integration Service performs incremental aggregation, it passes source data
through the mapping and uses historical cache data to perform aggregation
calculations incrementally.
When using incremental aggregation, you apply captured changes in the
source to aggregate calculations in a session. If the source changes
incrementally and you can capture changes, you can configure the session to
process those changes. This allows the Integration Service to update the
target incrementally, rather than forcing it to process the entire source and
recalculate the same data each time you run the session.
For example, you might have a session using a source that receives new data
every day. You can capture those incremental changes because you have
added a filter condition to the mapping that removes pre-existing data from
the flow of data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first
time on March 1, you use the entire source. This allows the Integration
Service to read and store the necessary aggregate data. On March 2, when
you run the session again, you filter out all the records except those timestamped March 2. The Integration Service then processes the new data and
updates the target accordingly. Consider using incremental aggregation in
the following circumstances:
You can capture new source data. Use incremental aggregation when you
can capture new source data each time you run the session. Use a Stored
Procedure or Filter transformation to process new data.
Incremental changes do not significantly change the target. Use incremental
aggregation when the changes do not significantly change the target. If
processing the incrementally changed source alters more than half the
existing target, the session may not benefit from using incremental
aggregation. In this case, drop the table and recreate the target with
complete source data.
Note: Do not use incremental aggregation if the mapping contains percentile
or median functions. The Integration Service uses system memory to process
these functions in addition to the cache memory you configure in the session
properties. As a result, the Integration Service does not store incremental
aggregation values for percentile and median functions in disk caches.
Integration Service Processing for Incremental Aggregation
(i)The first time you run an incremental aggregation session, the Integration
Service processes the entire source. At the end of the session, the
Integration Service stores aggregate data from that session run in two files,
the index file and the data file. The Integration Service creates the files in the
cache directory specified in the Aggregator transformation properties.
(ii)Each subsequent time you run the session with incremental aggregation,
you use the incremental source changes in the session. For each input
record, the Integration Service checks historical information in the index file
for a corresponding group. If it finds a corresponding group, the Integration

Service performs the aggregate operation incrementally, using the aggregate


data for that group, and saves the incremental change. If it does not find a
corresponding group, the Integration Service creates a new group and saves
the record data.
(iii)When writing to the target, the Integration Service applies the changes to
the existing target. It saves modified aggregate data in the index and data
files to be used as historical data the next time you run the session.
(iv) If the source changes significantly and you want the Integration Service
to continue saving aggregate data for future incremental changes, configure
the Integration Service to overwrite existing aggregate data with new
aggregate data.
Each subsequent time you run a session with incremental aggregation, the
Integration Service creates a backup of the incremental aggregation files.
The cache directory for the Aggregator transformation must contain enough
disk space for two sets of the files.
(v)When you partition a session that uses incremental aggregation, the
Integration Service creates one set of cache files for each partition.
The Integration Service creates new aggregate data, instead of using
historical data, when you perform one of the following tasks:
Save a new version of the mapping.
Configure the session to reinitialize the aggregate cache.
Move the aggregate files without correcting the configured path or
directory for the files in the session properties.
Change the configured path or directory for the aggregate files without
moving the files to the new location.
Delete cache files.
Decrease the number of partitions.
When the Integration Service rebuilds incremental aggregation files, the
data in the previous files is lost.
Note: To protect the incremental aggregation files from file corruption or disk
failure, periodically back up the files.
Preparing for Incremental Aggregation:
When you use incremental aggregation, you need to configure both mapping
and session properties:
Implement mapping logic or filter to remove pre-existing data.
Configure the session for incremental aggregation and verify that the file
directory has enough disk space for the aggregate files.
Configuring the Mapping
Before enabling incremental aggregation, you must capture changes in
source data. You can use a Filter or Stored Procedure transformation in the
mapping to remove pre-existing source data during a session.
Configuring the Session
Use the following guidelines when you configure the session for incremental
aggregation:
(i) Verify the location where you want to store the aggregate files.

The index and data files grow in proportion to the source data. Be sure the
cache directory has enough disk space to store historical data for the
session.
When you run multiple sessions with incremental aggregation, decide where
you want the files stored. Then, enter the appropriate directory for the
process variable, $PMCacheDir, in the Workflow Manager. You can enter
session-specific directories for the index and data files. However, by using
the process variable for all sessions using incremental aggregation, you can
easily change the cache directory when necessary by changing
$PMCacheDir.
Changing the cache directory without moving the files causes the Integration
Service to reinitialize the aggregate cache and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they
cannot find. When an Integration Service rebuilds incremental aggregation
files, it loses aggregate history.
(ii) Verify the incremental aggregation settings in the session
properties.
You can configure the session for incremental aggregation in the
Performance settings on the Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you
choose to reinitialize the cache, the Workflow Manager displays a warning
indicating the Integration Service overwrites the existing cache and a
reminder to clear this option after running the session.

When should we go for hash partitioning?


Scenarios for choosing hash partitioning:
Not enough knowledge about how much data maps into a give range.
Sizes of range partition differ quite substantially, or are difficult to balance
manually
Range partitioning would cause data to be clustered undesirably.
Features such as parallel DML, partition pruning, joins etc are important.
You Can Define Following Partition Types In Workflow Manager:
1) Database Partitioning
The integration service queries the IBM db2 or oracle system for table
partition information. It reads partitioned data from the corresponding nodes
in the database. Use database partitioning with oracle or IBM db2 source
instances on a multi-node table space. Use database partitioning with db2
targets
2) Hash Partitioning
Use hash partitioning when you want the integration service to distribute
rows to the partitions by group. For example, you need to sort items by item
id, but you do not know how many items have a particular id number
3) Key Range
you specify one or more ports to form a compound partition key. The
integration service passes data to each partition depending on the ranges
you specify for each port. Use key range partitioning where the sources or
targets in the pipeline are partitioned by key range.
4) Simple Pass-Through
The integration service passes all rows at one partition point to the next
partition point without redistributing them. Choose pass-through partitioning
where you want to create an additional pipeline stage to improve
performance, but do not want to change the distribution of data across
partitions
5) Round-Robin
The integration service distributes data evenly among all partitions. Use
round-robin partitioning where you want each partition to process
approximately the same number of rows.

Partition Types Overview


Creating Partition Tables
To create a partition table gives the following statement
Create table sales (year number(4),
product varchar2(10),
amt number(10))
partition by range (year)
(
partition p1 values less than (1992) ,
partition p2 values less than (1993),
partition p5 values less than (MAXVALUE)
);
The following example creates a table with list partitioning
Create table customers (custcode number(5),
Name varchar2(20),
Addr varchar2(10,2),
City varchar2(20),
Bal number(10,2))
Partition by list (city),
Partition north_India values (DELHI,CHANDIGARH),
Partition east_India values (KOLKOTA,PATNA),
Partition south_India values (HYDERABAD,BANGALORE,
CHENNAI),
Partition west India values (BOMBAY,GOA);
alter table sales add partition p6 values less than (1996);
alter table customers add partition central_India values
(BHOPAL,NAGPUR);SSS
Alter table sales drop partition p5;
Alter table sales merge partition p2 and p3 into
partition p23;

The following statement adds a new set of cities ('KOCHI', 'MANGALORE') to


an existing partition list.
ALTER TABLE customers
MODIFY PARTITION south_india
ADD VALUES ('KOCHI', 'MANGALORE');
The statement below drops a set of cities (KOCHI' and 'MANGALORE') from
an existing partition value list.
ALTER TABLE customers
MODIFY PARTITION south_india
DROP VALUES (KOCHI,MANGALORE);
SPLITTING PARTITIONS
You can split a single partition into two partitions. For example to split the
partition p5 of sales table into two partitions give the following command.
Alter table sales split partition p5 into
(Partition p6 values less than (1996),
Partition p7 values less then (MAXVALUE));
TRUNCATING PARTITON
Truncating a partition will delete all rows from the partition.
To truncate a partition give the following statement
Alter table sales truncate partition p5;
LISTING INFORMATION ABOUT PARTITION TABLES
To see how many partitioned tables are there in your schema give the
following statement
Select * from user_part_tables;
To see on partition level partitioning information
Select * from user_tab_partitions;

1.
2.
3.
4.
5.
6.
7.
8.
9.

TASKS
The Workflow Manager contains many types of tasks to help you build
workflows and worklets. We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
Tool where task Reusable or not
can be created
Session
Task Developer Yes
Email
Workflow
Yes
Designer
Command
Worklet
Yes
Designer
Event-Raise
Workflow
No
Designer
Event-Wait
Worklet
No
Designer
Timer
No
Decision
No
Assignment
No
Control
No
SESSION TASK
A session is a set of instructions that tells the Power Center Server how and
when to move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the
Session tasks sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches
depending on the transformations and options used in the session.
EMAIL TASK
The Workflow Manager provides an Email task that allows us to send email
during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
In the Task Developer or Workflow Designer, choose Tasks-Create.
Select an Email task and enter a name for the task. Click Create.
Click Done.
Double-click the Email task in the workspace. The Edit Tasks dialog box
appears.
Click the Properties tab.
Enter the fully qualified email address of the mail recipient in the Email User
Name field.
Enter the subject of the email in the Email Subject field. Or, you can leave
this field blank.
Click the Open button in the Email Text field to open the Email Editor.
Click OK twice to save your changes.

Example: To send an email when a session completes:


Steps:
1. Create a workflow wf_sample_email
2. Drag any session task to workspace.
3. Edit Session task and go to Components tab.
4. See On Success Email Option there and configure it.
5. In Type select reusable or Non-reusable.
6. In Value, select the email task to be used.
7. Click Apply -> Ok.
8. Validate workflow and Repository -> Save
9. We can also drag the email task and use as per need.
10.
We can set the option to send email on success or failure in
components tab of a session task.
COMMAND TASK
The Command task allows us to specify one or more shell commands in UNIX
or DOS commands in Windows to run during the workflow.
For example, we can specify shell commands in the Command task to delete
reject files, copy a file, or archive target files.
Ways of using command task:
1. Standalone Command task: We can use a Command task anywhere in the
workflow or worklet to run shell commands.
2. Pre- and post-session shell command: We can call a Command task as the
pre- or post-session shell command for a Session task. This is done in
COMPONENTS TAB of a session. We can run it in Pre-Session Command or
Post Session Success Command or Post Session Failure Command. Select the
Value and Type option as we did in Email task.
Example: to copy a file sample.txt from D drive to E.
Command: COPY D:\sample.txt E:\ in windows
Steps for creating command task:
1. In the Task Developer or Workflow Designer, choose Tasks-Create.
2. Select Command Task for the task type.
3. Enter a name for the Command task. Click Create. Then click done.
4. Double-click the Command task. Go to commands tab.
5. In the Commands tab, click the Add button to add a command.
6. In the Name field, enter a name for the new command.
7. In the Command field, click the Edit button to open the Command Editor.
8. Enter only one command in the Command Editor.
9. Click OK to close the Command Editor.
10.
Repeat steps 5-9 to add more commands in the task.
11.
Click OK.
Steps to create the workflow using command task:
1. Create a task using the above steps to copy a file in Task Developer.
2. Open Workflow Designer. Workflow -> Create -> Give name and click ok.
3. Start is displayed. Drag session say s_m_Filter_example and command task.
4. Link Start to Session task and Session to Command Task.

5. Double click link between Session and Command and give condition in
editor as
6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository > Save
WORKING WITH EVENT TASKS
We can define events in the workflow to specify the sequence of task
execution.
Types of Events:
Pre-defined event: A pre-defined event is a file-watch event. This event
Waits for a specified file to arrive at a given location.
User-defined event: A user-defined event is a sequence of tasks in the
Workflow. We create events and then raise them as per need.
Steps for creating User Defined Event:
1. Open any workflow where we want to create an event.
2. Click Workflow-> Edit -> Events tab.
3. Click to Add button to add events and give the names as per need.
4. Click Apply -> Ok. Validate the workflow and Save it.
Types of Events Tasks:
EVENT RAISE: Event-Raise task represents a user-defined event. We use
this task to raise a user defined event.
EVENT WAIT: Event-Wait task waits for a file watcher event or user defined
event to occur before executing the next session in the workflow.
Example1: Use an event wait task and make sure that session
s_filter_example runs when abc.txt file is present in D:\FILES folder.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event Wait. Give name. Click create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace and link it to event wait task.
5. Right click on event wait task and click EDIT -> EVENTS tab.
6. Select Pre Defined option there. In the blank space, give directory and
filename to watch. Example: D:\FILES\abc.tct
7. Workflow validate and Repository Save.
Example 2: Raise a user defined event when session s_m_filter_example
succeeds. Capture this event in event wait task and run session
S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab and add events EVENT1 there.
3. Drag s_m_filter_example and link it to START task.
4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name
5. ER_Example. Click Create and then done. Link ER_Example to
s_m_filter_example.
6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User
Defined Event and Select EVENT1 from the list displayed. Apply -> OK.

7. Click link between ER_Example and s_m_filter_example and give the


condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT.
Click Create and then done.
9. Link EW_WAIT to START task.
10.
Right click EW_WAIT -> EDIT-> EVENTS tab.
11.
Select User Defined there. Select the Event1 by clicking Browse
Events button.
12.
Apply -> OK.
13.
Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.
14.
Mapping -> Validate
15.
Repository -> Save.
Run workflow and see.
TIMER TASK
The Timer task allows us to specify the period of time to wait before the
Power Center Server runs the next task in the workflow. The Timer task has
two types of settings:
Absolute time: We specify the exact date and time or we can choose a
user-defined workflow variable to specify the exact time. The next task in
workflow will run as per the date and time specified.
Relative time: We instruct the Power Center Server to wait for a specified
period of time after the Timer task, the parent workflow, or the top-level
workflow starts.
Example: Run session s_m_filter_example relative to 1 min after the timer
task.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok.
2. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example.
Click Create and then done.
3. Link TIMER_Example to START task.
4. Right click TIMER_Example-> EDIT -> TIMER tab.
5. Select Relative Time Option and Give 1 min and Select From start time of
this task Option.
6. Apply -> OK.
7. Drag s_m_filter_example and link it to TIMER_Example.
8. Workflow-> Validate and Repository -> Save.
DECISION TASK
The Decision task allows us to enter a condition that determines the
execution of the workflow, similar to a link condition.
The Decision task has a pre-defined variable called
$Decision_task_name.condition that represents the result of the decision
condition.
The Power Center Server evaluates the condition in the Decision task and
sets the pre-defined condition variable to True (1) or False (0).
We can specify one decision condition per Decision task.
Example: Command Task should run only if either s_m_filter_example or

S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or


S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_decision_task_example -> Click ok.
2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and
link both of them to START task.
3. Click Tasks -> Create -> Select DECISION from list. Give name
DECISION_Example. Click Create and then done. Link DECISION_Example to
both s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE.
4. Right click DECISION_Example-> EDIT -> GENERAL tab.
5. Set Treat Input Links As to OR. Default is AND. Apply and click OK.
6. Now edit decision task again and go to PROPERTIES Tab. Open the
Expression editor by clicking the VALUE section of Decision Name attribute
and enter the following condition: $S_M_FILTER_EXAMPLE.Status =
SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED
7. Validate the condition -> Click Apply -> OK.
8. Drag command task and S_m_sample_mapping_EMP task to workspace and
link them to DECISION_Example task.
9. Double click link between S_m_sample_mapping_EMP & DECISION_Example
& give the condition: $DECISION_Example.Condition = 0. Validate & click OK.
10.
Double click link between Command task and DECISION_Example
and give the condition: $DECISION_Example.Condition = 1. Validate and click
OK.
11.
Workflow Validate and repository Save.
Run workflow and see the result.
CONTROL TASK
We can use the Control task to stop, abort, or fail the top-level workflow or
the parent workflow based on an input link condition.
A parent workflow or worklet is the workflow or worklet that contains the
Control task.
We give the condition to the link connected to Control Task.
Control
Description
Option
Fail Me
Fails the control task.
Fail Parent
Marks the status of the WF or worklet that
contains the
Control task as failed.
Stop Parent
Stops the WF or worklet that contains the
Control task.
Abort Parent
Aborts the WF or worklet that contains the
Control task.
Fail Top-Level
Fails the workflow that is running.
WF
Stop Top-Level Stops the workflow that is running.

WF
Abort Top-Level Aborts the workflow that is running.
WF
Example: Drag any 3 sessions and if anyone fails, then Abort the top level
workflow.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_control_task_example -> Click ok.
2. Drag any 3 sessions to workspace and link all of them to START task.
3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.
4. Click Create and then done.
5. Link all sessions to the control task cntr_task.
6. Double click link between cntr_task and any session say s_m_filter_example
and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
7. Repeat above step for remaining 2 sessions also.
8. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to
OR. Default is AND.
9. Go to PROPERTIES tab of cntr_task and select the value Fail top level
10.
Workflow for Control Option. Click Apply and OK.
11.
Workflow Validate and repository Save.
Run workflow and see the result.
ASSIGNMENT TASK
The Assignment task allows us to assign a value to a user-defined workflow
variable.
See Workflow variable topic to add user defined variables.

To use an Assignment task in the workflow, first create and add the

Assignment task to the workflow. Then configure the Assignment task to


assign values or expressions to user-defined variables.

We cannot assign values to pre-defined workflow.


Steps to create Assignment Task:
1. Open any workflow where we want to use Assignment task.
2. Edit Workflow and add user defined variables.
3. Choose Tasks-Create. Select Assignment Task for the task type.
4. Enter a name for the Assignment task. Click Create. Then click done.
5. Double-click the Assignment task to open the Edit Task dialog box.
6. On the Expressions tab, click Add to add an assignment.
7. Click the Open button in the User Defined Variables field.
8. Select the variable for which you want to assign a value. Click OK.
9. Click the Edit button in the Expression field to open the Expression Editor.
10.
Enter the value or expression you want to assign.
11.
Repeat steps 7-10 to add more variable assignments as necessary.
12.
Click OK.

Scheduler
We can schedule a workflow to run continuously, repeat at a given time or
interval, or we can manually start a workflow. The Integration Service runs a
scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule
settings by editing the scheduler. If we change schedule settings, the
Integration Service reschedules the workflow according to the new settings.

A scheduler is a repository object that contains a set of schedule


settings.
Scheduler can be non-reusable or reusable.
The Workflow Manager marks a workflow invalid if we delete the
scheduler associated with the workflow.
If we choose a different Integration Service for the workflow or restart
the Integration Service, it reschedules all workflows.
If we delete a folder, the Integration Service removes workflows from
the schedule.
The Integration Service does not run the workflow if:

The prior workflow run fails.

We remove the workflow from the schedule

The Integration Service is running in safe mode


Creating a Reusable Scheduler

For each folder, the Workflow Manager lets us create reusable


schedulers so we can reuse the same set of scheduling settings for
workflows in the folder.
Use a reusable scheduler so we do not need to configure the same set
of scheduling settings in each workflow.
When we delete a reusable scheduler, all workflows that use the
deleted scheduler becomes invalid. To make the workflows valid, we must
edit them and replace the missing scheduler.

Steps:
1.
Open the folder where we want to create the scheduler.
2.
In the Workflow Designer, click Workflows > Schedulers.
3.
Click Add to add a new scheduler.
4.
In the General tab, enter a name for the scheduler.
5.
Configure the scheduler settings in the Scheduler tab.
6.
Click Apply and OK.
Configuring Scheduler Settings
Configure the Schedule tab of the scheduler to set run options, schedule
options, start options, and end options for the schedule.
There are 3 run options:
1.
2.
3.

Run on Demand
Run Continuously
Run on Server initialization

1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The
Integration Service then starts the next run of the workflow as soon as it
finishes the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The
Integration Service then starts the next run of the workflow according to
settings in Schedule Options.
Schedule options for Run on Server initialization:

Run Once: To run the workflow just once.

Run every: Run the workflow at regular intervals, as configured.

Customized Repeat: Integration Service runs the workflow on the


dates and times specified in the Repeat dialog box.
Start options for Run on Server initialization:

Start Date

Start Time
End options for Run on Server initialization:

End on: IS stops scheduling the workflow in the selected date.

End After: IS stops scheduling the workflow after the set number of

Workflow runs.

Forever: IS schedules the workflow as long as the workflow does not


fail.
Creating a Non-Reusable Scheduler
1.
In the Workflow Designer, open the workflow.
2.
Click Workflows > Edit.
3.
In the Scheduler tab, choose Non-reusable. Select Reusable if we want
to select an existing reusable scheduler for the workflow.
4.
Note: If we do not have a reusable scheduler in the folder, we must
5.
Create one before we choose Reusable.
6.
Click the right side of the Scheduler field to edit scheduling settings for
the non- reusable scheduler
7.
If we select Reusable, choose a reusable scheduler from the Scheduler

8.
Browser dialog box.
9.
Click Ok.
Points to Ponder:

To remove a workflow from its schedule, right-click the workflow in the


Navigator window and choose Unscheduled Workflow.
To reschedule a workflow on its original schedule, right-click the
workflow in the Navigator window and choose Schedule Workflow.

Pushdown Optimization Overview


You can push transformation logic to the source or target database using
pushdown optimization. When you run a session configured for pushdown
optimization, theIntegration Service translates the transformation
logic into SQL queries and sends the SQL queries to the database.
The source or target database executes the SQL queries to process the
transformations.
The amount of transformation logic you can push to the database
depends on the database, transformation logic, and mapping and session
configuration. The Integration Service processes all transformation logic that
it cannot push to a database.
Use the Pushdown Optimization Viewer to preview the SQL statements
and mapping logic that the Integration Service can push to the source or
target database. You can also use the Pushdown Optimization Viewer to view
the messages related to pushdown optimization.
The following figure shows a mapping containing transformation logic that
can be pushed to the source database:
This mapping contains a Filter transformation that filters out all items
except those with an ID greater than 1005. The Integration Service can push
the transformation logic to the database. It generates the following SQL
statement to process the transformation logic:
INSERT INTO ITEMS(ITEM_ID, ITEM_NAME, ITEM_DESC, n_PRICE) SELECT
ITEMS.ITEM_ID, ITEMS.ITEM_NAME, ITEMS.ITEM_DESC, CAST(ITEMS.PRICE AS
INTEGER) FROM ITEMS WHERE (ITEMS.ITEM_ID >1005)
The Integration Service generates an INSERT SELECT statement to get the
ID, NAME, and DESCRIPTION columns from the source table. It filters the data
using a WHERE clause. The Integration Service does not extract data from
the database at this time.

Pushdown Optimization Types


You can configure the following types of pushdown optimization:
Source-side pushdown optimization. The Integration Service pushes as
much transformation logic as possible to the source database.
Target-side pushdown optimization. The Integration Service pushes as
much transformation logic as possible to the target database.
Full pushdown optimization. The Integration Service attempts to push all
transformation logic to the target database. If the Integration Service cannot
push all transformation logic to the database, it performs both source-side
and target-side pushdown optimization.
Running Source-Side Pushdown Optimization Sessions
When you run a session configured for source-side pushdown optimization,
the Integration Service analyzes the mapping from the source to the target
or until it reaches a downstream transformation it cannot push to the
database.
The Integration Service generates and executes a SELECT statement based
on the transformation logic for each transformation it can push to the
database. Then, it reads the results of this SQL query and processes the
remaining transformations.
Running Target-Side Pushdown Optimization Sessions
When you run a session configured for target-side pushdown optimization,
the Integration Service analyzes the mapping from the target to the source
or until it reaches an upstream transformation it cannot push to the
database. It generates an INSERT, DELETE, or UPDATE statement based on
the transformation logic for each transformation it can push to the database.
The Integration Service processes the transformation logic up to the point
that it can push the transformation logic to the target database. Then, it
executes the generated SQL.
Running Full Pushdown Optimization Sessions
To use full pushdown optimization, the source and target databases must be
in the same relational database management system. When you run a
session configured for full pushdown optimization, the Integration Service
analyzes the mapping from the source to the target or until it reaches a
downstream transformation it cannot push to the target database. It
generates and executes SQL statements against the source or target based
on the transformation logic it can push to the database.
When you run a session with large quantities of data and full pushdown
optimization, the database server must run a long transaction. Consider the
following database performance issues when you generate a long
transaction:
A long transaction uses more database resources.
A long transaction locks the database for longer periods of time. This reduces
database concurrency and increases the likelihood of deadlock.
A long transaction increases the likelihood of an unexpected event.
To minimize database performance issues for long transactions, consider
using source-side or target-side pushdown optimization.

Integration Service Behavior with Full Optimization


When you configure a session for full optimization, the Integration Service
analyzes the mapping from the source to the target or until it reaches a
downstream transformation it cannot push to the target database. If the
Integration Service cannot push all transformation logic to the target
database, it tries to push all transformation logic to the source database. If it
cannot push all transformation logic to the source or target, the Integration
Service pushes as much transformation logic to the source database,
processes intermediate transformations that it cannot push to any database,
and then pushes the remaining transformation logic to the target database.
The Integration Service generates and executes an INSERT SELECT, DELETE,
or UPDATE statement for each database to which it pushes transformation
logic.
For example, a mapping contains the following transformations:
The Rank transformation cannot be pushed to the source or target database.
If you configure the session for full pushdown optimization, the Integration
Service pushes the Source Qualifier transformation and the Aggregator
transformation to the source, processes the Rank transformation, and pushes
the Expression transformation and target to the target database. The
Integration Service does not fail the session if it can push only part of the
transformation logic to the database.

Active and Idle Databases


During pushdown optimization, the Integration Service pushes the
transformation logic to one database, which is called the active database. A
database that does not process transformation logic is called an idle
database. For example, a mapping contains two sources that are joined by a
Joiner transformation. If the session is configured for source-side pushdown
optimization, the Integration Service pushes the Joiner transformation logic
to the source in the detail pipeline, which is the active database. The source
in the master pipeline is the idle database because it does not process
transformation logic.
The Integration Service uses the following criteria to determine which
database is active or idle:
1. When using full pushdown optimization, the target database is active and
the source database is idle.
2. In sessions that contain a Lookup transformation, the source or target
database is active, and the lookup database is idle.
3. In sessions that contain a Joiner transformation, the source in the detail
pipeline is active, and the source in the master pipeline is idle.
4. In sessions that contain a Union transformation, the source in the first
input group is active. The sources in other input groups are idle.
To push transformation logic to an active database, the database user
account of the active database must be able to read from the idle databases.

Working with Databases


You can configure pushdown optimization for the following databases:
IBM DB2
Microsoft SQL Server
Netezza
Oracle
Sybase ASE
Teradata
Databases that use ODBC drivers
When you push transformation logic to a database, the database may
produce different output than the Integration Service. In addition, the
Integration Service can usually push more transformation logic to a database
if you use a native driver, instead of an ODBC driver.
Comparing the Output of the Integration Service and Databases
The Integration Service and databases can produce different results when
processing the same transformation logic. The Integration Service sometimes
converts data to a different format when it reads data. The Integration
Service and database may also handle null values, case sensitivity, and sort
order differently.
The database and Integration Service produce different output when the
following settings and conversions are different:
Nulls treated as the highest or lowest value. The Integration Service
and a database can treat null values differently. For example, you want to
push a Sorter transformation to an Oracle database. In the session, you
configure nulls as the lowest value in the sort order. Oracle treats null values
as the highest value in the sort order.
Sort order. The Integration Service and a database can use different sort
orders. For example, you want to push the transformations in a session to a
Microsoft SQL Server database, which is configured to use a sort order that is
not case sensitive. You configure the session properties to use the binary sort
order, which is case sensitive. The results differ based on whether the
Integration Service or Microsoft SQL Server database process the
transformation logic.
Case sensitivity. The Integration Service and a database can treat case
sensitivity differently. For example, the Integration Service uses case
sensitive queries and the database does not. A Filter transformation uses the
following filter condition: IIF(col_varchar2 = CA, TRUE, FALSE). You need the
database to return rows that match CA. However, if you push this
transformation logic to a Microsoft SQL Server database that is not case
sensitive, it returns rows that match the values Ca, ca, cA, and CA.
Numeric values converted to character values. The Integration Service
and a database can convert the same numeric value to a character value in
different formats. The database can convert numeric values to an
unacceptable character format. For example, a table contains the number
1234567890. When the Integration Service converts the number to a
character value, it inserts the characters 1234567890. However, a

database might convert the number to 1.2E9. The two sets of characters
represent the same value. However, if you require the characters in the
format 1234567890, you can disable pushdown optimization.
Precision. The Integration Service and a database can have different
precision for particular datatypes. Transformation datatypes use a default
numeric precision that can vary from the native datatypes. For example, a
transformation Decimal datatype has a precision of 1-28. The corresponding
Teradata Decimal datatype has a precision of 1-18. The results can vary if the
database uses a different precision than the Integration Service.
Using ODBC Drivers
When you use native drivers for all databases, except Netezza, the
Integration Service generates SQL statements using native database SQL.
When you use ODBC drivers, the Integration Service usually cannot detect
the database type. As a result, it generates SQL statements using ANSI SQL.
The Integration Service can generate more functions when it generates SQL
statements using the native language than ANSI SQL.
Note: Although the Integration Service uses an ODBC driver for the Netezza
database, the Integration Service detects that the database is Netezza and
generates native database SQL when pushing the transformation logic to the
Netezza database.
In some cases, ANSI SQL is not compatible with the database syntax. The
following sections describe problems that you can encounter when you use
ODBC drivers. When possible, use native drivers to prevent these problems.

Working with Dates


The Integration Service and database can process dates differently. When
you configure the session to push date conversion to the database, you can
receive unexpected results or the session can fail.
The database can produce different output than the Integration Service when
the following date settings and conversions are different:
Date values converted to character values. The Integration Service
converts the transformation Date/Time datatype to the native datatype that
supports subsecond precision in the database. The session fails if you
configure the datetime format in the session to a format that the database
does not support. For example, when the Integration Service performs the
ROUND function on a date, it stores the date value in a character column,
using the format MM/DD/YYYY HH:MI:SS.US. When the database performs
this function, it stores the date in the default date format for the database. If
the database is Oracle, it stores the date as the default DD-MON-YY. If you
require the date to be in the format MM/DD/YYYY HH:MI:SS.US, you can
disable pushdown optimization.
Date formats for TO_CHAR and TO_DATE functions. The Integration
Service uses the date format in the TO_CHAR or TO_DATE function when the
Integration Service pushes the function to the database. The database
converts each date string to a datetime value supported by the database.
For example, the Integration Service pushes the following expression to the
database:
TO_DATE( DATE_PROMISED, 'MM/DD/YY' )
The database interprets the date string in the DATE_PROMISED port based on
the specified date format string MM/DD/YY. The database converts each date
string, such as 01/22/98, to the supported date value, such as Jan 22 1998
00:00:00.
If the Integration Service pushes a date format to an IBM DB2, a Microsoft
SQL Server, or a Sybase database that the database does not support, the
Integration Service stops pushdown optimization and processes the
transformation.
The Integration Service converts all dates before pushing transformations to
an Oracle or Teradata database. If the database does not support the date
format after the date conversion, the session fails.
HH24 date format. You cannot use the HH24 format in the date format
string for Teradata. When the Integration Service generates SQL for a
Teradata database, it uses the HH format string instead.
Blank spaces in date format strings. You cannot use blank spaces in the
date format string in Teradata. When the Integration Service generates SQL
for a Teradata database, it substitutes the space with B.
Handling subsecond precision for a Lookup transformation. If you
enable subsecond precision for a Lookup transformation, the database and
Integration Service perform the lookup comparison using the subsecond
precision, but return different results. Unlike the Integration Service, the

database does not truncate the lookup results based on subsecond precision.
For example, you configure the Lookup transformation to show subsecond
precision to the millisecond. If the lookup result is 8:20:35.123456, a
database returns 8:20:35.123456, but the Integration Service returns
8:20:35.123.
SYSDATE built-in variable. When you use the SYSDATE built-in variable,
the Integration Service returns the current date and time for the node
running the service process. However, when you push the transformation
logic to the database, the SYSDATE variable returns the current date and
time for the machine hosting the database. If the time zone of the machine
hosting the database is not the same as the time zone of the machine
running the Integration Service process, the results can vary.
I have listed the following informatica scenarios which are frequently
asked in the informatica interviews. These informatica scenario interview
questions helps you a lot in gaining confidence in interviews.
1. How to generate sequence numbers using expression transformation?
Solution:
In the expression transformation, create a variable port and increment it by
1. Then assign the variable port to an output port. In the expression
transformation, the ports are:
V_count=V_count+1
O_count=V_count
2. Design a mapping to load the first 3 rows from a flat file into a target?
Solution:
You have to assign row numbers to each record. Generate the row numbers
either using the expression transformation as mentioned above or use
sequence generator transformation.
Then pass the output to filter transformation and specify the filter condition
as O_count <=3
3. Design a mapping to load the last 3 rows from a flat file into a target?
Solution:
Consider the source has the following data.
col
a
b
c

d
e
Step1: You have to assign row numbers to each record. Generate the row
numbers using the expression transformation as mentioned above and call
the row number generated port as O_count. Create a DUMMY output port in
the same expression transformation and assign 1 to that port. So that, the
DUMMY output port always return 1 for each row.
In the expression transformation, the ports are
V_count=V_count+1
O_count=V_count
O_dummy=1
The output of expression transformation will be
col, o_count, o_dummy
a, 1, 1
b, 2, 1
c, 3, 1
d, 4, 1
e, 5, 1
Step2: Pass the output of expression transformation to aggregator and do
not specify any group by condition. Create an output port O_total_records in
the aggregator and assign O_count port to it. The aggregator will return the
last row by default. The output of aggregator contains the DUMMY port which
has value 1 and O_total_records port which has the value of total number of
records in the source.
In the aggregator transformation, the ports are
O_dummy
O_count
O_total_records=O_count
The output of aggregator transformation will be
O_total_records, O_dummy
5, 1
Step3: Pass the output of expression transformation, aggregator
transformation to joiner transformation and join on the DUMMY port. In the
joiner transformation check the property sorted input, then only you can
connect both expression and aggregator to joiner transformation.

In the joiner transformation, the join condition will be


O_dummy (port from aggregator transformation) = O_dummy (port from
expression transformation)
The output of joiner transformation will be
col, o_count, o_total_records
a, 1, 5
b, 2, 5
c, 3, 5
d, 4, 5
e, 5, 5
Step4: Now pass the ouput of joiner transformation to filter transformation
and specify the filter condition as O_total_records (port from aggregator)O_count(port from expression) <=2
In the filter transformation, the filter condition will be
O_total_records - O_count <=2
The output of filter transformation will be
col o_count, o_total_records
c, 3, 5
d, 4, 5
e, 5, 5
4. Design a mapping to load the first record from a flat file into one table A,
the last record from a flat file into table B and the remaining records into
table C?
Solution:
This is similar to the above problem; the first 3 steps are same. In the last
step instead of using the filter transformation, you have to use router
transformation. In the router transformation create two output groups.
In the first group, the condition should be O_count=1 and connect the
corresponding output group to table A. In the second group, the condition
should be O_count=O_total_records and connect the corresponding output
group to table B. The output of default group should be connected to table C.
5. Consider the following products data which contain duplicate rows.
A
B
C

C
B
D
B
Q1. Design a mapping to load all unique products in one table and the
duplicate rows in another table.
The first table should contain the following output
A
D
The second target should contain the following output
B
B
B
C
C
Solution:
Use sorter transformation and sort the products data. Pass the output to an
expression transformation and create a dummy port O_dummy and assign 1
to that port. So that, the DUMMY output port always return 1 for each row.
The output of expression transformation will be
Product, O_dummy
A, 1
B, 1
B, 1
B, 1
C, 1
C, 1
D, 1
Pass the output of expression transformation to an aggregator
transformation. Check the group by on product port. In the aggreagtor,
create an output port O_count_of_each_product and write an expression
count(product).
The output of aggregator will be
Product, O_count_of_each_product
A, 1
B, 3
C, 2

D, 1
Now pass the output of expression transformation, aggregator
transformation to joiner transformation and join on the products port. In the
joiner transformation check the property sorted input, then only you can
connect both expression and aggregator to joiner transformation.
The output of joiner will be
product, O_dummy, O_count_of_each_product
A, 1, 1
B, 1, 3
B, 1, 3
B, 1, 3
C, 1, 2
C, 1, 2
D, 1, 1
Now pass the output of joiner to a router transformation, create one group
and specify the group condition as O_dummy=O_count_of_each_product.
Then connect this group to one table. Connect the output of default group to
another table.
Q2. Design a mapping to load each product once into one table and the
remaining products which are duplicated into another table.
The first table should contain the following output
A
B
C
D
The second table should contain the following output
B
B
C
Solution:
Use sorter transformation and sort the products data. Pass the output to an
expression transformation and create a variable port,V_curr_product, and
assign product port to it. Then create a V_count port and in the expression
editor write IIF(V_curr_product=V_prev_product, V_count+1,1). Create one
more variable port V_prev_port and assign product port to it. Now create an
output port O_count port and assign V_count port to it.

In the expression transformation, the ports are


Product
V_curr_product=product
V_count=IIF(V_curr_product=V_prev_product,V_count+1,1)
V_prev_product=product
O_count=V_count
The output of expression transformation will be
Product, O_count
A, 1
B, 1
B, 2
B, 3
C, 1
C, 2
D, 1
Now Pass the output of expression transformation to a router transformation,
create one group and specify the condition as O_count=1. Then connect this
group to one table. Connect the output of default group to another table.
Informatica Scenario Based Questions - Part 2
1. Consider the following employees data as source
employee_id, salary
10, 1000
20, 2000
30, 3000
40, 5000

Q1. Design a mapping to load the cumulative sum of salaries of employees


into target table?
The target table data should look like as
employee_id, salary, cumulative_sum
10, 1000, 1000
20, 2000, 3000
30, 3000, 6000
40, 5000, 11000
Solution:

Connect the source Qualifier to expression transformation. In the expression


transformation, create a variable port V_cum_sal and in the expression editor
write V_cum_sal+salary. Create an output port O_cum_sal and assign
V_cum_sal to it.

Q2. Design a mapping to get the pervious row salary for the current row. If
there is no pervious row exists for the current row, then the pervious row
salary should be displayed as null.
The output should look like as
employee_id, salary, pre_row_salary
10, 1000, Null
20, 2000, 1000
30, 3000, 2000
40, 5000, 3000
Solution:
Connect the source Qualifier to expression transformation. In the expression
transformation, create a variable port V_count and increment it by one for
each row entering the expression transformation. Also create V_salary
variable port and assign the expression IIF(V_count=1,NULL,V_prev_salary)
to it . Then create one more variable port V_prev_salary and assign Salary to
it. Now create output port O_prev_salary and assign V_salary to it. Connect
the expression transformation to the target ports.
In the expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
V_salary=IIF(V_count=1,NULL,V_prev_salary)
V_prev_salary=salary
O_prev_salary=V_salary

Q3. Design a mapping to get the next row salary for the current row. If there
is no next row for the current row, then the next row salary should be
displayed as null.
The output should look like as

employee_id, salary, next_row_salary


10, 1000, 2000
20, 2000, 3000
30, 3000, 5000
40, 5000, Null
Solution:
Step1: Connect the source qualifier to two expression transformation. In
each expression transformation, create a variable port V_count and in the
expression editor write V_count+1. Now create an output port O_count in
each expression transformation. In the first expression transformation, assign
V_count to O_count. In the second expression transformation assign V_count1 to O_count.
In the first expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
O_count=V_count
In the second expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
O_count=V_count-1
Step2: Connect both the expression transformations to joiner transformation
and join them on the port O_count. Consider the first expression
transformation as Master and second one as detail. In the joiner specify the
join type as Detail Outer Join. In the joiner transformation check the property
sorted input, then only you can connect both expression transformations to
joiner transformation.
Step3: Pass the output of joiner transformation to a target table. From the
joiner, connect the employee_id, salary which are obtained from the first
expression transformation to the employee_id, salary ports in target table.
Then from the joiner, connect the salary which is obtained from the second
expression transformaiton to the next_row_salary port in the target table.

Q4. Design a mapping to find the sum of salaries of all employees and this
sum should repeat for all the rows.
The output should look like as
employee_id, salary, salary_sum
10, 1000, 11000
20, 2000, 11000
30, 3000, 11000
40, 5000, 11000
Solution:
Step1: Connect the source qualifier to the expression transformation. In the
expression transformation, create a dummy port and assign value 1 to it.
In the expression transformation, the ports will be
employee_id
salary
O_dummy=1
Step2: Pass the output of expression transformation to aggregator. Create a
new port O_sum_salary and in the expression editor write SUM(salary). Do
not specify group by on any port.
In the aggregator transformation, the ports will be
salary
O_dummy
O_sum_salary=SUM(salary)
Step3: Pass the output of expression transformation, aggregator
transformation to joiner transformation and join on the DUMMY port. In the
joiner transformation check the property sorted input, then only you can
connect both expression and aggregator to joiner transformation.
Step4: Pass the output of joiner to the target table.

2. Consider the following employees table as source


department_no, employee_name
20, R

10,
10,
20,
10,
10,
20,
20,

A
D
P
B
C
Q
S

Q1. Design a mapping to load a target table with the following values from
the above source?
department_no, employee_list
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S
Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then pass the output to the expression transformation. In
the expression transformation, the ports will be
department_no
employee_name
V_employee_list =
IIF(ISNULL(V_employee_list),employee_name,V_employee_list||','||
employee_name)
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.

Q2. Design a mapping to load a target table with the following values from
the above source?
department_no, employee_list
10, A

10,
10,
10,
20,
20,
20,
20,

A,B
A,B,C
A,B,C,D
P
P,Q
P,Q,R
P,Q,R,S

Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then pass the output to the expression transformation. In
the expression transformation, the ports will be
department_no
employee_name
V_curr_deptno=department_no
V_employee_list = IIF(V_curr_deptno! =
V_prev_deptno,employee_name,V_employee_list||','||employee_name)
V_prev_deptno=department_no
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.

Q3. Design a mapping to load a target table with the following values from
the above source?
department_no, employee_names
10, A,B,C,D
20, P,Q,R,S
Solution:
The first step is same as the above problem. Pass the output of expression to
an aggregator transformation and specify the group by as department_no.
Now connect the aggregator transformation to a target table.
Informatica Scenario Based Questions - Part 3
1. Consider the following product types data as the source.

Product_id, product_type
10, video
10, Audio
20, Audio
30, Audio
40, Audio
50, Audio
10, Movie
20, Movie
30, Movie
40, Movie
50, Movie
60, Movie
Assume that there are only 3 product types are available in the source. The
source contains 12 records and you dont know how many products are
available in each product type.

Q1. Design a mapping to select 9 products in such a way that 3 products


should be selected from video, 3 products should be selected from Audio and
the remaining 3 products should be selected from Movie.
Solution:
Step1: Use sorter transformation and sort the data using the key as
product_type.
Step2: Connect the sorter transformation to an expression transformation.
In the expression transformation, the ports will be
product_id
product_type
V_curr_prod_type=product_type
V_count = IIF(V_curr_prod_type = V_prev_prod_type,V_count+1,1)
V_prev_prod_type=product_type
O_count=V_count
Step3: Now connect the expression transformaion to a filter transformation
and specify the filter condition as O_count<=3. Pass the output of filter to a
target table.

Q2. In the above problem Q1, if the number of products in a particular


product type are less than 3, then you wont get the total 9 records in the
target table. For example, see the videos type in the source data. Now
design a mapping in such way that even if the number of products in a
particular product type are less than 3, then you have to get those less
number of records from another porduc types. For example: If the number of
products in videos are 1, then the reamaining 2 records should come from
audios or movies. So, the total number of records in the target table should
always be 9.
Solution:
The first two steps are same as above.
Step3: Connect the expression transformation to a sorter transformation
and sort the data using the key as O_count. The ports in soter transformation
will be
product_id
product_type
O_count (sort key)
Step3: Discard O_count port and connect the sorter transformation to an
expression transformation. The ports in expression transformation will be
product_id
product_type
V_count=V_count+1
O_prod_count=V_count
Step4: Connect the expression to a filter transformation and specify the
filter condition as O_prod_count<=9. Connect the filter transformation to a
target table.

2. Design a mapping to convert column data into row data without using the
normalizer transformation.
The source data looks like
col1, col2, col3
a, b, c
d, e, f

The target table data should look like


Col
a
b
c
d
e
f
Solution:
Create three expression transformations with one port each. Connect col1
from Source Qualifier to port in first expression transformation. Connect col2
from Source Qualifier to port in second expression transformation. Connect
col3 from source qualifier to port in third expression transformation. Create a
union transformation with three input groups and each input group should
have one port. Now connect the expression transformations to the input
groups and connect the union transformation to the target table.

3. Design a mapping to convert row data into column data.


The source data looks like
id, value
10, a
10, b
10, c
20, d
20, e
20, f
The target table data should look like
id, col1, col2, col3
10, a, b, c
20, d, e, f
Solution:
Step1: Use sorter transformation and sort the data using id port as the key.
Then connect the sorter transformation to the expression transformation.

Step2: In the expression transformation, create the ports and assign the
expressions as mentioned below.
id
value
V_curr_id=id
V_count= IIF(v_curr_id=V_prev_id,V_count+1,1)
V_prev_id=id
O_col1= IIF(V_count=1,value,NULL)
O_col2= IIF(V_count=2,value,NULL)
O_col3= IIF(V_count=3,value,NULL)
Step3: Connect the expression transformation to aggregator transformation.
In the aggregator transforamtion, create the ports and assign the
expressions as mentioned below.
id (specify group by on this port)
O_col1
O_col2
O_col3
col1=MAX(O_col1)
col2=MAX(O_col2)
col3=MAX(O_col3)
Stpe4: Now connect the ports id, col1, col2, col3 from aggregator
transformation to the target table.
Informatica Scenario Based Questions - Part 4
Take a look at the following tree structure diagram. From the tree structure,
you can easily derive the parent-child relationship between the elements. For
example, B is parent of D and E.

The above tree structure data is represented in a table as shown below.


c1, c2, c3, c4
A, B, D, H
A, B, D, I
A, B, E, NULL
A, C, F, NULL
A, C, G, NULL
Here in this table, column C1 is parent of column C2, column C2 is parent of
column C3, column C3 is parent of column C4.
Q1. Design a mapping to load the target table with the below data. Here you
need to generate sequence numbers for each element and then you have to
get the parent id. As the element "A" is at root, it does not have any parent
and its parent_id is NULL.
id, element, parent_id
1, A, NULL
2, B, 1
3, C, 1
4, D, 2
5, E, 2
6, F, 3
7, G, 3
8, H, 4
9, I, 4
I have provided the solution for this problem in Oracle Sql query. If you are

interested you canClick Here to see the solution.


Q2. This is an extension to the problem Q1. Let say column C2 has null for all
the rows, then C1 becomes the parent of C3 and c3 is parent of C4. Let say
both columns c2 and c3 has null for all the rows. Then c1 becomes the
parent of c4. Design a mapping to accommodate these type of null
conditions.
Q1. The source data contains only column 'id'. It will have sequence
numbers from 1 to 1000. The source data looks like as
Id
1
2
3
4
5
6
7
8
....
1000
Create a workflow to load only the Fibonacci numbers in the target table. The
target table data should look like as
Id
1
2
3
5
8
13
.....
In Fibonacci series each subsequent number is the sum of previous two
numbers. Here assume that the first two numbers of the fibonacci series are
1 and 2.
Solution:
STEP1: Drag the source to the mapping designer and then in the Source
Qualifier Transformation properties, set the number of sorted ports to one.

This will sort the source data in ascending order. So that we will get the
numbers in sequence as 1, 2, 3, ....1000
STEP2: Connect the Source Qualifier Transformation to the Expression
Transformation. In the Expression Transformation, create three variable ports
and one output port. Assign the expressions to the ports as shown below.
Ports in Expression Transformation:
id
v_sum = v_prev_val1 + v_prev_val2
v_prev_val1 = IIF(id=1 or id=2,1, IIF(v_sum = id, v_prev_val2, v_prev_val1) )
v_prev_val2 = IIF(id=1 or id =2, 2, IIF(v_sum=id, v_sum, v_prev_val2) )
o_flag = IIF(id=1 or id=2,1, IIF( v_sum=id,1,0) )
STEP3: Now connect the Expression Transformation to the Filter
Transformation and specify the Filter Condition as o_flag=1
STEP4: Connect the Filter Transformation to the Target Table.
Q2. The source table contains two columns "id" and "val". The source data
looks like as below
id
1
2
3

val
a,b,c
pq,m,n
asz,ro,liqt

Here the "val" column contains comma delimited data and has three fields in
that column.
Create a workflow to split the fields in val column to separate rows. The
output should look like as below.
id
1
1
1
2
2
2
3
3
3

val
a
b
c
pq
m
n
asz
ro
liqt

Solution:
STEP1: Connect three Source Qualifier transformations to the Source
Definition
STEP2: Now connect all the three Source Qualifier transformations to
the Union Transformation. Then connect the Union Transformation to
the Sorter Transformation. In the sorter transformation sort the data
based on Id port in ascending order.
STEP3: Pass the output of Sorter Transformation to the Expression
Transformation. The ports in Expression Transformation are:
id (input/output port)
val (input port)
v_currend_id (variable port) = id
v_count (variable port) = IIF(v_current_id!=v_previous_id,1,v_count+1)
v_previous_id (variable port) = id
o_val (output port) = DECODE(v_count, 1,
SUBSTR(val, 1, INSTR(val,',',1,1)-1 ),
2,
SUBSTR(val, INSTR(val,',',1,1)+1, INSTR(val,',',1,2)INSTR(val,',',1,1)-1),
3,
SUBSTR(val, INSTR(val,',',1,2)+1),
NULL
)
STEP4: Now pass the output of Expression Transformation to the
Target definition. Connect id, o_val ports of Expression Transformation
to the id, val ports of Target Definition.
For those who are interested to solve this problem in oracle sql, Click
Here. The oracle sql query provides a dynamic solution where the "val"
column can have varying number of fields in each row.

Generate rows based on a column value - Informatica


Q) How to generate or load values in to the target table based on a column
value using informatica etl tool.

I have the products table as the source and the data of the products table is
shown below.

Table Name: Products


Product Quantity
----------------Samsung NULL
Iphone 3
LG
Nokia

0
4

Now i want to duplicate or repeat each product in the source table as many
times as the value in the quantity column. The output is

product Quantity
---------------Iphone 3
Iphone 3
Iphone 3
Nokia

Nokia

Nokia

Nokia

The Samsung and LG products should not be loaded as their quantity is


NULL, 0 respectively.
Now create informatica workflow to load the data in to the target table?
Solution:
Follow the below steps

Create a new mapping in the mapping designer

Drag the source definition in to the mapping

Create the java transformation in active mode

Drag the ports of source qualifier transformation in to the java


transformation.
Now edit the java transformation by double clicking on the title bar of
the java transformation and go to the "Java Code" tab.
Enter the below java code in the "Java Code" tab.

if (!isNull("quantity"))
{
double cnt = quantity;
for (int i = 1; i <= quantity; i++)
{
product = product;
quantity = quantity;
generateRow();
}
}

Now compile the java code. The compile button is shown in red circle in
the image.
Connect the ports of the java transformation to the target.
Save the mapping, create a workflow and run the workflow.
Flat file header row, footer row and detail rows to multiple tables
Assume that we have a flat file with header row, footer row and detail rows.
Now Lets see how to load header row into one table, footer row into other
table and detail rows into another table just by using the transformations
only.
First pass the data from source qualifier to an expression transformation. In
the expression transformation assign unique number to each row (assume
exp_count port). After that pass the data from expression to aggregator. In
the aggregator transformation don't check any group by port. So that the
aggregator will provide last row as the default output (assume agg_count
port).
Now pass the data from expression and aggregator to joiner transformation.
In the joiner select the ports from aggregator as master and the ports from
expression as details. Give the join condition on the count ports and select
the join type as master outer join. Pass the joiner output to a router
transformation and create two groups in the router. For the first group give

the condtion as exp_count=1, which gives header row. For the second group
give the condition as exp_count=agg_count, which gives the footer row. The
default group will give the detail rows.
Reverse the Contents of Flat File Informatica
Q1) I have a flat file, want to reverse the contents of the flat file which
means the first record should come as last record and last record should
come as first record and load into the target file.
As an example consider the source flat file data as

Informatica Enterprise Solution


Informatica Power center
Informatica Power exchange
Informatica Data quality

The target flat file data should look as

Informatica Data quality


Informatica Power exchange
Informatica Power center
Informatica Enterprise Solution

Solution:
Follow the below steps for creating the mapping logic

Create a new mapping.

Drag the flat file source into the mapping.

Create an expression transformation and drag the ports of source


qualifier transformation into the expression transformation.

Create the below additional ports in the expression transformation and


assign the corresponding expressions

Variable port: v_count = v_count+1


Output port o_count = v_count

Now create a sorter transformation and drag the ports of expression


transformation into it.
In the sorter transformation specify the sort key as o_count and sort
order as DESCENDING.
Drag the target definition into the mapping and connect the ports of
sorter transformation to the target.
Q2) Load the header record of the flat file into first target, footer record into
second target and the remaining records into the third target.
The solution to this problem I have already posted by using aggregator and
joiner. Now we will see how to implement this by reversing the contents of
the file.
Solution:

Connect the source qualifier transformation to the expression


transformation. In the expression transformation create the additional ports
as mentioned above.
Connect the expression transformation to a router. In the router
transformation create an output group and specify the group condition as
o_count=1. Connect this output group to a target and the default group to
sorter transformation.
Sort the data in descending order on o_count port.
Connect the output of sorter transformation to expression
transformation (dont connect o_count port).
Again in the expression transformation create the same additional
ports mentioned above.
Connect this expression transformation to router and create an output
group. In the output group specify the condition as o_count=1 and connect
this group to second target. Connect the default group to the third group.
Dynamic Target Flat File Name Generation in Informatica

Informatica 8.x or later versions provides a feature for generating the target
files dynamically. This feature allows you to
Create a new file for every session run
create a new file for each transaction.
Informatica provides a special port,"FileName" in the Target file definition.
This port you have to add explicitly. See the below diagram for adding the
"FileName" port.

Go to the Target Designer or Warehouse builder and edit the file definition.
You have to click on the button indicated in red color circle to add the special
port.
Now we will see some informatica mapping examples for creating the target
file name dynamically and load the data.
1. Generate a new file for every session run.
Whenever the session runs you need to create a new file dynamically and
load the source data into that file. To do this just follow the below steps:
STEP1: Connect the source qualifier to an expression transformation. In the
expression transformation create an output port (call it as File_Name) and
assign the expression as 'EMP_'||to_char(sessstarttime,
'YYYYMMDDHH24MISS')||'.dat'
STPE2: Now connect the expression transformation to the target and
connect eh File_Name port of expression transformation to the FileName port
of the target file definition.
STEP3: Create a workflow and run the workflow.
Here I have used sessstarttime, as it is constant throughout the session run.
If you have used sysdate, a new file will be created whenever a new
transaction occurs in the session run.

The target file names created would look like EMP_20120101125040.dat.


2. Create a new file for every session run. The file name should contain suffix
as numbers (EMP_n.dat)
In the above mapping scenario, the target flat file name contains the suffix
as 'timestamp.dat'. Here we have to create the suffix as a number. So, the
file names should looks as EMP_1.dat, EMP_2.dat and so on. Follow the below
steps:
STPE1: Go the mappings parameters and variables -> Create a new
variable, $$COUNT_VAR and its data type should be Integer
STPE2: Connect the source Qualifier to the expression transformation. In the
expression transformation create the following new ports and assign the
expressions.

v_count (variable port) = v_count+1


v_file_count (variable port) = IIF(v_count = 1, SETVARIABLE($$COUNT_VAR,$
$COUNT_VAR+1),$$COUNT_VAR)
o_file_name (output port) = 'EMP_'||v_file_count||'.dat'

STEP3: Now connect the expression transformation to the target and


connect the o_file_name port of expression transformation to the FileName
port of the target.
3. Create a new file once a day.
You can create a new file only once in a day and can run the session multiple
times in the day to load the data. You can either overwrite the file or append
the new data.
This is similar to the first problem. Just change the expression in expression
transformation to 'EMP_'||to_char(sessstarttime, 'YYYYMMDD')||'.dat'. To avoid
overwriting the file, use Append If Exists option in the session properties.
4. Create a flat file based on the values in a port.
You can create a new file for each distinct values in a port. As an example

consider the employees table as the source. I want to create a file for each
department id and load the appropriate data into the files.
STEP1: Sort the data on department_id. You can either use the source
qualifier or sorter transformation to sort the data.
STEP2: Connect to the expression transformation. In the expression
transformation create the below ports and assign expressions.

v_curr_dept_id (variable port) = dept_id


v_flag (variable port) = IIF(v_curr_dept_id=v_prev_dept_id,0,1)
v_prev_dept_id (variable port) = dept_id
o_flag (output port) = v_flag
o_file_name (output port) = dept_id||'.dat'

STEP4: Now connect the expression transformation to the transaction


control transformation and specify the transaction control condition as

IIF(o_flag = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)


Informatica Real Time Scenarios - Solutions
This is continuation to my previous post on Informatica Real Time Complex
scenarios which contains around 50 problems. Here i am providing few more
real time Informatica scenarios with answers.
Informatica Real Time Scenarios:
Q1) Alternate Target Loading
My source is a flat file which contains N number of records. I want to load the
source data into two targets such that first five records should loaded into
the first target, next five records into the second target table. Again the next
source five records into the first target table and so on. How to implement a
Informatica mapping logic for this?

Solution:

Connect the source qualifier transformation to the expression


transformation. In the expression transformation, create the below additional
ports:

v_cnt (variable port) = v_cnt+1


o_cnt (output port) = v_cnt

Connect the expression transformation to the router transformation.


Create two output groups in the router transformation and specify the
following filter conditions:

--Filter condition for first output group


DECODE(substr(o_cnt,-1,1),1,TRUE,2,TRUE,3,TRUE,4,TRUE,5,TRUE,FALSE)
--Filter condition for second output group
DECODE(substr(o_cnt,-1,1),6,TRUE,7,TRUE,8,TRUE,9,TRUE,0,TRUE,FALSE)

Connect the router transformation output groups to the appropriate


targets.
Q2) Load source data in multiple session run.
I have flat file as a source which contains N number of records. My
requirement is to load half of the source data into the target table in the first
session run and the remaining half of the records in the second session run.
Create Informatica mapping to implement this logic? Assume that the source
data does not change between session runs.
Solution:

Create a mapping to find out the number of records in the source and
write the count to a parameter file. Let call this parameter as $
$SOURCE_COUNT.
Create another mapping. Go to the mapping parameters and variables,
create a mapping variable ($$VAR_SESSION_RUNS) with integer data type.
Connect the source qualifier transformation to the expression
transformation. In the expression transformation, create the below additional
ports.

v_Count (variable port) = v_Count+1


O_Run_flag (output port) = IIF($$vAR_SESSION_RUNS=0,
setvariable($$vAR_SESSION_RUNS,1),
IIF( !ISNULL($$vAR_SESSION_RUNS)
and v_Count=1,
2,
$$vAR_SESSION_RUNS)
)
O_count (output port) = V_Count

Connect the expression transformation to the filter transformation and


specify the following filter condition:
Create/Design/Implement SCD Type 1 Mapping in Informatica
Q) How to create or implement or design a slowly changing dimension (SCD)
Type 1 using the informatica ETL tool.
The SCD Type 1 method is used when there is no need to store historical data
in the Dimension table. The SCD type 1 method overwrites the old data with
the new data in the dimension table.

The process involved in the implementation of SCD Type 1 in informatica is


Identifying the new record and inserting it in to the dimension table.

Identifying the changed record and updating the dimension table.


We see the implementation of SCD type 1 by using the customer dimension
table as an example. The source table looks as

CREATE TABLE Customers (


Customer_Id Number,
Customer_Name Varchar2(30),
Location

Varchar2(30)

Now I have to load the data of the source into the customer dimension table
using SCD Type 1. The Dimension table structure is shown below.

CREATE TABLE Customers_Dim (


Cust_Key

Number,

Customer_Id Number,
Customer_Name Varchar2(30),
Location

Varchar2(30)

Steps to Create SCD Type 1 Mapping

Follow the below steps to create SCD Type 1 mapping in informatica


Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or
import the source definition.

Go to the Warehouse designer or Target designer and import the target


definition.
Go to the mapping designer tab and create new mapping.

Drag the source into the mapping.

Go to the toolbar, Transformation and then Create.

Select the lookup Transformation, enter a name and click on create.


You will get a window as shown in the below image.

Select the customer dimension table and click on OK.

Edit the lkp transformation, go to the properties tab, and add a new
port In_Customer_Id. This new port needs to be connected to the
Customer_Id port of source qualifier transformation.

Go to the condition tab of lkp transformation and enter the lookup


condition as Customer_Id = IN_Customer_Id. Then click on OK.

Connect the customer_id port of source qualifier transformation to the


IN_Customer_Id port of lkp transformation.
Create the expression transformation with input ports as Cust_Key,
Name, Location, Src_Name, Src_Location and output ports as New_Flag,
Changed_Flag
For the output ports of expression transformation enter the below
expressions and click on ok

New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND (Name != Src_Name
OR Location != Src_Location),
1, 0 )

Now connect the ports of lkp transformation (Cust_Key, Name,


Location) to the expression transformaiton ports (Cust_Key, Name, Location)
and ports of source qualifier transformation(Name, Location) to the
expression transforamtion ports(Src_Name, Src_Location) respectively.
The mapping diagram so far created is shown in the below image.

Create a filter transformation and drag the ports of source qualifier


transformation into it. Also drag the New_Flag port from the expression
transformation into it.
Edit the filter transformation, go to the properties tab and enter the
Filter Condition as New_Flag=1. Then click on ok.
Now create an update strategy transformation and connect all the
ports of the filter transformation (except the New_Flag port) to the update
strategy. Go to the properties tab of update strategy and enter the update
strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the
appropriate ports from update strategy to the target definition.
Create a sequence generator transformation and connect the NEXTVAL
port to the target surrogate key (cust_key) port.
The part of the mapping diagram for inserting a new row is shown
below:

Now create another filter transformation and drag the ports from lkp
transformation (Cust_Key), source qualifier transformation (Name, Location),
expression transformation (changed_flag) ports into the filter transformation.
Edit the filter transformation, go to the properties tab and enter the
Filter Condition as Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of
the filter transformation (Cust_Key, Name, and Location) to the update
strategy. Go to the properties tab of update strategy and enter the update
strategy expression as DD_Update
Now drag the target definition into the mapping and connect the
appropriate ports from update strategy to the target definition.

The complete mapping diagram is shown in the below image.

Design/Implement/Create SCD Type 2 Flag Mapping in Informatica


Q) How to create or implement slowly changing dimension (SCD) Type 2
Flagging mapping in informatica?
SCD type 2 will store the entire history in the dimension table. Know more
about SCDs atSlowly Changing Dimensions Concepts.
We will see how to implement the SCD Type 2 Flag in informatica. As an
example consider the customer dimension. The source and target table
structures are shown below:

--Source Table

Create Table Customers


(
Customer_Id Number Primary Key,
Location

Varchar2(30)

);

--Target Dimension Table

Create Table Customers_Dim

(
Cust_Key Number Primary Key,
Customer_Id Number,
Location
Flag

Varchar2(30),
Number

);

The basic steps involved in creating a SCD Type 2 Flagging mapping are
Identifying the new records and inserting into the dimension table with
flag column value as one.
Identifying the changed record and inserting into the dimension table
with flag value as one.
Identify the changed record and update the existing record in
dimension table with flag value as zero.
We will divide the steps to implement the SCD type 2 flagging mapping into
four parts.
SCD Type 2 Flag implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2
Flagging. The steps involved are:

Create the source and dimension tables in the database.


Open the mapping designer tool, source analyzer and either create or
import the source definition.
Go to the Warehouse designer or Target designer and import the target
definition.
Go to the mapping designer tab and create new mapping.

Drag the source into the mapping.

Go to the toolbar, Transformation and then Create.

Select the lookup Transformation, enter a name and click on create.


You will get a window as shown in the below image.

Select the customer dimension table and click on OK.

Edit the lookup transformation, go to the ports tab and remove


unnecessary ports. Just keep only Cust_key, customer_id and location ports
in the lookup transformation. Create a new port (IN_Customer_Id) in the
lookup transformation. This new port needs to be connected to the
customer_id port of the source qualifier transformation.

Go to the conditions tab of the lookup transformation and enter the


condition as Customer_Id = IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below
query in Lookup SQL Override. Alternatively you can generate the SQL query
by connecting the database in the Lookup SQL Override expression editor
and then add the WHERE clause.

SELECT Customers_Dim.Cust_Key as Cust_Key,

Customers_Dim.Location as Location,
Customers_Dim.Customer_Id as Customer_Id
FROM

Customers_Dim

WHERE Customers_Dim.Flag = 1

Click on Ok in the lookup transformation. Connect the customer_id port


of source qualifier transformation to the In_Customer_Id port of the LKP
transformation.
Create an expression transformation with input/output ports as
Cust_Key, LKP_Location, Src_Location and output ports as New_Flag,
Changed_Flag. Enter the below expressions for output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)


Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND
LKP_Location != SRC_Location, 1, 0)

The part of the mapping flow is shown below.

SCD Type 2 Flag implementation - Part 2


In this part, we will identify the new records and insert them into the target
with flag value as 1. The steps involved are:

Now create a filter transformation to identify and insert new record in


to the dimension table. Drag the ports of expression transformation

(New_Flag) and source qualifier transformation (Customer_Id, Location) into


the filter transformation.
Go the properties tab of filter transformation and enter the filter
condition as New_Flag=1
Now create a update strategy transformation and connect the ports of
filter transformation (Customer_Id, Location). Go to the properties tab and
enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the
appropriate ports of update strategy transformation to the target definition.
Create a sequence generator and an expression transformation. Call
this expression transformation as "Expr_Flag".
Drag and connect the NextVal port of sequence generator to the
Expression transformation. In the expression transformation create a new
output port (Flag) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Flag) to
the Target definition ports (Cust_Key, Flag). The part of the mapping flow is
shown in the below image.

SCD Type 2 Flag implementation - Part 3


In this part, we will identify the changed records and insert them into the
target with flag value as 1. The steps involved are:

Create a filter transformation. Call this filter transformation as


FIL_Changed. This is used to find the changed records. Now drag the ports
from expression transformation (changed_flag), source qualifier
transformation (customer_id, location), LKP transformation (Cust_Key) into
the filter transformation.
Go to the filter transformation properties and enter the filter condition
as changed_flag =1.
Now create an update strategy transformation and drag the ports of
Filter transformation (customer_id, location) into the update strategy
transformation. Go to the properties tab and enter the update strategy
expression as DD_INSERT.

Now drag the target definition into the mapping and connect the
appropriate ports of update strategy transformation to the target definition.
Now connect the Next_Val, Flag ports of expression transformation
(Expr_Flag created in part 2) to the cust_key, Flag ports of the target
definition respectively. The part of the mapping diagram is shown below.

SCD Type 2 Flag implementation - Part 4


In this part, we will update the changed records in the dimension table with
flag value as 0.

Create an expression transformation and drag the Cust_Key port of


filter transformation (FIL_Changed created in part 3) into the expression
transformation.
Go to the ports tab of expression transformation and create a new
output port (Flag). Assign a value "0" to this Flag port.
Now create an update strategy transformation and drag the ports of
the expression transformation into it. Go to the properties tab and enter the
update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the
appropriate ports of update strategy to it. The complete mapping image is
shown below.

Design/Implement/Create SCD Type 2 Effective Date Mapping in


Informatica
Q) How to create or implement slowly changing dimension (SCD) Type 2
Effective Date mapping in informatica?

SCD type 2 will store the entire history in the dimension table. In SCD type 2
effective date, the dimension table will have Start_Date (Begin_Date) and
End_Date as the fields. If the End_Date is Null, then it indicates the current
row. Know more about SCDs at Slowly Changing Dimensions Concepts.
We will see how to implement the SCD Type 2 Effective Date in informatica.
As an example consider the customer dimension. The source and target table
structures are shown below:

--Source Table

Create Table Customers


(
Customer_Id Number Primary Key,
Location

Varchar2(30)

);

--Target Dimension Table

Create Table Customers_Dim


(
Cust_Key Number Primary Key,
Customer_Id Number,
Location
Begin_Date
End_Date

Varchar2(30),
Date,
Date

);

The basic steps involved in creating a SCD Type 2 Effective Date mapping are
Identifying the new records and inserting into the dimension table with
Begin_Date as the Current date (SYSDATE) and End_Date as NULL.
Identifying the changed record and inserting into the dimension table
with Begin_Date as the Current date (SYSDATE) and End_Date as NULL.
Identify the changed record and update the existing record in
dimension table with End_Date as Curren date.
We will divide the steps to implement the SCD type 2 Effective Date mapping
into four parts.
SCD Type 2 Effective Date implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2
Effective Date. The steps involved are:

Create the source and dimension tables in the database.


Open the mapping designer tool, source analyzer and either create or
import the source definition.
Go to the Warehouse designer or Target designer and import the target
definition.
Go to the mapping designer tab and create new mapping.

Drag the source into the mapping.

Go to the toolbar, Transformation and then Create.

Select the lookup Transformation, enter a name and click on create.


You will get a window as shown in the below image.

Select the customer dimension table and click on OK.

Edit the lookup transformation, go to the ports tab and remove


unnecessary ports. Just keep only Cust_key, customer_id and location ports
in the lookup transformation. Create a new port (IN_Customer_Id) in the
lookup transformation. This new port needs to be connected to the
customer_id port of the source qualifier transformation.

Go to the conditions tab of the lookup transformation and enter the


condition as Customer_Id = IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below
query in Lookup SQL Override. Alternatively you can generate the SQL query
by connecting the database in the Lookup SQL Override expression editor
and then add the WHERE clause.

SELECT Customers_Dim.Cust_Key as Cust_Key,


Customers_Dim.Location as Location,
Customers_Dim.Customer_Id as Customer_Id
FROM

Customers_Dim

WHERE Customers_Dim.End_Date IS NULL

Click on Ok in the lookup transformation. Connect the customer_id port


of source qualifier transformation to the In_Customer_Id port of the LKP
transformation.
Create an expression transformation with input/output ports as
Cust_Key, LKP_Location, Src_Location and output ports as New_Flag,
Changed_Flag. Enter the below expressions for output ports.

New_Flag = IIF(ISNULL(Cust_Key), 1,0)


Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND
LKP_Location != SRC_Location, 1, 0)

The part of the mapping flow is shown below.

SCD Type 2 Effective Date implementation - Part 2

In this part, we will identify the new records and insert them into the target
with Begin Date as the current date. The steps involved are:
Now create a filter transformation to identify and insert new record in
to the dimension table. Drag the ports of expression transformation
(New_Flag) and source qualifier transformation (Customer_Id, Location) into
the filter transformation.
Go the properties tab of filter transformation and enter the filter
condition as New_Flag=1
Now create a update strategy transformation and connect the ports of
filter transformation (Customer_Id, Location). Go to the properties tab and
enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the
appropriate ports of update strategy transformation to the target definition.

Create a sequence generator and an expression transformation. Call


this expression transformation as "Expr_Date".
Drag and connect the NextVal port of sequence generator to the
Expression transformation. In the expression transformation create a new
output port (Begin_Date with date/time data type) and assign value SYSDATE
to it.
Now connect the ports of expression transformation (Nextval,
Begin_Date) to the Target definition ports (Cust_Key, Begin_Date). The part of
the mapping flow is shown in the below image.

SCD Type 2 Effective Date implementation - Part 3

In this part, we will identify the changed records and insert them into the
target with Begin Date as the current date. The steps involved are:
Create a filter transformation. Call this filter transformation as
FIL_Changed. This is used to find the changed records. Now drag the ports
from expression transformation (changed_flag), source qualifier
transformation (customer_id, location), LKP transformation (Cust_Key) into
the filter transformation.
Go to the filter transformation properties and enter the filter condition
as changed_flag =1.
Now create an update strategy transformation and drag the ports of
Filter transformation (customer_id, location) into the update strategy
transformation. Go to the properties tab and enter the update strategy
expression as DD_INSERT.
Now drag the target definition into the mapping and connect the
appropriate ports of update strategy transformation to the target definition.
Now connect the Next_Val, Begin_Date ports of expression
transformation (Expr_Date created in part 2) to the cust_key, Begin_Date
ports of the target definition respectively. The part of the mapping diagram is
shown below.

SCD Type 2 Effective Date implementation - Part 4

In this part, we will update the changed records in the dimension table with
End Date as current date.
Create an expression transformation and drag the Cust_Key port of
filter transformation (FIL_Changed created in part 3) into the expression
transformation.
Go to the ports tab of expression transformation and create a new
output port (End_Date with date/time data type). Assign a value SYSDATE to
this port.
Now create an update strategy transformation and drag the ports of
the expression transformation into it. Go to the properties tab and enter the
update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the
appropriate ports of update strategy to it. The complete mapping image is
shown below.

Create/Design/Implement SCD Type 3 Mapping in Informatica


Q) How to create or implement or design a slowly changing dimension (SCD)
Type 3 using the informatica ETL tool.
The SCD Type 3 method is used to store partial historical data in the
Dimension table. The dimension table contains the current and previous
data.

The process involved in the implementation of SCD Type 3 in informatica is


Identifying the new record and insert it in to the dimension table.

Identifying the changed record and update the existing record in the
dimension table.
We will see the implementation of SCD type 3 by using the customer
dimension table as an example. The source table looks as

CREATE TABLE Customers (


Customer_Id Number,
Location

Varchar2(30)

Now I have to load the data of the source into the customer dimension table
using SCD Type 3. The Dimension table structure is shown below.

CREATE TABLE Customers_Dim (


Cust_Key

Number,

Customer_Id Number,
Curent_Location
Previous_Location

Varchar2(30),
Varchar2(30)

Steps to Create SCD Type 3 Mapping

Follow the below steps to create SCD Type 3 mapping in informatica


Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or
import the source definition.

Go to the Warehouse designer or Target designer and import the target


definition.
Go to the mapping designer tab and create new mapping.

Drag the source into the mapping.

Go to the toolbar, Transformation and then Create.

Select the lookup Transformation, enter a name and click on create.


You will get a window as shown in the below image.

Select the customer dimension table and click on OK.

Edit the LKP transformation, go to the properties tab, remove the


Previous_Location port and add a new port In_Customer_Id. This new port
needs to be connected to the Customer_Id port of source qualifier
transformation.

Go to the condition tab of LKP transformation and enter the lookup


condition as Customer_Id = IN_Customer_Id. Then click on OK.

Connect the customer_id port of source qualifier transformation to the


IN_Customer_Id port of LKP transformation.
Create the expression transformation with input ports as Cust_Key,
Prev_Location, Curr_Location and output ports as New_Flag, Changed_Flag
For the output ports of expression transformation enter the below
expressions and click on ok

New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND Prev_Location != Curr_Location,
1, 0 )

Now connect the ports of LKP transformation (Cust_Key,


Curent_Location) to the expression transformaiton ports (Cust_Key,
Prev_Location) and ports of source qualifier transformation (Location) to the
expression transformation ports (Curr_Location) respectively.
The mapping diagram so far created is shown in the below image.

Create a filter transformation and drag the ports of source qualifier


transformation into it. Also drag the New_Flag port from the expression
transformation into it.
Edit the filter transformation, go to the properties tab and enter the
Filter Condition as New_Flag=1. Then click on ok.
Now create an update strategy transformation and connect all the
ports of the filter transformation (except the New_Flag port) to the update
strategy. Go to the properties tab of update strategy and enter the update
strategy expression as DD_INSERT
Now drag the target definition into the mapping and connect the
appropriate ports from update strategy to the target definition. Connect

Location port of update strategy to the Current_Location port of the target


definition.
Create a sequence generator transformation and connect the NEXTVAL
port to the target surrogate key (cust_key) port.
The part of the mapping diagram for inserting a new row is shown
below:

Now create another filter transformation, Go to the ports tab and


create the ports Cust_Key, Curr_Location, Prev_Location, Changed_Flag.
Connect the ports LKP Transformation (Cust_Key, Current_Location) to the
filter transformation ports (Cust_Key, Prev_Location), source qualifier
transformation ports (Location) to the filter transformation port
(Curr_Location) and expression transformation port(changed_flag) to the
changed_flag port of the filter transformation.
Edit the filter transformation, go to the properties tab and enter the
Filter Condition as Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of
the filter transformation (Cust_Key, Curr_Location, Prev_location) to the
update strategy. Go to the properties tab of update strategy and enter the
update strategy expression as DD_Update
Now drag the target definition into the mapping and connect the
appropriate ports from update strategy to the target definition.
The complete mapping diagram is shown in the below image.

Session - a task associated with a mapping to define the connections and


other configurations for that mapping.
Workflow - controls the execution of tasks such as commands, emails and
sessions.

Worklet - a workflow that can be called within a workflow.


Mapping - represents the flow and transformation of data from source to
taraget.
Mapplet - a group of transformations that can be called within a mappin

IIF (O_Run_Flag =1, v_count<= $$SOURCE_COUNT/2,


IIF (O_Run_Flag =2, v_count > $$SOURCE_COUNT/2))

Connect the filter transformation to the target.


Here i am assuming that you know how to use a parameter file. That is
why I did not specify the complete details.
Get Previous Row Value in Informatica
How to get the previous row value while processing the current row in
informatica?
One of my blog readers asked this question. The source data is shown
below:

Table Name: Customers

cust_id, Year, City


----------------------10,

2001, BLR

10,

2002, MUM

10,

2003, SEA

10,

2004, NY

20,

2001, DEL

20,

2002, NCR

20,

2003, HYD

The question is for each customer when processing the record for current
row, you have to get the previous row city value. If there is no previous row,
then make the previous row value as null. The output data is shown below:

Table Name: Customers_TGT

cust_id, Year, City, prev_city


-----------------------------10,

2001, BLR, NULL

10,

2002, MUM, BLR

10,

2003, SEA, MUM

10,

2004, NY, SEA

20,

2001, DEL, NuLL,

20,

2002, NCR, DEL

20,

2003, HYD, NCR

Getting Previous Row Value Informatica Mapping Logic


Solution:

Connect the source qualifier transformation to the sorter


transformation and sort the data on cust_id, year ports in ascending order.
Connect the sorter transformation to the expression transformation. In
the expression transformation, create the below additional ports and assign
the corresponding expressions:

cust_id (input/output port)


year
city

(input/output port)
(input/output port)

v_current_cust_id (variable port) = cust_id


v_act_previous_city (variable port ) = IIF(v_current_cust_id =
v_previous_cust_id, v_previous_city, NULL)
v_previous_city

(variable port) = city

v_previous_cust_id (variable port) = cust_id


o_previous_city

(output port) = v_act_previous_city

Connect the output ports of expression transformation to the target

Get Previous Row Value in Informatica


How to get the previous row value while processing the current row in
informatica?
One of my blog readers asked this question. The source data is shown
below:

Table Name: Customers

cust_id, Year, City


----------------------10,

2001, BLR

10,

2002, MUM

10,

2003, SEA

10,

2004, NY

20,

2001, DEL

20,

2002, NCR

20,

2003, HYD

The question is for each customer when processing the record for current
row, you have to get the previous row city value. If there is no previous row,
then make the previous row value as null. The output data is shown below:

Table Name: Customers_TGT

cust_id, Year, City, prev_city


-----------------------------10,

2001, BLR, NULL

10,

2002, MUM, BLR

10,

2003, SEA, MUM

10,

2004, NY, SEA

20,

2001, DEL, NuLL,

20,

2002, NCR, DEL

20,

2003, HYD, NCR

Getting Previous Row Value Informatica Mapping Logic


Solution:

Connect the source qualifier transformation to the sorter


transformation and sort the data on cust_id, year ports in ascending order.
Connect the sorter transformation to the expression transformation. In
the expression transformation, create the below additional ports and assign
the corresponding expressions:

cust_id (input/output port)


year
city

(input/output port)
(input/output port)

v_current_cust_id (variable port) = cust_id


v_act_previous_city (variable port ) = IIF(v_current_cust_id =
v_previous_cust_id, v_previous_city, NULL)
v_previous_city

(variable port) = city

v_previous_cust_id (variable port) = cust_id


o_previous_city

(output port) = v_act_previous_city

Connect the output ports of expression transformation to the target


Convert multiple Rows to single row (multiple Columns) in
Informatica
I have the sales table as a source. The sales table contains the sales
information of products for each year and month. The data in the source
table is shown below:

Source Data: Sales table

year product month amount


------------------------1999 A

Jan 9600

1999 A

Feb 2000

1999 A

Mar 2500

2001 B

Jan 3000

2001 B

Feb 3500

2001 B

Mar 4000

The sales information of a product for each month is available in a separate


row. I want to convert the rows for all the months in a specific year to a
single row. The output is shown below:

Target Data:

year product Jan_month Feb_month2 Mar_month


------------------------------------------1999 A

9600

2000

2500

2001 B

3000

3500

4000

How to implement a mapping logic for this in informatica?


Solution:

Follow the below steps to implement the mapping logic for the above
scenario in informatica:
Create a new mapping.

Drag the source into the mapping.

Create an expression transformation.

Drag the ports of source qualifier into the expression transformation.

Create the below additional ports in the expression transformation and


assign the corresponding expressions:
Jan_Month (output port) = IIF(month='Jan', amount, null)
Feb_Month (output port) = IIF(month='Feb', amount, null)
Mar_Month (output port) = IIF(month='Mar', amount, null)

Connect the expression transformation to an aggregator


transformation. Connect only the ports year, product, Jan_Month,
Feb_Month,Mar_Month ports of expression to aggregator transformation.
Group by on year and product in aggregator transformation.
Create the below additional ports in aggregator transformation and
assign the corresponding expressions:
o_Jan_Month (output port) = MAX(Jan_Month)
o_Feb_Month (output port) = MAX(Feb_Month)

o_Mar_Month (output port) = MAX(Mar_Month)

Now connect the ports year, product, o_Jan_Month, o_Feb_Month,


o_Mar_Month of aggregator transformation to the target.
Save the mapping.
Cumulative Sum Calculation in Informatica
Q) How to find the cumulative sum of salaries of employees in informatica?
I have employees table as a source. The data in the employees table is
shown below:

Table name: employees

Dept_Id, emp_id, salary


--------------------10, 201, 10000
10, 202, 20000
10, 203, 30000
20, 301, 40000
20 302, 50000

I want to sort the data on the department id, employee id and then find the
cumulative sum of salaries of employees in each department. The output i
shown below:

Dept_id emp_id salary, Cum_salary


---------------------------------

10, 201, 10000, 10000


10, 202, 20000, 30000
10, 203, 30000, 60000
20, 301, 40000, 40000
20 302, 50000, 90000

Solution: Follow the below steps for implementing mapping logic in


informatica.
Connect the source qualifier transformation to a sorter transformation.
Sort the rows on the dept_id and emp_id ports in ascending order.
Connect the sorter transformation to the expression transformation. In
the expression transformation, create the following additional ports and
assign the corresponding expressions:
v_salary (variable port) = IIF(dept_id = v_last_dept_id, v_salary + salary,
salary)
v_last_dept_id (variable port) = dept_id
o_cum_salary (output port) = v_salary

Connect the expression transformation ports to the target. Save the


mapping.
Load all records except last N - Informatica
Q) I want to load all the records from my source, which is a file, except the
last 5 records. This question can be asked interview as "How to remove the
footer record which is last record"
Example: My source file contains the following records:

Name
---A

B
C
D
E
F
G

After excluding the last 5 records, i want to load A,B into the target. How to
implement a mapping logic for this in informatica?

Solution: Follow the below steps


Connect the source qualifier transformation, NEXTVAL port of sequence
generator to the sorter transformation.
In the sorter transformation, check the key box corresponding to
NEXTVAL port and change the direction to Descending.
Create one more sequence generator transformation and a filter
transformation.
Connect the NEXTVAL port of the second sequence generator
transformation to the filter and Name port of sorter transformation to filter.
Specify the filter condition as NEXTVAL > 5.
Save the mapping. Create a workflow and session. Save the workflow
and run the workflow.
You can use the same approach to remove the footer record from the source
by specifying the filter condition as NEXVAL>1. If you have any issues in
solving this problem, please do comment here.

Load Last N Records of File into Target Table - Informatica


Q) How to load only the last N rows from source file into the target table
using the mapping in informatica?
First take a look at the below data in the source file:

Products
-------Windows
Linux
Unix
Ubuntu
Fedora
Centos
Debian

I want to load only the last record or footer into the target table. The target
should contain only the product "Debain". Follow the below steps for
implementing the mapping logic in informatica:

The mapping flow and the transformations are shown below:


SRC->SQ->EXPRESSION->SORTER->EXPRESSION->FILTER->TGT

Create a new mapping and drag the source into the mapping. By
default, it creates the source qualifier transformation.
Now create an expression transformation and drag the ports from
source qualifier into the expression transformation. In the expression
transformation, create the below additional ports and assign the
corresponding expressions:
v_count (variable port) = v_count+1
o_count (output port) = v_count

The output of expression transformation is

Products, o_count
----------------Windows, 1
Linux, 2
Unix,

Ubuntu, 4
Fedora, 5
Centos, 6
Debian, 7

Now connect the expression transformation to a sorter transformation


and sort the rows on the o_count port in descending order. The output of
sorter transformation is shown below:
Products
-------Debian
Centos
Fedora
Ubuntu
Unix
Linux
Windows

Create another expression transformation and connect the Products


port of sorter to expression transformation. Create the following ports in the
expression transformation:
v_count (variable port) = v_count+1
o_count (output port) = v_count

Connect the expression to a filter transformation and specify the filter


condition as o_count = 1.
Connect the filter to the target and save the mapping.
If you have are facing any issues in loading last N records, comment here.
Load Alternative Records / Rows into Multiple Targets - Informatica
Q) How to load records alternativly into multiple targets in informatica?
Implement mapping logic for this.
I have a source file which contains N number of records. I want to load the
source records into two targets, such that first row goest into target 1,
second row goes into target2, third row goes into target3 and so on.
Let see how to create a mapping logic for this in informatica with an
example. Consider the following source flat file as an example:

Products
--------Informatica
Datastage
Pentaho
MSBI
Oracle

Mysql

The data in the targets should be:

Target1
------Informatica
Pentaho
Oracle

Target2
------Datastage
MSBI
Mysql

Solution:
The mapping flow and the transformations used are mentioned below:

SRC->SQ->EXP->RTR->TGTS

First create a new mapping and drag the source into the mapping.

Create an expression transformation. Drag the ports of source qualifier


into the expression transformation. Create the following additional ports and
assign the corresponding expressions:
v_count (variable port) = v_count+1
o_count (output port) = v_count

Create a router transformation and drag the ports (products, v_count)


from expression transformation into the router transformation. Create an
output group in the router transformation and specify the following filter
condition:
MOD(o_count,2) = 1

Now connect the output group of the router transformation to the


target1 and default group to target2. Save the mapping.
In the above solution, I have used expression transfromation for generating
numbers. You can also use sequence generator transformation for producing
sequence values.
This is how we have to load alternative records into multiple targets.
For more problems check - informatica scenarios
Generate rows based on a column value - Informatica
Q) How to generate or load values in to the target table based on a column
value using informatica etl tool.
I have the products table as the source and the data of the products table is
shown below.

Table Name: Products


Product Quantity
-----------------

Samsung NULL
Iphone 3
LG
Nokia

0
4

Now i want to duplicate or repeat each product in the source table as many
times as the value in the quantity column. The output is

product Quantity
---------------Iphone 3
Iphone 3
Iphone 3
Nokia

Nokia

Nokia

Nokia

The Samsung and LG products should not be loaded as their quantity is


NULL, 0 respectively.
Now create informatica workflow to load the data in to the target table?
Solution:
Follow the below steps

Create a new mapping in the mapping designer

Drag the source definition in to the mapping

Create the java transformation in active mode

Drag the ports of source qualifier transformation in to the java


transformation.
Now edit the java transformation by double clicking on the title bar of
the java transformation and go to the "Java Code" tab.
Enter the below java code in the "Java Code" tab.

if (!isNull("quantity"))
{
double cnt = quantity;
for (int i = 1; i <= quantity; i++)
{
product = product;
quantity = quantity;
generateRow();
}
}

Now compile the java code. The compile button is shown in red circle in
the image.
Connect the ports of the java transformation to the target.
Save the mapping, create a workflow and run the workflow.
If you like this post, please share it on google by clicking on the +1 button
Load Source File Name in Target - Informatica
Q) How to load the name of the current processing flat file along with the
data into the target using informatica mapping?
We will create a simple pass through mapping to load the data and "file
name" from a flat file into the target. Assume that we have a source file
"customers" and want to load this data into the target "customers_tgt". The
structures of source and target are

Source file name: customers.dat


Customer_Id
Location

Target: Customers_TBL
Customer_Id
Location
FileName

The steps involved are:


Login to the powercenter mapping designer and go to the source
analyzer.
You can create the flat file or import the flat file.
Once you created a flat file, edit the source and go to the properties
tab. Check the option "Add Currently Processed Flat File Name Port". This
option is shown in the below image.

A new port, "CurrentlyProcessedFileName" is created in the ports tab.


Now go to the Target Designer or Warehouse Designer and create or
import the target definition. Create a "Filename" port in the target.
Go to the Mapping designer tab and create new mapping.
Drag the source and target into the mapping. Connect the appropriate
ports of source qualifier transformation to the target.
Now create a workflow and session. Edit the session and enter the
appropriate values for source and target connections.
The mapping flow is shown in the below image

The loading of the filename works for both Direct and Indirect Source
filetype. After running the workflow, the data and the filename will be loaded
in to the target. The important point to note is the complete path of the file
will be loaded into the target. This means that the directory path and the
filename will be loaded(example: /informatica/9.1/SrcFiles/Customers.dat).

If you dont want the directory path and just want the filename to be loaded
in to the target, then follow the below steps:
Create an expression transformation and drag the ports of source
qualifier transformation into it.
Edit the expression transformation, go to the ports tab, create an
output port and assign the below expression to it.

REVERSE
(
SUBSTR
(
REVERSE(CurrentlyProcessedFileName),
1,
INSTR(REVERSE(CurrentlyProcessedFileName), '/') - 1
)
)

Now connect the appropriate ports of expression transformati


Reverse the Contents of Flat File Informatica
Q1) I have a flat file, want to reverse the contents of the flat file which
means the first record should come as last record and last record should
come as first record and load into the target file.

As an example consider the source flat file data as

Informatica Enterprise Solution


Informatica Power center
Informatica Power exchange
Informatica Data quality

The target flat file data should look as

Informatica Data quality


Informatica Power exchange
Informatica Power center
Informatica Enterprise Solution

Solution:
Follow the below steps for creating the mapping logic

Create a new mapping.

Drag the flat file source into the mapping.

Create an expression transformation and drag the ports of source


qualifier transformation into the expression transformation.
Create the below additional ports in the expression transformation and
assign the corresponding expressions

Variable port: v_count = v_count+1

Output port o_count = v_count

Now create a sorter transformation and drag the ports of expression


transformation into it.
In the sorter transformation specify the sort key as o_count and sort
order as DESCENDING.
Drag the target definition into the mapping and connect the ports of
sorter transformation to the target.
Q2) Load the header record of the flat file into first target, footer record into
second target and the remaining records into the third target.
The solution to this problem I have already posted by using aggregator and
joiner. Now we will see how to implement this by reversing the contents of
the file.
Solution:

Connect the source qualifier transformation to the expression


transformation. In the expression transformation create the additional ports
as mentioned above.
Connect the expression transformation to a router. In the router
transformation create an output group and specify the group condition as
o_count=1. Connect this output group to a target and the default group to
sorter transformation.
Sort the data in descending order on o_count port.
Connect the output of sorter transformation to expression
transformation (dont connect o_count port).
Again in the expression transformation create the same additional
ports mentioned above.
Connect this expression transformation to router and create an output
group. In the output group specify the condition as o_count=1 and connect
this group to second target. Connect the default group to the third group.
Dynamic Target Flat File Name Generation in Informatica
Informatica 8.x or later versions provides a feature for generating the target
files dynamically. This feature allows you to
Create a new file for every session run
create a new file for each transaction.

Informatica provides a special port,"FileName" in the Target file definition.


This port you have to add explicitly. See the below diagram for adding the
"FileName" port.

Go to the Target Designer or Warehouse builder and edit the file definition.
You have to click on the button indicated in red color circle to add the special
port.
Now we will see some informatica mapping examples for creating the target
file name dynamically and load the data.
1. Generate a new file for every session run.
Whenever the session runs you need to create a new file dynamically and
load the source data into that file. To do this just follow the below steps:
STEP1: Connect the source qualifier to an expression transformation. In the
expression transformation create an output port (call it as File_Name) and
assign the expression as 'EMP_'||to_char(sessstarttime,
'YYYYMMDDHH24MISS')||'.dat'
STPE2: Now connect the expression transformation to the target and
connect eh File_Name port of expression transformation to the FileName port
of the target file definition.
STEP3: Create a workflow and run the workflow.
Here I have used sessstarttime, as it is constant throughout the session run.
If you have used sysdate, a new file will be created whenever a new
transaction occurs in the session run.
The target file names created would look like EMP_20120101125040.dat.
2. Create a new file for every session run. The file name should contain suffix

as numbers (EMP_n.dat)
In the above mapping scenario, the target flat file name contains the suffix
as 'timestamp.dat'. Here we have to create the suffix as a number. So, the
file names should looks as EMP_1.dat, EMP_2.dat and so on. Follow the below
steps:
STPE1: Go the mappings parameters and variables -> Create a new
variable, $$COUNT_VAR and its data type should be Integer
STPE2: Connect the source Qualifier to the expression transformation. In the
expression transformation create the following new ports and assign the
expressions.

v_count (variable port) = v_count+1


v_file_count (variable port) = IIF(v_count = 1, SETVARIABLE($$COUNT_VAR,$
$COUNT_VAR+1),$$COUNT_VAR)
o_file_name (output port) = 'EMP_'||v_file_count||'.dat'

STEP3: Now connect the expression transformation to the target and


connect the o_file_name port of expression transformation to the FileName
port of the target.
3. Create a new file once a day.
You can create a new file only once in a day and can run the session multiple
times in the day to load the data. You can either overwrite the file or append
the new data.
This is similar to the first problem. Just change the expression in expression
transformation to 'EMP_'||to_char(sessstarttime, 'YYYYMMDD')||'.dat'. To avoid
overwriting the file, use Append If Exists option in the session properties.
4. Create a flat file based on the values in a port.
You can create a new file for each distinct values in a port. As an example
consider the employees table as the source. I want to create a file for each
department id and load the appropriate data into the files.
STEP1: Sort the data on department_id. You can either use the source

qualifier or sorter transformation to sort the data.


STEP2: Connect to the expression transformation. In the expression
transformation create the below ports and assign expressions.

v_curr_dept_id (variable port) = dept_id


v_flag (variable port) = IIF(v_curr_dept_id=v_prev_dept_id,0,1)
v_prev_dept_id (variable port) = dept_id
o_flag (output port) = v_flag
o_file_name (output port) = dept_id||'.dat'

STEP4: Now connect the expression transformation to the transaction


control transformation and specify the transaction control condition as

IIF(o_flag = 1, TC_COMMIT_BEFORE, TC_CONTINUE_TRANSACTION)

STEP5: Now connect to the target file definition.

Você também pode gostar