Escolar Documentos
Profissional Documentos
Cultura Documentos
2
3
4
5
6
7
8
9
1
0
1
1
1
2
De-normalized data
E R schema
Star schema
3.
What are the types of datawarehousing?
EDW (Enterprise datawarehousing)
It provides a central database for decision support throughout the enterprise
It is a collection of DATAMARTS
DATAMART
It is a subset of Datawarehousing
It is a subject oriented database which supports the needs of individuals
depts. in an organizations
It is called high performance query structure
It supports particular line of business like sales, marketing etc..
ODS (Operational data store)
It is defined as an integrated view of operational database designed to
support operational monitoring
It is a collection of operational data sources designed to support Transaction
processing
Data is refreshed near real-time and used for business activity
It is an intermediate between the OLTP and OLAP which helps to create an
instance reports
4.
5.
Entity
Table
Attribute
Column
Primary Key
Alternate Key
Rule
Relationship
Foreign Key
Definition
Comment
process. The DTM process is the second process associated with the session
run. The
primary purpose of the DTM process is to create and manage threads that
carry out
the session tasks.
The DTM allocates process memory for the session and divide it into
buffers. This
is also known as buffer memory. It creates the main thread, which is called
the
master thread. The master thread creates and manages all other threads.
If we partition a session, the DTM creates a set of threads for each partition
to
allow concurrent processing.. When Informatica server writes messages to
the
session log it includes thread type and thread ID.
Following are the types of threads that DTM creates:
Master Thread - Main thread of the DTM process. Creates and manages all
other
threads.
Mapping Thread - One Thread to Each Session. Fetches Session and
Mapping
Information.
Pre and Post Session Thread - One Thread each to Perform Pre and Post
Session
Operations.
Reader Thread - One Thread for Each Partition for Each Source Pipeline.
Writer Thread - One Thread for Each Partition if target exist in the source
pipeline
write to the target.
Transformation Thread - One or More Transformation Thread For Each
Partition.
Q. What is Session and Batches?
Session - A Session Is A set of instructions that tells the Informatica Server
How
And When To Move Data From Sources To Targets. After creating the session,
we
can use either the server manager or the command line program pmcmd to
start
or stop the session.
Batches - It Provides A Way to Group Sessions For Either Serial Or Parallel
Execution By The Informatica Server. There Are Two Types Of Batches:
1. Sequential - Run Session One after the Other.
2. Concurrent - Run Session At The Same Time.
Q. How many ways you can update a relational source definition and
what
are they?
A. Two ways
1. Edit the definition
2. Reimport the definition
Q. What is a transformation?
A. It is a repository object that generates, modifies or passes data.
Q. What are the designer tools for creating transformations?
A. Mapping designer
Transformation developer
Mapplet designer
Q. In how many ways can you create ports?
A. Two ways
1. Drag the port from another transformation
2. Click the add button on the ports tab.
Q. What are reusable transformations?
A. A transformation that can be reused is called a reusable transformation
They can be created using two methods:
1. Using transformation developer
2. Create normal one and promote it to reusable
Q. Is aggregate cache in aggregator transformation?
A. The aggregator stores data in the aggregate cache until it completes
aggregate calculations. When u run a session that uses an aggregator
transformation, the Informatica server creates index and data caches in
memory to process the transformation. If the Informatica server requires
more space, it stores overflow values in cache files.
Q. What r the settings that u use to configure the joiner
transformation?
Master and detail source
Type of join
Condition of the join
Q. What are the join types in joiner transformation?
A. Normal (Default) -- only matching rows from both master and detail
Master outer -- all detail rows and only matching rows from master
Detail outer -- all master rows and only matching rows from detail
Full outer -- all rows from both master and detail (matching or non
matching)
Q. What are the joiner caches?
A. When a Joiner transformation occurs in a session, the Informatica Server
reads all the records from the master source and builds index and data
caches based on the master rows. After building the caches, the Joiner
transformation reads records
Post-load of the Source. After the session retrieves data from the source,
the stored procedure runs. This is useful for removing temporary tables.
Pre-load of the Target. Before the session sends data to the target, the
stored procedure runs. This is useful for verifying target tables or disk space
on the target system.
Post-load of the Target. After the session sends data to the target, the
stored procedure runs. This is useful for re-creating indexes on the database.
It must contain at least one Input and one Output port.
Q. What kinds of sources and of targets can be used in Informatica?
Sources may be Flat file, relational db or XML.
Target may be relational tables, XML or flat files.
Q: What is Session Process?
A: The Load Manager process. Starts the session, creates the DTM
process, and
sends post-session email when the session completes.
Q. What is DTM process?
A: The DTM process creates threads to initialize the session, read, write,
transform
data and handle pre and post-session operations.
Q. What is the different type of tracing levels?
Tracing level represents the amount of information that Informatica
Server writes in a log file. Tracing levels store information about mapping
and transformations. There are 4 types of tracing levels supported
1. Normal: It specifies the initialization and status information and
summarization of the success rows and target rows and the information
about the skipped rows due to transformation errors.
2. Terse: Specifies Normal + Notification of data
3. Verbose Initialization: In addition to the Normal tracing, specifies the
location of the data cache files and index cache files that are treated and
detailed transformation statistics for each and every transformation within
the mapping.
4. Verbose Data: Along with verbose initialization records each and every
record processed by the informatica server.
Q. TYPES OF DIMENSIONS?
A dimension table consists of the attributes about the facts. Dimensions
store the textual descriptions of the business.
Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact
table to which they are joined.
Eg: The date dimension table connected to the sales facts is identical to the
date dimension connected to the inventory facts.
Junk Dimension:
Reject file: This file contains the rows of data that the writer does not write
to
targets.
Control file: Informatica server creates control file and a target file when
you run a
session that uses the external loader. The control file contains the
information about
the target flat file such as data format and loading instructions for the
external
loader.
Post session email: Post session email allows you to automatically
communicate
information about a session run to designated recipients. You can create two
different messages. One if the session completed successfully the other if
the session
fails.
Indicator file: If you use the flat file as a target, you can configure the
Informatica
server to create indicator file. For each target row, the indicator file contains
a
number to indicate whether the row was marked for insert, update, delete or
reject.
Output file: If session writes to a target file, the Informatica server creates
the
target file based on file properties entered in the session property sheet.
Cache files: When the Informatica server creates memory cache it also
creates cache
files.
For the following circumstances Informatica server creates index and data
cache
files:
Aggregator transformation
Joiner transformation
Rank transformation
Lookup transformation
Q. What is meant by lookup caches?
A. The Informatica server builds a cache in memory when it processes the
first row
of a data in a cached look up transformation. It allocates memory for the
cache
based on the amount you configure in the transformation or session
properties. The
Informatica server stores condition values in the index cache and output
values in
the data cache.
Q. How do you identify existing rows of data in the target table
using lookup
transformation?
A. There are two ways to lookup the target table to verify a row exists or not :
1. Use connect dynamic cache lookup and then check the values of
NewLookuprow
Output port to decide whether the incoming record already exists in the table
/ cache
or not.
2. Use Unconnected lookup and call it from an expression transformation and
check
the Lookup condition port value (Null/ Not Null) to decide whether the
incoming
record already exists in the table or not.
Q. What are Aggregate tables?
Aggregate table contains the summary of existing warehouse data which is
grouped to certain levels of dimensions. Retrieving the required data from
the actual table, which have millions of records will take more time and also
affects the server performance. To avoid this we can aggregate the table to
certain required level and can use it. This tables reduces the load in the
database server and increases the performance of the query and can
retrieve the result very fastly.
Q. What is a level of Granularity of a fact table?
Level of granularity means level of detail that you put into the fact table in a
data warehouse. For example: Based on design you can decide to put the
sales data in each transaction. Now, level of granularity would mean what
detail you are willing to put for each transactional fact. Product sales with
respect to each minute or you want to aggregate it upto minute and put that
data.
Q. What is session?
A session is a set of instructions to move data from sources to targets.
Q. What is worklet?
Worklet are objects that represent a set of workflow tasks that allow to reuse
a set of workflow logic in several window.
Use of Worklet: You can bind many of the tasks in one place so that they
can easily get identified and also they can be of a specific purpose.
Q. What is workflow?
A workflow is a set of instructions that tells the Informatica server how to
execute the tasks.
The change data thus captured is then made available to the target systems
in a controlled manner.
Q. What is an indicator file? and how it can be used?
Indicator file is used for Event Based Scheduling when you dont know when
the Source Data is available. A shell command, script or a batch file creates
and send this indicator file to the directory local to the Informatica Server.
Server waits for the indicator file to appear before running the session.
3 cc 4 2
So row num changed and it is an active transformation
2. or the order of the row changes
eg: when Union transformation pulls in data, suppose we have
2 sources
sources1:
id name dept row_num
1 aa 4 1
2 bb 3 2
3 cc 4 3
source2:
id name dept row_num
4 aaa 4 4
5 bbb 3 5
6 ccc 4 6
it never restricts the data from any source so the data can
come in any manner
id name dept row_num old row_num
1 aa 4 1 1
4 aaa 4 2 4
5 bbb 3 3 5
2 bb 3 4 2
3 cc 4 5 3
6 ccc 4 6 6
so the row_num are changing . Thus we say that union is an active
transformation
Q. What is use of batch file in informatica? How many types of batch
file in informatica?
With the batch file, we can run sessions either in sequential or in
concurrently.
Grouping of Sessions is known as Batch.
Two types of batches:
1)Sequential: Runs Sessions one after another.
2)Concurrent: Run the Sessions at the same time.
If u have sessions with source-target dependencies u have to go for
sequential batch to start the sessions one after another. If u have several
independent sessions u can use concurrent batches Which run all the
sessions at the same time
Static cache
Dynamic cache
Shared cache
Persistent cache
Normalizer transformation
COBOL sources, joiner
XML source qualifier transformation
XML sources
Target definitions
Pre & Post Session stored procedures
Other mapplets
Mapping parameters
Mapping variables
Session parameters
After that we have to select the properties Tab of Session and Set Parameter
file name including physical path of this yyyy.prm file.
Do following changes in Mapping Tab of Source Qualifier's
Properties section
Attributes
values
Source file Type ---------> Direct
Source File Directory --------> Empty
Source File Name
--------> $InputFileValue1
Q. What is the default data driven operation in informatica?
This is default option for update strategy transformation.
The integration service follows instructions coded in update strategy within
session mapping determine how to flag records for
insert,delete,update,reject. If you do not data driven option setting, the
integration service ignores update strategy transformations in the mapping.
Q. What is threshold error in informatica?
When the target is used by the update strategy DD_REJECT,DD_UPDATE and
some limited count, then if it the number of rejected records exceed the
count then the
session ends with failed status. This error is called Threshold Error.
Q. SO many times i saw "$PM parser error ". What is meant by PM?
PM: POWER MART
1) Parsing error will come for the input parameter to the lookup.
2) Informatica is not able to resolve the input parameter CLASS for your
lookup.
3) Check the Port CLASS exists as either input port or a variable port in your
expression.
4) Check data type of CLASS and the data type of input parameter for your
lookup.
Q. What is a candidate key?
A candidate key is a combination of attributes that can be uniquely used to
identify a database record without any extraneous data (unique). Each table
may have one or more candidate keys. One of these candidate keys is
selected as the table primary key else are called Alternate Key.
Q. What is the difference between Bitmap and Btree index?
Bitmap index is used for repeating values.
ex: Gender: male/female
Account status:Active/Inactive
Btree index is used for unique values.
ex: empid.
Q. What is ThroughPut in Informatica?
Thoughtput is the rate at which power centre server read the rows in bytes
from source or writes the rows in bytes into the target per second.
You can find this option in workflow monitor. Right click on session choose
properties and Source/Target Statictics tab you can find thoughtput
details for each instance of source and target.
Q. What are set operators in Oracle
UNION
UNION ALL
MINUS
INTERSECT
Q. How i can Schedule the Informatica job in "Unix Cron scheduling
tool"?
Crontab
The crontab (cron derives from chronos, Greek for time; tab stands for table)
command, found in Unix and Unix-like operating systems, is used to schedule
commands to be executed periodically. To see what crontabs are currently
running on your system, you can open a terminal and run:
sudo crontab -l
To edit the list of cronjobs you can run:
sudo crontab -e
This will open a the default editor (could be vi or pico, if you want you can
change the default editor) to let us manipulate the crontab. If you save and
exit the editor, all your cronjobs are saved into crontab. Cronjobs are written
in the following format:
* * * * * /bin/execute/this/script.sh
Scheduling explained
As you can see there are 5 stars. The stars represent different date parts in
the following order:
1.
minute (from 0 to 59)
2.
hour (from 0 to 23)
3.
day of month (from 1 to 31)
4.
month (from 1 to 12)
5.
day of week (from 0 to 6) (0=Sunday)
Execute every minute
If you leave the star, or asterisk, it means every. Maybe
Domains
Nodes
Services
Q. WHAT IS VERSIONING?
Its used to keep history of changes done on the mappings and workflows
1. Check in: You check in when you are done with your changes so that
everyone can see those changes.
2. Check out: You check out from the main stream when you want to make
any change to the mapping/workflow.
3. Version history: It will show you all the changes made and who made it.
o - Overflow
n - Null
t - Truncate
When the data is with nulls, or overflow it will be rejected to write the data to
the target
The reject data is stored on reject files. You can check the data and reload
the data in to the target using reject reload utility.
Q. Difference between STOP and ABORT?
Stop - If the Integration Service is executing a Session task when you issue
the stop command, the Integration Service stops reading data. It continues
processing and writing data and committing data to targets. If the
Integration Service cannot finish processing and committing data, you can
issue the abort command.
Abort - The Integration Service handles the abort command for the Session
task like the stop command, except it has a timeout period of 60 seconds. If
the Integration Service cannot finish processing and committing data within
the timeout period, it kills the DTM process and terminates the session.
Q. WHAT IS INLINE VIEW?
An inline view is term given to sub query in FROM clause of query which can
be used as table. Inline view effectively is a named sub query
Ex : Select Tab1.col1,Tab1.col.2,Inview.col1,Inview.Col2
From Tab1, (Select statement) Inview
Where Tab1.col1=Inview.col1
SELECT DNAME, ENAME, SAL FROM EMP ,
(SELECT DNAME, DEPTNO FROM DEPT) D
WHERE A.DEPTNO = B.DEPTNO
In the above query (SELECT DNAME, DEPTNO FROM DEPT) D is the inline
view.
Inline views are determined at runtime, and in contrast to normal view they
are not stored in the data dictionary,
Disadvantage of using this is
1. Separate view need to be created which is an overhead
2. Extra time taken in parsing of view
This problem is solved by inline view by using select statement in sub query
and using that as table.
TABLES
VIEWS
INDEXES
SYNONYMS
SEQUENCES
TABLESPACES
Q. WHAT IS @@ERROR?
The @@ERROR automatic variable returns the error code of the last TransactSQL statement. If there was no error, @@ERROR returns zero. Because
@@ERROR is reset after each Transact-SQL statement, it must be saved to a
variable if it is needed to process it further after checking it.
Q. WHAT IS DIFFERENCE BETWEEN CO-RELATED SUB QUERY AND
NESTED SUB QUERY?
Correlated subquery runs once for each row selected by the outer query. It
contains a reference to a value from the row selected by the outer query.
Nested subquery runs only once for the entire nesting (outer) query. It
does not contain any reference to the outer query row.
For example,
Correlated Subquery:
Select e1.empname, e1.basicsal, e1.deptno from emp e1 where e1.basicsal
= (select max(basicsal) from emp e2 where e2.deptno = e1.deptno)
Nested Subquery:
Select empname, basicsal, deptno from emp where (deptno, basicsal) in
(select deptno, max(basicsal) from emp group by deptno)
Q. HOW DOES ONE ESCAPE SPECIAL CHARACTERS WHEN BUILDING
SQL QUERIES?
The LIKE keyword allows for string searches. The _ wild card character is
used to match exactly one character, % is used to match zero or more
occurrences of any characters. These characters can be escaped in SQL.
Example:
SELECT name FROM emp WHERE id LIKE %\_% ESCAPE \;
Use two quotes for every one displayed. Example:
SELECT Frankss Oracle site FROM DUAL;
SELECT A quoted word. FROM DUAL;
SELECT A double quoted word. FROM DUAL;
Q. How does the server recognize the source and target databases?
If it is relational - By using ODBC connection
FTP connection - By using flat file
Q. WHAT ARE THE DIFFERENT TYPES OF INDEXES SUPPORTED
BY ORACLE?
1.
2.
3.
4.
5.
6.
B-tree index
B-tree cluster index
Hash cluster index
Reverse key index
Bitmap index
Function Based index
Q. TYPES OF NORMALIZER TRANSFORMATION?
There are two types of Normalizer transformation.
VSAM Normalizer transformation
A non-reusable transformation that is a Source Qualifier transformation for a
COBOL source. The Mapping Designer creates VSAM Normalizer columns
from a COBOL source in a mapping. The column attributes are read-only. The
VSAM Normalizer receives a multiple-occurring source column through one
input port.
Pipeline Normalizer transformation
A transformation that processes multiple-occurring data from relational
tables or flat files. You might choose this option when you want to process
multiple-occurring data from another transformation in the mapping.
A VSAM Normalizer transformation has one input port for a multipleoccurring column. A pipeline Normalizer transformation has multiple input
ports for a multiple-occurring column.
When you create a Normalizer transformation in the Transformation
Developer, you create a pipeline Normalizer transformation by default. When
you create a pipeline Normalizer transformation, you define the columns
based on the data the transformation receives from another type of
transformation such as a Source Qualifier transformation.
The Normalizer transformation has one output port for each single-occurring
input port.
Q. WHAT ARE ALL THE TRANSFORMATION YOU USED IF SOURCE AS
XML FILE?
Deallocate cursor
Q. What is the difference between a HAVING CLAUSE and a WHERE
CLAUSE?
1. Specifies a search condition for a group or an aggregate. HAVING can be
used only with the SELECT statement.
2. HAVING is typically used in a GROUP BY clause. When GROUP BY is not
used, HAVING behaves like a WHERE clause.
3. Having Clause is basically used only with the GROUP BY function in a
query. WHERE Clause is applied to each row before they are part of the
GROUP BY function in a query.
RANK CACHE
Sample Rank Mapping
When the Power Center Server runs a session with a Rank transformation, it
compares an input row with rows in the data cache. If the input row out-ranks
a Stored row, the Power Center Server replaces the stored row with the input
row.
Example: Power Center caches the first 5 rows if we are finding top 5
salaried Employees. When 6th row is read, it compares it with 5 rows in
cache and places it in Cache is needed.
1) RANK INDEX CACHE:
The index cache holds group information from the group by ports. If we are
Using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.
All Group By Columns are in RANK INDEX CACHE. Ex. DEPTNO
2) RANK DATA CACHE:
It holds row data until the Power Center Server completes the ranking and is
generally larger than the index cache. To reduce the data cache size, connect
only the necessary input/output ports to subsequent transformations.
All Variable ports if there, Rank Port, All ports going out from RANK
Transformations are stored in RANK DATA CACHE.
Example: All ports except DEPTNO In our mapping example.
Aggregator Caches
1. The Power Center Server stores data in the aggregate cache until it
completes Aggregate calculations.
2. It stores group values in an index cache and row data in the data cache. If
the Power Center Server requires more space, it stores overflow values in
cache files.
Note: The Power Center Server uses memory to process an Aggregator
transformation with sorted ports. It does not use cache memory. We do not
need to configure cache memory for Aggregator transformations that use
sorted ports.
1) Aggregator Index Cache:
The index cache holds group information from the group by ports. If we are
using Group By on DEPTNO, then this cache stores values 10, 20, 30 etc.
JOINER CACHES
Joiner always caches the MASTER table. We cannot disable caching. It builds
Index cache and Data Cache based on MASTER table.
1) Joiner Index Cache:
All Columns of MASTER table used in Join condition are in JOINER INDEX
CACHE.
Example: DEPTNO in our mapping.
2) Joiner Data Cache:
Master column not in join condition and used for output to other
transformation or target table are in Data Cache.
Example: DNAME and LOC in our mapping example.
Unconnected Lookup
Cache Comparison
Persistence and Dynamic Caches
Dynamic
1) When you use a dynamic cache, the Informatica Server updates the
lookup cache as it passes rows to the target.
2) In Dynamic, we can update catch will new data also.
3) Dynamic cache, Not Reusable.
(When we need updated cache data, That only we need Dynamic Cache)
Persistent
1) A Lookup transformation to use a non-persistent or persistent cache. The
PowerCenter Server saves or deletes lookup cache files after a successful
session based on the Lookup Cache Persistent property.
2) Persistent, we are not able to update the catch with new data.
3) Persistent catch is Reusable.
(When we need previous cache data, that only we need Persistent Cache)
View And Materialized View
Star Schema And Snow Flake Schema
Informatica - Transformations
In Informatica, Transformations help to transform the source data according
to the requirements of target system and it ensures the quality of the data
being loaded into target.
Transformations are of two types: Active and Passive.
Active Transformation
An active transformation can change the number of rows that pass through it
from source to target. (i.e) It eliminates rows that do not meet the condition
in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass
through it (i.e) It passes all rows through the transformation.
Transformations can be Connected or Unconnected.
Connected Transformation
Connected transformation is connected to other transformations or directly
to target table in the mapping.
Unconnected Transformation
An unconnected transformation is not connected to other transformations in
the mapping. It is called within another transformation, and returns a value
to that transformation.
Following are the list of Transformations available in Informatica:
Aggregator Transformation
Expression Transformation
Filter Transformation
Joiner Transformation
Lookup Transformation
Normalizer Transformation
Rank Transformation
Router Transformation
Sequence Generator Transformation
Stored Procedure Transformation
Sorter Transformation
Update Strategy Transformation
XML Source Qualifier Transformation
In the following pages, we will explain all the above Informatica
Transformations and their significances in the ETL process in detail.
==============================================
================================
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation.
This transformation is useful to perform calculations such as averages and
sums (mainly to perform calculations on multiple rows or groups).
For example, to calculate total of daily sales or to calculate average of
monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT,
PERCENTILE, MAX, SUM etc. can be used in aggregate transformation.
==============================================
================================
Expression Transformation
Expression transformation is a Passive and Connected transformation.
This can be used to calculate values in a single row before writing to the
target.
For example, to calculate discount of each product
or to concatenate first and last names
or to convert date to a string field.
==============================================
================================
Filter Transformation
Filter transformation is an Active and Connected transformation.
This can be used to filter rows in a mapping that do not meet the condition.
For example,
To know all the employees who are working in Department 10 or
To find out the products that falls between the rate category $500 and
$1000.
==============================================
================================
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be
used to join two sources coming from two different locations or from same
location. For example, to join a flat file and a relational source or to join two
flat files or to join a relational source and a XML source.
In order to join two sources, there must be at least one matching port. While
joining two sources it is a must to specify one source as master and the other
as detail.
The Joiner transformation supports the following types of joins:
1)Normal
2)Master Outer
3)Detail Outer
4)Full Outer
Normal join discards all the rows of data from the master and detail source
that do not match, based on the condition.
Master outer join discards all the unmatched rows from the master source
and keeps all the rows from the detail source and the matching rows from
the master source.
Detail outer join keeps all rows of data from the master source and the
matching rows from the detail source. It discards the unmatched rows from
the detail source.
Full outer join keeps all rows of data from both the master and detail
sources.
==============================================
================================
Lookup transformation
Lookup transformation is Passive and it can be both Connected and
UnConnected as well. It is used to look up data in a relational table, view, or
synonym. Lookup definition can be imported either from source or from
target tables.
For example, if we want to retrieve all the sales of a product with an ID 10
and assume that the sales data resides in another table. Here instead of
using the sales table as one more source, use Lookup transformation to
lookup the data for the product, with ID 10 in sales table.
Connected lookup receives input values directly from mapping pipeline
whereas
Unconnected lookup receives values from: LKP expression from another
transformation.
Connected lookup returns multiple columns from the same row whereas
Unconnected lookup has one return port and returns one column from each
row.
Connected lookup supports user-defined default values whereas
Unconnected lookup does not support user defined values.
==============================================
================================
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation.
It is used mainly with COBOL sources where most of the time data is stored
in de-normalized format.
Also, Normalizer transformation can be used to create multiple rows from a
single row of data.
==============================================
================================
Rank Transformation
Rank transformation is an Active and Connected transformation.
It is used to select the top or bottom rank of data.
For example,
To select top 10 Regions where the sales volume was very high
or
To select 10 lowest priced products.
==============================================
================================
Router Transformation
Router is an Active and Connected transformation. It is similar to filter
transformation.
The only difference is, filter transformation drops the data that do not meet
the condition whereas router has an option to capture the data that do not
meet the condition. It is useful to test multiple conditions.
It has input, output and default groups.
For example, if we want to filter data like where State=Michigan,
State=California, State=New York and all other States. Its easy to route data
to different tables.
==============================================
================================
Sequence Generator Transformation
Sequence Generator transformation is a Passive and Connected
transformation. It is used to create unique primary key values or cycle
through a sequential range of numbers or to replace missing keys.
It has two output ports to connect transformations. By default it has two
fields CURRVAL and NEXTVAL (You cannot add ports to this transformation).
NEXTVAL port generates a sequence of numbers by connecting it to a
transformation or target. CURRVAL is the NEXTVAL value plus one or
NEXTVAL plus the Increment By value.
==============================================
================================
Sorter Transformation
Sorter transformation is a Connected and an Active transformation.
It allows sorting data either in ascending or descending order according to a
specified field.
Also used to configure for case-sensitive sorting, and specify whether the
output rows should be distinct.
==============================================
================================
Source Qualifier Transformation
Source Qualifier transformation is an Active and Connected transformation.
When adding a relational or a flat file source definition to a mapping, it is
must to connect it to a Source Qualifier transformation.
The Source Qualifier performs the various tasks such as
Overriding Default SQL query,
Filtering records;
join data from two or more tables etc.
==============================================
================================
Stored Procedure Transformation
Stored Procedure transformation is a Passive and Connected &
Unconnected transformation. It is useful to automate time-consuming tasks
and it is also used in error handling, to drop and recreate indexes and to
determine the space in database, a specialized calculation etc.
The stored procedure must exist in the database before creating a Stored
Procedure transformation, and the stored procedure can exist in a source,
target, or any database with a valid connection to the Informatica Server.
Stored Procedure is an executable script with SQL statements and control
statements, user-defined variables and conditional statements.
==============================================
================================
Update Strategy Transformation
Update strategy transformation is an Active and Connected transformation.
It is used to update data in target table, either to maintain history of data or
recent changes.
You can specify how to treat source rows in table, insert, update, delete or
data driven.
==============================================
================================
XML Source Qualifier Transformation
Constraint-Based Loading
In the Workflow Manager, you can specify constraint-based loading for a
session. When you select this option, the Integration Service orders the
target load on a row-by-row basis. For every row generated by an active
source, the Integration Service loads the corresponding transformed row first
to the primary key table, then to any foreign key tables. Constraint-based
loading depends on the following requirements:
Active source: Related target tables must have the same active source.
Key relationships: Target tables must have key relationships.
Target connection groups: Targets must be in one target connection group.
Treat rows as insert. Use this option when you insert into the target. You
cannot use updates with constraint based loading.
Active Source:
When target tables receive rows from different active sources, the
Integration Service reverts to normal loading for those tables, but loads all
other targets in the session using constraint-based loading when possible.
For example, a mapping contains three distinct pipelines. The first two
contain a source, source qualifier, and target. Since these two targets receive
data from different active sources, the Integration Service reverts to normal
loading for both targets. The third pipeline contains a source, Normalizer, and
two targets. Since these two targets share a single active source (the
Normalizer), the Integration Service performs constraint-based loading:
loading the primary key table first, then the foreign key table.
Key Relationships:
When target tables have no key relationships, the Integration Service does
not perform constraint-based loading.
Similarly, when target tables have circular key relationships, the Integration
Service reverts to a normal load. For example, you have one target
containing a primary key and a foreign key related to the primary key in a
second target. The second target also contains a foreign key that references
the primary key in the first target. The Integration Service cannot enforce
constraint-based loading for these tables. It reverts to a normal load.
Target Connection Groups:
The Integration Service enforces constraint-based loading for targets in the
same target connection group. If you want to specify constraint-based
loading for multiple targets that receive data from the same active source,
you must verify the tables are in the same target connection group. If the
tables with the primary key-foreign key relationship are in different target
connection groups, the Integration Service cannot enforce constraint-based
loading when you run the workflow. To verify that all targets are in the same
target connection group, complete the following tasks:
Verify all targets are in the same target load order group and receive data
from the same active source.
Use the default partition properties and do not add partitions or partition
points.
Define the same target type for all targets in the session properties.
Define the same database connection name for all targets in the session
properties.
Choose normal mode for the target load type for all targets in the session
properties.
Treat Rows as Insert:
Use constraint-based loading when the session option Treat Source Rows As
is set to insert. You might get inconsistent data if you select a different Treat
Source Rows As option and you configure the session for constraint-based
loading.
When the mapping contains Update Strategy transformations and you need
to load data to a primary key table first, split the mapping using one of the
following options:
Load primary key table in one mapping and dependent tables in another
mapping. Use constraint-based loading to load the primary table.
Perform inserts in one mapping and updates in another mapping.
Constraint-based loading does not affect the target load ordering of the
mapping. Target load ordering defines the order the Integration Service reads
the sources in each target load order group in the mapping. A target load
order group is a collection of source qualifiers, transformations, and targets
linked together in a mapping. Constraint based loading establishes the order
in which the Integration Service loads individual targets within a set of
targets receiving data from a single source qualifier.
Example
The following mapping is configured to perform constraint-based loading:
In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain
foreign keys referencing the T1 primary key. T_3 has a primary key that T_4
references as a foreign key.
Since these tables receive records from a single active source, SQ_A,
theIntegration Service loads rows to the target in the following
order:
1. T_1
2. T_2 and T_3 (in no particular order)
3. T_4
The Integration Service loads T_1 first because it has no foreign key
dependencies and contains a primary key referenced by T_2 and T_3. The
Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no
dependencies, they are not loaded in any particular order. The Integration
Service loads T_4 last, because it has a foreign key that references a primary
key in T_3.After loading the first set of targets, the Integration Service begins
reading source B. If there are no key relationships between T_5 and T_6, the
Integration Service reverts to a normal load for both targets.
If T_6 has a foreign key that references a primary key in T_5, since T_5 and
T_6 receive data from a single active source, the Aggregator AGGTRANS, the
Integration Service loads rows to the tables in the following order:
T_5
T_6
T_1, T_2, T_3, and T_4 are in one target connection group if you use the
same database connection for each target, and you use the default partition
properties. T_5 and T_6 are in another target connection group together if
you use the same database connection for each target and you use the
default partition properties. The Integration Service includes T_5 and T_6 in a
different target connection group because they are in a different target load
order group from the first four targets.
Enabling Constraint-Based Loading:
When you enable constraint-based loading, the Integration Service orders
the target load on a row-by-row basis. To enable constraint-based loading:
1. In the General Options settings of the Properties tab, choose Insert for the
Treat Source Rows As property.
2. Click the Config Object tab. In the Advanced settings, select Constraint
Based Load Ordering.
3. Click OK.
Update Strategy
Initial and Default Value:
When we declare a mapping parameter or variable in a mapping or a
mapplet, we can enter an initial value. When the Integration Service needs
an initial value, and we did not declare an initial value for the parameter or
variable, the Integration Service uses a default value based on the data type
of the parameter or variable.
Data ->Default Value
Numeric ->0
String ->Empty String
Date time ->1/1/1
Variable Values: Start value and current value of a mapping variable
Start Value:
The start value is the value of the variable at the start of the session. The
Integration Service looks for the start value in the following order:
Value in parameter file
Value saved in the repository
Initial value
Default value
Current Value:
The current value is the value of the variable as the session progresses.
When a session starts, the current value of a variable is the same as the start
value. The final current value for a variable is saved to the repository at the
end of a successful session. When a session fails to complete, the Integration
Service does not update the value of the variable in the repository.
Note: If a variable function is not used to calculate the current value of a
mapping variable, the start value of the variable is saved to the repository.
Variable Data type and Aggregation Type When we declare a mapping
variable in a mapping, we need to configure the Data type and aggregation
type for the variable. The IS uses the aggregate type of a Mapping variable
to determine the final current value of the mapping variable.
Aggregation types are:
Count: Integer and small integer data types are valid only.
Max: All transformation data types except binary data type are valid.
Min: All transformation data types except binary data type are valid.
Variable Functions
Variable functions determine how the Integration Service calculates the
current value of a mapping variable in a pipeline.
SetMaxVariable: Sets the variable to the maximum value of a group of
values. It ignores rows marked for update, delete, or reject. Aggregation type
set to Max.
SetMinVariable: Sets the variable to the minimum value of a group of
values. It ignores rows marked for update, delete, or reject. Aggregation type
set to Min.
SetCountVariable: Increments the variable value by one. It adds one to the
variable value when a row is marked for insertion, and subtracts one when
the row is Marked for deletion. It ignores rows marked for update or reject.
Aggregation type set to Count.
SetVariable: Sets the variable to the configured value. At the end of a
session, it compares the final current value of the variable to the start value
of the variable. Based on the aggregate type of the variable, it saves a final
value to the repository.
Creating Mapping Parameters and Variables
In the Mapping Designer, click Mappings > Parameters and Variables. -orIn the Mapplet Designer, click Mapplet > Parameters and Variables.
Select Type and Data type. Select Aggregation type for mapping
variables.
17. Open Expression editor for out_min_var and write the following
expression:
SETMINVARIABLE($$var_min,SAL). Validate the expression.
18. Open Expression editor for out_count_var and write the following
expression:
SETCOUNTVARIABLE($$var_count). Validate the expression.
19. Open Expression editor for out_set_var and write the following
expression:
SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.
20. Click OK. Expression Transformation below:
21. Link all ports from expression to target and Validate Mapping and Save it.
22. See mapping picture on next page.
PARAMETER FILE
A parameter file is a list of parameters and associated values for a workflow,
worklet, or session.
Parameter files provide flexibility to change these variables each time we run
a workflow or session.
We can create multiple parameter files and change the file we use for a
session or workflow. We can create a parameter file using a text editor such
as WordPad or Notepad.
Enter the parameter file name and directory in the workflow or session
properties.
A parameter file contains the following types of parameters and variables:
Workflow variable: References values and records information in a
workflow.
Worklet variable: References values and records information in a worklet.
Use predefined worklet variables in a parent workflow, but we cannot use
workflow variables from the parent workflow in a worklet.
Session parameter: Defines a value that can change from session to
session, such as a database connection or file name.
Mapping parameter and Mapping variable
USING A PARAMETER FILE
Parameter files contain several sections preceded by a heading. The heading
identifies the Integration Service, Integration Service process, workflow,
worklet, or session to which we want to assign parameters or variables.
Make session and workflow.
Give connection information for source and target table.
Run workflow and see result.
Sample Parameter File for Our example:
In the parameter file, folder and session names are case sensitive.
Create a text file in notepad with name Para_File.txt
[Practice.ST:s_m_MP_MV_Example]
$$Bonus=1000
$$var_max=500
$$var_min=1200
$$var_count=0
CONFIGURING PARAMTER FILE
We can specify the parameter file name and directory in the workflow or
session properties.
To enter a parameter file in the workflow properties:
1. Open a Workflow in the Workflow Manager.
2. Click Workflows > Edit.
3. Click the Properties tab.
4. Enter the parameter directory and name in the Parameter Filename field.
5. Click OK.
To enter a parameter file in the session properties:
1. Open a session in the Workflow Manager.
2. Click the Properties tab and open the General Options settings.
3. Enter the parameter directory and name in the Parameter Filename field.
4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt
5. Click OK.
Mapplet
A mapplet is a reusable object that we create in the Mapplet Designer.
It contains a set of transformations and lets us reuse that transformation
logic in multiple mappings.
Created in Mapplet Designer in Designer Tool.
We need to use same set of 5 transformations in say 10 mappings. So
instead of making 5 transformations in every 10 mapping, we create a
mapplet of these 5 transformations. Now we use this mapplet in all 10
mappings. Example: To create a surrogate key in target. We create a mapplet
using a stored procedure to create Primary key for target table. We give
target table name and key column name as input to mapplet and get the
Surrogate key as output.
Mapplets help simplify mappings in the following ways:
Include source definitions: Use multiple source definitions and source
qualifiers to provide source data for a mapping.
Accept data from sources in a mapping
Include multiple transformations: As many transformations as we need.
Pass data to multiple transformations: We can create a mapplet to feed
data to multiple transformations. Each Output transformation in a mapplet
represents one output group in a mapplet.
Contain unused ports: We do not have to connect all mapplet input and
output ports in a mapping.
Mapplet Input:
Mapplet input can originate from a source definition and/or from an Input
transformation in the mapplet. We can create multiple pipelines in a
mapplet.
We use Mapplet Input transformation to give input to mapplet.
Use of Mapplet Input transformation is optional.
Mapplet Output:
The output of a mapplet is not connected to any target table.
We must use Mapplet Output transformation to store mapplet output.
A mapplet must contain at least one Output transformation with at least one
connected port in the mapplet.
Example1: We will join EMP and DEPT table. Then calculate total salary. Give
the output to mapplet out transformation.
EMP and DEPT will be source tables.
Output will be given to transformation Mapplet_Out.
Steps:
Open folder where we want to create the mapping.
Click Tools -> Mapplet Designer.
Click Mapplets-> Create-> Give name. Ex: mplt_example1
Drag EMP and DEPT table.
Use Joiner transformation as described earlier to join them.
Transformation -> Create -> Select Expression for list -> Create -> Done
Pass all ports from joiner to expression and then calculate total salary as
described in expression transformation.
Now Transformation -> Create -> Select Mapplet Out from list > Create
-> Give name and then done.
Pass all ports from expression to Mapplet output.
Mapplet -> Validate
Repository -> Save
Use of mapplet in mapping:
We can mapplet in mapping by just dragging the mapplet from mapplet
folder on left pane as we drag source and target tables.
When we use the mapplet in a mapping, the mapplet object displays only the
ports from the Input and Output transformations. These are referred to as the
mapplet input and mapplet output ports.
Make sure to give correct connection information in session.
Making a mapping: We will use mplt_example1, and then create a filter
transformation to filter records whose Total Salary is >= 1500.
mplt_example1 will be source.
Create target table same as Mapplet_out transformation as in picture
above. Creating Mapping
Open folder where we want to create the mapping.
Click Tools -> Mapping Designer.
Click Mapping-> Create-> Give name. Ex: m_mplt_example1
Drag mplt_Example1 and target table.
Transformation -> Create -> Select Filter for list -> Create -> Done.
Drag all ports from mplt_example1 to filter and give filter condition.
Connect all ports from filter to target. We can add more transformations
after filter if needed.
Validate mapping and Save it.
Make session and workflow.
Give connection information for mapplet source tables.
Give connection information for target table.
Run workflow and see result.
Sample:
D:\EMP1.txt
E:\EMP2.txt
E:\FILES\DWH\EMP3.txt and so on
3. Now make a session and in Source file name and Source File Directory
location fields, give the name and location of above created file.
4. In Source file type field, select Indirect.
5. Click Apply.
6. Validate Session
7. Make Workflow. Save it to repository and run.
Incremental Aggregation
When we enable the session option-> Incremental Aggregation the
Integration Service performs incremental aggregation, it passes source data
through the mapping and uses historical cache data to perform aggregation
calculations incrementally.
When using incremental aggregation, you apply captured changes in the
source to aggregate calculations in a session. If the source changes
incrementally and you can capture changes, you can configure the session to
process those changes. This allows the Integration Service to update the
target incrementally, rather than forcing it to process the entire source and
recalculate the same data each time you run the session.
For example, you might have a session using a source that receives new data
every day. You can capture those incremental changes because you have
added a filter condition to the mapping that removes pre-existing data from
the flow of data. You then enable incremental aggregation.
When the session runs with incremental aggregation enabled for the first
time on March 1, you use the entire source. This allows the Integration
Service to read and store the necessary aggregate data. On March 2, when
you run the session again, you filter out all the records except those timestamped March 2. The Integration Service then processes the new data and
updates the target accordingly. Consider using incremental aggregation in
the following circumstances:
You can capture new source data. Use incremental aggregation when you
can capture new source data each time you run the session. Use a Stored
Procedure or Filter transformation to process new data.
Incremental changes do not significantly change the target. Use incremental
aggregation when the changes do not significantly change the target. If
processing the incrementally changed source alters more than half the
existing target, the session may not benefit from using incremental
aggregation. In this case, drop the table and recreate the target with
complete source data.
Note: Do not use incremental aggregation if the mapping contains percentile
or median functions. The Integration Service uses system memory to process
these functions in addition to the cache memory you configure in the session
properties. As a result, the Integration Service does not store incremental
aggregation values for percentile and median functions in disk caches.
Integration Service Processing for Incremental Aggregation
(i)The first time you run an incremental aggregation session, the Integration
Service processes the entire source. At the end of the session, the
Integration Service stores aggregate data from that session run in two files,
the index file and the data file. The Integration Service creates the files in the
cache directory specified in the Aggregator transformation properties.
(ii)Each subsequent time you run the session with incremental aggregation,
you use the incremental source changes in the session. For each input
record, the Integration Service checks historical information in the index file
for a corresponding group. If it finds a corresponding group, the Integration
The index and data files grow in proportion to the source data. Be sure the
cache directory has enough disk space to store historical data for the
session.
When you run multiple sessions with incremental aggregation, decide where
you want the files stored. Then, enter the appropriate directory for the
process variable, $PMCacheDir, in the Workflow Manager. You can enter
session-specific directories for the index and data files. However, by using
the process variable for all sessions using incremental aggregation, you can
easily change the cache directory when necessary by changing
$PMCacheDir.
Changing the cache directory without moving the files causes the Integration
Service to reinitialize the aggregate cache and gather new aggregate data.
In a grid, Integration Services rebuild incremental aggregation files they
cannot find. When an Integration Service rebuilds incremental aggregation
files, it loses aggregate history.
(ii) Verify the incremental aggregation settings in the session
properties.
You can configure the session for incremental aggregation in the
Performance settings on the Properties tab.
You can also configure the session to reinitialize the aggregate cache. If you
choose to reinitialize the cache, the Workflow Manager displays a warning
indicating the Integration Service overwrites the existing cache and a
reminder to clear this option after running the session.
1.
2.
3.
4.
5.
6.
7.
8.
9.
TASKS
The Workflow Manager contains many types of tasks to help you build
workflows and worklets. We can create reusable tasks in the Task Developer.
Types of tasks:
Task Type
Tool where task Reusable or not
can be created
Session
Task Developer Yes
Email
Workflow
Yes
Designer
Command
Worklet
Yes
Designer
Event-Raise
Workflow
No
Designer
Event-Wait
Worklet
No
Designer
Timer
No
Decision
No
Assignment
No
Control
No
SESSION TASK
A session is a set of instructions that tells the Power Center Server how and
when to move data from sources to targets.
To run a session, we must first create a workflow to contain the Session task.
We can run as many sessions in a workflow as we need. We can run the
Session tasks sequentially or concurrently, depending on our needs.
The Power Center Server creates several files and in-memory caches
depending on the transformations and options used in the session.
EMAIL TASK
The Workflow Manager provides an Email task that allows us to send email
during a workflow.
Created by Administrator usually and we just drag and use it in our mapping.
Steps:
In the Task Developer or Workflow Designer, choose Tasks-Create.
Select an Email task and enter a name for the task. Click Create.
Click Done.
Double-click the Email task in the workspace. The Edit Tasks dialog box
appears.
Click the Properties tab.
Enter the fully qualified email address of the mail recipient in the Email User
Name field.
Enter the subject of the email in the Email Subject field. Or, you can leave
this field blank.
Click the Open button in the Email Text field to open the Email Editor.
Click OK twice to save your changes.
5. Double click link between Session and Command and give condition in
editor as
6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED
7. Workflow-> Validate
8. Repository > Save
WORKING WITH EVENT TASKS
We can define events in the workflow to specify the sequence of task
execution.
Types of Events:
Pre-defined event: A pre-defined event is a file-watch event. This event
Waits for a specified file to arrive at a given location.
User-defined event: A user-defined event is a sequence of tasks in the
Workflow. We create events and then raise them as per need.
Steps for creating User Defined Event:
1. Open any workflow where we want to create an event.
2. Click Workflow-> Edit -> Events tab.
3. Click to Add button to add events and give the names as per need.
4. Click Apply -> Ok. Validate the workflow and Save it.
Types of Events Tasks:
EVENT RAISE: Event-Raise task represents a user-defined event. We use
this task to raise a user defined event.
EVENT WAIT: Event-Wait task waits for a file watcher event or user defined
event to occur before executing the next session in the workflow.
Example1: Use an event wait task and make sure that session
s_filter_example runs when abc.txt file is present in D:\FILES folder.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.
2. Task -> Create -> Select Event Wait. Give name. Click create and done.
3. Link Start to Event Wait task.
4. Drag s_filter_example to workspace and link it to event wait task.
5. Right click on event wait task and click EDIT -> EVENTS tab.
6. Select Pre Defined option there. In the blank space, give directory and
filename to watch. Example: D:\FILES\abc.tct
7. Workflow validate and Repository Save.
Example 2: Raise a user defined event when session s_m_filter_example
succeeds. Capture this event in event wait task and run session
S_M_TOTAL_SAL_EXAMPLE
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.
2. Workflow -> Edit -> Events Tab and add events EVENT1 there.
3. Drag s_m_filter_example and link it to START task.
4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name
5. ER_Example. Click Create and then done. Link ER_Example to
s_m_filter_example.
6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User
Defined Event and Select EVENT1 from the list displayed. Apply -> OK.
WF
Abort Top-Level Aborts the workflow that is running.
WF
Example: Drag any 3 sessions and if anyone fails, then Abort the top level
workflow.
Steps for creating workflow:
1. Workflow -> Create -> Give name wf_control_task_example -> Click ok.
2. Drag any 3 sessions to workspace and link all of them to START task.
3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.
4. Click Create and then done.
5. Link all sessions to the control task cntr_task.
6. Double click link between cntr_task and any session say s_m_filter_example
and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.
7. Repeat above step for remaining 2 sessions also.
8. Right click cntr_task-> EDIT -> GENERAL tab. Set Treat Input Links As to
OR. Default is AND.
9. Go to PROPERTIES tab of cntr_task and select the value Fail top level
10.
Workflow for Control Option. Click Apply and OK.
11.
Workflow Validate and repository Save.
Run workflow and see the result.
ASSIGNMENT TASK
The Assignment task allows us to assign a value to a user-defined workflow
variable.
See Workflow variable topic to add user defined variables.
To use an Assignment task in the workflow, first create and add the
Scheduler
We can schedule a workflow to run continuously, repeat at a given time or
interval, or we can manually start a workflow. The Integration Service runs a
scheduled workflow as configured.
By default, the workflow runs on demand. We can change the schedule
settings by editing the scheduler. If we change schedule settings, the
Integration Service reschedules the workflow according to the new settings.
Steps:
1.
Open the folder where we want to create the scheduler.
2.
In the Workflow Designer, click Workflows > Schedulers.
3.
Click Add to add a new scheduler.
4.
In the General tab, enter a name for the scheduler.
5.
Configure the scheduler settings in the Scheduler tab.
6.
Click Apply and OK.
Configuring Scheduler Settings
Configure the Schedule tab of the scheduler to set run options, schedule
options, start options, and end options for the schedule.
There are 3 run options:
1.
2.
3.
Run on Demand
Run Continuously
Run on Server initialization
1. Run on Demand:
Integration Service runs the workflow when we start the workflow manually.
2. Run Continuously:
Integration Service runs the workflow as soon as the service initializes. The
Integration Service then starts the next run of the workflow as soon as it
finishes the previous run.
3. Run on Server initialization
Integration Service runs the workflow as soon as the service is initialized. The
Integration Service then starts the next run of the workflow according to
settings in Schedule Options.
Schedule options for Run on Server initialization:
Start Date
Start Time
End options for Run on Server initialization:
End After: IS stops scheduling the workflow after the set number of
Workflow runs.
8.
Browser dialog box.
9.
Click Ok.
Points to Ponder:
database might convert the number to 1.2E9. The two sets of characters
represent the same value. However, if you require the characters in the
format 1234567890, you can disable pushdown optimization.
Precision. The Integration Service and a database can have different
precision for particular datatypes. Transformation datatypes use a default
numeric precision that can vary from the native datatypes. For example, a
transformation Decimal datatype has a precision of 1-28. The corresponding
Teradata Decimal datatype has a precision of 1-18. The results can vary if the
database uses a different precision than the Integration Service.
Using ODBC Drivers
When you use native drivers for all databases, except Netezza, the
Integration Service generates SQL statements using native database SQL.
When you use ODBC drivers, the Integration Service usually cannot detect
the database type. As a result, it generates SQL statements using ANSI SQL.
The Integration Service can generate more functions when it generates SQL
statements using the native language than ANSI SQL.
Note: Although the Integration Service uses an ODBC driver for the Netezza
database, the Integration Service detects that the database is Netezza and
generates native database SQL when pushing the transformation logic to the
Netezza database.
In some cases, ANSI SQL is not compatible with the database syntax. The
following sections describe problems that you can encounter when you use
ODBC drivers. When possible, use native drivers to prevent these problems.
database does not truncate the lookup results based on subsecond precision.
For example, you configure the Lookup transformation to show subsecond
precision to the millisecond. If the lookup result is 8:20:35.123456, a
database returns 8:20:35.123456, but the Integration Service returns
8:20:35.123.
SYSDATE built-in variable. When you use the SYSDATE built-in variable,
the Integration Service returns the current date and time for the node
running the service process. However, when you push the transformation
logic to the database, the SYSDATE variable returns the current date and
time for the machine hosting the database. If the time zone of the machine
hosting the database is not the same as the time zone of the machine
running the Integration Service process, the results can vary.
I have listed the following informatica scenarios which are frequently
asked in the informatica interviews. These informatica scenario interview
questions helps you a lot in gaining confidence in interviews.
1. How to generate sequence numbers using expression transformation?
Solution:
In the expression transformation, create a variable port and increment it by
1. Then assign the variable port to an output port. In the expression
transformation, the ports are:
V_count=V_count+1
O_count=V_count
2. Design a mapping to load the first 3 rows from a flat file into a target?
Solution:
You have to assign row numbers to each record. Generate the row numbers
either using the expression transformation as mentioned above or use
sequence generator transformation.
Then pass the output to filter transformation and specify the filter condition
as O_count <=3
3. Design a mapping to load the last 3 rows from a flat file into a target?
Solution:
Consider the source has the following data.
col
a
b
c
d
e
Step1: You have to assign row numbers to each record. Generate the row
numbers using the expression transformation as mentioned above and call
the row number generated port as O_count. Create a DUMMY output port in
the same expression transformation and assign 1 to that port. So that, the
DUMMY output port always return 1 for each row.
In the expression transformation, the ports are
V_count=V_count+1
O_count=V_count
O_dummy=1
The output of expression transformation will be
col, o_count, o_dummy
a, 1, 1
b, 2, 1
c, 3, 1
d, 4, 1
e, 5, 1
Step2: Pass the output of expression transformation to aggregator and do
not specify any group by condition. Create an output port O_total_records in
the aggregator and assign O_count port to it. The aggregator will return the
last row by default. The output of aggregator contains the DUMMY port which
has value 1 and O_total_records port which has the value of total number of
records in the source.
In the aggregator transformation, the ports are
O_dummy
O_count
O_total_records=O_count
The output of aggregator transformation will be
O_total_records, O_dummy
5, 1
Step3: Pass the output of expression transformation, aggregator
transformation to joiner transformation and join on the DUMMY port. In the
joiner transformation check the property sorted input, then only you can
connect both expression and aggregator to joiner transformation.
C
B
D
B
Q1. Design a mapping to load all unique products in one table and the
duplicate rows in another table.
The first table should contain the following output
A
D
The second target should contain the following output
B
B
B
C
C
Solution:
Use sorter transformation and sort the products data. Pass the output to an
expression transformation and create a dummy port O_dummy and assign 1
to that port. So that, the DUMMY output port always return 1 for each row.
The output of expression transformation will be
Product, O_dummy
A, 1
B, 1
B, 1
B, 1
C, 1
C, 1
D, 1
Pass the output of expression transformation to an aggregator
transformation. Check the group by on product port. In the aggreagtor,
create an output port O_count_of_each_product and write an expression
count(product).
The output of aggregator will be
Product, O_count_of_each_product
A, 1
B, 3
C, 2
D, 1
Now pass the output of expression transformation, aggregator
transformation to joiner transformation and join on the products port. In the
joiner transformation check the property sorted input, then only you can
connect both expression and aggregator to joiner transformation.
The output of joiner will be
product, O_dummy, O_count_of_each_product
A, 1, 1
B, 1, 3
B, 1, 3
B, 1, 3
C, 1, 2
C, 1, 2
D, 1, 1
Now pass the output of joiner to a router transformation, create one group
and specify the group condition as O_dummy=O_count_of_each_product.
Then connect this group to one table. Connect the output of default group to
another table.
Q2. Design a mapping to load each product once into one table and the
remaining products which are duplicated into another table.
The first table should contain the following output
A
B
C
D
The second table should contain the following output
B
B
C
Solution:
Use sorter transformation and sort the products data. Pass the output to an
expression transformation and create a variable port,V_curr_product, and
assign product port to it. Then create a V_count port and in the expression
editor write IIF(V_curr_product=V_prev_product, V_count+1,1). Create one
more variable port V_prev_port and assign product port to it. Now create an
output port O_count port and assign V_count port to it.
Q2. Design a mapping to get the pervious row salary for the current row. If
there is no pervious row exists for the current row, then the pervious row
salary should be displayed as null.
The output should look like as
employee_id, salary, pre_row_salary
10, 1000, Null
20, 2000, 1000
30, 3000, 2000
40, 5000, 3000
Solution:
Connect the source Qualifier to expression transformation. In the expression
transformation, create a variable port V_count and increment it by one for
each row entering the expression transformation. Also create V_salary
variable port and assign the expression IIF(V_count=1,NULL,V_prev_salary)
to it . Then create one more variable port V_prev_salary and assign Salary to
it. Now create output port O_prev_salary and assign V_salary to it. Connect
the expression transformation to the target ports.
In the expression transformation, the ports will be
employee_id
salary
V_count=V_count+1
V_salary=IIF(V_count=1,NULL,V_prev_salary)
V_prev_salary=salary
O_prev_salary=V_salary
Q3. Design a mapping to get the next row salary for the current row. If there
is no next row for the current row, then the next row salary should be
displayed as null.
The output should look like as
Q4. Design a mapping to find the sum of salaries of all employees and this
sum should repeat for all the rows.
The output should look like as
employee_id, salary, salary_sum
10, 1000, 11000
20, 2000, 11000
30, 3000, 11000
40, 5000, 11000
Solution:
Step1: Connect the source qualifier to the expression transformation. In the
expression transformation, create a dummy port and assign value 1 to it.
In the expression transformation, the ports will be
employee_id
salary
O_dummy=1
Step2: Pass the output of expression transformation to aggregator. Create a
new port O_sum_salary and in the expression editor write SUM(salary). Do
not specify group by on any port.
In the aggregator transformation, the ports will be
salary
O_dummy
O_sum_salary=SUM(salary)
Step3: Pass the output of expression transformation, aggregator
transformation to joiner transformation and join on the DUMMY port. In the
joiner transformation check the property sorted input, then only you can
connect both expression and aggregator to joiner transformation.
Step4: Pass the output of joiner to the target table.
10,
10,
20,
10,
10,
20,
20,
A
D
P
B
C
Q
S
Q1. Design a mapping to load a target table with the following values from
the above source?
department_no, employee_list
10, A
10, A,B
10, A,B,C
10, A,B,C,D
20, A,B,C,D,P
20, A,B,C,D,P,Q
20, A,B,C,D,P,Q,R
20, A,B,C,D,P,Q,R,S
Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then pass the output to the expression transformation. In
the expression transformation, the ports will be
department_no
employee_name
V_employee_list =
IIF(ISNULL(V_employee_list),employee_name,V_employee_list||','||
employee_name)
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.
Q2. Design a mapping to load a target table with the following values from
the above source?
department_no, employee_list
10, A
10,
10,
10,
20,
20,
20,
20,
A,B
A,B,C
A,B,C,D
P
P,Q
P,Q,R
P,Q,R,S
Solution:
Step1: Use a sorter transformation and sort the data using the sort key as
department_no and then pass the output to the expression transformation. In
the expression transformation, the ports will be
department_no
employee_name
V_curr_deptno=department_no
V_employee_list = IIF(V_curr_deptno! =
V_prev_deptno,employee_name,V_employee_list||','||employee_name)
V_prev_deptno=department_no
O_employee_list = V_employee_list
Step2: Now connect the expression transformation to a target table.
Q3. Design a mapping to load a target table with the following values from
the above source?
department_no, employee_names
10, A,B,C,D
20, P,Q,R,S
Solution:
The first step is same as the above problem. Pass the output of expression to
an aggregator transformation and specify the group by as department_no.
Now connect the aggregator transformation to a target table.
Informatica Scenario Based Questions - Part 3
1. Consider the following product types data as the source.
Product_id, product_type
10, video
10, Audio
20, Audio
30, Audio
40, Audio
50, Audio
10, Movie
20, Movie
30, Movie
40, Movie
50, Movie
60, Movie
Assume that there are only 3 product types are available in the source. The
source contains 12 records and you dont know how many products are
available in each product type.
2. Design a mapping to convert column data into row data without using the
normalizer transformation.
The source data looks like
col1, col2, col3
a, b, c
d, e, f
Step2: In the expression transformation, create the ports and assign the
expressions as mentioned below.
id
value
V_curr_id=id
V_count= IIF(v_curr_id=V_prev_id,V_count+1,1)
V_prev_id=id
O_col1= IIF(V_count=1,value,NULL)
O_col2= IIF(V_count=2,value,NULL)
O_col3= IIF(V_count=3,value,NULL)
Step3: Connect the expression transformation to aggregator transformation.
In the aggregator transforamtion, create the ports and assign the
expressions as mentioned below.
id (specify group by on this port)
O_col1
O_col2
O_col3
col1=MAX(O_col1)
col2=MAX(O_col2)
col3=MAX(O_col3)
Stpe4: Now connect the ports id, col1, col2, col3 from aggregator
transformation to the target table.
Informatica Scenario Based Questions - Part 4
Take a look at the following tree structure diagram. From the tree structure,
you can easily derive the parent-child relationship between the elements. For
example, B is parent of D and E.
This will sort the source data in ascending order. So that we will get the
numbers in sequence as 1, 2, 3, ....1000
STEP2: Connect the Source Qualifier Transformation to the Expression
Transformation. In the Expression Transformation, create three variable ports
and one output port. Assign the expressions to the ports as shown below.
Ports in Expression Transformation:
id
v_sum = v_prev_val1 + v_prev_val2
v_prev_val1 = IIF(id=1 or id=2,1, IIF(v_sum = id, v_prev_val2, v_prev_val1) )
v_prev_val2 = IIF(id=1 or id =2, 2, IIF(v_sum=id, v_sum, v_prev_val2) )
o_flag = IIF(id=1 or id=2,1, IIF( v_sum=id,1,0) )
STEP3: Now connect the Expression Transformation to the Filter
Transformation and specify the Filter Condition as o_flag=1
STEP4: Connect the Filter Transformation to the Target Table.
Q2. The source table contains two columns "id" and "val". The source data
looks like as below
id
1
2
3
val
a,b,c
pq,m,n
asz,ro,liqt
Here the "val" column contains comma delimited data and has three fields in
that column.
Create a workflow to split the fields in val column to separate rows. The
output should look like as below.
id
1
1
1
2
2
2
3
3
3
val
a
b
c
pq
m
n
asz
ro
liqt
Solution:
STEP1: Connect three Source Qualifier transformations to the Source
Definition
STEP2: Now connect all the three Source Qualifier transformations to
the Union Transformation. Then connect the Union Transformation to
the Sorter Transformation. In the sorter transformation sort the data
based on Id port in ascending order.
STEP3: Pass the output of Sorter Transformation to the Expression
Transformation. The ports in Expression Transformation are:
id (input/output port)
val (input port)
v_currend_id (variable port) = id
v_count (variable port) = IIF(v_current_id!=v_previous_id,1,v_count+1)
v_previous_id (variable port) = id
o_val (output port) = DECODE(v_count, 1,
SUBSTR(val, 1, INSTR(val,',',1,1)-1 ),
2,
SUBSTR(val, INSTR(val,',',1,1)+1, INSTR(val,',',1,2)INSTR(val,',',1,1)-1),
3,
SUBSTR(val, INSTR(val,',',1,2)+1),
NULL
)
STEP4: Now pass the output of Expression Transformation to the
Target definition. Connect id, o_val ports of Expression Transformation
to the id, val ports of Target Definition.
For those who are interested to solve this problem in oracle sql, Click
Here. The oracle sql query provides a dynamic solution where the "val"
column can have varying number of fields in each row.
I have the products table as the source and the data of the products table is
shown below.
0
4
Now i want to duplicate or repeat each product in the source table as many
times as the value in the quantity column. The output is
product Quantity
---------------Iphone 3
Iphone 3
Iphone 3
Nokia
Nokia
Nokia
Nokia
if (!isNull("quantity"))
{
double cnt = quantity;
for (int i = 1; i <= quantity; i++)
{
product = product;
quantity = quantity;
generateRow();
}
}
Now compile the java code. The compile button is shown in red circle in
the image.
Connect the ports of the java transformation to the target.
Save the mapping, create a workflow and run the workflow.
Flat file header row, footer row and detail rows to multiple tables
Assume that we have a flat file with header row, footer row and detail rows.
Now Lets see how to load header row into one table, footer row into other
table and detail rows into another table just by using the transformations
only.
First pass the data from source qualifier to an expression transformation. In
the expression transformation assign unique number to each row (assume
exp_count port). After that pass the data from expression to aggregator. In
the aggregator transformation don't check any group by port. So that the
aggregator will provide last row as the default output (assume agg_count
port).
Now pass the data from expression and aggregator to joiner transformation.
In the joiner select the ports from aggregator as master and the ports from
expression as details. Give the join condition on the count ports and select
the join type as master outer join. Pass the joiner output to a router
transformation and create two groups in the router. For the first group give
the condtion as exp_count=1, which gives header row. For the second group
give the condition as exp_count=agg_count, which gives the footer row. The
default group will give the detail rows.
Reverse the Contents of Flat File Informatica
Q1) I have a flat file, want to reverse the contents of the flat file which
means the first record should come as last record and last record should
come as first record and load into the target file.
As an example consider the source flat file data as
Solution:
Follow the below steps for creating the mapping logic
Informatica 8.x or later versions provides a feature for generating the target
files dynamically. This feature allows you to
Create a new file for every session run
create a new file for each transaction.
Informatica provides a special port,"FileName" in the Target file definition.
This port you have to add explicitly. See the below diagram for adding the
"FileName" port.
Go to the Target Designer or Warehouse builder and edit the file definition.
You have to click on the button indicated in red color circle to add the special
port.
Now we will see some informatica mapping examples for creating the target
file name dynamically and load the data.
1. Generate a new file for every session run.
Whenever the session runs you need to create a new file dynamically and
load the source data into that file. To do this just follow the below steps:
STEP1: Connect the source qualifier to an expression transformation. In the
expression transformation create an output port (call it as File_Name) and
assign the expression as 'EMP_'||to_char(sessstarttime,
'YYYYMMDDHH24MISS')||'.dat'
STPE2: Now connect the expression transformation to the target and
connect eh File_Name port of expression transformation to the FileName port
of the target file definition.
STEP3: Create a workflow and run the workflow.
Here I have used sessstarttime, as it is constant throughout the session run.
If you have used sysdate, a new file will be created whenever a new
transaction occurs in the session run.
consider the employees table as the source. I want to create a file for each
department id and load the appropriate data into the files.
STEP1: Sort the data on department_id. You can either use the source
qualifier or sorter transformation to sort the data.
STEP2: Connect to the expression transformation. In the expression
transformation create the below ports and assign expressions.
Solution:
Create a mapping to find out the number of records in the source and
write the count to a parameter file. Let call this parameter as $
$SOURCE_COUNT.
Create another mapping. Go to the mapping parameters and variables,
create a mapping variable ($$VAR_SESSION_RUNS) with integer data type.
Connect the source qualifier transformation to the expression
transformation. In the expression transformation, create the below additional
ports.
Varchar2(30)
Now I have to load the data of the source into the customer dimension table
using SCD Type 1. The Dimension table structure is shown below.
Number,
Customer_Id Number,
Customer_Name Varchar2(30),
Location
Varchar2(30)
Edit the lkp transformation, go to the properties tab, and add a new
port In_Customer_Id. This new port needs to be connected to the
Customer_Id port of source qualifier transformation.
New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND (Name != Src_Name
OR Location != Src_Location),
1, 0 )
Now create another filter transformation and drag the ports from lkp
transformation (Cust_Key), source qualifier transformation (Name, Location),
expression transformation (changed_flag) ports into the filter transformation.
Edit the filter transformation, go to the properties tab and enter the
Filter Condition as Changed_Flag=1. Then click on ok.
Now create an update strategy transformation and connect the ports of
the filter transformation (Cust_Key, Name, and Location) to the update
strategy. Go to the properties tab of update strategy and enter the update
strategy expression as DD_Update
Now drag the target definition into the mapping and connect the
appropriate ports from update strategy to the target definition.
--Source Table
Varchar2(30)
);
(
Cust_Key Number Primary Key,
Customer_Id Number,
Location
Flag
Varchar2(30),
Number
);
The basic steps involved in creating a SCD Type 2 Flagging mapping are
Identifying the new records and inserting into the dimension table with
flag column value as one.
Identifying the changed record and inserting into the dimension table
with flag value as one.
Identify the changed record and update the existing record in
dimension table with flag value as zero.
We will divide the steps to implement the SCD type 2 flagging mapping into
four parts.
SCD Type 2 Flag implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2
Flagging. The steps involved are:
Customers_Dim.Location as Location,
Customers_Dim.Customer_Id as Customer_Id
FROM
Customers_Dim
WHERE Customers_Dim.Flag = 1
Now drag the target definition into the mapping and connect the
appropriate ports of update strategy transformation to the target definition.
Now connect the Next_Val, Flag ports of expression transformation
(Expr_Flag created in part 2) to the cust_key, Flag ports of the target
definition respectively. The part of the mapping diagram is shown below.
SCD type 2 will store the entire history in the dimension table. In SCD type 2
effective date, the dimension table will have Start_Date (Begin_Date) and
End_Date as the fields. If the End_Date is Null, then it indicates the current
row. Know more about SCDs at Slowly Changing Dimensions Concepts.
We will see how to implement the SCD Type 2 Effective Date in informatica.
As an example consider the customer dimension. The source and target table
structures are shown below:
--Source Table
Varchar2(30)
);
Varchar2(30),
Date,
Date
);
The basic steps involved in creating a SCD Type 2 Effective Date mapping are
Identifying the new records and inserting into the dimension table with
Begin_Date as the Current date (SYSDATE) and End_Date as NULL.
Identifying the changed record and inserting into the dimension table
with Begin_Date as the Current date (SYSDATE) and End_Date as NULL.
Identify the changed record and update the existing record in
dimension table with End_Date as Curren date.
We will divide the steps to implement the SCD type 2 Effective Date mapping
into four parts.
SCD Type 2 Effective Date implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2
Effective Date. The steps involved are:
Customers_Dim
In this part, we will identify the new records and insert them into the target
with Begin Date as the current date. The steps involved are:
Now create a filter transformation to identify and insert new record in
to the dimension table. Drag the ports of expression transformation
(New_Flag) and source qualifier transformation (Customer_Id, Location) into
the filter transformation.
Go the properties tab of filter transformation and enter the filter
condition as New_Flag=1
Now create a update strategy transformation and connect the ports of
filter transformation (Customer_Id, Location). Go to the properties tab and
enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the
appropriate ports of update strategy transformation to the target definition.
In this part, we will identify the changed records and insert them into the
target with Begin Date as the current date. The steps involved are:
Create a filter transformation. Call this filter transformation as
FIL_Changed. This is used to find the changed records. Now drag the ports
from expression transformation (changed_flag), source qualifier
transformation (customer_id, location), LKP transformation (Cust_Key) into
the filter transformation.
Go to the filter transformation properties and enter the filter condition
as changed_flag =1.
Now create an update strategy transformation and drag the ports of
Filter transformation (customer_id, location) into the update strategy
transformation. Go to the properties tab and enter the update strategy
expression as DD_INSERT.
Now drag the target definition into the mapping and connect the
appropriate ports of update strategy transformation to the target definition.
Now connect the Next_Val, Begin_Date ports of expression
transformation (Expr_Date created in part 2) to the cust_key, Begin_Date
ports of the target definition respectively. The part of the mapping diagram is
shown below.
In this part, we will update the changed records in the dimension table with
End Date as current date.
Create an expression transformation and drag the Cust_Key port of
filter transformation (FIL_Changed created in part 3) into the expression
transformation.
Go to the ports tab of expression transformation and create a new
output port (End_Date with date/time data type). Assign a value SYSDATE to
this port.
Now create an update strategy transformation and drag the ports of
the expression transformation into it. Go to the properties tab and enter the
update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the
appropriate ports of update strategy to it. The complete mapping image is
shown below.
Identifying the changed record and update the existing record in the
dimension table.
We will see the implementation of SCD type 3 by using the customer
dimension table as an example. The source table looks as
Varchar2(30)
Now I have to load the data of the source into the customer dimension table
using SCD Type 3. The Dimension table structure is shown below.
Number,
Customer_Id Number,
Curent_Location
Previous_Location
Varchar2(30),
Varchar2(30)
New_Flag = IIF(ISNULL(Cust_Key),1,0)
Changed_Flag = IIF(NOT ISNULL(Cust_Key)
AND Prev_Location != Curr_Location,
1, 0 )
2001, BLR
10,
2002, MUM
10,
2003, SEA
10,
2004, NY
20,
2001, DEL
20,
2002, NCR
20,
2003, HYD
The question is for each customer when processing the record for current
row, you have to get the previous row city value. If there is no previous row,
then make the previous row value as null. The output data is shown below:
10,
10,
10,
20,
20,
20,
(input/output port)
(input/output port)
2001, BLR
10,
2002, MUM
10,
2003, SEA
10,
2004, NY
20,
2001, DEL
20,
2002, NCR
20,
2003, HYD
The question is for each customer when processing the record for current
row, you have to get the previous row city value. If there is no previous row,
then make the previous row value as null. The output data is shown below:
10,
10,
10,
20,
20,
20,
(input/output port)
(input/output port)
Jan 9600
1999 A
Feb 2000
1999 A
Mar 2500
2001 B
Jan 3000
2001 B
Feb 3500
2001 B
Mar 4000
Target Data:
9600
2000
2500
2001 B
3000
3500
4000
Follow the below steps to implement the mapping logic for the above
scenario in informatica:
Create a new mapping.
I want to sort the data on the department id, employee id and then find the
cumulative sum of salaries of employees in each department. The output i
shown below:
Name
---A
B
C
D
E
F
G
After excluding the last 5 records, i want to load A,B into the target. How to
implement a mapping logic for this in informatica?
Products
-------Windows
Linux
Unix
Ubuntu
Fedora
Centos
Debian
I want to load only the last record or footer into the target table. The target
should contain only the product "Debain". Follow the below steps for
implementing the mapping logic in informatica:
Create a new mapping and drag the source into the mapping. By
default, it creates the source qualifier transformation.
Now create an expression transformation and drag the ports from
source qualifier into the expression transformation. In the expression
transformation, create the below additional ports and assign the
corresponding expressions:
v_count (variable port) = v_count+1
o_count (output port) = v_count
Products, o_count
----------------Windows, 1
Linux, 2
Unix,
Ubuntu, 4
Fedora, 5
Centos, 6
Debian, 7
Products
--------Informatica
Datastage
Pentaho
MSBI
Oracle
Mysql
Target1
------Informatica
Pentaho
Oracle
Target2
------Datastage
MSBI
Mysql
Solution:
The mapping flow and the transformations used are mentioned below:
SRC->SQ->EXP->RTR->TGTS
First create a new mapping and drag the source into the mapping.
Samsung NULL
Iphone 3
LG
Nokia
0
4
Now i want to duplicate or repeat each product in the source table as many
times as the value in the quantity column. The output is
product Quantity
---------------Iphone 3
Iphone 3
Iphone 3
Nokia
Nokia
Nokia
Nokia
if (!isNull("quantity"))
{
double cnt = quantity;
for (int i = 1; i <= quantity; i++)
{
product = product;
quantity = quantity;
generateRow();
}
}
Now compile the java code. The compile button is shown in red circle in
the image.
Connect the ports of the java transformation to the target.
Save the mapping, create a workflow and run the workflow.
If you like this post, please share it on google by clicking on the +1 button
Load Source File Name in Target - Informatica
Q) How to load the name of the current processing flat file along with the
data into the target using informatica mapping?
We will create a simple pass through mapping to load the data and "file
name" from a flat file into the target. Assume that we have a source file
"customers" and want to load this data into the target "customers_tgt". The
structures of source and target are
Target: Customers_TBL
Customer_Id
Location
FileName
The loading of the filename works for both Direct and Indirect Source
filetype. After running the workflow, the data and the filename will be loaded
in to the target. The important point to note is the complete path of the file
will be loaded into the target. This means that the directory path and the
filename will be loaded(example: /informatica/9.1/SrcFiles/Customers.dat).
If you dont want the directory path and just want the filename to be loaded
in to the target, then follow the below steps:
Create an expression transformation and drag the ports of source
qualifier transformation into it.
Edit the expression transformation, go to the ports tab, create an
output port and assign the below expression to it.
REVERSE
(
SUBSTR
(
REVERSE(CurrentlyProcessedFileName),
1,
INSTR(REVERSE(CurrentlyProcessedFileName), '/') - 1
)
)
Solution:
Follow the below steps for creating the mapping logic
Go to the Target Designer or Warehouse builder and edit the file definition.
You have to click on the button indicated in red color circle to add the special
port.
Now we will see some informatica mapping examples for creating the target
file name dynamically and load the data.
1. Generate a new file for every session run.
Whenever the session runs you need to create a new file dynamically and
load the source data into that file. To do this just follow the below steps:
STEP1: Connect the source qualifier to an expression transformation. In the
expression transformation create an output port (call it as File_Name) and
assign the expression as 'EMP_'||to_char(sessstarttime,
'YYYYMMDDHH24MISS')||'.dat'
STPE2: Now connect the expression transformation to the target and
connect eh File_Name port of expression transformation to the FileName port
of the target file definition.
STEP3: Create a workflow and run the workflow.
Here I have used sessstarttime, as it is constant throughout the session run.
If you have used sysdate, a new file will be created whenever a new
transaction occurs in the session run.
The target file names created would look like EMP_20120101125040.dat.
2. Create a new file for every session run. The file name should contain suffix
as numbers (EMP_n.dat)
In the above mapping scenario, the target flat file name contains the suffix
as 'timestamp.dat'. Here we have to create the suffix as a number. So, the
file names should looks as EMP_1.dat, EMP_2.dat and so on. Follow the below
steps:
STPE1: Go the mappings parameters and variables -> Create a new
variable, $$COUNT_VAR and its data type should be Integer
STPE2: Connect the source Qualifier to the expression transformation. In the
expression transformation create the following new ports and assign the
expressions.