Escolar Documentos
Profissional Documentos
Cultura Documentos
com/tutorial/10-
interview/44-important-practical-
interview-questions.html
But what if the source is a flat file? How can we remove the duplicates from flat file
source? Read On...
Now suppose the source system is a Flat File. Here in the Source
Qualifier you will not be able to select the distinct clause as it is
disabled due to flat file source table. Hence the next approach may be
we use a Sorter Transformation and check the Distinct option.
When we select the distinct option all the columns will the selected as
keys, in ascending order by default.
Sorter Transformation DISTINCT clause
Ans.
After the Source Qualifier place a Router Transformation . Create
two Groups namely EVEN and ODD, with filter conditions as
MOD(SERIAL_NO,2)=0 and MOD(SERIAL_NO,2)=1 respectively.
Then output the two groups into two flat file targets.
Sam 100 70 80
John 75 100 85
Tom 80 100 85
John Maths 75
Tom Maths 80
Ans.
Here to convert the Rows to Columns we have to use the Normalizer Transformation
followed by an Expression Transformation to Decode the column taken into
consideration. For more details on how the mapping is performed please visit Working
with Normalizer
Q4. Name the transformations which converts one to many rows i.e
increases the i/p:o/p row count. Also what is the name of its reverse
transformation.
Ans.
Normalizer as well as Router Transformations are the Active
transformation which can increase the number of input rows to output
rows.
Q5. Suppose we have a source table and we want to load three target
tables based on source rows such that first row moves to first target
table, secord row in second target table, third row in third target table,
fourth row again in first target table so on and so forth. Describe your
approach.
Ans.
We can clearly understand that we need a Router transformation to
route or filter source data to the three target tables. Now the question
is what will be the filter conditions. First of all we need an Expression
Transformation where we have all the source table columns and
along with that we have another i/o port say seq_num, which is gets
sequence numbers for each source row from the port NextVal of a
Sequence Generator start value 0 and increment by 1. Now the
filter condition for the three router groups will be:
Q6. Suppose we have ten source flat files of same structure. How can
we load all the files in target database in a single batch run using a
single mapping.
Ans.
After we create a mapping to load data in target database from flat
files, next we move on to the session property of the Source Qualifier.
To load a set of source files we need to create a file say final.txt
containing the source falt file names, ten files in our case and set the
Source filetype option as Indirect. Next point this flat file final.txt
fully qualified through Source file directory and Source filename .
Image: Session Property Flat File
Ans.
We will use the very basic concept of the Expression
Transformation that at a time we can access the previous row data
as well as the currently processed data in an expression
transformation. What we need is simple Sorter, Expression and Filter
transformation to achieve aggregation at Informatica level.
For detailed understanding visit Aggregation without Aggregator
Tom Maths 80
John Maths 75
John 75 100 85
Tom 80 100 85
Describe your approach.
Ans.
Here our scenario is to convert many rows to one rows, and the transformation which will
help us to achieve this is Aggregator . Our Mapping will look like this:
Sorter Transformation
Now based on STUDENT_NAME in GROUP BY clause the following output subject
columns are populated as
MATHS: MAX(MARKS, SUBJECT='Maths')
LIFE_SC: MAX(MARKS, SUBJECT='Life Science')
PHY_SC: MAX(MARKS, SUBJECT='Physical Science')
Aggregator Transformation
Q9. What is a Source Qualifier? What are the tasks we can perform
using a SQ and why it is an ACTIVE transformation?
Ans.
A Source Qualifier is an Active and Connected Informatica
transformation that reads the rows from a relational database or flat
file source.
Ans.
The Source Qualifier transformation displays the transformation
datatypes. The transformation datatypes determine how the source
database binds data when the Integration Service reads it.
Now if we alter the datatypes in the Source Qualifier transformation or
the datatypes in the source definition and Source Qualifier
transformation do not match, the Designer marks the mapping as
invalid when we save it.
Q11. Suppose we have used the Select Distinct and the Number Of
Sorted Ports property in the SQ and then we add Custom SQL Query.
Explain what will happen.
Ans.
Whenever we add Custom SQL or SQL override query it overrides the
User-Defined Join, Source Filter, Number of Sorted Ports, and Select
Distinct settings in the Source Qualifier transformation. Hence only the
user defined SQL Query will be fired in the database and all the other
options will be ignored .
Q12. Describe the situations where we will use the Source Filter,
Select Distinct and Number Of Sorted Ports properties of Source
Qualifier transformation.
Ans.
Source Filter option is used basically to reduce the number of rows
the Integration Service queries so as to improve performance.
Select Distinct option is used when we want the Integration Service
to select unique values from a source, filtering out unnecessary data
earlier in the data flow, which might improve performance.
Number Of Sorted Ports option is used when we want the source
data to be in a sorted fashion so as to use the same in some following
transformations like Aggregator or Joiner, those when configured for
sorted input will improve the performance.
Q13. What will happen if the SELECT list COLUMNS in the Custom
override SQL Query and the OUTPUT PORTS order in SQ transformation
do not match?
Ans.
Mismatch or Changing the order of the list of selected columns to that
of the connected transformation output ports may result is session
failure.
Ans.
We use source filter to reduce the number of source records. If we
include the string WHERE in the source filter, the Integration Service
fails the session .
Ans.
While joining Source Data of heterogeneous sources as well as to
join flat files we will use the Joiner transformation.
Use the Joiner transformation when we need to join the following types
of sources:
Join data from different Relational Databases.
Join data from different Flat Files.
Join relational sources and flat files.
Ans.
Sybase supports a maximum of 16 columns in an ORDER BY clause. So
if the source is Sybase, do not sort more than 16 columns.
Ans.
In the Workflow Manager, we can Configure Constraint based load
ordering for a session. The Integration Service orders the target load
on a row-by-row basis. For every row generated by an active source,
the Integration Service loads the corresponding transformed row first
to the primary key table, then to the foreign key table.
Hence if we have one Source Qualifier transformation that provides
data for multiple target tables having primary and foreign key
relationships, we will go for Constraint based load ordering.
Ans.
A Filter transformation is an Active and Connected transformation
that can filter rows in a mapping.
Only the rows that meet the Filter Condition pass through the Filter
transformation to the next transformation in the pipeline. TRUE and
FALSE are the implicit return values from any filter condition we set. If
the filter condition evaluates to NULL, the row is assumed to be FALSE.
The numeric equivalent of FALSE is zero (0) and any non-zero value is
the equivalent of TRUE.
Ans.
Source Qualifier
transformation filters Filter transformation filters rows
rows when read from a from within a mapping
source.
Source Qualifier
Filter transformation filters rows
transformation can only
coming from any type of source
filter rows from
system in the mapping level.
Relational Sources.
Ans.
A Joiner is an Active and Connected transformation used to join
source data from the same source system or from two related
heterogeneous sources residing in different locations or file systems.
The Joiner transformation joins sources with at least one matching
column. The Joiner transformation uses a condition that matches one
or more pairs of columns between the two sources.
The two input pipelines include a master pipeline and a detail pipeline
or a master and a detail branch. The master pipeline ends at the Joiner
transformation, while the detail pipeline continues to the target.
Q22. State the limitations where we cannot use Joiner in the mapping
pipeline.
Ans.
The Joiner transformation accepts input from most transformations.
However, following are the limitations:
Joiner transformation cannot be used when either of the input pipeline
contains an Update Strategy transformation.
Joiner transformation cannot be used if we connect a Sequence
Generator transformation directly before the Joiner transformation.
Q23. Out of the two input pipelines of a joiner, which one will you set
as the master pipeline?
Ans.
During a session run, the Integration Service compares each row of
the master source against the detail source.
The master and detail sources need to be configured for optimal
performance .
Ans.
In SQL, a join is a relational operator that combines data from multiple
tables into a single result set. The Joiner transformation is similar to an
SQL join except that data can originate from different types of sources.
Note: A normal or master outer join performs faster than a full outer or detail outer
join.
Ans.
In a normal join , the Integration Service discards all rows of data
from the master and detail source that do not match, based on the join
condition.
A master outer join keeps all rows of data from the detail source and
the matching rows from the master source. It discards the unmatched
rows from the master source.
A detail outer join keeps all rows of data from the master source and
the matching rows from the detail source. It discards the unmatched
rows from the detail source.
A full outer join keeps all rows of data from both the master and
detail sources.
Q26. Describe the impact of number of join conditions and join order
in a Joiner Transformation.
Ans.
We can define one or more conditions based on equality between
the specified master and detail sources.
Both ports in a condition must have the same datatype . If we need
to use two ports in the join condition with non-matching datatypes we
must convert the datatypes so that they match. The Designer validates
datatypes in a join condition.
Additional ports in the join condition increases the time necessary
to join two sources.
The order of the ports in the join condition can impact the performance
of the Joiner transformation. If we use multiple ports in the join
condition, the Integration Service compares the ports in the order we
specified.
Ans.
The Joiner transformation does not match null values .
For example, if both EMP_ID1 and EMP_ID2 contain a row with a null
value, the Integration Service does not consider them a match and
does not join the two rows.
To join rows with null values, replace null input with default values in
the Ports tab of the joiner, and then join on the default values.
Note: If a result set includes fields that do not contain data in either of
the sources, the Joiner transformation populates the empty fields with
null values. If we know that a field will return a NULL and we do not
want to insert NULLs in the target, set a default value on the Ports tab
for the corresponding port.
Ans.
If we have sorted both the master and detail pipelines in order of the
ports say ITEM_NO, ITEM_NAME and PRICE we must ensure that:
Use ITEM_NO in the First Join Condition.
If we add a Second Join Condition, we must use ITEM_NAME.
If we want to use PRICE as a Join Condition apart from ITEM_NO, we
must also use ITEM_NAME in the Second Join Condition.
If we skip ITEM_NAME and join on ITEM_NO and PRICE, we will lose the
input sort order and the Integration Service fails the session .
Q29. What are the transformations that cannot be placed between the
sort origin and the Joiner transformation so that we do not lose the
input sort order.
Ans.
The best option is to place the Joiner transformation directly after the
sort origin to maintain sorted data.
However do not place any of the following transformations between
the sort origin and the Joiner transformation:
Custom
Unsorted Aggregator
Normalizer
Rank
Union transformation
XML Parser transformation
XML Generator transformation
Mapplet [if it contains any one of the above mentioned
transformations]
Q30. Suppose we have the EMP table as our source. In the target we
want to view those employees whose salary is greater than or equal to
the average salary for their departments.
Next we place a Sorted Aggregator Transformation . Here we will find out the
AVERAGE SALARY for each (GROUP BY) DEPTNO .
When we perform this aggregation, we lose the data for individual employees. To
maintain employee data, we must pass a branch of the pipeline to the Aggregator
Transformation and pass a branch with the same sorted source data to the Joiner
transformation to maintain the original data. When we join both branches of the pipeline,
we join the aggregated data with the original data.
Aggregator Ports Tab
Aggregator Properties Tab
So next we need Sorted Joiner Transformation to join the sorted aggregated data with
the original data, based on DEPTNO .
Here we will be taking the aggregated pipeline as the Master and original dataflow as
Detail Pipeline.
Joiner Condition Tab
Joiner Properties Tab
After that we need a Filter Transformation to filter out the employees having salary less
than average salary for their department.
Filter Condition: SAL>=AVG_SAL
Filter Properties Tab
Ans.
A Sequence Generator transformation is a Passive and Connected
transformation that generates numeric values.
It is used to create unique primary key values, replace missing primary
keys, or cycle through a sequential range of numbers.
This transformation by default contains ONLY Two OUTPUT ports
namely CURRVAL and NEXTVAL . We cannot edit or delete these
ports neither we cannot add ports to this unique transformation.
We can create approximately two billion unique numeric values with
the widest range from 1 to 2147483647.
Q32. Define the Properties available in Sequence Generator
transformation in brief.
Ans.
Sequence
Generator Description
Properties
Default is 2147483647.
Now suppose the requirement is like that we need to have the same
surrogate keys in both the targets.
Then the easiest way to handle the situation is to put an Expression
Transformation in between the Sequence Generator and the Target
tables. The SeqGen will pass unique values to the expression
transformation, and then the rows are routed from the expression
transformation to the targets.
Sequence Generator
Q34. Suppose we have 100 records coming from the source. Now for a
target column population we used a Sequence generator.
Suppose the Current Value is 0 and End Value of Sequence generator
is set to 80. What will happen?
Ans.
End Value is the maximum value the Sequence Generator will
generate. After it reaches the End value the session fails with the
following error message:
TT_11009 Sequence Generator Transformation: Overflow error.
Ans.
When we convert a non reusable sequence generator to resuable one
we observe that the Number of Cached Values is set to 1000 by
default; And the Reset property is disabled.
When we try to set the Number of Cached Values property of a
Reusable Sequence Generator to 0 in the Transformation Developer we
encounter the following error message:
The number of cached values must be greater than zero for
reusable sequence transformation.
Target Table
Store Sales Quarter
Store 1 100 1
Store 1 300 2
Store 1 500 3
Store 1 700 4
Store 2 250 1
Store 2 450 2
Store 2 650 3
Store 2 850 4
First we need to set the number of occurences property of the Expense head as 3 in the
Normalizer tab of the Normalizer transformation, since we have Food,Houserent and
Transportation.
Which in turn will create the corresponding 3 input ports in the ports tab along with the
fields Individual and Month
In the Ports tab of the Normalizer the ports will be created
automatically as configured in the Normalizer tab. Interestingly we will
observe two new columns namely GK_EXPENSEHEAD and
GCID_EXPENSEHEAD.
GK field generates sequence number starting from the value as
defined in Sequence field while GCID holds the value of the occurence
field i.e. the column no of the input Expense head.
Here 1 is for FOOD, 2 is for HOUSERENT and 3 is for TRANSPORTATION.
Now the GCID will give which expense corresponds to which field while
converting columns to rows.
Below is the screen-shot of the expression to handle this GCID
efficiently:
A LookUp cache does not change once built. But what if the underlying
lookup table changes the data after the lookup cache is created? Is
there a way so that the cache always remain up-to-date even if the
underlying table changes?
Let's think about this scenario. You are loading your target table
through a mapping. Inside the mapping you have a Lookup and in the
Lookup, you are actually looking up the same target table you are
loading. You may ask me, "So? What's the big deal? We all do it quite
often...". And yes you are right.
There is no "big deal" because Informatica (generally) caches the
lookup table in the very beginning of the mapping, so whatever record
getting inserted to the target table through the mapping, will have no
effect on the Lookup cache. The lookup will still hold the previously
cached data, even if the underlying target table is changing.
But what if you want your Lookup cache to get updated as and when
the target table is changing? What if you want your lookup cache to
always show the exact snapshot of the data in your target table at that
point in time? Clearly this requirement will not be fullfilled in case you
use a static cache. You will need a dynamic cache to handle this.
Let's suppose you run a retail business and maintain all your customer
information in a customer master table (RDBMS table). Every night, all
the customers from your customer master table is loaded in to a
Customer Dimension table in your data warehouse. Your source
customer table is a transaction system table, probably in 3rd normal
form, and does not store history. Meaning, if a customer changes his
address, the old address is updated with the new address. But your
data warehouse table stores the history (may be in the form of SCD
Type-II). There is a map that loads your data warehouse table from the
source table. Typically you do a Lookup on target (static cache) and
check with your every incoming customer record to determine if the
customer is already existing in target or not. If the customer is not
already existing in target, you conclude the customer is new and
INSERT the record whereas if the customer is already existing, you
may want to update the target record with this new record (if the
record is updated). This is illustrated below, You don't need dynamic
Lookup cache for this
• Updating a master customer table with both new and updated customer
information as shown above
• Loading data into a slowly changing dimension table and a fact table at the same
time. Remember, you typically lookup the dimension while loading to fact. So
you load dimension table before loading fact table. But using dynamic lookup,
you can load both simultaneously.
• Loading data from a file with many duplicate records and to eliminate duplicate
records in target by updating a duplicate row i.e. keeping the most recent row or
the initial row
• Loading the same data from multiple sources using a single mapping. Just
consider the previous Retail business example. If you have more than one shops
and Linda has visited two of your shops for the first time, customer record Linda
will come twice during the same load.
When the Integration Service reads a row from the source, it updates
the lookup cache by performing one of the following actions:
Inserts the row into the cache: If the incoming row is not in the
cache, the Integration Service inserts the row in the cache based on
input ports or generated Sequence-ID. The Integration Service flags
the row as insert.
Updates the row in the cache: If the row exists in the cache, the
Integration Service updates the row in the cache based on the input
ports. The Integration Service flags the row as update.
Makes no change to the cache: This happens when the row exists
in the cache and the lookup is configured or specified To Insert New
Rows only or, the row is not in the cache and lookup is configured to
update existing rows only or,
the row is in the cache, but based on the lookup condition, nothing
changes. The Integration Service flags the row as unchanged.
Notice that Integration Service actually flags the rows based on the
above three conditions. This is a great thing, because, if you know the
flag you can actually reroute the row to achieve different logic. This
flag port is called "NewLookupRow" and using this the rows can be
routed for insert, update or to do nothing. You just need to use a
Router or Filter transformation followed by an Update Strategy.
Oh, forgot to tell you the actual values that you can expect in NewLookupRow port:
0 Integration Service does not update or insert the row in the cache.
1 Integration Service inserts the row into the cache.
2 Integration Service updates the row in the cache.
When the Integration Service reads a row, it changes the lookup cache
depending on the results of the lookup query and the Lookup
transformation properties you define. It assigns the value 0, 1, or 2 to
the NewLookupRow port to indicate if it inserts or updates the row in
the cache, or makes no change.
And here I provide you the screenshot of the lookup below. Lookup
ports screen shot first,
Image: Dynamic Lookup Ports Tab
If the input value is NULL and we select the Ignore Null inputs for
Update property for the associated input port, the input value does not
equal the lookup value or the value out of the input/output port. When
you select the Ignore Null property, the lookup cache and the target
table might become unsynchronized if you pass null values to the
target. You must verify that you do not pass null values to the target.
When you update a dynamic lookup cache and target table, the source
data might contain some null values. The Integration Service can
handle the null values in the following ways:
Insert null values: The Integration Service uses null values from the
source and updates the lookup cache and target table using all values
from the source.
Ignore Null inputs for Update property : The Integration Service
ignores the null values in the source and updates the lookup cache and
target table using only the not null values from the source.
If we know the source data contains null values, and we do not want
the Integration Service to update the lookup cache or target with null
values, then we need to check the Ignore Null property for the
corresponding lookup/output port.
When we choose to ignore NULLs, we must verify that we output the
same values to the target that the Integration Service writes to the
lookup cache. We can Configure the mapping based on the value we
want the Integration Service to output from the lookup/output ports
when it updates a row in the cache, so that lookup cache and the
target table might not become unsynchronized
New values. Connect only lookup/output ports from the Lookup
transformation to the target.
Old values. Add an Expression transformation after the Lookup
transformation and before the Filter or Router transformation. Add
output ports in the Expression transformation for each port in the
target table and create expressions to ensure that we do not output
null input values to the target.
But what if we don't want to compare all ports? We can choose the
ports we want the Integration Service to ignore when it compares
ports. The Designer only enables this property for lookup/output ports
when the port is not used in the lookup condition. We can improve
performance by ignoring some ports during comparison.
We might want to do this when the source data includes a column that
indicates whether or not the row contains data we need to update.
Select the Ignore in Comparison property for all lookup ports
except the port that indicates whether or not to update the
row in the cache and target table.
• Control File
• Online Redo Log • Shared Memory (SGA)
• Data File
• Processes
• Temp File
Now let's learn some details of both Database and Oracle Instance.
The Database
The database is comprised of different files as follows
Control Control File contains information that defines the rest of the database like
File names, location and types of other files etc.
Redo Log Redo Log file keeps track of the changes made to the database. All user and
file meta data are stored in data files
Temp file stores the temporary information that are often generated when
Temp file
sorts are performed.
Each file has a header block that contains metadata about the file like
SCN or system change number that says when data stored in buffer
cache was flushed down to disk. This SCN information is important for
Oracle to determine if the database is consistent.
The Instance
This is comprised of a shared memory segment (SGA) and a few
processes. The following picture shows the Oracle structure.
Storage Structure
Here we will learn about both physical and logical storage structure. Physical storage is
how Oracle stores the data physically in the system. Whereas logical storage talks about
how an end user actually accesses that data.
Physically Oracle stores everything in file, called data files. Whereas an end user
accesses that data in terms of accessing the RDBMS tables, which is the logical part.
Let's see the details of these structures.
Physical storage space is comprised of different datafiles which
contains data segments. Each segment can contain multiple extents
and each extent contains the blocks which are the most granular
storage structure. Relationship among Segments, extents and blocks
are shown below.
Data Files
|
^
Segments (size: 96k)
|
^
Extents (Size: 24k)
|
^
Blocks (size: 2k)
A connected lookup recieves source data, performs a lookup and returns data to the
pipeline; While an unconnected lookup is not connected to source or target and is called
by a transformation in the pipeline by :LKP expression which in turn returns only one
column value to the calling transformation.
Lookup can be Cached or Uncached . If we cache the lookup then again we can further
go for static or dynamic or persistent cache,named cache or unnamed cache . By default
lookup transformations are cached and static.
Output Ports: Create an output port for each lookup port we want to link to another
transformation. For connected lookups, we must have at least one output port. For
unconnected lookups, we must select a lookup port as a return port (R) to pass a return
value.
Lookup Port: The Designer designates each column of the lookup source as a lookup
port.
Return Port: An unconnected Lookup transformation has one return port that returns
one column of data to the calling transformation through this port.
Notes: We can delete lookup ports from a relational lookup if the mapping does not use
the lookup ports which will give us performance gain. But if the lookup source is a flat
file then deleting of lookup ports fails the session.
Now let us have a look on the Properties Tab of the Lookup Transformation:
Lookup Sql Override: Override the default SQL statement to add a WHERE clause or
to join multiple tables.
Lookup table name: The base table on which the lookup is performed.
Lookup Source Filter: We can apply filter conditions on the lookup table so as to reduce
the number of records. For example, we may want to select the active records of the
lookup table hence we may use the condition CUSTOMER_DIM.ACTIVE_FLAG = 'Y'.
Lookup caching enabled: If option is checked it caches the lookup table during the
session run. Otherwise it goes for uncached relational database hit. Remember to
implement database index on the columns used in the lookup condition to provide better
performance when the lookup in Uncached.
Lookup policy on multiple match: While lookup if the integration service finds
multiple match we can configure the lookup to return the First Value, Last Value, Any
Value or to Report Error.
Lookup condition: The condition to lookup values from the lookup table based on
source input data. For example, IN_EmpNo=EmpNo.
Connection Information: Query the lookup table from the source or target connection.
In can of flat file lookup we can give the file path and name, whether direct or indirect.
Source Type: Determines whether the source is relational database or flat file.
Tracing Level: It provides the amount of detail in the session log for the transformation.
Options available are Normal, Terse, Vebose Initialization, Verbose Data.
Lookup cache directory name: Determines the directory name where the lookup cache
files will reside.
Lookup cache persistent: Indicates whether we are going for persistent cache or non-
persistent cache.
Dynamic Lookup Cache: When checked We are going for Dyanamic lookup cache else
static lookup cache is used.
Output Old Value On Update: Defines whether the old value for output ports will be
used to update an existing row in dynamic cache.
Cache File Name Prefix: Lookup will used this named persistent cache file based on the
base lookup table.
Re-cache from lookup source: When checked, integration service rebuilds lookup cache
from lookup source when the lookup instance is called in the session.
Insert Else Update: Insert the record if not found in cache, else update it. Option is
available when using dynamic lookup cache.
Update Else Insert: Update the record if found in cache, else insert it. Option is
available when using dynamic lookup cache.
Datetime Format: Used when source type is file to determine the date and time format
of lookup columns.
Thousand Separator: By default it is None, used when source type is file to determine
the thousand separator.
Decimal Separator: By default it is "." else we can use "," and used when source type is
file to determine the thousand separator.
Case Sensitive String Comparison: To be checked when we want to go for Case
sensitive String values in lookup comparison. Used when source type is file.
Null ordering: Determines whether NULL is the highest or lowest value. Used when
source type is file.
Sorted Input: Checked whenever we expect the input data to be sorted and is used when
the source type is flat file.
Lookup source is static: When checked it assumes that the lookup source is not going to
change during the session run.
Pre-build lookup cache: Default option is Auto. If we want the integration service to
start building the cache whenever the session just begins we can chose the option Always
allowed.
Now I am showing a sorter here just illustrate the concept. If you already have sorted data
from the source, you need not use this thereby increasing the performance benefit.
Expression (EXP_SAL) Ports Tab
Image: Expression Ports Tab Properties
Sorter (SRT_SAL1) Ports Tab
By default the Integration service creates the reject files or bad files in
the $PMBadFileDir process variable directory. It writes the entire
reject record row in the bad file although the problem may be in any
one of the Columns. The reject files have a default naming convention
like [target_instance_name].bad . If we open the reject file in an
editor we will see comma separated values having some tags/ indicator
and some data values. We will see two types of Indicators in the
reject file. One is the Row Indicator and the other is the Column
Indicator .
For reading the bad file the best method is to copy the contents of the
bad file and saving the same as a CSV (Comma Sepatared Value) file.
Opening the csv file will give an excel sheet type look and feel. The
firstmost column in the reject file is the Row Indicator , that
determines whether the row was destined for insert, update, delete or
reject. It is basically a flag that determines the Update Strategy for the
data row. When the Commit Type of the session is configured as
User-defined the row indicator indicates whether the transaction was
rolled back due to a non-fatal error, or if the committed transaction
was in a failed target connection group.
3 Reject Writer
>
Column
Type of data Writer Treats As
Indicator
Oracle Parser
Hard Parse
A hard parse occurs when a SQL statement is executed, and the SQL
statement is either not in the shared pool , or it is in the shared pool
but it cannot be shared. A SQL statement is not shared if the metadata
for the two SQL statements is different i.e. a SQL statement textually
identical to a preexisting SQL statement, but the tables referenced in
the two statements are different, or if the optimizer environment is
different.
Soft Parse
EXPLAIN PLAN
Oracle Trace
SQL Trace
TKPROF
i) Source Database
ii) Target Database
iii) Data Volume
Category Technique
While loading Staging Tables for FULL LOADS, Truncate target table option
should be checked. Based on the Target database and the primary key defined,
Integration Service fires TRUNCATE or
DELETE statement.Database Primary Key Defined No
Primary KeyDB2 TRUNCATE
TRUNCATE
INFORMIX DELETE DELETE
ODBC DELETE DELETE
ORACLE DELETE UNRECOVERABLE TRUNCATE
MSSQL DELETE TRUNCATE
SYBASE TRUNCATE TRUNCATE
On a general note any Informatica help material would suggest: you can enter any valid SQL
statement supported by the source database in a SQL override of a Source qualifier or a Lookup
transformation or at the session properties level.
While using them as part of Source Qualifier has no complications, using them in a Lookup SQL
override gets a bit tricky. Use of forward slash followed by an asterix (“/*”) in lookup SQL Override
[generally used for commenting purpose in SQL and at times as Oracle hints.] would result in
session failure with an error like:
Infa 7.x
1. Using a text editor open the PowerCenter server configuration file (pmserver.cfg).
2. Add the following entry at the end of the file:
LookupOverrideParsingSetting=1
3. Re-start the PowerCenter server (pmserver).
Infa 8.x
1. Connect to the Administration Console.
2. Stop the Integration Service.
3. Select the Integration Service.
4. Under the Properties tab, click Edit in the Custom Properties section.
5. Under Name enter LookupOverrideParsingSetting
6. Under Value enter 1.
7. Click OK.
8. And start the Integration Service.
Starting with PowerCenter 8.5, this change could be done at the session task itself
as follows:
Informatica PowerCenter 8x
Key Concepts – 1
We shall look at the fundamental components of the Informatica
PowerCenter 8.x Suite, the key components are
1. PowerCenter Domain
2. PowerCenter Repository
3. Administration Console
4. PowerCenter Client
5. Repository Service
6. Integration Service
PowerCenter Domain
Node
Application services
The services that essentially perform data movement, connect to
different data sources and manage data are called Application
services, they are namely Repository Service, Integration Service, Web
Services Hub, SAPBW Service, Reporting Service and Metadata
Manager Service. The application services run on each node based on
the way we configure the node and the application service
Domain Configuration
Some of the configurations for a domain involves assigning host name,
port numbers to the nodes, setting up Resilience Timeout values,
providing connection information of metadata Database, SMTP details
etc. All the Configuration information for a domain is stored in a set of
relational database tables within the repository. Some of the global
properties that are applicable for Application Services like ‘Maximum
Restart Attempts’, ‘Dispatch Mode’ as ‘Round Robin’/’Metric
Based’/’Adaptive’ etc are configured under Domain Configuration
2. PowerCenter Repository
A set of tasks grouped together becomes worklet. After we create a workflow, we run the
workflow in the Workflow Manager and monitor it in the Workflow Monitor. Workflow Manager has
following three window panes,Task Developer, Create tasks we want to accomplish in the
workflow. Worklet Designer, Create a worklet in the Worklet Designer. A worklet is an object that
groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. You
can nest worklets inside a workflow. Workflow Designer, Create a workflow by connecting tasks
with links in the Workflow Designer. We can also create tasks in the Workflow Designer as you
develop the workflow. The ODBC connection details are defined in Workflow Manager
“Connections “ Menu .
Workflow Monitor : We can monitor workflows and tasks in the Workflow Monitor. We can view
details about a workflow or task in Gantt Chart view or Task view. We can run, stop, abort, and
resume workflows from the Workflow Monitor. We can view sessions and workflow log events in
the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor
continuously receives information from the Integration Service and Repository Service. It also
fetches information from the repository to display historic information.
Output window – Displays messages from the Integration Service and Repository
Service.
Gantt chart view – Displays details about workflow runs in chronological format.
Repository Manager
We can navigate through multiple folders and repositories and perform basic repository tasks with
the Repository Manager. We use the Repository Manager to complete the following tasks:
2. Add and connect to a repository, we can add repositories to the Navigator window and
client registry and then connect to the repositories.
3. Work with PowerCenter domain and repository connections, we can edit or remove domain
connection information. We can connect to one repository or multiple repositories. We
can export repository connection information from the client registry to a file. We can
import the file on a different machine and add the repository connection information to the
client registry.
4. Change your password. We can change the password for our user account.
5. Search for repository objects or keywords. We can search for repository objects containing
specified text. If we add keywords to target definitions, use a keyword to search for a
target definition.
7. Compare repository objects. In the Repository Manager, wecan compare two repository
objects of the same type to identify differences between the objects.
8. Truncate session and workflow log entries. we can truncate the list of session and workflow
logs that the Integration Service writes to the repository. we can truncate all logs, or
truncate all logs older than a specified date.
5. Repository Service
Advanced Properties
> CommentsRequiredFor Checkin: Requires users to add comments when checking
in repository objects.
> Error Severity Level: Level of error messages written to the Repository Service log.
Specify one of the following message levels: Fatal, Error, Warning, Info, Trace &
Debug
Environment Variables
You might want to configure the code page environment variable for a Repository
Service process when the Repository Service process requires a different database
client code page than the Integration Service process running on the same node.
For example, the Integration Service reads from and writes to databases using the
UTF-8 code page. The Integration Service requires that the code page environment
variable be set to UTF-8. However, you have a Shift-JIS repository that requires that
the code page environment variable be set to Shift-JIS. Set the environment variable
on the node to UTF-8. Then add the environment variable to the Repository Service
process properties and set the value to Shift-JIS.
1. The Load Balancer checks different resource provision thresholds on the node
depending on the Dispatch mode set. If dispatching the task causes any threshold
to be exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
2. The Load Balancer dispatches all tasks to the node that runs the master
Integration Service process
1. The Load Balancer verifies which nodes are currently running and enabled
2. The Load Balancer identifies nodes that have the PowerCenter resources required
by the tasks in the workflow
3. The Load Balancer verifies that the resource provision thresholds on each
candidate node are not exceeded. If dispatching the task causes a threshold to be
exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
4. The Load Balancer selects a node based on the dispatch mode
In the next blog we can see how to implement CDC when reading from
Salesforce.com