Você está na página 1de 26

Performance Tuning for

Informatica PowerCenter

What Affects Performance?


These can make a substantial performance difference:

Database Tuning Sources, Targets, Repositories, Query


Tuning, Statistics Collection, Indexes, other techniques

Network - Capacity, Utilization


Disk Speed, Space, Tuning, Storage Type, Mount Type
Server Database, Informatica RAM, CPU,

Configuration
PowerCenter add on features Grid, Partitioning

Informatica Platform Architecture


Informatica
Analyst

Analyst Service

Model
Repository
Service

Informatica
Developer

Reposito
ry
Data Integration Service

Informatica
Administrator

ISP

Profile
Service

Mapping
Service

Staging Data

SQL
Service

Integration Service

Data Object Cache


Profile
Warehouse

Workflow
Manager

Mapping
Designer

Repository
Service
Reposito
ry

Metadata
Manager

Metadata Manager Service

MM
Warehouse

Comparison Across Tools


Most

techniques are the same except:

PowerCenter
Intended for larger data volumes
Use the Source Qualifier JOIN, SORT
and FILTER features instead of
additional JOINER, SORTER or
FILTER transformations in the
mapping (Why: Includes the join, sort
and filter conditions in the WHERE
clause in the Source Qualified
database query)

Cache Lookup transformations (Why:


Prepare for managing larger amounts
of data from Source)

Memory Settings in Session

Data Services
Intended for small data volumes
Use additional JOINER, SORTER or
FILTER transformations in the
mapping instead of placing in the
READ transformation (Why: Uses
native Pushdown Optimization and
avoids blocking pushdown on
subsequent transformations)

Do not cache Lookup Transformations


(Why: minimize I/O to a few rows of
required data from Source based on
request. Select only what is used for
those rows.)

Memory settings in DIS

Key Techniques

Adjusting Memory for DTM Buffer (PC and IDS)


Prevent Blockage of Pushdown (IDS, and PC if you have
PDO)

Lookup Caching (PC)


Cache and Optimization (IDS)
Avoid Unnecessary I/O
Avoid Duplicate Work

Anatomy of a Session
Integration Service or
Data Integration Service
Data Transformation Manager
(DTM)
DTM Buffer
Source
data READER

WRITER

TRANSFORMER

Target
data

Transformation
caches

PC Session Memory Settings

Generally set the


DTM Buffer Size
to Auto

For large jobs,


set as large as
needed, defining
it using all
numeric values
(2000000000 for
2 GB)

Set to fit 100

rows or more

Maximum Memory Settings

System uses the smaller of the two


Applies to DTM Buffer Size, Buffer Block Size, and
transformation caches set to Auto
Small changes can have a huge impact on performance, as it
provides more DTM Buffer space

Throughput and Busy Percentage in Log


View Throughput values in PowerCenter Workflow Monitor
Values less than 5000 rows/sec are slow and need tuning,
unless the table size is small too

Review log file Busy Percentage to evaluate


performance/bottleneck status
Source, target and transformation Busy Percentage
Even distribution all 100% -- is ideal

One section busy and others not busy indicates a


bottleneck

Other memory efficiency messages in the log review and


modify memory accordingly

Lookups

Minimize Lookup calls with Unconnected Lookups when a conditional lookup is required

Eliminate Records from the Lookup


Leverage the Lookup source filter property
SQL Override to include only rows in the Source for
insert/update check on target or referential integrity checks

For large Lookups, consider joining the rows instead


Uncached lookups require queries of the source for each
lookup row

Cached lookups query the source once for a complete set


of data

Lookup Cache

Reuse Cache within a Mapping

Structure the lookups identically, same ports, same lookup


condition, and it will be reused automatically

Include additional ports needed by all lookups to the table,


even if not used in all to enable reuse

Reuse Cache Between Mappings Named Cache

Enable concurrent caches

Build cache first time and use across several mappings


In the first session, indicate to rebuild the cache if it exists

Add pipelines to build cache at the start of the mapping for


all lookups concurrently, instead of default sequentially when
lookup process the first row

Increase the index and data cache sizes - hold all data in
memory without paging to disk

Determine Potential Cache Size


Source

Lookup

Lookup

Table

Items

Manufacturer

DIM_Items

Rows

5000

200

100000

Example 1 Lookup
Manufacturer
Cached
Build Cache

Example 2 Lookup
Dim_Items
Uncached
Cached
200

Uncached

100000

Read Source Records 5000

5000

5000

5000

Execute Lookup

5000

5000

Total Disk Reads

5200

10000

105000

10000

In this example, for a small lookup table, caching is best, for a


large one, uncached is best

Reduce I/O and Reduce Number of Rows

Reduce amount of data moving across the network - Filter


data in the database

Reduce the amount of data selected from the source Reduce the number of ports selected in sources, lookups

Select Distinct in Source Qualifier Filter rows


Reduce writes to the log file
Set Trace Level lower (e.g. to WARN)
Fix Transformation Errors
Fix Predicate syntax to avoid messages (view in IDS logs)

Single Pass Reads

Read source once and write to multiple targets

Reduce I/O and Reduce Number of Rows

Remove unnecessary ports and links to unnecessary


ports

Reduce the number of transformations - keep the Source


and Target anchors

Tune Database Queries and SQL statements see log file


for SQL

Reduce Rows in Pipeline


Move transformations that decrease rows upstream in
mapping (Filter, Joiner, Aggregator, Rank)

Move transformations that increase rows downstream


(Normalizer, Joiner, SQL, Java, Lookup with Return All Rows
selected)

Avoid Duplicate Work

Place common logic upstream


Perform functions needed in each data stream prior to the
split to reduce duplicating logic or function calls

Calculate once, use many times

Avoid calculating or testing the same value over and over


Calculate it once in an Expression transformation, and set a
true/false flag

Within an Expression, use variables to calculate a value


used several times

Additional Informatica Resources


More resources available at:

http://mysupport.informatica.com
http://velocity.informatica.com/
http://marketplace.informatica.com
PowerCenter and Data Services Help
Product manuals

PowerCenter

and
Data
Performance Tuning Guides

Services

PowerCenter and Informatica Developer


Transformation Guides

PowerCenter and Informatica Administrator


Guides

Developer Community
Informatica Professional Services

Additional Performance
Improvement
Tips/Techniques

Efficient Data Formats

Flat File vs. Database Table - Flat file located on the server
loads faster than database table located on the server

Fixed-width files are faster than delimited - Delimited files


require extra parsing

UNICODE vs. ASCII data movement mode - Unicode data


requires 2 bytes per character; ASCII requires 1 byte per
character

Disable High Precision - Applies to decimal data type.


Use native drivers instead of ODBC drivers

Transformation Tips

Avoid Stored Procedures, External Procedures, and


Advanced External Procedures - Slow and has overhead

Use sorted input for Aggregators, Union and Joiners


Connect only the ports that will be used downstream Lookup and other transformations

Use the where clause in the Lookup Filter instead of a


Lookup SQL Override

Sort in Source Qualifier (PC)


Join in Source Qualifier (PC)
Pushdown Sort (IDS) with Sorter transformation
Pushdown Join (IDS) with Joiner transformation

Mapping Tips

Simplify nested functions

instead of:
IIF(condition1,result1,IIF(condition2, result2,IIF ))))))))))))

try:

DECODE (TRUE, condition1, result1, , conditionn, resultn)


DECODE processing stops as soon as a condition is matched

Entire IIF statement evaluated even if earlier condition succeeds

Functions are more expensive than operators


Use || instead of CONCAT()

Mapping Tips

Avoid Data Type Conversions


Avoid implicit conversion, especially repeated ones
Avoid repeated conversions by explicitly converting and managing
the data type as it moves through the mapping

Trim spaces before comparing CHAR and VARCHAR


ports

Use Integer values for Filter and Lookup conditions, even

if datatype conversion is required, instead of Char or Data


Minimize the number of targets per mapping

More targets = poorer performance

Reduce the use of function calls


SUM(Col A) + SUM(Col B) Can be SUM(Col A + Col B)

Mapping Tips
Use Integers instead of Strings or Dates and Avoid
Complex Expressions

Comparison in Lookups, Filters and Routers


Group By in Aggregator

XML Reader / Mid-Stream XML Parser


Remove unprojected groups
Dont leave port size lengths as infinite.

Minimize use of Update Strategy transformations

Mapping Tips

Avoid Unnecessary Sorting on already sorted input


Replace multiple filter transformations with a router
transformation - Reduces the number of transformations
in the mapping

Use Filters or Routers to drop rejected rows from an


Update Strategy transformation

Select the appropriate Master Source in Joiner


Source with fewer rows (unsorted joiner)
Source with fewer duplicate keys (sorted joiner)

Normal Joins are faster


Join in database

Mapping Tips

Reusable Sequence Generator - Number of Cached

Values Property
Setting too low causes frequent repository access,
which impacts performance set to 1000 or similar
Unused values in a block are lost leading to gaps in the
sequence

Change unused lookup ports to be neither input nor

output reduces generated SQL results


Use a SQL override with an ORDER BY clause to override
the default ORDER BY for lookups with many output ports

Mapping Tips

Minimize unconnected lookup execution in IIF expression


- Set up the condition so unconnected lookup is in the
THEN clause

Use "Return any matching value" in Lookup whenever


possible - Creates index based on key ports rather than all
lookup ports

Place Lookup Conditions in the following order:


Equal to (=) (numeric ports first, then string or date)
Less than (<), greater than (>), less than or equal to (<=),
greater than or equal to (>=)

Not equal to (!=)

Session/Runtime Settings

Increase the Target Commit Interval - Reduce database


overhead

Use pre-post session commands to drop and rebuild


indexes and constraints - Improves load time when loading
many rows, balance with index/constraint rebuild time

Você também pode gostar