Você está na página 1de 21

How to Use PowerCenter with Teradata to

Load and Unload Data

2009 Informatica Corporation

Abstract
This article explains how to move data between PowerCenter and Teradata databases. It explains when to use
Teradata relational connections, Teradata load and unload utilities, or pushdown optimization to move data. This article
also lists issues you might encounter when loading data to or unloading data from Teradata and the workarounds for
these issues.

Table of Contents
Overview ........................................................................................................................................................................... 3
Prerequisites................................................................................................................................................................. 4
Teradata Relational Connections...................................................................................................................................... 5
Creating a Teradata Relational Connection ................................................................................................................. 6
Standalone Load and Unload Utilities............................................................................................................................... 6
Teradata FastLoad ....................................................................................................................................................... 7
Teradata MultiLoad....................................................................................................................................................... 7
Teradata TPump........................................................................................................................................................... 7
Teradata FastExport..................................................................................................................................................... 8
Teradata Parallel Transporter ........................................................................................................................................... 8
Pushdown Optimization .................................................................................................................................................... 9
Achieving Full Pushdown without Affecting the Source System................................................................................. 12
Achieving Full Pushdown with Parallel Lookups ........................................................................................................ 13
Achieving Pushdown with Sorted Aggregation........................................................................................................... 14
Achieving Pushdown for an Aggregator Transformation ............................................................................................ 14
Achieving Pushdown when a Transformation Contains a Variable Port .................................................................... 14
Improving Pushdown Performance in Mappings with Multiple Targets ...................................................................... 14
Removing Temporary Views when a Pushdown Session Fails.................................................................................. 15
Issues Affecting Loading to and Unloading from Teradata ............................................................................................. 16
Making 32-bit Load and Unload Utilities Work with 64-bit PowerCenter .................................................................... 16
Increasing Lookup Performance................................................................................................................................. 16
Performing Uncached Lookups with Date/Time Ports in the Lookup Condition ......................................................... 17
Restarting a Failed MultiLoad Job Manually............................................................................................................... 18
Configuring Sessions that Load to the Same Table ................................................................................................... 18
Setting the Checkpoint when Loading to Named Pipes ............................................................................................. 19
Loading from Partitioned Sessions............................................................................................................................. 19
Loading to Targets with Date/Time Columns ............................................................................................................. 19
Hiding Passwords....................................................................................................................................................... 20
Using Error Tables to Identify Problems during Loading ............................................................................................ 20

Overview
Teradata is a global technology leader in enterprise data warehousing, business analytics, and data warehousing
services. Teradata provides a powerful suite of software that includes the Teradata Database, data access and
management tools, and data mining applications. PowerCenter works with the Teradata Database and Teradata tools
to provide a data integration solution that allows you to integrate data from virtually any business system into Teradata
as well as leverage Teradata data for use in other business systems.
PowerCenter uses the following techniques when extracting data from and loading data to the Teradata database:
y

ETL (extract, transform, and load). This technique extracts data from the source systems, transforms the data
within PowerCenter, and loads it to target tables. The PowerCenter Integration Service transforms all data. If you
use the PowerCenter Partitioning option, the Integration Service also parallelizes the workload.

ELT (extract, load, and then transform). This technique extracts data from the source systems, loads it to userdefined staging tables in the target database, and transforms the data within the target system using generated
SQL. The SQL queries include a final insert into the target tables. The database system transforms all data and
parallelizes the workload, if necessary.

ETL-T (ETL and ELT hybrid). This technique extracts data from the source systems, transforms the data within
PowerCenter, loads the data to user-defined staging tables in the target database, and further transforms the data
within the target system using generated SQL. The SQL queries include a final insert into the target tables. The
ELT-T technique is optimized within PowerCenter so that the transformations that better perform within the
database system can be performed there and the Integration Service performs the other transformations.

To perform ETL operations, configure PowerCenter sessions to use a Teradata relational connection, a Teradata
standalone load or unload utility, or Teradata Parallel Transporter. To use ELT or ETL-T techniques, configure
PowerCenter sessions to use pushdown optimization.
Use a Teradata relational connection to communicate with Teradata when PowerCenter sessions load or extract small
amounts of data (<1 GB per session). Teradata relational connections use ODBC to connect to Teradata. ODBC is a
native interface for Teradata. Teradata provides 32- and 64-bit ODBC drivers for Windows and UNIX platforms. The
driver bit mode must be compatible with the bit mode of the platform on which the PowerCenter Integration Services
runs. For example, 32-bit PowerCenter only runs with 32-bit drivers.
Use a standalone load or unload utility when PowerCenter sessions extract or load large amounts of data (>1 GB per
session). Standalone load and unload utilities can increase session performance by loading or extracting data directly
from a file or pipe rather than running the SQL commands to load or extract the same data. All Teradata standalone
load and unload utilities are fully parallel to provide optimal and scalable performance for loading data to or extracting
data from the Teradata Database. PowerCenter works with the Teradata FastLoad, MultiLoad, and TPump load utilities
and the Teradata FastExport unload utility.
Use Teradata Parallel Transporter for PowerCenter sessions that must quickly load or extract large amounts of data
(>1 GB per session). Teradata Parallel Transporter provides all of the capabilities of the standalone load and unload
utilities, plus it provides more granular control over the load or unload process, enhanced monitoring capabilities, and
the ability to automatically drop log, error, and work tables when a session starts. Teradata Parallel Transporter is a
parallel, multi-function extract and load environment that provides access to PowerCenter using an open API. It can
load dozens of files using a single control file. It also allows you to distribute the workload among several CPUs,
eliminating bottlenecks in the data loading and extraction processes.
Use pushdown optimization to reduce the amount of data passed between Teradata and PowerCenter or when the
Teradata database can process transformation logic faster than PowerCenter. Pushdown optimization improves
session performance by pushing as much transformation logic as possible to the Teradata source or target database.
PowerCenter processes any transformation logic that cannot be pushed to the database. For example, pushing Filter
transformation logic to the source database can reduce the amount of data passed to PowerCenter, which decreases
session run time. When you run a session configured for pushdown optimization, PowerCenter translates the

transformation logic into SQL queries and sends the queries to the Teradata database. The Teradata database
executes the SQL queries to process the transformation logic.

Prerequisites
Before you run sessions that move data between PowerCenter and Teradata, you might want to install Teradata client
tools. You also need to locate the Teradata TDPID.

Teradata Client Tools


Teradata client tools help you communicate with the Teradata database and debug problems that occur when a
session loads data to or extracts data from the Teradata database.
You can install the following Teradata client tools:
y

BTEQ. A general-purpose, command-line utility (similar to Oracle SQL*Plus) that enables you to communicate
with one or more Teradata databases.

Teradata SQL Assistant. A GUI-based tool that allows you to retrieve data from any ODBC-compliant database
server and manipulate and store the data in desktop applications. Teradata Queryman is the older version of this
tool.

Install BTEQ or Teradata SQL Assistant to help you debug problems that occur when loading to and extracting from
Teradata. Both tools are included in the Teradata Utility Pack, which is available from Teradata.

TDPID
The Teradata TPDID indicates the name of the Teradata instance and defines the name a client uses to connect to a
server. When you use a Teradata Parallel Transporter or a standalone load or unload utility with PowerCenter, you
must specify the TDPID in the connection properties.
The Teradata TDPID appears in the hosts file on the machines on which the Integration Service and PowerCenter
Client run. By default, the hosts file appears in the following location:
y

UNIX: /etc/hosts

Windows: %SystemRoot%\system32\drivers\etc\hosts*
* The actual location is defined in the Registry key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\DataBasePath

The hosts file contains client configuration information for Teradata. In a hosts file entry, the TDPID precedes the string
cop1.
For example, the hosts file contains the following entries:
127.0.0.1 localhost
192.168.80.113
td_1
192.168.80.114
td_2
192.168.80.115
td_3
192.168.80.116
td_4

demo1099cop1
custcop1
custcop2
custcop3
custcop4

The first entry has the TDPID demo1099. This entry tells the Teradata database that when a client tool references the
Teradata instance demo1099, it should direct requests to localhost (IP address 127.0.0.1).
The following entries have the same TDPID, cust. Multiple hosts file entries with the same TDPID indicate the
Teradata instance is configured for load balancing among nodes. When a client tool attempts to reference Teradata
instance cust, the Teradata database directs requests to the first node in the entry list, td_1. If it takes too long for
the node to respond, the database redirects the request to the second node, and so on. This process prevents the first
node, td_1 from becoming overloaded.

Teradata Relational Connections


Teradata relational connections use ODBC to connect to Teradata. PowerCenter uses the ODBC Driver for Teradata
to retrieve metadata and read and write to Teradata. To establish ODBC connectivity between Teradata and
PowerCenter, install the ODBC Driver for Teradata on each PowerCenter machine that communicates with Teradata.
The ODBC Driver for Teradata is included in the Teradata Tools and Utilities (TTU). You can download the driver from
the Teradata web site.
Use a Teradata relational connection when extracting or loading small data sets, usually <1 GB per session. In
sessions that extract or load large amounts of data, a standalone load or unload utility or Teradata Parallel Transporter
is usually faster than a Teradata relational connection.
PowerCenter works with the ODBC Driver for Teradata available in the following TTU versions:
PowerCenter Versions

TTU Versions

7.0 - 8.1.1

8.1

8.5 and later

8.2, 12.0

For more information about the TTU versions that work with PowerCenter, see the TTU Supported Platforms and
Product Versions document, which is available from Teradata @Your Service.
Sessions that perform lookups on Teradata tables must use a Teradata relational connection. If a session performs a
lookup on a large, static Teradata table, you might be able to increase performance by using FastExport to extract the
data to a flat file and configuring the session to look up data in the flat file.
If you experience performance problems when using a Teradata relational connection, and you do not want to use a
load or unload utility, you might be able to configure PowerCenter sessions to use pushdown optimization.
If you load or extract data using a Teradata relational connection on UNIX, you must verify the configuration of
environment variables and the odbc.ini file on the machine on which the Integration Service runs. To verify the
environment variable configuration, ensure the Teradata ODBC path precedes the Data Direct driver path information
in the PATH and shared library path environment variables. Place the Teradata path before the Data Direct path
because both sets of ODBC software use some of the same file names.
To verify the odbc.ini file configuration, make sure there is an entry for the Teradata ODBC driver in the [ODBC Data
Sources] section of odbc.ini. The following excerpt from an odbc.ini file shows a Teradata ODBC driver (tdata.so)
entry on Linux:
[ODBC Data Sources]
intdv12=tdata.so
[intdv12]
Driver=/usr/odbc/drivers/tdata.so
Description=NCR 3600 running Teradata V12
DBCName=intdv12
SessionMode=Teradata
CharacterSet=UTF8
StCheckLevel=0
DateTimeFormat=AAA
LastUser=
Username=
Password=
Database=
DefaultDatabase=

For more information about configuring odbc.ini, see the PowerCenter Configuration Guide and the ODBC Driver for
Teradata User Guide.

Creating a Teradata Relational Connection


When you create a Teradata (relational) connection object in the Workflow Manager, choose Teradata, and not
ODBC, as the connection type in the connection properties. When you choose Teradata as the connection type, the
Integration Service still uses Teradata ODBC to connect to Teradata.
Although both ODBC and Teradata connection types might work, the Integration Service communicates with the
Teradata database more efficiently when you choose the Teradata connection type. This is especially true if you use
pushdown optimization in a session. If you use pushdown optimization in a Teradata session with an ODBC connection
type, the Integration Service generates database connection driver warning messages.
For more information about creating connection objects in the Workflow Manager, see the PowerCenter Workflow
Basics Guide.

Standalone Load and Unload Utilities


Teradata standalone load and unload utilities are fast, reliable tools that help you export large amounts of data from
Teradata databases and load session target files into Teradata databases. Use a standalone load or unload utility
when PowerCenter sessions extract or load large amounts of data. Standalone load and unload utilities are faster than
Teradata relational connections because they load or extract data directly from a file or pipe rather than run SQL
commands to load or extract the data.
PowerCenter works with the following Teradata standalone load and unload utilities:
y

FastLoad. Inserts large volumes of data into empty tables in a Teradata database.

MultiLoad. Updates, inserts, upserts, and deletes large volumes of data into empty or populated Teradata tables.

TPump. Inserts, updates, upserts, and deletes data in Teradata tables in near-real-time.

FastExport. Exports large data sets from Teradata tables or views to PowerCenter.

All of these load and unload utilities are included in the Teradata Tools and Utilities (TTU), available from Teradata.
PowerCenter supports all of these standalone load and unload utilities. Support for MultiLoad and TPump has been
available since PowerCenter 6.0. Support for FastLoad was added in PowerCenter 7.0. Support for FastExport was
added in PowerCenter 7.1.3.
Before you can configure a session to use a load or unload utility, create a loader or FastExport (application)
connection in the PowerCenter Workflow Manager and enter a value for the TDPID in the connection attributes. For
more information about creating connection objects in PowerCenter, see the PowerCenter Workflow Basics Guide.
To use a load utility in a session, configure the associated mapping to load to a Teradata target, configure the session
to write to a flat file instead of a relational database, and select the loader connection for the session. To use
FastExport in a session, configure the mapping to extract from a Teradata source, configure the session to read from
FastExport instead of a relational database, and select the FastExport connection for the session. For more information
about configuring a session to use a load or unload utility, see the PowerCenter Advanced Workflow Guide.
When a session transfers data between Teradata and PowerCenter, the following files are created:
y

A staging file or pipe. PowerCenter creates a staging file or named pipe for data transfer based on how you
configure the connection. Named pipes are generally faster than staging files because data is transferred as soon
as it appears in the pipe. If you use a staging file, data is not transferred until all data appears in the file.

A control file. PowerCenter generates a control file that contains instructions for loading or extracting data.
PowerCenter creates the control file based on the loader or FastExport attributes you configure for the connection
and the session.

A log file. The load or unload utility creates a log file and writes error messages to it. The PowerCenter session
log indicates whether the session ran successfully, but does not contain load or unload utility error messages. Use
the log file to debug problems that occur during data loading or extraction.

By default, loader staging, control, and log files are created in the target file directory. The FastExport staging, control,
and log files are created in the PowerCenter temporary files directory. For more information about these files, see the
PowerCenter Advanced Workflow Guide.

Teradata FastLoad
Teradata FastLoad is a command-line utility that quickly loads large amounts of data to empty tables in a Teradata
database. Use FastLoad for a high-volume initial load or for high-volume truncate and reload operations.
FastLoad is the fastest load utility, but it has the following limitations:
y

FastLoad uses multiple sessions to load data, but it can load data to only one table in a Teradata database per
job.

It locks tables while loading data, preventing others and other instances of FastLoad from accessing the tables
during data loading.

FastLoad only works with empty tables with no secondary indexes.

It can only insert data.

Teradata MultiLoad
Teradata MultiLoad is a command-driven utility for fast, high-volume maintenance on multiple tables and views of a
Teradata database. Each MultiLoad instance can perform multiple data insert, update, and delete operations on up to
five different tables or views. MultiLoad optimizes operations that rapidly acquire, process, and apply data to Teradata
tables. Use MultiLoad for large volume, incremental data loads.
MultiLoad has the following advantages:
y

MultiLoad is very fast. It can process millions of rows in a few minutes.

MultiLoad supports inserts, updates, upserts, deletes, and data-driven operations in PowerCenter.

You can use variables and embed conditional logic into MultiLoad control files.

MultiLoad supports sophisticated error recovery. It allows load jobs to be restarted without having to redo all of the
prior work.

MultiLoad has the following limitations:


y

MultiLoad is designed for the highest possible throughput, so it can be very resource intensive.

It locks tables while loading data, preventing others and other instances of MultiLoad from accessing the tables
during data loading.

Because of its phased nature, there are potentially inconvenient windows of time when MultiLoad cannot be
stopped without losing access to target tables.

Teradata TPump
Teradata TPump is a highly parallel utility that can continuously move data from data sources into Teradata tables
without locking the affected table. TPump supports inserts, updates, deletes, and data-driver updates. TPump acquires
row hash locks on a database table instead of table-level locks, so multiple TPump instances can load data
simultaneously to the same table. TPump is often used to trickle-load a database table. Use TPump for low volume,
online data loads.
TPump has the following advantages:
y

TPump can refresh database tables in near real-time.

TPump continuously loads data into Teradata tables without locking the affected tables, so users can run queries
when TPump is running.

TPump is less resource-intensive than MultiLoad because it does not write to temporary tables.

Users can control the rate at which statements are sent to the Teradata database, limiting resource consumption.

It supports parallel processing.

TPump can always be stopped and all of its locks dropped with no ill effect.

TPump is not as fast as the other standalone loaders for large volume loads because it changes the same data block
multiple times.

Teradata FastExport
Teradata FastExport is a command-driven utility that uses multiple sessions to quickly transfer large amounts of data
from Teradata sources to PowerCenter. Use FastExport to quickly extract data from Teradata sources.
FastExport has the following advantages:
y

It is faster than Teradata relational connections when extracting large amounts of data.

FastExport can be run in streaming mode, which avoids the need to stage the data file.

You can encrypt the data transfer between FastExport and the Teradata server.

FastExport is available for sources and pipeline lookups.


When you create a FastExport connection, verify the settings of the following connection attributes:
y

Data encryption. Enable this attribute to encrypt the data transfer between FastExport and the Teradata server
so that unauthorized users cannot access the data being transferred across the network.

Fractional seconds. This attribute specifies the precision of the decimal portion of timestamp data. To avoid
session failure or possible data corruption, make sure this value matches the timestamp precision of the column in
the Teradata database.

For more information about configuring FastExport connection attributes, see the PowerCenter Advanced Workflow
Guide.

Teradata Parallel Transporter


Teradata Parallel Transporter (PT) is a client application that provides scalable, high-speed, parallel data extraction,
loading, and updating. It uses and expands upon the functions and capabilities of the standalone Teradata load and
unload utilities. Teradata PT supports a single scripting environment with different system operators for extracting and
loading data. It also supports massive parallel extraction and loading, so if you partition a Teradata PT session,
multiple Teradata PT instances can extract or load large amounts of data in the same database tables at the same
time.
To provide the functionality of the standalone load and unload utilities, Teradata PT extracts or loads data using one of
the following system operators:
y

Export. Exports large data sets from Teradata tables or views and imports the data to PowerCenter for
processing using the FastExport protocol.

Load. Bulk loads large volumes of data into empty Teradata database tables using the FastLoad protocol.

Update. Batch updates, inserts, upserts, and deletes data in Teradata database tables using the MultiLoad
protocol.

Stream. Continuously updates, inserts, upserts, and deletes data in near real-time using the TPump protocol.

Teradata PT has the following advantages:


y

Teradata PT is up to 20% faster than the standalone Teradata load and unload utilities, even though it uses the
underlying protocols from the standalone utilities.

Teradata PT supports recovery for sessions that use the Stream operator when the source data is repeatable.
This feature is especially useful when running real-time sessions and streaming the changes to Teradata.

Users can invoke Teradata PT through a set of open APIs that communicate with the database directly,
eliminating the need for a staging file or pipe and a control file.

Teradata PT eliminates the need to invoke different load and unload utilities to extract and load data.

PowerCenter communicates with Teradata PT using PowerExchange for Teradata Parallel Transporter, which is
available through the Informatica-Teradata Enterprise Data Warehousing Solution. PowerExchange for Teradata
Parallel Transporter was released with PowerCenter 8.1.1.
PowerExchange for Teradata Parallel Transporter provides integration between PowerCenter and Teradata databases
for data extraction and loading. PowerExchange for Teradata Parallel Transporter executes Teradata PT operators
directly through API calls. This improves performance by eliminating the staging file or named pipe. It also improves
security by eliminating the control file, so there is no need to overwrite or store passwords in the control file.
PowerExchange for Teradata Parallel Transporter supports session and workflow recovery. It also captures Teradata
PT error messages and displays them in the session log, so you do not need to check the utility log file when errors
occur.
Before you can configure a session to use Teradata PT, you must you must create a Teradata PT (relational)
connection in the Workflow Manager and enter a value for the TDPID in the connection attributes. To configure a
session to extract data, configure the associated mapping to read from Teradata, change the reader type for the
session to Teradata Parallel Transporter Reader, and select the Teradata PT connection. To configure a session to
load data, configure the associated mapping to load to Teradata, change the writer type for the session to Teradata
Parallel Transporter Writer, and select the Teradata PT connection. In sessions that load to Teradata, you can also
configure an ODBC connection that is used to automatically create the recovery table in the target database and drop
the log, error, and work tables if a session fails.
For more information about using PowerExchange for Teradata Parallel Transporter, see the PowerExchange for
Teradata Parallel Transporter User Guide.

Pushdown Optimization
When you run sessions that move data between PowerCenter and Teradata databases, you might be able to improve
session performance using pushdown optimization. Pushdown optimization allows you to push PowerCenter
transformation logic to the Teradata source or target database. The PowerCenter Integration Service translates the
transformation logic into SQL queries and sends the SQL queries to the database. The Teradata database executes
the SQL queries to process the mapping logic. The Integration Service processes any mapping logic it cannot push to
the database.

The following figure illustrates how pushdown optimization works with a Teradata database system:
ETL
Repository

Repository
Server

Data Server

Today

ETL

Visual, codeless
environment.
Job control and
logging.

Full metadata

ELT MPPbased
performance.
Automatic
scalability.

Pushdown
Processing

ELT

SQL
Staging

Teradata Source
Database

Warehouse

Teradata Target
Database

The following figure shows a mapping in which you can increase performance using pushdown optimization:

If you configure this mapping for pushdown optimization, the Integration Service generates an SQL query based on the
Filter and Lookup transformation logic and pushes the query to the source database. This improves session
performance because it reduces the number of rows sent to PowerCenter. The Integration Service processes the Java
transformation logic since that cannot be pushed to the database, and then loads data to the target.
Use pushdown optimization to improve the performance of sessions that use Teradata relational connections to
connect to Teradata. In general, pushdown optimization can improve session performance in the following
circumstances:
y

When it reduces the number of rows passed between Teradata and PowerCenter. For example, pushing a
Filter transformation to the Teradata source can reduce the number of rows PowerCenter extracts from the
source.

When the database server is more powerful than the PowerCenter server. For example, pushing a complex
Expression transformation to the source or target improves performance when the database server can perform
the expression faster than the server on which the PowerCenter Integration Service runs.

When the generated query can take advantage of prebuilt indexes. For example, pushing a Joiner
transformation to the Teradata source improves performance when the database can join tables using indexes
and statistics that PowerCenter cannot access.

10

Pushdown optimization is available with the PowerCenter Pushdown Optimization Option and has been supported
since PowerCenter 8.0. To configure a session to use pushdown optimization, choose a Pushdown Optimization type
in the session properties. You can select one of the following pushdown optimization types:
y

None. The Integration Service does not push any transformation logic to the database.

Source-side. The Integration Service analyzes the mapping from the source to the target or until it reaches a
downstream transformation it cannot push to the database. It pushes as much transformation logic as possible to
the source database.
The Integration Service generates SQL in the following form:
SELECT FROM source WHERE (filter/join condition) GROUP BY

Target-side. The Integration Service analyzes the mapping from the target back to the source or until it reaches
an upstream transformation it cannot push to the database. It pushes as much transformation logic as possible to
the target database.
The Integration Service generates SQL in the following form:
INSERT INTO target() VALUES (?+1, UPPER(?))

Full. The Integration Service attempts to push all transformation logic to the target database. If the Integration
Service cannot push all transformation logic to the database, it performs both source-side and target-side
pushdown optimization.
The Integration Service generates SQL in the following form:
INSERT INTO target()SELECT FROM source

$$PushdownConfig. Allows you to run the same session with different pushdown optimization configurations at
different times.

The Integration Service can push the logic for the following transformations to Teradata:
Transformation

Pushdown Types

Aggregator

Source-side, Full

Expression*

Source-side, Target-side, Full

Filter

Source-side, Full

Joiner

Source-side, Full

Lookup, connected

Source-side, Full

Lookup, unconnected

Source-side, Target-side, Full

Router

Source-side, Full

Sorter

Source-side, Full

Source Qualifier

Source-side, Full

Target

Target-side, Full

Union

Source-side, Full

Update Strategy

Full

* PowerCenter expressions can be pushed down only if there is an equivalent database


function. To work around this issue, you can enter an SQL override in the source qualifier.

When you use pushdown optimization with sessions that extract from or load to Teradata, you might need to modify
mappings or sessions to take full advantage of the performance improvements possible with pushdown optimization.
You might also encounter issues if a pushdown session fails.

11

For example, you might need to perform the following tasks:


y

Achieve full pushdown optimization without affecting the source. To achieve full pushdown optimization for a
session in which the source and target reside in different database management systems, you can stage the
source data in the Teradata target database. For more information, see Achieving Full Pushdown without
Affecting the Source System on page 12.

Achieve full pushdown optimization with parallel lookups. To achieve full pushdown optimization for a
mapping that contains parallel lookups, redesign the mapping to serialize the lookups. For more information, see
Achieving Full Pushdown with Parallel Lookups on page 13.

Achieve pushdown optimization with sorted aggregation. To achieve pushdown optimization for a mapping
that contains a Sorter transformation before an Aggregator transformation, redesign the mapping to remove the
Sorter transformation. For more information, see Achieving Pushdown with Sorted Aggregation on page 14.

Achieve pushdown optimization for an Aggregator transformation with pass-through ports. To achieve
pushdown optimization for a mapping that contains an Aggregator transformation with pass-through ports,
redesign the mapping to remove the pass-through ports from the Aggregator transformation. For more
information, see Achieving Pushdown for an Aggregator Transformation on page 14.

Achieve pushdown optimization when a transformation contains a variable port. To achieve pushdown
optimization for a mapping that contains a transformation with a variable port, update the expression to eliminate
the variable port. For more information, see Achieving Pushdown when a Transformation Contains a Variable
Port on page 14.

Improve pushdown performance in mappings with multiple targets. To increase performance when using full
pushdown optimization for mappings with multiple targets, you can stage the target data in the Teradata
database. For more information, see Improving Pushdown Performance in Mappings with Multiple Targets on
page 14.

Remove temporary views after a session that uses an SQL query fails. If you run a pushdown session that
uses an SQL query, and the session fails, the Integration Service might not drop the views it creates in the source
database. You can remove the views manually. For more information, see Removing Temporary Views when a
Pushdown Session Fails on page 15.

For more information about pushdown optimization, see the PowerCenter Advanced Workflow Guide and the
PowerCenter Performance Tuning Guide.

Achieving Full Pushdown without Affecting the Source System


You can stage source data in the Teradata target database to achieve full pushdown optimization. Stage source data
in the target when the mapping contains a source that does not reside in the same database management system as
the Teradata target.
For example, the following mapping contains an OLTP source and a Teradata target:

Since the source and target tables reside in different database management systems, you cannot configure the
session for full pushdown optimization as it is. You could configure the session for source-side pushdown optimization,
which would push the Filter and Lookup transformation logic to the source. However, pushing transformation logic to a
transactional source might reduce performance of the source database.
To avoid the performance problems caused by pushing transformation logic to the source, you can reconfigure the
mapping to stage the source data in the target database.

12

To achieve full pushdown optimization, redesign the mapping as follows:


1.

Create a simple, pass-through mapping to pass all source data to a staging table in the Teradata target database:

Configure the session to use Teradata PT or a standalone load utility to load the data to the staging table. Do not
configure the session to use pushdown optimization.
2.

Configure the original mapping to read from the staging table:

Configure the session to use full pushdown optimization. The Integration Service pushes all transformation logic to
the Teradata database, increasing session performance.

Achieving Full Pushdown with Parallel Lookups


The PowerCenter Integration Service cannot push down mapping logic that contains parallel Lookup transformations.
The Integration Service processes all transformations after a pipeline branch when multiple Lookup transformations are
present in different branches of pipeline, and the branches merge downstream.
For example, the Integration Service cannot fully push down the following mapping:

To achieve full pushdown optimization, redesign the mapping so that the lookups are serialized as follows:

When you serialize the Lookup transformations, the Integration Service generates an SQL query in which the lookups
become part of a subquery. The Integration Service can then push the entire query to the source database.

13

Achieving Pushdown with Sorted Aggregation


The Integration Service cannot push an Aggregator transformation to Teradata if it is downstream from a Sorter
transformation. The Integration Service processes the Aggregator transformation.
For example, the Integration Service cannot push down the Aggregator transformation in the following mapping:

To redesign this mapping to achieve full or source-side pushdown optimization, configure the Aggregator
transformation so that it does not use sorted input, and remove the Sorter transformation. For example:

Achieving Pushdown for an Aggregator Transformation


The Integration Service cannot push an Aggregator transformation to Teradata if the Aggregator transformation
contains pass-through ports. To achieve source-side or full pushdown optimization for a mapping that contains an
Aggregator transformation with pass-through ports, redesign the mapping to remove the pass-through ports from the
Aggregator transformation.

Achieving Pushdown when a Transformation Contains a Variable Port


The Integration Service cannot push down transformation logic when the transformation contains a variable port. To
achieve pushdown optimization for a mapping that contains a transformation with a variable port, update the
transformation expression to eliminate the variable port. For example, a transformation contains a variable and an
output port with the following expressions:
y

Variable port expression: NET_AMOUNT = AMOUNT - FEE

Output port expression: DOLLAR_AMT = NET_AMOUNT * RATE

To achieve pushdown optimization for the mapping, remove the variable port and reconfigure the output port as
follows:
y

Output port expression: DOLLAR_AMT = (AMOUNT - FEE) * RATE

Improving Pushdown Performance in Mappings with Multiple Targets


If you configure a mapping that contains complex transformation logic and multiple targets for full pushdown
optimization, the Integration Service generates one INSERT SELECT SQL query for each target. This makes
pushdown optimization inefficient because it can cause duplicate processing of complex transformation logic within the
database. To improve session performance, redesign the original mapping to stage the target data in the Teradata
database. Then create a second mapping that uses the staging table as the source.

14

For example, the following mapping contains two Teradata sources and two Teradata targets, all in the same RDBMS:

To achieve full pushdown optimization, redesign the mapping as follows:


1.

Configure the original mapping to write to a staging table in the Teradata target database:

Configure the session to use full pushdown optimization.


2.

Create a second mapping to pass all target data from the staging table to the Teradata targets:

Configure the session to use full pushdown optimization.

Removing Temporary Views when a Pushdown Session Fails


In a mapping, the Source Qualifier transformation provides the SQL Query option to override the default query. You
can enter an SQL statement supported by the source database. When you override the default SQL query for a
session configured for pushdown optimization, the Integration Service creates a view to represent the SQL override. It
then runs an SQL query against this view to push the transformation logic to the database.
To use an SQL override in a session configured for pushdown optimization, enable the Allow Temporary View for
Pushdown option in the session properties. This option allows the Integration Service to create temporary view objects
in the database when it pushes the session to the database. The Integration Service uses a prefix of PM_V for the view
objects it creates. When the session completes, the Integration Service drops the view from the database. If the
session does not complete successfully, the Integration Service might not drop the view.
To search for views generated by the Integration Service, run the following query against the Teradata source
database:
SELECT TableName FROM DBC.Tables
WHERE CreatorName = USER

15

AND TableKind ='V'


AND TableName LIKE 'PM\_V%' ESCAPE '\'

To avoid problems when you run a pushdown session that contains an SQL override, use the following guidelines:
y

Ensure that the SQL override syntax is compatible with the Teradata source database. PowerCenter does not
validate the syntax, so test the query before you push it to the database.

Do not use an order by clause in the SQL override.

Use ANSI outer join syntax in the SQL override. If the Source Qualifier transformation contains Informatica outer
join syntax in the SQL override, the Integration Service processes the Source Qualifier transformation logic.

If the Source Qualifier transformation is configured for a distinct sort and contains an SQL override, the Integration
Service ignores the distinct sort configuration.

If the Source Qualifier contains multiple partitions, specify the SQL override for all partitions.

Do not use a Sequence Generator transformation in the mapping. Teradata does not have a sequence generator
function or operator.

Issues Affecting Loading to and Unloading from Teradata


This section describes issues you might encounter when you move data between PowerCenter and Teradata.

Making 32-bit Load and Unload Utilities Work with 64-bit PowerCenter
Applies to: FastLoad, MultiLoad, TPump, FastExport
If you use 64-bit PowerCenter, you need to reset the library path to make PowerCenter work with the32-bit Teradata
load and unload utilities. You must reset the library path before you can run a session that invokes a load or unload
utility.
To reset the library path, you need to replace the loader or FastExport executable with a shell script. The following
procedure explains how to reset the library path for TPump on AIX. You can use the same method to reset the library
path for the other utilities on Linux or other UNIX operating systems.
To reset the library path:
1.

Create a shell script like the following called <executable>_infa, for example, tpump_infa:
#!/bin/sh
LIBPATH=/usr/lib;export LIBPATH
COPLIB=/usr/lib;export COPLIB
COPERR=/usr/lib;export COPERR
PATH=$PATH:$INFA_HOME/server/infa_shared/TgtFiles
exec tpump "$@"
exit $?

2.

In the loader connection in the Workflow Manager, set the External Loader Executable attribute (for a load utility)
or the Executable Name attribute (for FastExport) to the name of the shell script. So for Tpump, change the
External Loader Executable from tpump to tpump_infa.

Increasing Lookup Performance


Applies to: Teradata relational connections, FastExport
Sessions that perform lookups on Teradata tables must use Teradata relational connections. If you experience
performance problems when running a session that performs lookups against a Teradata database, you might be able
to increase performance in the following ways:
y

Use FastExport to extract data to a flat file and perform the lookup on the flat file.

Enable or disable the Lookup Cache.

16

Using FastExport to Extract Lookup Data


If a session performs a lookup on a large, static Teradata table, you might be able to increase performance by using
FastExport to extract the data to a flat file and configuring the session to look up data in the flat file.
To do this, redesign the mapping as follows:
1.

Create a simple, pass-through mapping to pass the lookup data to a flat file. Configure the session to extract data
to the flat file using FastExport.

2.

Configure the original mapping to perform the lookup on the flat file.

Note: If you redesign the mapping using this procedure, you can further increase performance by specifying an
ORDER BY clause on the FastExport SQL and enabling the Sorted Input property for the lookup file. This prevents
PowerCenter from having to sort the file before populating the lookup cache.

Enabling or Disabling the Lookup Cache


In sessions that perform lookups on Teradata tables, you might also be able to increase performance by enabling or
disabling the lookup cache. When you enable lookup caching, the Integration Service queries the lookup source once,
caches the values, and looks up values in the cache during the session. The lookup uses ODBC to populate the cache.
When you disable lookup caching, each time a row passes into the transformation, the Integration Service issues a
select statement to the lookup source for lookup values.
Enabling the lookup cache has the following advantages:
y

The Integration Service can search the cache very quickly.

Caches can be kept completely in memory.

Using a lookup cache prevents the Integration Service from making many separate calls to the database server.

The result of the Lookup query and processing is the same, whether or not you cache the lookup table. However, using
a lookup cache can increase session performance for relatively static data in smaller lookup tables. Generally, it is
better to cache lookup tables that need less than 300 MB.
For data that changes frequently or is stored in larger lookup tables, disabling caching can improve overall throughput.
Do not cache the lookup tables in the following circumstances:
y

The lookup tables are so large that they cannot be stored on the local system.

There are not enough inodes or blocks to save the cache files.

You are not allowed to save cache files on the Informatica system.

The amount of time needed to build the cache exceeds the amount of time saved by caching.

To enable or disable the lookup cache, enable or disable the Lookup Caching Enabled option in the Lookup
transformation properties. For more information about the lookup cache, see the PowerCenter Transformation Guide
and the PowerCenter Performance Tuning Guide.

Performing Uncached Lookups with Date/Time Ports in the Lookup Condition


Applies to: Teradata relational connections
When the Integration Service performs an uncached lookup on a Teradata database, the session fails if any
transformation port in the lookup condition contains a date/time port. The Integration Service writes the following
Teradata error message to the session log:
[][ODBC Teradata Driver][Teradata RDBMS] Invalid operation on an ANSI Datetime or
Interval value.

17

To work around this issue, perform either of the following actions:


y

Apply the Teradata ODBC patch 3.2.011 or later and remove NoScan=Yes from the odbc.ini file.

Configure the Lookup transformation to use a lookup cache or remove the Date/Time port from the lookup
condition.

Restarting a Failed MultiLoad Job Manually


Applies to: MultiLoad
When loading data, MultiLoad puts the target table into the MultiLoad state and creates a log table for the target
table. After successfully loading the data, it returns the target table to the normal (non-MultiLoad) state and deletes the
log table. When you load data using MultiLoad, and the MultiLoad job fails for any reason, MultiLoad reports an error,
and leaves the target table in the Multi-Load state. Additionally, MultiLoad queries the log table to check for errors. If a
target table is in the MultiLoad state or if a log table exists for the target table, you cannot restart the job.
To recover from a failed MultiLoad job, you must release the target table from the MultiLoad state and drop the
MultiLoad log table. To do this, enter the following commands using BTEQ or Teradata SQL Assistant:
drop table ML_<table name>;
release mload <table name>;

Note that PowerCenter adds the ML_ prefix to the MultiLoad log table name. If you use a hand-coded MultiLoad
control file, the log table can have any name.
For example, to recover from a failed job that attempted to load data to table td_test owned by user infatest, enter
the following commands using BTEQ:
BTEQ -- Enter your DBC/SQL request or BTEQ command:
drop table infatest.mldlog_td_test;
drop table infatest.mldlog_td_test;
*** Table has been dropped.
*** Total elapsed time was 1 second.
BTEQ -- Enter your DBC/SQL request or BTEQ command:
release mload infatest.td_test;
release mload infatest.td_test;
*** Mload has been released.
*** Total elapsed time was 1 second.

Configuring Sessions that Load to the Same Table


Applies to: MultiLoad
While Teradata MultiLoad loads data to a database table, it locks the table. MultiLoad requires that all instances handle
wait events so they do not try to access the same table simultaneously.
If you have multiple PowerCenter sessions that load to the same Teradata table using MultiLoad, set the Tenacity
attribute for the session to a value that is greater than the expected run time of the session. The Tenacity attribute
controls the amount of time a MultiLoad instance waits for the table to become available. Also configure each session
to use unique log file names.
For more information about the Tenacity attribute, see the PowerCenter Advanced Workflow Guide.

18

Setting the Checkpoint when Loading to Named Pipes


Applies to: FastLoad, MultiLoad, TPump
If you configure a session to load to Teradata using a named pipe, set the checkpoint loader attribute to 0 to prevent
the loader from performing checkpoint operations. Teradata loaders use checkpoint values to recover or restart a failed
loader job. When a loader job that uses a staging file fails, you can restart it from the last checkpoint. When the loader
uses a named pipe, checkpoints are not used.
Setting the checkpoint attribute to 0 increases loader performance, since the loader job does not have to keep track of
checkpoints. It also prevents the broken pipe errors and session failures that can occur when a nonzero checkpoint is
used with a named pipe.

Loading from Partitioned Sessions


Applies to: FastLoad, MultiLoad
When you configure multiple partitions in a session that uses staging files, the Integration Service creates a separate
flat file for each partition. Since FastLoad and MultiLoad cannot load data from multiple files, use round-robin
partitioning to route the data to a single file. When you do this, the Integration Service writes all data to the first partition
and starts only one instance of FastLoad or MultiLoad. It writes the following message in the session log:
MAPPING> DBG_21684 Target [TD_INVENTORY] does not support multiple partitions.
All data will be routed to the first partition.

If you do not route the data to a single file, the session fails with the following error:
WRITER_1_*_1> WRT_8240 Error: The external loader [Teradata Mload Loader] does
not support partitioned sessions.
WRITER_1_*_1> Thu Jun 16 11:58:21 2005
WRITER_1_*_1> WRT_8068 Writer initialization failed. Writer terminating.

For more information about loading from partitioned sessions, see the PowerCenter Advanced Workflow Guide.

Loading to Targets with Date/Time Columns


Applies to: FastLoad, MultiLoad, TPump, Teradata PT
The target date format determines the format in which dates can be loaded into the column. PowerCenter only
supports a limited set of Teradata date formats. Therefore, you must check the target date format to avoid problems
loading date/time data.
When you create a date/time column in a Teradata database table, you specify the display format for the date/time
values. The format you choose determines the format in which date/time values are displayed by Teradata client tools
as well as the format in which date/time values can be loaded into the column. For example a column in a Teradata
table has the date format yyyy/mm/dd. If you run a PowerCenter session that loads a date with the format
mm/dd/yyyy into the column, the session fails.
Before running a session that loads date/time values to Teradata, you verify that the format of each date/time column
in the mapping matches the format of the corresponding date/time column in the Teradata target. If the session loads
values into multiple date/time columns, check the format of each date/time column in the target because different
tables often use different date/time formats. You can use Teradata BTEQ or SQL Assistant to check the format for a
date/time column in a Teradata database.
If any column in the Teradata target uses the yyyyddd date format (4-digit year followed by the 3-digit day), you must
either redefine the date format in the Teradata table or convert the date to a character string in PowerCenter.
Redefining the date format in the Teradata table does not change the way Teradata stores the date internally.

19

To convert a Teradata yyyyddd date column to a character column in PowerCenter:


1.

Edit the target table definition in PowerCenter and change the date column data type from date to char(7).

2.

Create an Expression transformation with the following expression to convert the date into a string with the format
yyyyddd:
to_char(date_port,yyyy) || to_char(date_port,ddd)

Note: The expression to_char(date_port, yyyyddd) does not work.


3.

Link the output port in the Expression transformation to the char(7) column in the target definition.

Hiding Passwords
Applies to: FastExport, FastLoad, MultiLoad, TPump, Teradata PT
When you create a loader or application (FastExport) connection object, you enter the database user name and
password in the connection properties. The Integration Service writes the password in the control file in plain text and
the Teradata loader does not encrypt the password. To prevent the password from appearing in the control file, enter
PMNullPasswd as the password. When you do this, the Integration Service writes an empty string for the password in
the control file.
If you do not want to use PMNullPasswd, perform either of following actions:
y

Lock the control file directory.

For load utilities, configure PowerCenter to write the control file to a different directory, and then secure that
directory.

By default, the PowerCenter Integration Service writes the loader control file to the target file directory and the
FastExport control file to the temp file directory. To write the loader control file to a different directory, set the
LoaderControlFileDirectory custom property to the new directory for the Integration Service or session. For more
information about setting custom properties for the Integration Service, see the PowerCenter Administrator Guide. For
more information about setting custom properties for the session, see the PowerCenter Workflow Basics Guide.
Finally, MultiLoad and TPump support the RUN FILE command. This command directs control from the current control
file to the control file specified in the login script. Place the login statements in a file in a secure location, and then add
the RUN FILE command to the generated control file to call it. Run chmod -w on the control file to prevent
PowerCenter from overwriting it.
For example, create a login script as follows (in the file login.ctl in a secure directory path):
.LOGON demo1099/infatest,infatest;

Modify the generated control file and replace the login statement with the following command:
.RUN FILE <secure_directory_path>/login.ctl;

Using Error Tables to Identify Problems during Loading


Applies to: FastLoad, MultiLoad, TPump
When problems occur during loading data, the Teradata standalone load utilities generate error tables. (FastExport
generates an error log file.) The load utilities generate different errors during the different phases of loading data.
FastLoad jobs run in two main phases: loading and end loading. During the loading phase, FastLoad initiates the job,
locks the target table, and loads the data. During the end loading phase, the Teradata database distributes the rows of
data to the target table and unlocks it. FastLoad requires an exclusive lock on the target table during the loading
phase.
MultiLoad also loads data during two main phases: acquisition and application. In the acquisition phase, MultiLoad
reads the input data and writes it to a temporary work table. In the application phase, MultiLoad writes the data from
the work table to the actual target table. MultiLoad requires an exclusive lock on the target table during the application
phase.

20

Tpump loads data in a single phase. It converts the SQL in the control file into a database macro and applies the
macro to the input data. TPump uses standard SQL and standard table locking.
The following table lists the error tables you can check to troubleshoot load or unload utility errors:
Utility

Data Loading Phase

Default Error Table Name

Error Types

FastLoad

Loading

ET_<target_table_name>

Constraint violations, conversion errors,


unavailable AMP conditions

End loading

UV_<target_table_name>

Unique primary index violations

Acquisition

ET_<target_table_name>

All acquisition phase errors, application phase


errors if the Teradata database cannot build a
valid primary index

Application

UV_<target_table_name>

Uniqueness violations, field overflow on


columns other than primary index fields,
constraint errors

n/a (single phase)

ET_<target_table_name>
<partition_number>

All TPump errors

MultiLoad

TPump

When a load fails, check the ET_ error table first for specific information. The ErrorField or ErrorFieldName column
indicates the column in the target table that could not be loaded. The ErrorCode field provides details that explain why
the column failed. For MultiLoad and TPump, the most common ErrorCodes are:
y

2689: Trying to load a null value into a non-null field

2665: Invalid date format

In the MultiLoad UV_ error table, you can also check the DBCErrorField column and DBCErrorCode field. The
DBCErrorField column is not initialized in the case of primary key uniqueness violations. The DBCErrorCode that
corresponds to a primary key uniqueness violation is 2794.
For more information about Teradata error codes, see the Teradata documentation.

Authors
Lori Troy
Senior Technical Writer, Informatica Corporation
Chai Pydimukkala
Senior Product Manager, Informatica Corporation

Acknowledgements
The authors would like to thank Guy Boo, Ashlee Brinan, Eugene Ding, Stan Dorcey, Anudeep Sharma, Lalitha
Sundaramurthy, Raymond To, Rama Krishna Tumrukoti, Sonali Verma, and Rajeeva Lochan Yellanki at Informatica for
their assistance with this article. Additionally, the authors would like to thank Edgar Bartolome, Steven Greenberg,
John Hennessey, and Michael Klassen at Teradata and Stephen Knilans and Michael Taylor at LoganBritton for their
technical assistance.

21

Você também pode gostar