Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract
This article explains how to move data between PowerCenter and Teradata databases. It explains when to use Teradata relational connections, Teradata load and unload utilities, or pushdown optimization to move data. This article also lists issues you might encounter when loading data to or unloading data from Teradata and the workarounds for these issues.
Table of Contents
Overview ........................................................................................................................................................................... 3 Prerequisites................................................................................................................................................................. 4 Teradata Relational Connections...................................................................................................................................... 5 Creating a Teradata Relational Connection ................................................................................................................. 6 Standalone Load and Unload Utilities............................................................................................................................... 6 Teradata FastLoad ....................................................................................................................................................... 7 Teradata MultiLoad....................................................................................................................................................... 7 Teradata TPump........................................................................................................................................................... 7 Teradata FastExport..................................................................................................................................................... 8 Teradata Parallel Transporter ........................................................................................................................................... 8 Pushdown Optimization .................................................................................................................................................... 9 Achieving Full Pushdown without Affecting the Source System................................................................................. 12 Achieving Full Pushdown with Parallel Lookups ........................................................................................................ 13 Achieving Pushdown with Sorted Aggregation........................................................................................................... 14 Achieving Pushdown for an Aggregator Transformation ............................................................................................ 14 Achieving Pushdown when a Transformation Contains a Variable Port .................................................................... 14 Improving Pushdown Performance in Mappings with Multiple Targets ...................................................................... 14 Removing Temporary Views when a Pushdown Session Fails.................................................................................. 15 Issues Affecting Loading to and Unloading from Teradata ............................................................................................. 16 Making 32-bit Load and Unload Utilities Work with 64-bit PowerCenter .................................................................... 16 Increasing Lookup Performance................................................................................................................................. 16 Performing Uncached Lookups with Date/Time Ports in the Lookup Condition ......................................................... 17 Restarting a Failed MultiLoad Job Manually............................................................................................................... 18 Configuring Sessions that Load to the Same Table ................................................................................................... 18 Setting the Checkpoint when Loading to Named Pipes ............................................................................................. 19 Loading from Partitioned Sessions............................................................................................................................. 19 Loading to Targets with Date/Time Columns ............................................................................................................. 19 Hiding Passwords....................................................................................................................................................... 20 Using Error Tables to Identify Problems during Loading ............................................................................................ 20
Overview
Teradata is a global technology leader in enterprise data warehousing, business analytics, and data warehousing services. Teradata provides a powerful suite of software that includes the Teradata Database, data access and management tools, and data mining applications. PowerCenter works with the Teradata Database and Teradata tools to provide a data integration solution that allows you to integrate data from virtually any business system into Teradata as well as leverage Teradata data for use in other business systems. PowerCenter uses the following techniques when extracting data from and loading data to the Teradata database: y ETL (extract, transform, and load). This technique extracts data from the source systems, transforms the data within PowerCenter, and loads it to target tables. The PowerCenter Integration Service transforms all data. If you use the PowerCenter Partitioning option, the Integration Service also parallelizes the workload. ELT (extract, load, and then transform). This technique extracts data from the source systems, loads it to userdefined staging tables in the target database, and transforms the data within the target system using generated SQL. The SQL queries include a final insert into the target tables. The database system transforms all data and parallelizes the workload, if necessary. ETL-T (ETL and ELT hybrid). This technique extracts data from the source systems, transforms the data within PowerCenter, loads the data to user-defined staging tables in the target database, and further transforms the data within the target system using generated SQL. The SQL queries include a final insert into the target tables. The ELT-T technique is optimized within PowerCenter so that the transformations that better perform within the database system can be performed there and the Integration Service performs the other transformations.
To perform ETL operations, configure PowerCenter sessions to use a Teradata relational connection, a Teradata standalone load or unload utility, or Teradata Parallel Transporter. To use ELT or ETL-T techniques, configure PowerCenter sessions to use pushdown optimization. Use a Teradata relational connection to communicate with Teradata when PowerCenter sessions load or extract small amounts of data (<1 GB per session). Teradata relational connections use ODBC to connect to Teradata. ODBC is a native interface for Teradata. Teradata provides 32- and 64-bit ODBC drivers for Windows and UNIX platforms. The driver bit mode must be compatible with the bit mode of the platform on which the PowerCenter Integration Services runs. For example, 32-bit PowerCenter only runs with 32-bit drivers. Use a standalone load or unload utility when PowerCenter sessions extract or load large amounts of data (>1 GB per session). Standalone load and unload utilities can increase session performance by loading or extracting data directly from a file or pipe rather than running the SQL commands to load or extract the same data. All Teradata standalone load and unload utilities are fully parallel to provide optimal and scalable performance for loading data to or extracting data from the Teradata Database. PowerCenter works with the Teradata FastLoad, MultiLoad, and TPump load utilities and the Teradata FastExport unload utility. Use Teradata Parallel Transporter for PowerCenter sessions that must quickly load or extract large amounts of data (>1 GB per session). Teradata Parallel Transporter provides all of the capabilities of the standalone load and unload utilities, plus it provides more granular control over the load or unload process, enhanced monitoring capabilities, and the ability to automatically drop log, error, and work tables when a session starts. Teradata Parallel Transporter is a parallel, multi-function extract and load environment that provides access to PowerCenter using an open API. It can load dozens of files using a single control file. It also allows you to distribute the workload among several CPUs, eliminating bottlenecks in the data loading and extraction processes. Use pushdown optimization to reduce the amount of data passed between Teradata and PowerCenter or when the Teradata database can process transformation logic faster than PowerCenter. Pushdown optimization improves session performance by pushing as much transformation logic as possible to the Teradata source or target database. PowerCenter processes any transformation logic that cannot be pushed to the database. For example, pushing Filter transformation logic to the source database can reduce the amount of data passed to PowerCenter, which decreases session run time. When you run a session configured for pushdown optimization, PowerCenter translates the
transformation logic into SQL queries and sends the queries to the Teradata database. The Teradata database executes the SQL queries to process the transformation logic.
Prerequisites
Before you run sessions that move data between PowerCenter and Teradata, you might want to install Teradata client tools. You also need to locate the Teradata TDPID.
Install BTEQ or Teradata SQL Assistant to help you debug problems that occur when loading to and extracting from Teradata. Both tools are included in the Teradata Utility Pack, which is available from Teradata.
TDPID
The Teradata TPDID indicates the name of the Teradata instance and defines the name a client uses to connect to a server. When you use a Teradata Parallel Transporter or a standalone load or unload utility with PowerCenter, you must specify the TDPID in the connection properties. The Teradata TDPID appears in the hosts file on the machines on which the Integration Service and PowerCenter Client run. By default, the hosts file appears in the following location: y y UNIX: /etc/hosts Windows: %SystemRoot%\system32\drivers\etc\hosts* * The actual location is defined in the Registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\DataBasePath The hosts file contains client configuration information for Teradata. In a hosts file entry, the TDPID precedes the string cop1. For example, the hosts file contains the following entries:
127.0.0.1 localhost 192.168.80.113 td_1 192.168.80.114 td_2 192.168.80.115 td_3 192.168.80.116 td_4 demo1099cop1 custcop1 custcop2 custcop3 custcop4
The first entry has the TDPID demo1099. This entry tells the Teradata database that when a client tool references the Teradata instance demo1099, it should direct requests to localhost (IP address 127.0.0.1). The following entries have the same TDPID, cust. Multiple hosts file entries with the same TDPID indicate the Teradata instance is configured for load balancing among nodes. When a client tool attempts to reference Teradata instance cust, the Teradata database directs requests to the first node in the entry list, td_1. If it takes too long for the node to respond, the database redirects the request to the second node, and so on. This process prevents the first node, td_1 from becoming overloaded.
For more information about the TTU versions that work with PowerCenter, see the TTU Supported Platforms and Product Versions document, which is available from Teradata @Your Service. Sessions that perform lookups on Teradata tables must use a Teradata relational connection. If a session performs a lookup on a large, static Teradata table, you might be able to increase performance by using FastExport to extract the data to a flat file and configuring the session to look up data in the flat file. If you experience performance problems when using a Teradata relational connection, and you do not want to use a load or unload utility, you might be able to configure PowerCenter sessions to use pushdown optimization. If you load or extract data using a Teradata relational connection on UNIX, you must verify the configuration of environment variables and the odbc.ini file on the machine on which the Integration Service runs. To verify the environment variable configuration, ensure the Teradata ODBC path precedes the Data Direct driver path information in the PATH and shared library path environment variables. Place the Teradata path before the Data Direct path because both sets of ODBC software use some of the same file names. To verify the odbc.ini file configuration, make sure there is an entry for the Teradata ODBC driver in the [ODBC Data Sources] section of odbc.ini. The following excerpt from an odbc.ini file shows a Teradata ODBC driver (tdata.so) entry on Linux:
[ODBC Data Sources] intdv12=tdata.so [intdv12] Driver=/usr/odbc/drivers/tdata.so Description=NCR 3600 running Teradata V12 DBCName=intdv12 SessionMode=Teradata CharacterSet=UTF8 StCheckLevel=0 DateTimeFormat=AAA LastUser= Username= Password= Database= DefaultDatabase=
For more information about configuring odbc.ini, see the PowerCenter Configuration Guide and the ODBC Driver for Teradata User Guide.
All of these load and unload utilities are included in the Teradata Tools and Utilities (TTU), available from Teradata. PowerCenter supports all of these standalone load and unload utilities. Support for MultiLoad and TPump has been available since PowerCenter 6.0. Support for FastLoad was added in PowerCenter 7.0. Support for FastExport was added in PowerCenter 7.1.3. Before you can configure a session to use a load or unload utility, create a loader or FastExport (application) connection in the PowerCenter Workflow Manager and enter a value for the TDPID in the connection attributes. For more information about creating connection objects in PowerCenter, see the PowerCenter Workflow Basics Guide. To use a load utility in a session, configure the associated mapping to load to a Teradata target, configure the session to write to a flat file instead of a relational database, and select the loader connection for the session. To use FastExport in a session, configure the mapping to extract from a Teradata source, configure the session to read from FastExport instead of a relational database, and select the FastExport connection for the session. For more information about configuring a session to use a load or unload utility, see the PowerCenter Advanced Workflow Guide. When a session transfers data between Teradata and PowerCenter, the following files are created: y A staging file or pipe. PowerCenter creates a staging file or named pipe for data transfer based on how you configure the connection. Named pipes are generally faster than staging files because data is transferred as soon as it appears in the pipe. If you use a staging file, data is not transferred until all data appears in the file. A control file. PowerCenter generates a control file that contains instructions for loading or extracting data. PowerCenter creates the control file based on the loader or FastExport attributes you configure for the connection and the session. A log file. The load or unload utility creates a log file and writes error messages to it. The PowerCenter session log indicates whether the session ran successfully, but does not contain load or unload utility error messages. Use the log file to debug problems that occur during data loading or extraction.
By default, loader staging, control, and log files are created in the target file directory. The FastExport staging, control, and log files are created in the PowerCenter temporary files directory. For more information about these files, see the PowerCenter Advanced Workflow Guide.
Teradata FastLoad
Teradata FastLoad is a command-line utility that quickly loads large amounts of data to empty tables in a Teradata database. Use FastLoad for a high-volume initial load or for high-volume truncate and reload operations. FastLoad is the fastest load utility, but it has the following limitations: y y y y FastLoad uses multiple sessions to load data, but it can load data to only one table in a Teradata database per job. It locks tables while loading data, preventing others and other instances of FastLoad from accessing the tables during data loading. FastLoad only works with empty tables with no secondary indexes. It can only insert data.
Teradata MultiLoad
Teradata MultiLoad is a command-driven utility for fast, high-volume maintenance on multiple tables and views of a Teradata database. Each MultiLoad instance can perform multiple data insert, update, and delete operations on up to five different tables or views. MultiLoad optimizes operations that rapidly acquire, process, and apply data to Teradata tables. Use MultiLoad for large volume, incremental data loads. MultiLoad has the following advantages: y y y y MultiLoad is very fast. It can process millions of rows in a few minutes. MultiLoad supports inserts, updates, upserts, deletes, and data-driven operations in PowerCenter. You can use variables and embed conditional logic into MultiLoad control files. MultiLoad supports sophisticated error recovery. It allows load jobs to be restarted without having to redo all of the prior work. MultiLoad is designed for the highest possible throughput, so it can be very resource intensive. It locks tables while loading data, preventing others and other instances of MultiLoad from accessing the tables during data loading. Because of its phased nature, there are potentially inconvenient windows of time when MultiLoad cannot be stopped without losing access to target tables.
Teradata TPump
Teradata TPump is a highly parallel utility that can continuously move data from data sources into Teradata tables without locking the affected table. TPump supports inserts, updates, deletes, and data-driver updates. TPump acquires row hash locks on a database table instead of table-level locks, so multiple TPump instances can load data simultaneously to the same table. TPump is often used to trickle-load a database table. Use TPump for low volume, online data loads. TPump has the following advantages: y y TPump can refresh database tables in near real-time. TPump continuously loads data into Teradata tables without locking the affected tables, so users can run queries when TPump is running.
y y y y
TPump is less resource-intensive than MultiLoad because it does not write to temporary tables. Users can control the rate at which statements are sent to the Teradata database, limiting resource consumption. It supports parallel processing. TPump can always be stopped and all of its locks dropped with no ill effect.
TPump is not as fast as the other standalone loaders for large volume loads because it changes the same data block multiple times.
Teradata FastExport
Teradata FastExport is a command-driven utility that uses multiple sessions to quickly transfer large amounts of data from Teradata sources to PowerCenter. Use FastExport to quickly extract data from Teradata sources. FastExport has the following advantages: y y y It is faster than Teradata relational connections when extracting large amounts of data. FastExport can be run in streaming mode, which avoids the need to stage the data file. You can encrypt the data transfer between FastExport and the Teradata server.
FastExport is available for sources and pipeline lookups. When you create a FastExport connection, verify the settings of the following connection attributes: y y Data encryption. Enable this attribute to encrypt the data transfer between FastExport and the Teradata server so that unauthorized users cannot access the data being transferred across the network. Fractional seconds. This attribute specifies the precision of the decimal portion of timestamp data. To avoid session failure or possible data corruption, make sure this value matches the timestamp precision of the column in the Teradata database.
For more information about configuring FastExport connection attributes, see the PowerCenter Advanced Workflow Guide.
y y y
Teradata PT supports recovery for sessions that use the Stream operator when the source data is repeatable. This feature is especially useful when running real-time sessions and streaming the changes to Teradata. Users can invoke Teradata PT through a set of open APIs that communicate with the database directly, eliminating the need for a staging file or pipe and a control file. Teradata PT eliminates the need to invoke different load and unload utilities to extract and load data.
PowerCenter communicates with Teradata PT using PowerExchange for Teradata Parallel Transporter, which is available through the Informatica-Teradata Enterprise Data Warehousing Solution. PowerExchange for Teradata Parallel Transporter was released with PowerCenter 8.1.1. PowerExchange for Teradata Parallel Transporter provides integration between PowerCenter and Teradata databases for data extraction and loading. PowerExchange for Teradata Parallel Transporter executes Teradata PT operators directly through API calls. This improves performance by eliminating the staging file or named pipe. It also improves security by eliminating the control file, so there is no need to overwrite or store passwords in the control file. PowerExchange for Teradata Parallel Transporter supports session and workflow recovery. It also captures Teradata PT error messages and displays them in the session log, so you do not need to check the utility log file when errors occur. Before you can configure a session to use Teradata PT, you must you must create a Teradata PT (relational) connection in the Workflow Manager and enter a value for the TDPID in the connection attributes. To configure a session to extract data, configure the associated mapping to read from Teradata, change the reader type for the session to Teradata Parallel Transporter Reader, and select the Teradata PT connection. To configure a session to load data, configure the associated mapping to load to Teradata, change the writer type for the session to Teradata Parallel Transporter Writer, and select the Teradata PT connection. In sessions that load to Teradata, you can also configure an ODBC connection that is used to automatically create the recovery table in the target database and drop the log, error, and work tables if a session fails. For more information about using PowerExchange for Teradata Parallel Transporter, see the PowerExchange for Teradata Parallel Transporter User Guide.
Pushdown Optimization
When you run sessions that move data between PowerCenter and Teradata databases, you might be able to improve session performance using pushdown optimization. Pushdown optimization allows you to push PowerCenter transformation logic to the Teradata source or target database. The PowerCenter Integration Service translates the transformation logic into SQL queries and sends the SQL queries to the database. The Teradata database executes the SQL queries to process the mapping logic. The Integration Service processes any mapping logic it cannot push to the database.
The following figure illustrates how pushdown optimization works with a Teradata database system: ETL
Repository Repository Server Data Server
Today
ETL
ELT
Pushdown Processing
SQL
Staging Warehouse
The following figure shows a mapping in which you can increase performance using pushdown optimization:
If you configure this mapping for pushdown optimization, the Integration Service generates an SQL query based on the Filter and Lookup transformation logic and pushes the query to the source database. This improves session performance because it reduces the number of rows sent to PowerCenter. The Integration Service processes the Java transformation logic since that cannot be pushed to the database, and then loads data to the target. Use pushdown optimization to improve the performance of sessions that use Teradata relational connections to connect to Teradata. In general, pushdown optimization can improve session performance in the following circumstances: y When it reduces the number of rows passed between Teradata and PowerCenter. For example, pushing a Filter transformation to the Teradata source can reduce the number of rows PowerCenter extracts from the source. When the database server is more powerful than the PowerCenter server. For example, pushing a complex Expression transformation to the source or target improves performance when the database server can perform the expression faster than the server on which the PowerCenter Integration Service runs. When the generated query can take advantage of prebuilt indexes. For example, pushing a Joiner transformation to the Teradata source improves performance when the database can join tables using indexes and statistics that PowerCenter cannot access.
10
Pushdown optimization is available with the PowerCenter Pushdown Optimization Option and has been supported since PowerCenter 8.0. To configure a session to use pushdown optimization, choose a Pushdown Optimization type in the session properties. You can select one of the following pushdown optimization types: y y None. The Integration Service does not push any transformation logic to the database. Source-side. The Integration Service analyzes the mapping from the source to the target or until it reaches a downstream transformation it cannot push to the database. It pushes as much transformation logic as possible to the source database. The Integration Service generates SQL in the following form:
SELECT FROM source WHERE (filter/join condition) GROUP BY
Target-side. The Integration Service analyzes the mapping from the target back to the source or until it reaches an upstream transformation it cannot push to the database. It pushes as much transformation logic as possible to the target database. The Integration Service generates SQL in the following form:
INSERT INTO target() VALUES (?+1, UPPER(?))
Full. The Integration Service attempts to push all transformation logic to the target database. If the Integration Service cannot push all transformation logic to the database, it performs both source-side and target-side pushdown optimization. The Integration Service generates SQL in the following form:
INSERT INTO target()SELECT FROM source
$$PushdownConfig. Allows you to run the same session with different pushdown optimization configurations at different times.
Transformation Aggregator Expression* Filter Joiner Lookup, connected Lookup, unconnected Router Sorter Source Qualifier Target Union Update Strategy Pushdown Types Source-side, Full Source-side, Target-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Target-side, Full Source-side, Full Source-side, Full Source-side, Full Target-side, Full Source-side, Full Full
The Integration Service can push the logic for the following transformations to Teradata:
* PowerCenter expressions can be pushed down only if there is an equivalent database function. To work around this issue, you can enter an SQL override in the source qualifier.
When you use pushdown optimization with sessions that extract from or load to Teradata, you might need to modify mappings or sessions to take full advantage of the performance improvements possible with pushdown optimization. You might also encounter issues if a pushdown session fails.
11
For example, you might need to perform the following tasks: y Achieve full pushdown optimization without affecting the source. To achieve full pushdown optimization for a session in which the source and target reside in different database management systems, you can stage the source data in the Teradata target database. For more information, see Achieving Full Pushdown without Affecting the Source System on page 12. Achieve full pushdown optimization with parallel lookups. To achieve full pushdown optimization for a mapping that contains parallel lookups, redesign the mapping to serialize the lookups. For more information, see Achieving Full Pushdown with Parallel Lookups on page 13. Achieve pushdown optimization with sorted aggregation. To achieve pushdown optimization for a mapping that contains a Sorter transformation before an Aggregator transformation, redesign the mapping to remove the Sorter transformation. For more information, see Achieving Pushdown with Sorted Aggregation on page 14. Achieve pushdown optimization for an Aggregator transformation with pass-through ports. To achieve pushdown optimization for a mapping that contains an Aggregator transformation with pass-through ports, redesign the mapping to remove the pass-through ports from the Aggregator transformation. For more information, see Achieving Pushdown for an Aggregator Transformation on page 14. Achieve pushdown optimization when a transformation contains a variable port. To achieve pushdown optimization for a mapping that contains a transformation with a variable port, update the expression to eliminate the variable port. For more information, see Achieving Pushdown when a Transformation Contains a Variable Port on page 14. Improve pushdown performance in mappings with multiple targets. To increase performance when using full pushdown optimization for mappings with multiple targets, you can stage the target data in the Teradata database. For more information, see Improving Pushdown Performance in Mappings with Multiple Targets on page 14. Remove temporary views after a session that uses an SQL query fails. If you run a pushdown session that uses an SQL query, and the session fails, the Integration Service might not drop the views it creates in the source database. You can remove the views manually. For more information, see Removing Temporary Views when a Pushdown Session Fails on page 15.
For more information about pushdown optimization, see the PowerCenter Advanced Workflow Guide and the PowerCenter Performance Tuning Guide.
Since the source and target tables reside in different database management systems, you cannot configure the session for full pushdown optimization as it is. You could configure the session for source-side pushdown optimization, which would push the Filter and Lookup transformation logic to the source. However, pushing transformation logic to a transactional source might reduce performance of the source database. To avoid the performance problems caused by pushing transformation logic to the source, you can reconfigure the mapping to stage the source data in the target database.
12
To achieve full pushdown optimization, redesign the mapping as follows: 1. Create a simple, pass-through mapping to pass all source data to a staging table in the Teradata target database:
Configure the session to use Teradata PT or a standalone load utility to load the data to the staging table. Do not configure the session to use pushdown optimization. 2. Configure the original mapping to read from the staging table:
Configure the session to use full pushdown optimization. The Integration Service pushes all transformation logic to the Teradata database, increasing session performance.
To achieve full pushdown optimization, redesign the mapping so that the lookups are serialized as follows:
When you serialize the Lookup transformations, the Integration Service generates an SQL query in which the lookups become part of a subquery. The Integration Service can then push the entire query to the source database.
13
To redesign this mapping to achieve full or source-side pushdown optimization, configure the Aggregator transformation so that it does not use sorted input, and remove the Sorter transformation. For example:
To achieve pushdown optimization for the mapping, remove the variable port and reconfigure the output port as follows: y Output port expression: DOLLAR_AMT = (AMOUNT - FEE) * RATE
14
For example, the following mapping contains two Teradata sources and two Teradata targets, all in the same RDBMS:
To achieve full pushdown optimization, redesign the mapping as follows: 1. Configure the original mapping to write to a staging table in the Teradata target database:
Configure the session to use full pushdown optimization. 2. Create a second mapping to pass all target data from the staging table to the Teradata targets:
15
To avoid problems when you run a pushdown session that contains an SQL override, use the following guidelines: y y y y y y Ensure that the SQL override syntax is compatible with the Teradata source database. PowerCenter does not validate the syntax, so test the query before you push it to the database. Do not use an order by clause in the SQL override. Use ANSI outer join syntax in the SQL override. If the Source Qualifier transformation contains Informatica outer join syntax in the SQL override, the Integration Service processes the Source Qualifier transformation logic. If the Source Qualifier transformation is configured for a distinct sort and contains an SQL override, the Integration Service ignores the distinct sort configuration. If the Source Qualifier contains multiple partitions, specify the SQL override for all partitions. Do not use a Sequence Generator transformation in the mapping. Teradata does not have a sequence generator function or operator.
Making 32-bit Load and Unload Utilities Work with 64-bit PowerCenter
Applies to: FastLoad, MultiLoad, TPump, FastExport If you use 64-bit PowerCenter, you need to reset the library path to make PowerCenter work with the32-bit Teradata load and unload utilities. You must reset the library path before you can run a session that invokes a load or unload utility. To reset the library path, you need to replace the loader or FastExport executable with a shell script. The following procedure explains how to reset the library path for TPump on AIX. You can use the same method to reset the library path for the other utilities on Linux or other UNIX operating systems. To reset the library path: 1. Create a shell script like the following called <executable>_infa, for example, tpump_infa:
#!/bin/sh LIBPATH=/usr/lib;export LIBPATH COPLIB=/usr/lib;export COPLIB COPERR=/usr/lib;export COPERR PATH=$PATH:$INFA_HOME/server/infa_shared/TgtFiles exec tpump "$@" exit $?
2.
In the loader connection in the Workflow Manager, set the External Loader Executable attribute (for a load utility) or the Executable Name attribute (for FastExport) to the name of the shell script. So for Tpump, change the External Loader Executable from tpump to tpump_infa.
16
Note: If you redesign the mapping using this procedure, you can further increase performance by specifying an ORDER BY clause on the FastExport SQL and enabling the Sorted Input property for the lookup file. This prevents PowerCenter from having to sort the file before populating the lookup cache.
The result of the Lookup query and processing is the same, whether or not you cache the lookup table. However, using a lookup cache can increase session performance for relatively static data in smaller lookup tables. Generally, it is better to cache lookup tables that need less than 300 MB. For data that changes frequently or is stored in larger lookup tables, disabling caching can improve overall throughput. Do not cache the lookup tables in the following circumstances: y y y y The lookup tables are so large that they cannot be stored on the local system. There are not enough inodes or blocks to save the cache files. You are not allowed to save cache files on the Informatica system. The amount of time needed to build the cache exceeds the amount of time saved by caching.
To enable or disable the lookup cache, enable or disable the Lookup Caching Enabled option in the Lookup transformation properties. For more information about the lookup cache, see the PowerCenter Transformation Guide and the PowerCenter Performance Tuning Guide.
17
To work around this issue, perform either of the following actions: y y Apply the Teradata ODBC patch 3.2.011 or later and remove NoScan=Yes from the odbc.ini file. Configure the Lookup transformation to use a lookup cache or remove the Date/Time port from the lookup condition.
Note that PowerCenter adds the ML_ prefix to the MultiLoad log table name. If you use a hand-coded MultiLoad control file, the log table can have any name. For example, to recover from a failed job that attempted to load data to table td_test owned by user infatest, enter the following commands using BTEQ:
BTEQ -- Enter your DBC/SQL request or BTEQ command: drop table infatest.mldlog_td_test; drop table infatest.mldlog_td_test; *** Table has been dropped. *** Total elapsed time was 1 second. BTEQ -- Enter your DBC/SQL request or BTEQ command: release mload infatest.td_test; release mload infatest.td_test; *** Mload has been released. *** Total elapsed time was 1 second.
18
If you do not route the data to a single file, the session fails with the following error:
WRITER_1_*_1> WRT_8240 Error: The external loader [Teradata Mload Loader] does not support partitioned sessions. WRITER_1_*_1> Thu Jun 16 11:58:21 2005 WRITER_1_*_1> WRT_8068 Writer initialization failed. Writer terminating.
For more information about loading from partitioned sessions, see the PowerCenter Advanced Workflow Guide.
19
To convert a Teradata yyyyddd date column to a character column in PowerCenter: 1. 2. Edit the target table definition in PowerCenter and change the date column data type from date to char(7). Create an Expression transformation with the following expression to convert the date into a string with the format yyyyddd:
to_char(date_port,yyyy) || to_char(date_port,ddd)
Note: The expression to_char(date_port, yyyyddd) does not work. 3. Link the output port in the Expression transformation to the char(7) column in the target definition.
Hiding Passwords
Applies to: FastExport, FastLoad, MultiLoad, TPump, Teradata PT When you create a loader or application (FastExport) connection object, you enter the database user name and password in the connection properties. The Integration Service writes the password in the control file in plain text and the Teradata loader does not encrypt the password. To prevent the password from appearing in the control file, enter PMNullPasswd as the password. When you do this, the Integration Service writes an empty string for the password in the control file. If you do not want to use PMNullPasswd, perform either of following actions: y y Lock the control file directory. For load utilities, configure PowerCenter to write the control file to a different directory, and then secure that directory.
By default, the PowerCenter Integration Service writes the loader control file to the target file directory and the FastExport control file to the temp file directory. To write the loader control file to a different directory, set the LoaderControlFileDirectory custom property to the new directory for the Integration Service or session. For more information about setting custom properties for the Integration Service, see the PowerCenter Administrator Guide. For more information about setting custom properties for the session, see the PowerCenter Workflow Basics Guide. Finally, MultiLoad and TPump support the RUN FILE command. This command directs control from the current control file to the control file specified in the login script. Place the login statements in a file in a secure location, and then add the RUN FILE command to the generated control file to call it. Run chmod -w on the control file to prevent PowerCenter from overwriting it. For example, create a login script as follows (in the file login.ctl in a secure directory path):
.LOGON demo1099/infatest,infatest;
Modify the generated control file and replace the login statement with the following command:
.RUN FILE <secure_directory_path>/login.ctl;
20
Tpump loads data in a single phase. It converts the SQL in the control file into a database macro and applies the macro to the input data. TPump uses standard SQL and standard table locking. The following table lists the error tables you can check to troubleshoot load or unload utility errors:
Utility FastLoad Data Loading Phase Loading End loading MultiLoad Acquisition Default Error Table Name ET_<target_table_name> UV_<target_table_name> ET_<target_table_name> Error Types Constraint violations, conversion errors, unavailable AMP conditions Unique primary index violations All acquisition phase errors, application phase errors if the Teradata database cannot build a valid primary index Uniqueness violations, field overflow on columns other than primary index fields, constraint errors All TPump errors
Application
UV_<target_table_name>
TPump
ET_<target_table_name> <partition_number>
When a load fails, check the ET_ error table first for specific information. The ErrorField or ErrorFieldName column indicates the column in the target table that could not be loaded. The ErrorCode field provides details that explain why the column failed. For MultiLoad and TPump, the most common ErrorCodes are: y y 2689: Trying to load a null value into a non-null field 2665: Invalid date format
In the MultiLoad UV_ error table, you can also check the DBCErrorField column and DBCErrorCode field. The DBCErrorField column is not initialized in the case of primary key uniqueness violations. The DBCErrorCode that corresponds to a primary key uniqueness violation is 2794. For more information about Teradata error codes, see the Teradata documentation.
Authors
Lori Troy Senior Technical Writer, Informatica Corporation Chai Pydimukkala Senior Product Manager, Informatica Corporation
Acknowledgements
The authors would like to thank Guy Boo, Ashlee Brinan, Eugene Ding, Stan Dorcey, Anudeep Sharma, Lalitha Sundaramurthy, Raymond To, Rama Krishna Tumrukoti, Sonali Verma, and Rajeeva Lochan Yellanki at Informatica for their assistance with this article. Additionally, the authors would like to thank Edgar Bartolome, Steven Greenberg, John Hennessey, and Michael Klassen at Teradata and Stephen Knilans and Michael Taylor at LoganBritton for their technical assistance.
21