Você está na página 1de 59

 

Teradata User's Guide: The Ultimate Companion, Third Edition


by Tom Coffing
Coffing Data Warehousing. (c) 2006. Copying Prohibited.

  

Reprinted for vramanathan vramanathan, CSC


vramanathan@csc.com
Reprinted with permission as a subscription benefit of Skillport,
http://skillport.books24x7.com/

All rights reserved. Reproduction and/or distribution in whole or in part in electronic,paper or


other forms without written permission is prohibited.
Teradata User's Guide: The Ultimate Companion, Third Edition

Chapter 9: Teradata Utilities


An Introduction to the Teradata Utilities

"It's not the data load that breaks us down, it's the way you carry it."
Tom Coffing

Teradata has been doing data transfers to and from the largest data warehouses in the world for close to two decades.
While other databases have allowed the loads to break them down, Teradata has continued to set the standards and
break new barriers. The brilliance behind the Teradata load utilities is in their power and flexibility. With five great utilities
Teradata allows you to pick the utility for the task at hand. This book is dedicated to explaining these utilities in a complete
and easy manner. This book has been written by Five Teradata Certified Masters with experience at over 125 Teradata
sites worldwide. Let our experience be your guide.

The intent of this book is to twofold. The first is to help you write and use the various utilities. A large part of this is taken
up with showing the commands and their functionality. In addition, it is shows examples using the various utility commands
and SQL in conjunction with each other that you will come to appreciate.

The second intention is to help you know which utility to use under a variety of conditions. You will learn that some of the
utilities use very large blocks to transfer the data either to or from the Teradata Relational Database Management System
(RDBMS). From this perspective, they provide a high degree of efficiency using a communications path of either the
mainframe channel or network.

The other approach to transferring data rows either to or from the Teradata RDBMS is a single row at a time. The following
sections provide a high level introduction to the capabilities and considerations for both approaches. You can use this
information to help decide which utilities are appropriate for your specific need.
Considerations for Using Block at a Time Utilities

As mentioned above, there are efficiencies associated with using large blocks of data when transferring between
computers. So, the logic might indicate that it is always the best approach. However, there is never one best approach.

You will learn that efficiency comes at the price of other database capabilities. For instance, when using large blocks to
transfer and incorporate data into Teradata the following are not allowed:

n Secondary indices

n Triggers

n Referential integrity

n More than 15 concurrent utilities running at the same time

Therefore, it is important to understand when and where these considerations are present. So, as important as it is to know
the language of the utility and database, it is also important to understand when to use the appropriate utility. The
capabilities and considerations are covered in conjunction with the commands.
Considerations for Using Row at a Time Utilities

The opposite of sending a large block of rows at the same time is sending a single row at a time. The primary difference in
these approaches is speed. It is always faster to send multiple rows in one operation instead of one row.

If it is slower, why would anyone ever use this approach?

The reason is that it provides more flexibility with fewer considerations. By this, we mean that the row at a time utilities
allow the following:

n Secondary indices

n Triggers

n Referential integrity

Page 2 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

n More than 15 concurrent utilities running at the same time

As you can see, they allow all the things that the block utilities do not. With that in mind and for more information, continue
reading about the individual utilities and open up a new world of capabilities in working with the Teradata RDBMS.
Welcome to the world of the Teradata Utilities.

An Introduction to BTEQ
Why it is Called BTEQ?

Why is BTEQ available on every Teradata system ever built? Because the Batch TEradata Query (BTEQ) tool was the
original way that SQL was submitted to Teradata as a means of getting an answer set in a desired format. This is the utility
that I used for training at Wal*Mart, AT&T, Anthem Blue Cross and Blue Shield, and SouthWestern Bell back in the early
1990's. BTEQ is often referred to as the Basic TEradata Query and is still used today and continues to be an effective tool.

Here is what is excellent about BTEQ:

n BTEQ can be used to submit SQL in either a batch or interactive environment. Interactive users can submit SQL and
receive an answer set on the screen. Users can also submit BTEQ jobs from batch scripts, have error checking and
conditional logic, and allow for the work to be done in the background.

n BTEQ outputs a report format, where Queryman outputs data in a format more like a spreadsheet. This allows BTEQ a
great deal of flexibility in formatting data, creating headings, and utilizing Teradata extensions, such as WITH and
WITH BY that Queryman has problems in handling.

n BTEQ is often used to submit SQL, but is also an excellent tool for importing and exporting data.

¡ Importing Data: Data can be read from a file on either a mainframe or LAN attached computer and used for
substitution directly into any Teradata SQL using the INSERT, UPDATE or DELETE statements.

¡ Exporting Data: Data can be written to either a mainframe or LAN attached computer using a SELECT from
Teradata. You can also pick the format you desire ranging from data files to printed reports to spread sheet
formats.

There are other utilities that are faster than BTEQ for importing or exporting data. We will talk about these in future
chapters, but BTEQ is still used for smaller jobs.

Logging onto BTEQ

"It's choice – not change – that determines your destiny."


– Jean Nidetch

By taking a chance in this industry, you've chosen to arm yourself with an unlimited arsenal of knowledge. But you can't
use that knowledge if you can't log onto the system! This next slide is going to teach you how to logon to BTEQ.
Remember that you will be prompted for the password since it's an interactive interface. BTEQ commands begin with a
period (.) and do not require a semi-colon (;) to end the statement. SQL commands do not ever start with a period and they
must always be terminated with a semi-colon.

Let's logon to BTEQ and show all information in the Employee_Table:

Page 3 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Before you can use BTEQ, you must have user access rights to the client system and privileges to the Teradata DBS.
Normal system access privileges include a userid and a password. Some systems may also require additional user
identification codes depending on company standards and operational procedures. Depending on the configuration of your
Teradata DBS, you may need to include an account identifier (acctid) and/or a Teradata Director Program Identifier
(TDPID).

Using BTEQ to submit queries


Submitting SQL in BTEQ's Interactive Mode

Once you logon to Teradata through BTEQ, you are ready to run your queries. Teradata knows the SQL is finished when
it finds a semi-colon, so don't forget to put one at the end of your query. Below is an example of a Teradata table to
demonstrate BTEQ operations.

Employee_Table

yee_No Last_Name First_Name Salary Dept_No


2000000 Jones Squiggy 32800.50 ?
1256349 Harrison Herbert 54500.00 400
1333454 Smith John 48000.00 200
1121334 Strickling Cletus 54500.00 400

Figure 2-1

BTEQ execution

.LOGON cdw/sql01 Type at command prompt: Logon with TDPID and USERNAME.
Password: XXXXX Then enter PASSWORD at the second prompt.

Enter your BTEQ/SQL Request or BTEQ Command. BTEQ will respond and is waiting for a command.

SELECT * FROM Employee_Table


WHERE Dept_No = 400;

Page 4 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

*** Query Completed. 2 rows found. 5 Columns returned. An SQL Statement


*** Total elapsed time was 1 second.

Employee_No Last_name First_name Salary BTEQ displays information about the answer set.
Dept_No The result set
1256349 Harrison Herbert
54500.00 400
1121334 Strickling Cletus 54500.00
400

WITH BY Statement

"Time is the best teacher, but unfortunately, it kills all of its students."
– Robin Williams

Investing time in Teradata can be a killer move for your career. We can use the WITH BY statement in BTEQ, whereas we
cannot use it with Nexus or SQL Assistant. The WITH BY statement works like a correlated subquery in the fact that you
can us aggregates based on a distinct column value.

BTEQ has the ability to use WITH BY statements:

"I've learned that you can't have everything and do everything at the same time."
– Oprah Winfrey

The great thing about the WITH statement is that you can do everything to a specific group while having everything done
to a column as a whole. We can get a grand total or an overall average with the WITH statement, just leave out BY. Here's
a good example:

Page 5 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Using WITH on a whole column:

Transactions in Teradata Mode

"He who every morning plans the transaction of the day and follows out that plan, carries a thread that will
guide him through the maze of most busy life."
- Victor Hugo

Victor couldn't have summed up Teradata any better. However, Victor did seem more worried about the hunchback than
the rollback. Turning your queries into a single transaction is often the best plan, but can sometimes make one Miserables.

Often in Teradata we'll see multiple queries within the same transaction. We can use the BT / ET keywords to bundle
several queries into one transaction. You also need to end every query with a semi-colon (;), which isn't the case in Nexus
or SQL assistant. For example:

In Teradata mode, we're going to put four single statements into a single transaction

Page 6 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

"What is defeat? Nothing but education; nothing but the first step to something better."
– Wendell Phillips

The final query in our last transaction is what caused our updates to fail. This was not the sweet taste of victory, but
instead the smell of de Feet! Actually, it was an education that is the step to something better. When using BT / ET in your
transaction, you're telling Teradata that when it comes to committing, we either want all or none. Since our last query in the
transaction failed the Transient Journal rolled back all the queries in our entire transaction. Make sure that your syntax is
correct when using the method of BT and ET because a mistake causes a massive rollback.

The last query in our set did not work:

Now let's take a look at the Employee_Table:

Page 7 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Our updates didn't work! That's because we bundled all four queries into one transaction. Since our last query failed,
the tables were rolled back to their original state before the transaction took place.

Alternative Transactions in Teradata Mode

"It's not enough that we do our best; sometimes we have to do what's required."
– Sir Winston Churchill

Sometimes we're required to use an alternative method to get the job done if we want to win like Winston. Here's another
way to set up a bundle of queries into one transaction. Notice where we place the semi-colon in our queries and you will
understand this technique. Remember that the semi-colon must be at the very beginning of the next line for a query to be
considered as part of the same transaction. Because we are in Teradata mode if any query fails then all queries that are
part of the same transaction roll back. How many queries are parts of the same transaction below? Four!

Another way to perform a multi-statement transaction in Teradata mode:

Placing the semi-colon at the beginning of the next line (followed by another statement) will bundle those statements
together as one transaction. Notice that our Employee_Table was not updated, just like in the first example.

Transactions in ANSI Mode

"The man who views the world at 50 the same as he did at 20 has wasted 30 years of his life."
- Muhammad Ali

ANSI (American National Standard Institution) allows us to view the same queries in a different way. To change to ANSI
mode, simple type '.set session transaction ANSI' and be sure to do it before you actually logon to BTEQ. Then, you can
logon like you always do, but you will be in ANSI mode. All queries in ANSI mode will also work in Teradata mode and vice
versa. However, three things will be different in ANSI mode versus Teradata mode. Those things are how case sensitivity

Page 8 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

is handled, how transactions are committed and rolled back, and how truncation is accepted.

Let's log back onto BTEQ, but this time change it to ANSI mode:

"Be not afraid of growing slowly, be afraid only of standing still."


- Chinese Proverb

Remember the first rule of ANSI mode: all transactions must be committed by the user actually using the word 'COMMIT'.
Also, in ANSI mode after any DDL statement (CREATE, DROP, ALTER, DATABASE) we have to use the 'commit'
command immediately. This tells Teradata to commit to what's been done. Our query below will attempt to find anyone with
a last_name of 'larkins'. It will fail even though we have 'Mike' 'Larkins' in our table. This is because ANSI is case sensitive
and we did not capitalize the 'L' in 'Larkins'.

Let's run a few queries in ANSI mode:

Notice that we have to COMMIT after any DDL or Update before the transaction is committed. We even have to
COMMIT after setting our DATABASE or we will get an error.

We didn't have any rows return, but we know there's a Mike Larkins within the table. That's because BTEQ is case
sensitive. Change 'larkins' to 'Larkins'.

Rollback

Page 9 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

"Insanity: doing the same thing over and over again and expecting different results."
– Albert Einstein

The Rollback keyword is the SQL mulligan of Teradata. Rollback will erase any changes made to a table. This can be very
useful if something didn't work. However, you cannot rollback once you've used the commit keyword. Not keeping rollback
in your arsenal would be insane.

Advantages to ANSI Mode

ANSI mode is great because when you bundle several queries into one transaction and one doesn't work, the rest won't be
rolled back to their original state. Using commit will ensure that your successes aren't hidden by your failures.

Now notice that I will have multiple statements in the same transaction and that I purposely fail the last SQL statement:

Page 10 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Which statements were rolled back?

"All truths are easy to understand once they are discovered; the point is to discover them."
– Galileo Galilei

Discovering the advantages in using ANSI will only make SQL easier to write. It might take a little bit more typing, but a little
work now can save you lots of time later.

The Employee_Table was updated!

In ANSI mode, only failed transactions are rolled back when it comes to multi-statement transactions.

Creating a Batch Script for BTEQ

"The cure for boredom is curiosity. There is no cure for curiosity."


– Dorothy Parker

If you've been bored waiting for your queries to finish, then I'm sure you're curious on how we can fix the situation. Batch
scripting allows us to write out pages and pages of queries and execute those queries in one single swoop. BTEQ can also
run in batch mode under UNIX (IBM AIX, Hewlett-Packard HP-UX, NCR MP-RAS, Sun Solaris), DOS, Macintosh,
Microsoft Windows and OS/2 operating systems. To submit a job in batch mode, do the following:
1. Invoke BTEQ (using dos prompt)

2. Type in the input file name

3. Type in the location and output file name.

The following example shows how to create a batch script and how to invoke the script using BTEQ from a DOS command.

Page 11 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

When using Batch scripting, your password will not be prompted. Instead, just add the password after your login name, and
a comma separates the two. Be sure to end with either a .quit or .logoff so that your queries aren't left hanging.

Simply open up notepad and type in the following, then save it. I recommend calling it 'BTEQ_First_Batch_Script.txt' and
save it in the C:\Temp folder. However, as long as you can remember what you named it and where you saved it, you'll be
fine. Be sure that you save it as a .txt file.

Using Batch scripting with BTEQ

Running your Batch Script in BTEQ

"I do not fear computers. I fear the lack of them."


– Isaac Asimov

The BTEQ utility enables us to run our scripts in batch mode. To run our new batch script, we have to access the BTEQ
utility via dos prompt. Simply use command prompt to access the utility, and follow the steps below:

Let's run our query in Batch!

Once you're in dos, type in the following: 'BTEQ < c:\temp\BTEQ_First_Script.txt', then hit enter. BTEQ will automatically
open in dos, and then it will access the file from the location you listed.

Page 12 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Results from a BTEQ Batch Script

"Don't be afraid to take a big step when one is indicated. You can't cross a chasm in two small steps."
-David Lloyd George

BTEQ will run your query in steps to produce the answer you're looking for. Whether you're accessing a small table or
crossing over a chasm of information, BTEQ will ensure that the steps it takes will be big enough to get the job done.

Our results are returned Interactively

Placing Our BTEQ Output to a File

"The secret to creativity is knowing how to hide your sources."


– Albert Einstein

We can use BTEQ to export our results to another text document. Exporting data also works very well when you're trying to
document your query along with the results.

We can export our results in batch as well

Page 13 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Notice that the BTEQ command is immediately followed by the '<BTEQ_First_Script.txt' to tell BTEQ which file contains the
commands to execute. Then, the '>BTEQ_First_Export.txt' names the file where the output messages are written.

Since putting password information into a script is scary for security reasons, inserting the password directly into a script
that is to be processed in batch mode may not be a good idea. It is generally recommended and a common practice to
store the logon and password in a separate file that that can be secured. That way, it is not in the script for anyone to see.
For example, the contents of a file called "mylogon.txt" might be: '.LOGON cdw/sql00,whynot'. Then, the script should
contain the following command instead of a .LOGON: .RUN FILE=c:\temp\mylogon.txt

This command opens and reads the file. It then executes every record in that file.

Reading Out BTEQ Output from the Text File

"The more original a discovery, the more obvious it seems afterwards."


– Arthur Koestler

Discovering how easy it is to export your data in batch mode is a key step in learning Teradata utilities. Here are our
results, including the original query and what BTEQ did to generate its answer set. Simply go to the folder where you
saved the exported data (the previous examples saved the file as c:\temp\BTEQ_First_Export.txt).

What you'll find in our new text document

Page 14 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Using BTEQ Conditional Logic

Below is a BTEQ batch script example. The initial steps of the script will establish the logon, the database, and then delete
all the rows from the Employee_Table. If the table does not exist, the BTEQ conditional logic will instruct Teradata to
create it. However, if the table already exists, then Teradata will move forward and insert data.

Note: In script examples, the left panel contains BTEQ base commands and the right panel provides a brief description of
each command.

.RUN FILE = c:\temp\mylogon.txt BTEQ conditional logic that will check to ensure
DATABASE SQL_Class; that the delete worked or if the table even existed.
DELETE FROM Employee_Table; If the table did not exist, then BTEQ will create it. If
the table does exist, the Create table step will be
.IF ERRORCODE = 0 THEN .GOTO INSEMPS [*] skipped and directly GOTO INSEMPS.
/* ERRORCODE is a reserved word that contains the outcome
status for every SQL statement executed in BTEQ. A zero (0)
indicates that statement worked. */

CREATE TABLE Employee_Table The Label INSEMPS provides code so the BTEQ
(Employee_No INTEGER, Logic can go directly to inserting records into the
Last_name CHAR(20), Employee_Table.
First_name CHAR(12),
Salary DECIMAL(8,2),
Dept_No SMALLINT)
UNIQUE PRIMARY INDEX (Employee_No);

.LABEL INSEMPS[*] Once the table has been created, Teradata will
INSERT INTO Employee_Table (1232578, 'Chambers', 'Mandee', 48850.00, then insert the two new rows into the empty table.
100);
INSERT INTO Employee_Table (1256349, 'Harrison', 'Herbert', 54500.00, 400);
.QUIT

[*]
Both labels have to be identical or it will not work.

Using BTEQ to Export Data

Page 15 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

"The trouble with facts is that there are so many of them."


– Samuel McChord Crothers

Creating flat files is one of the most important tasks in Teradata, and that's a fact. BTEQ allows data to be exported directly
from Teradata to a file on a mainframe or network-attached computer. In addition, the BTEQ export function has several
export formats that a user can choose from depending on the desired output. Generally, users will export data to a flat file
format that is composed of a variety of characteristics. These characteristics include: field mode, indicator mode, or dif
mode.

Syntax of a basic EXPORT command:


.EXPORT <Mode (example: data)> FILE = <filename>

Creating a flat file of what's on the Employee_Table

Executing our BTEQ Script to Export Data

"The past is a foreign country; they do things differently there."


L. P. Hartley

Transferring data from table to another without the use of a flat file is a thing of the past. Teradata does things differently
now, which is why it's still considered the future of data warehousing. The flat files we create are merely used to store
information contained within a table. The information on these files is written in binary code, which is why the text seems
garbled. It may look garbled, but it is perfectly written. When we Fastload the data back to a table it will look beautiful.

Executing our fastload_creating_flatfile01.txt

What our flat file looks like:

Page 16 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

And I thought French was tough; it's like they have a different word for everything…

We now have a flat file that contains all information found in the Employee_Table. We will be able to use this flat file for
future exercises.

BTEQ Export Modes Explained

Below is a list and description of our three data modes:

Record Mode: (also called DATA mode): This is set by .EXPORT DATA. This will bring data back as a flat file. Each
parcel will contain a complete record. Since it is not a report, there are no headers or white space between the data
contained in each column and the data is written to the file (e.g., disk drive file) in native format. For example, this means
that INTEGER data is written as a 4-byte binary field. Therefore, it cannot be read and understood using a normal text
editor.

Field Mode (also called REPORT mode): This is set by .EXPORT REPORT. This is the default mode for BTEQ and
brings the data back as if it was a standard SQL SELECT statement. The output of this BTEQ export would return the
column headers for the fields, white space, expanded packed or binary data (for humans to read) and can be understood
using a text editor.

Indicator Mode: This is set by .EXPORT INDICDATA. This mode writes the data in data mode, but also provides host
operating systems with the means of recognizing missing or unknown data (NULL) fields. This is important if the data is to
be loaded into another Relational Database System (RDBMS).

The issue is that there is no standard character defined to represent either a numeric or character NULL. So, every system
uses a zero for a numeric NULL and a space or blank for a character NULL. If this data is simply loaded into another
RDBMS, it is no longer a NULL, but a zero or space.

To remedy this situation, INDICATA puts a bitmap at the front of every record written to the disk. This bitmap contains one
bit per field/column. When a Teradata column contains a NULL, the bit for that field is turned on by setting it to a "1".
Likewise, if the data is not NULL, the bit remains a zero. Therefore, the loading utility reads these bits as indicators of
NULL data and identifies the column(s) as NULL when data is loaded back into the table, where appropriate.

Since both DATA and INDICDATA store each column on disk in native format with known lengths and characteristics, they
are the fastest method of transferring data. However, it becomes imperative that you be consistent. When it is exported as
DATA, it must be imported as DATA and the same is true for INDICDATA.

Again, this internal processing is automatic and potentially important. Yet, on a network-attached system, being consistent
is our only responsibility. However, on a mainframe system, you must account for these bits when defining the LRECL in
the Job Control Language (JCL). Otherwise, your length is too short and the job will end with an error.

To determine the correct length, the following information is important. As mentioned earlier, one bit is needed per field
output onto disk. However, computers allocate data in bytes, not bits. Therefore, if one bit is needed a minimum of eight (8
bits per byte) are allocated. Therefore, for every eight fields, the LRECL becomes 1 byte longer and must be added. In
other words, for nine columns selected, 2 bytes are added even though only nine bits are needed.

With this being stated, there is one indicator bit per field selected. INDICDATA mode gives the Host computer the ability to
allocate bits in the form of a byte. Therefore, if one bit is required by the host system, INDICDATA mode will automatically
allocate eight of them. This means that from one to eight columns being referenced in the SELECT will add one byte to the

Page 17 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

length of the record. When selecting nine to sixteen columns, the output record will be two bytes longer.

When executing on non-mainframe systems, the record length is automatically maintained. However, when exporting to a
mainframe, the JCL (LRECL) must account for this additional 2 bytes in the length.

DIF Mode: Known as Data Interchange Format, which allows users to export data from Teradata to be directly utilized for
spreadsheet applications like Excel, FoxPro and Lotus.

The optional LIMIT is to tell BTEQ to stop returning rows after a specific number (n) of rows. This might be handy in a test
environment to stop BTEQ before the end of transferring rows to the file.

BTEQ EXPORT Example Using Record (DATA) Mode

The following is an example that displays how to utilize the export Record (DATA) option. Notice the periods (.) at the
beginning some of script lines. A period starting a line indicates a BTEQ command. If there is no period, then the command
is an SQL command.

When doing an export on a mainframe or a network-attached (e.g., LAN) computer, there is one primary difference in
the .EXPORT command. The difference is the following:

.EXPORT DATA DDNAME = data definition statement name (JCL)


n Mainframe syntax:

n LAN syntax: .EXPORT DATA FILE = actual file name

BTEQ Return Codes

Return codes are two-digit values that BTEQ returns to the user after completing each job or task. The value of the return
code indicates the completion status of the job or task as follows:

Return Code Descirption

n 00 Job completed with no errors.

n 02 User alert to log on to the Teradata DBS.

n 04 Warning error.

n 08 User error.

n 12 Severe internal error.

You can over-ride the standard error codes at the time you terminate BTEQ. This might be handy for debug purposes. The
error code or "return code" can be any number you specify using one of the following:

Override Code Description

n .QUIT 15

n .EXIT 15

BTEQ Commands

The BTEQ commands in Teradata are designed for flexibility. These commands are not used directly on the data inside
the tables. However, these 60 different BTEQ commands are utilized in four areas.

n Session Control Commands

n File Control Commands

n Sequence Control Commands

n Format Control Commands

Page 18 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Session Control Commands

ABORT Abort any and all active running requests and transactions for a session, but do not exit BTEQ.
DEFAULTS Reset all BTEQ Format command options to their defaults. This will utilize the default configurations.
EXIT Immediately end the current session or sessions and exit BTEQ.
HALT EXECUTION Abort any and all active running requests and transactions and EXIT BTEQ.
LOGOFF End the current session or sessions, but do not exit BTEQ.
LOGON Starts a BTEQ Session. Every user, application, or utility must LOGON to Teradata to establish a session.
QUIT End the current session or sessions and exit BTEQ.
SECURITY Specifies the security level of messages between a network-attached system and the Teradata
Database.
SESSIONS Specifies the number of sessions to use with the next LOGON command.
SESSION Specifies the name of a character set for the current session or sessions.
CHARSET
SESSION Specifies a disposition of warnings issued in response to violations of ANSI syntax. The SQL will still run,
SQLFLAG but a warning message will be provided. The four settings are FULL, INTERMEDIATE, ENTRY, and NONE.
SESSION Specifies whether transaction boundaries are determined by Teradata SQL or ANSI SQL semantics.
TRANSACTION
SHOW CONTROLS Displays all of the BTEQ control command options currently configured.
SHOW VERSIONS Displays the BTEQ software release versions.
TDP Used to specify the correct Teradata server for logons for a particular session.

File Control Commands

These BTEQ commands are used to specify the formatting parameters of incoming and outgoing information. This includes
identifying sources and determining I/O streams.

CMS Execute a VM CMS command inside the BTEQ environment.


ERROROUT Write error messages to a specific output file.
EXPORT Open a file with a specific format to transfer information directly from the Teradata database.
HALT Abort any and all active running requests and transactions and EXIT BTEQ.
EXECUTION
FORMAT Enable/inhibit the page-oriented format command options.
IMPORT Open a file with a specific format to import information into Teradata.
INDICDATA One of multiple data mode options for data selected from Teradata. The modes are INDICDATA, FIELD, or
RECORD MODE.
OS Execute an MS-DOS, PC-DOS, or UNIX command from inside BTEQ.
QUIET Limit BTEQ output displays to all error messages and request processing statistics.
RECORDMODE One of multiple data mode options for data selected from Teradata. (INDICDATA, FIELD, or RECORD).
REPEAT Submit the next request a certain amount of times
RUN Execute Teradata SQL requests and BTEQ commands directly from a specified run file.
TSO Execute an MVS TSO command from inside the BTEQ environment.

Sequence Control Commands

These commands control the sequence in which Teradata commands operate.

ABORT Abort any active transactions and requests.


ERRORLEVEL Assign severity levels to particular error numbers.

Page 19 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

EXIT End the current session or sessions and exit BTEQ.


GOTO Skip all intervening commands and resume after branching forward to the specified label.
HANG Pause BTEQ processing for a specific amount of time.
IF…THEN Test a stated condition, and then resume processing based on the test results.
LABEL The GOTO command will always GO directly TO a particular line of code based on a label.
MAXERROR Specifies a maximum allowable error severity level.
QUIT End the current session or sessions and exit BTEQ.
REMARK Place a comment on the standard output stream.
REPEAT Submit the next request a certain amount of times.

Format Control Commands

These commands control the formatting for Teradata and present the data in a report mode to the screen or printer.

DEFAULTS Reset all BTEQ Format command options to their defaults. This will utilize the default configurations.
ECHOREQ Enable the Echo required function in BTEQ returning a copy of each Teradata SQL request and BTEQ
command to the standard output stream.
EXPORT Open a file with a specific format to transfer information directly from the Teradata database.
FOLDLINE Split or fold each line of a report into multiple lines.
FOOTING Specify a footer to appear at the bottom of every report page.
FORMAT Enable/inhibit the page-oriented format command options.
IMPORT Open a file with a specific format to transfer or IMPORT information directly to Teradata.
INDICDATA One of multiple data mode options for data selected from Teradata. The modes are INDICDATA, FIELD, or
RECORD MODE.
NULL Specifies a character or string of characters to represent null values returned from Teradata.
OMIT Omit specific columns from a report.
PAGEBREAK Ejects a page whenever a specified column changes values.
PAGELENGTH Specifies the page length of printed reports based on lines per page.
QUIET Limit BTEQ output displays to all error messages and request processing statistics.
RECORDMODE One of multiple data mode options for data selected from Teradata. (INDICDATA, FIELD, or RECORD).
RETCANCEL Cancel a request when the specified value of the RETLIMIT command option is exceeded.
RETLIMIT Specifies the maximum number of rows to be displayed or written from a Teradata SQL request.
RETRY Retry requests that fail under specific error conditions.
RTITLE Specify a header appearing at the top of all pages of a report.
SEPARATOR Specifies a character string or specific width of blank characters separating columns of a report.
SHOWCONTR Displays all of the BTEQ control command options currently configured.
OLS
SIDETITLES Place titles to the left or side of the report instead of on top.
SKIPLINE Inserts blank lines in a report when the value of a column changes specified values.
SUPPRESS Replace each and every consecutively repeated value with completely-blank character strings.
TITLEDASHES Display dash characters before each report line summarized by a WITH clause.
UNDERLINE Display a row of dash characters when the specified column changes values.
WIDTH Specifies the width of screen displays and printed reports, based on characters per line.

An Introduction to FastExport
Why it is Called "FAST" Export

Page 20 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

FastExport is known for its lightning speed when it comes to exporting vast amounts of data from Teradata and transferring
the data into flat files on either a mainframe or network-attached computer. In addition, FastExport has the ability to use
OUTMOD routines, which provide the user the capability to write, select, validate, and preprocess the exported data. Part
of this speed is achieved because FastExport takes full advantage of Teradata's parallelism.

In this book, we have already discovered how BTEQ can be utilized to export data from Teradata in a variety of formats. As
the demand increases to store data, the ever-growing requirement for tools to export massive amounts of data also
increases.

This is the reason why FastExport (FEXP) is brilliant by design. A good rule of thumb is that if you have more than half a
million rows of data to export to either a flat file format or with NULL indicators, then FastExport is the best choice to
accomplish this task.

Keep in mind that FastExport is designed as a one-way utility — that is, the sole purpose of FastExport is to move data out
of Teradata. It does this by harnessing the parallelism that Teradata provides.

FastExport is extremely attractive for exporting data because it takes full advantage of multiple sessions, which leverages
Teradata parallelism. FastExport can also export from multiple tables during a single operation. In addition, FastExport
utilizes the Support Environment, which provides a job restart capability from a checkpoint if an error occurs during the
process of executing an export job.
How FastExport Works

When FastExport is invoked, the utility logs onto the Teradata database and retrieves the rows that are specified in the
SELECT statement and puts them into SPOOL. From there, it must build blocks to send back to the client. In comparison,
BTEQ starts sending rows immediately for storage into a file.

If the output data is sorted, FastExport may be required to redistribute the selected data two times across the AMP
processors in order to build the blocks in the correct sequence. Remember, a lot of rows fit into a 64K block and both the
rows and the blocks must be sequenced. While all of this redistribution is occurring, BTEQ continues to send rows.
FastExport is getting behind in the processing. However, when FastExport starts sending the rows back a block at a time, it
quickly overtakes and passes BTEQ's row at time processing.

The other advantage is that if BTEQ terminates abnormally, all of your rows (which are in SPOOL) are discarded. You
must rerun the BTEQ script from the beginning. However, if FastExport terminates abnormally, all the selected rows are in
worktables and it can continue sending them where it left off. Pretty smart and very fast!

FastExport Fundamentals

#1: FastExport EXPORTS data from Teradata. The reason they call it FastExport is because it takes data off of
Teradata (Exports Data). FastExport does not import data into Teradata. Additionally, like BTEQ it can output multiple files
in a single run.

#2: FastExport only supports the SELECT statement. The only DML statement that FastExport understands is
SELECT. You SELECT the data you want exported and FastExport will take care of the rest.

#3: Choose FastExport over BTEQ when Exporting Data of more than half a million+ rows. When a large amount
of data is being exported, FastExport is recommended over BTEQ Export. The only drawback is the total number of
FastLoads, FastExports, and MultiLoads that can run at the same time, which is limited to 15. BTEQ Export does not have
this restriction. Of course, FastExport will work with less data, but the speed may not be much faster than BTEQ.

#4: FastExport supports multiple SELECT statements and multiple tables in a single run. You can have multiple
SELECT statements with FastExport and each SELECT can join information up to 64 tables.

#5: FastExport supports conditional logic, conditional expressions, arithmetic calculations, and data
conversions. FastExport is flexible and supports the above conditions, calculations, and conversions.

#6: FastExport does NOT support error files or error limits. FastExport does not record particular error types in a
table. The FastExport utility will terminate after a certain number of errors have been encountered.

#7: FastExport supports user-written routines INMODs and OUTMODs. FastExport allows you write INMOD and
OUTMOD routines so you can select, validate and preprocess the exported data

Page 21 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Maximum of 15 Loads

The Teradata RDBMS will only support a maximum of 15 simultaneous FastLoad, MultiLoad, or FastExport utility jobs. This
maximum value is determined and configured in the DBS Control record. This value can be set from 0 to 15. When
Teradata is initially installed, this value is set at 5.

The reason for this limitation is that FastLoad, MultiLoad, and FastExport all use large blocks to transfer data. If more then
15 simultaneous jobs were supported, a saturation point could be reached on the availability of resources. In this case,
Teradata does an excellent job of protecting system resources by queuing up additional FastLoad, MultiLoad, and
FastExport jobs that are attempting to connect.

For example, if the maximum number of utilities on the Teradata system is reached and another job attempts to run that job
does not start. This limitation should be viewed as a safety control feature. A tip for remembering how the load limit applies
is this, "If the name of the load utility contains either the word "Fast" or the word "Load", then there can be only a total of
fifteen of them running at any one time".

BTEQ does not have this load limitation. FastExport is clearly the better choice when exporting data. However, if two many
load jobs are running. BTEQ is an alternate choice for exporting data.

FastExport Support and Task Commands

FastExport accepts both FastExport commands and a subset of SQL statements. The FastExport commands can be
broken down into support and task activities. The table below highlights the key FastExport commands and their
definitions. These commands provide flexibility and control during the export process.

Support Environment Commands (see Support Environment chapter for details)

ACCEPT Allows the value of utility variables to be accepted directly from a file or from environmental variables.
DATEFORM Specifies the style of the DATE data types for FastExport.
DISPLAY Writes messages to the specific location.
ELSE Used in conjunction with the IF statement. ELSE commands and statements will execute when a proceeding IF
condition is false.
ENDIF Used in conjunction with the IF or ELSE statements. Delimits the commands that were subject to previous IF or
ELSE conditions.
IF Introduces a conditional expression. If true then execution of subsequent commands will happen.
LOGOFF Disconnects all FastExport active sessions and terminates FastExport.
LOGON LOGON command or string used to connect sessions established through the FastExport utility.
LOGTABLE FastExport utilizes this to specify a restart log table. The purpose is for FastExport checkpoint information.
ROUTE Will route FastExport messages to an alternate destination.
MESSAGES
RUN FILE Used to point to a file that FastExport is to use as standard input. This will Invoke the specified external file as the
current source of utility and Teradata SQL commands.
SET Assigns a data type and value to a variable.
SYSTEM Suspends the FastExport utility temporarily and executes any valid local operating system command before
returning.

Task Commands

BEGIN Begins the export task and sets the specifications for the number of sessions with Teradata.
EXPORT
END Ends the export task and initiates processing by Teradata.
EXPORT
EXPORT Provides two things which are:

n The client destination and file format specifications for the export data retrieved from Teradata
n A generated MultiLoad script file that can be used later to reload the export data back into Teradata

Page 22 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

FIELD Constitutes a field in the input record section that provides data values for the SELECT statement.
FILLER Specifies a field in the input record that will not be sent to Teradata for processing. It is part of the input record to
provide data values for the SELECT statement.
IMPORT Defines the file that provides the USING data values for the SELECT.
LAYOUT Specifies the data layout for a file. It contains a sequence of FIELD and FILLER commands. This is used to describe
the import file that can optionally provide data values for the SELECT.

FastExport Supported SQL Commands

FastExport accepts the following Teradata SQL statements. Each has been placed in alphabetic order for your
convenience.
SQL Commands

ALTER TABLE Change a column or table options of a table.


CHECKPOINT Add a checkpoint entry in the journal table.
COLLECT STATISTICS Collect statistics for one or more columns or indexes in a table.
COMMENT Store or retrieve a comment string for a particular object.
CREATE DATABASE Creates a new database.
CREATE TABLE Creates a new table.
CREATE VIEW Creates a new view.
CREATE MACRO Creates a new macro.
DATABASE Specify a default database for the session.
DELETE Delete rows from a table.
DELETE DATABASE Removes all tables, views, macros, and stored procedures from a database.
DROP DATABASE Drops a database.
GIVE Transfer ownership of a database or user to another user.
GRANT Grant access privileges to an object.
MODIFY DATABASE Change the options for a database.
RENAME Change the name of a table, view, or macro.
REPLACE MACRO Change a macro.
REPLACE VIEW Change a view.
REVOKE Revoke privileges to an object.
SET SESSION COLLATION Override the collation specification during the current session.
UPDATE Change a column value of an existing row or rows in a table.

FastExport Modes

FastExport has two modes: RECORD or INDICATOR. In the mainframe world, only use RECORD mode. In the UNIX or
LAN environment, INDICATOR mode is the default, but you can use INDICATOR mode if desired. The difference between
the two modes is INDICATOR mode will set the indicator bits to 1 for column values containing NULLS.

Both modes return data in a client internal format with variable-length records. Each individual record has a value for all of
the columns specified by the SELECT statement. All variable-length columns are preceded by a two-byte control value
indicating the length of the column data. NULL columns have a value that is appropriate for the column data type.
Remember, INDICATOR mode will set bit flags that identify the columns that have a null value.

FastExport Formats

FastExport has many possible formats in the UNIX or LAN environment. The FORMAT statement specifies the format for
each record being exported which are:

n FASTLOAD

Page 23 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

n BINARY

n TEXT

n UNFORMAT

The default FORMAT is FASTLOAD in a UNIX or LAN environment.

FASTLOAD format has a two-byte integer, followed by the data, followed by an end-of-record marker. It is called
FASTLOAD because the data is exported in a format ready for FASTLOAD.

BINARY format is a two-byte integer, followed by data.

TEXT format is an arbitrary number of bytes followed by an end-of-record marker.

UNFORMAT format is exactly as received from CLIv2 without any client modifications.

An Introduction to FastLoad
Why it is Called "FAST" Load

FastLoad is known for its lightning-like speed in loading vast amounts of data from flat files from a host into empty tables in
Teradata. Part of this speed is achieved because it does not use the Transient Journal. You will see some more of the
reasons enumerated below. But, regardless of the reasons that it is fast, know that FastLoad was developed to load
millions of rows into a table.

The way FastLoad works can be illustrated by home construction, of all things! Let's look at three scenarios from the
construction industry to provide an amazing picture of how the data gets loaded.

Scenario One: Builders prefer to start with an empty lot and construct a house on it, from the foundation right on up to the
roof. There is no pre-existing construction, just a smooth, graded lot. The fewer barriers there are to deal with, the quicker
the new construction can progress. Building custom or spec houses this way is the fastest way to build them. Similarly,
FastLoad likes to start with an empty table, like an empty lot, and then populate it with rows of data from another source.
Because the target table is empty, this method is typically the fastest way to load data. FastLoad will never attempt to
insert rows into a table that already holds data.

Scenario Two: The second scenario in this analogy is when someone buys the perfect piece of land on which to build a
home, but the lot already has a house on it. In this case, the person may determine that it is quicker and more
advantageous just to demolish the old house and start fresh from the ground up — allowing for brand new construction.
FastLoad also likes this approach to loading data. It can just 1) drop the existing table, which deletes the rows, 2) replace
its structure, and then 3) populate it with the latest and greatest data. When dealing with huge volumes of new rows, this
process will run much quicker than using MultiLoad to populate the existing table. Another option is to DELETE all the data
rows from a populated target table and reload it. This requires less updating of the Data Dictionary than dropping and
recreating a table. In either case, the result is a perfectly empty target table that FastLoad requires!

FastLoad Has Some Limits

There are more reasons why FastLoad is so fast. Many of these become restrictions and therefore, cannot slow it down.
For instance, can you imagine a sprinter wearing cowboy boots in a race? Of course, not! Because of its speed, FastLoad,
too, must travel light! This means that it will have limitations that may or may not apply to other load utilities. Remembering
this short list will save you much frustration from failed loads and angry colleagues. It may even foster your reputation as a
smooth operator!

Rule #1: No Secondary Indexes are allowed on the Target Table. High performance will only allow FastLoad to utilize
Primary Indexes when loading. The reason for this is that Primary (UPI and NUPI) indexes are used in Teradata to
distribute the rows evenly across the AMPs and build only data rows. A secondary index is stored in a subtable block and
many times on a different AMP from the data row. This would slow FastLoad down and they would have to call it: get ready
now, HalfFastLoad. Therefore, FastLoad does not support them. If Secondary Indexes exist already, just drop them. You
may easily recreate them after completing the load.

Rule #2: No Referential Integrity is allowed. FastLoad cannot load data into tables that are defined with Referential
Integrity (RI). This would require too much system checking to prevent referential constraints to a different table. FastLoad

Page 24 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

only does one table. In short, RI constraints will need to be dropped from the target table prior to the use of FastLoad.

Rule #3: No Triggers are allowed at load time. FastLoad is much too focused on speed to pay attention to the needs of
other tables, which is what Triggers are all about. Additionally, these require more than one AMP and more than one table.
FastLoad does one table only. Simply ALTER the Triggers to the DISABLED status prior to using FastLoad.

Rule #4: Duplicate Rows (in Multi-Set Tables) are not supported. Multiset tables are tables that allow duplicate rows
— that is when the values in every column are identical. When FastLoad finds duplicate rows, they are discarded. While
FastLoad can load data into a multi-set table, FastLoad will not load duplicate rows into a multi-set table because FastLoad
discards duplicate rows!

Rule #5: No AMPs may go down (i.e., go offline) while FastLoad is processing. The down AMP must be repaired
before the load process can be restarted. Other than this, FastLoad can recover from system glitches and perform restarts.
We will discuss Restarts later in this chapter.

Rule #6: No more than one data type conversion is allowed per column during a FastLoad. Why just one? Data
type conversion is highly resource intensive job on the system, which requires a "search and replace" effort. And that takes
more time. Enough said!

Three Key Requirements for FastLoad to Run

FastLoad can be run from either MVS/ Channel (mainframe) or Network (LAN) host. In either case, FastLoad requires three
key components. They are a log table, an empty target table and two error tables. The user must name these at the
beginning of each script.

Log Table: FastLoad needs a place to record information on its progress during a load. It uses the table called Fastlog in
the SYSADMIN database. This table contains one row for every FastLoad running on the system. In order for your
FastLoad to use this table, you need INSERT, UPDATE and DELETE privileges on that table.

Empty Target Table: We have already mentioned the absolute need for the target table to be empty. FastLoad does not
care how this is accomplished. After an initial load of an empty target table, you are now looking at a populated table that
will likely need to be maintained.

If you require the phenomenal speed of FastLoad, it is usually preferable, both for the sake of speed and for less
interaction with the Data Dictionary, just to delete all the rows from that table and then reload it with fresh data. The syntax
DELETE <databasename>.<tablename> should be used for this. But sometimes, as in some of our FastLoad sample
scripts below (see Figure 4-1), you want to drop that table and recreate it versus using the DELETE option. To do this,
FastLoad has the ability to run the DDL statements DROP TABLE and CREATE TABLE. The problem with putting DDL in
the script is that is no longer restartable and you are required to rerun the FastLoad from the beginning. Otherwise, we
recommend that you have a script for an initial run and a different script for a restart.

Two Error Tables: Each FastLoad requires two error tables. These are error tables that will only be populated should
errors occur during the load process. These are required by the FastLoad utility, which will automatically create them for
you; all you must do is to name them. The first error table is for any translation errors or constraint violations. For example,
a row with a column containing a wrong data type would be reported to the first error table. The second error table is for
errors caused by duplicate values for Unique Primary Indexes (UPI). FastLoad will load just one occurrence for every UPI.
The other occurrences will be stored in this table. However, if the entire row is a duplicate, FastLoad counts it but does not
store the row. These tables may be analyzed later for troubleshooting should errors occur during the load. For specifics on
how you can troubleshoot, see the section below titled, "What Happens When FastLoad Finishes."

FastLoad Commands

Here is a table of some key FastLoad commands and their definitions. They are used to provide flexibility in control of the
load process. Consider this your personal redireference guide! You will notice that there are only a few SQL commands
that may be used with this utility (Create Table, Drop Table, Delete and Insert). This keeps FastLoad from becoming
encumbered with additional functions that would slow it down.

AXSMOD Short for Access Module, this command specifies input protocol like OLE-DB or reading a tape from REEL
Librarian. This parameter is for network-attached systems only. When used, it must precede the DEFINE
command in the script.
BEGIN LOADING This identifies and locks the FastLoad target table for the duration of the load. It also identifies the two error

Page 25 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

tables to be used for the load. CHECKPONT and INDICATORS are subordinate commands in the BEGIN
LOADING clause of the script. CHECKPOINT, which will be discussed below in detail, is not the default for
FastLoad. It must be specified in the script. INDICATORS is a keyword related to how FastLoad handles nulls in
the input file. It identifies columns with nulls and uses a bitmap at the beginning of each row to show which fields
contain a null instead of data. When the INDICATORS option is on, FastLoad looks at each bit to identify the null
column. The INDICATORS option does not work with VARTEXT.
CREATE TABLE This defines the target table and follows normal syntax. If used, this should only be in the initial script. If the table
is being loaded, it cannot be created a second time.
DEFINE This names the input file and describes the columns in that file and the data types for those columns.
DELETE Deletes all the rows of a table. This will only work in the initial run of the script. Upon restart, it will fail because
the table is locked.
DROP TABLE Drops a table and its data. It is used in FastLoad to drop previous Target and error tables. At the same time,
this is not a good thing to do within a FastLoad script since it cancels the ability to restart.
END LOADING Success! This command indicates the point at which that all the data has been transmitted. It tells FastLoad to
proceed to Phase II. As mentioned earlier, it can be used as a way to partition data loads to the same table by
omitting if from the script. This is true because the table remains empty until after Phase II.
ERRLIMIT Specifies the maximum number of rejected ROWS allowed in error table 1 (Phase I). This handy command can
be a lifesaver when you are not sure how corrupt the data in the input file is. The more corrupt it is, the greater
the clean up effort required after the load finishes. ERRLIMIT provides you with a safety valve. You may specify a
particular number of error rows beyond which FastLoad will precede to the abort. This provides the option to
restart the FastLoad or to scrub the input data more before loading it. Remember, all the rows in the error table
are not in the data table. That becomes your responsibility.
HELP Designed for online use, the Help command provides a list of all possible FastLoad commands along with brief,
but pertinent tips for using them.
HELP TABLE Builds the table columns list for use in the FastLoad DEFINE statement when the data matches the Create
Table statement exactly. In real life this does not happen very often.
INSERT This is FastLoad's favorite command! It inserts rows into the target table.
LOGON/LOGOFF No, this is not the WAX ON / WAX OFF from the movie, The Karate Kid! LOGON simply begins a session.
or, QUIT LOGOFF ends a session. QUIT is the same as LOGOFF.
NOTIFY Just like it sounds, the NOTIFY command used to inform the job that follows that some event has occurred. It
calls a user exit or predetermined activity when such events occur. NOTIFY is often used for detailed reporting
on the FastLoad job's success.
RECORD Specifies the beginning record number (or with THRU, the ending record number) of the Input data source, to be
read by FastLoad. Syntactically, This command is placed before the INSERT keyword. Why would it be used?
Well, it enables FastLoad to bypass input records that are not needed such as tape headers, manual restart,
etc. When doing a partition data load, RECORD is used to over-ride the checkpoint.
SET RECORD Used only in the LAN environment, this command states in what format the data from the Input file is coming:
FastLoad, Unformatted, Binary, Text, or Variable Text. The default is the Teradata RDBMS standard, FastLoad.
SESSIONS This command specifies the number of FastLoad sessions to establish with Teradata. It is written in the script
just before the logon. The default is 1 session per available AMP. The purpose of multiple sessions is to
enhance throughput when loading large volumes of data. Too few sessions will stifle throughput. Too many will
preclude availability of system resources to other users. You will need to find the proper balance for your
configuration.
SLEEP Working in conjunction with TENACITY, the SLEEP command specifies the amount of time in minutes to wait
before retrying to logon and establish all sessions. This situation can occur if all of the loader slots are used or if
the number of requested sessions are not available. The default is 6 minutes. For example, suppose that
Teradata sessions are already maxed-out when your job is set to run. If TENACITY were set at 4 and SLEEP at
10, then FastLoad would attempt to logon every 10 minutes for up to 4 hours. If there were no success by that
time, all efforts to logon would cease.
TENACITY Sometimes there are too many sessions already established with Teradata for a FastLoad to obtain the number
of sessions it requested to perform its task or all of the loader slots are currently used. TENACITY specifies the
amount of time, in hours, to retry to obtain a loader slot or to establish all requested sessions to logon. The
default for FastLoad is "no tenacity", meaning that it will not retry at all. If several FastLoad jobs are executed at
the same time, we recommend setting the TENACITY to 4, meaning that the system will continue trying to logon
for the number of sessions requested for up to four hours.

Fastload Exercise

"Mistakes are a part of being human. Appreciate your mistakes for what they are: precious life lessons that can

Page 26 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

only be learned the hard way. Unless it's a fatal mistake, which, at least, others can learn from."
– Al Franken

Fastload is a utility we can use to populate empty tables. Make no mistake about how useful Fastload can be or how fatal
errors can occur. The next 2 slides illustrate the essentials needed when constructing your fastload script. The first will
highlight the important areas about the FastLoad script, and the second slide is a blank copy of the script that you can use
to create your own FastLoad script. Use the flat file we created in the BTEQ chapter to help run the script.

Structuring our Fastload script.

Simply copy the following text into notepad, then save it with a name and location that you can easily remember (we saved
ours as c:\temp\Fastload_First_Script.txt).

Page 27 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

This script is going to create a table called Employee_Table02. After the table is created, it's going to take the information
from our flat file and insert it into the new table. Afterwards, the Employee_Table and Employee_Table02 should look
identical.

Executing Our FastLoad Script

"A good plan, violently executed now, is better than a perfect plan next week."
- George S. Patton

We can execute the Fastload utility like we do with BTEQ; however we use the command "fastload" instead of "BTEQ". If
we get a return code of 0 then the Fastload worked perfectly. What did General Patton say when his Fastload gave him a
return code of 12? I shall return 0!

Executing our Fastload script

Let's see if it worked:

Page 28 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

The load utilities often scare people because there are many things that appear complicated. In actuality, the load scripts
are very simple. Think of FastLoad as:

n Logging onto Teradata

n Defining the Teradata table that you want to load (target table)

n Defining the INPUT data file

n Telling the system to start loading

Sample FastLoad Script

Normally it is not a good idea to put the DROP and CREATE statements in a FastLoad script. The reason is that when any
of the tables that FastLoad is using are dropped, the script cannot be restarted. It can only be rerun from the beginning.
Since FastLoad has restart logic built into it, a restart is normally the better solution if the initial load attempt should fail.
However, for purposes of this example, it shows the table structure and the description of the data being read.

Let's look at another FastLoad script that you might see in the real world. In the script below, every comment line is placed
inside the normal Teradata comment syntax, [/*. . . . */]. FastLoad and SQL commands are written in upper case in order to
make them stand out. In reality, Teradata utilities, like Teradata itself, are by default not case sensitive. You will also note
that when column names are listed vertically we recommend placing the comma separator in front of the following column.
Coding this way makes reading or debugging the script easier for everyone. The purpose of this script is to load the
Employee_Profile table in the SQL01 database. The input file used for the load is named EMPS.TXT. Below the sample
script each step will be described in detail.

/* FASTLOAD SCRIPT TO LOAD THE */ Since this script does not drop the target or error tables, it is
/* Employee_Profile TABLE */ restartable. This is a good thing for production jobs.
/* Created by Coffing Data Warehousing */

/* Setup the FastLoad Parameters */

SESSIONS 100; /*or, the number of sessions supportable*/

TENACITY 4; /* the default is no tenacity, means no Specify the number of sessions to logon.
retry */ Tenacity is set to 4 hr; Wait 10 Min between retries.
SLEEP 10; /* the default is 6, means retry in 6
minutes */

LOGON CW/SQL01,SQL01;

SHOW VERSIONS; /* Shows the Utility's release number */  

Page 29 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

/* Set the Record type to a comma delimited for FastLoad */


RECORD 2;

SET RECORD VARTEXT ",";

/* Define the Text File Layout and Input File */ Starts with the second record.

DEFINE Employee_No (VARCHAR(10)) Specifies if record layout is vartext with a comma delimiter.
,Last_name (VARCHAR(20))
,First_name (VARCHAR(12))
,Salary (VARCHAR(5))
,Dept_No (VARCHAR(6))
FILE= EMPS.TXT;

/* Optional to show the layout of the input */ Notice that all fields are defined as VARCHAR. When using
SHOW; VARTEXT, the fields do not contain the length field like in these
formats: text, FastLoad, or unformatted.
/* Begin the Load and Insert Process into the */
/* Employee_Profile Table */

BEGIN LOADING SQL01.Employee_Profile


ERRORFILES SQL01.Emp_Err1, SQL01.Emp_Err2
CHECKPOINT 100000;

INSERT INTO SQL01.Employee_Profile VALUES Specifies table to load and lock.


( :Employee_No Names the error tables Sets the number of rows at which to pause
,:Last_name & record progress in the restart log before loading further.
,:First_name
,:Salary Defines the insert statement to use for loading the rows
,:Dept_No ); Continues loading process with Phase 2.
Logs off of Teradata.
END LOADING;

LOGOFF;

Step One: Before logging onto Teradata, it is important to specify how many sessions you need. The syntax is
[SESSIONS {n}].

Step Two: Next, you LOGON to the Teradata system. You will quickly see that the utility commands in FastLoad are
similar to those in BTEQ. FastLoad commands were designed from the underlying commands in BTEQ. However, unlike
BTEQ, most of the FastLoad commands do not allow a dot ["."] in front of them and therefore need a semicolon. At this
point we chose to have Teradata tell us which version of FastLoad is being used for the load. Why would we recommend
this? We do because as FastLoad's capabilities get enhanced with newer versions, the syntax of the scripts may have to
be revisited.

Step Three: If the input file is not a FastLoad format, before you describe the INPUT FILE structure in the DEFINE
statement, you must first set the RECORD layout type for the file being passed by FastLoad. We have used VARTEXT in
our example with a comma delimiter. The other options are FastLoad, TEXT, UNFORMATTED OR VARTEXT. You need
to know this about your input file ahead of time.

Step Four: Next, comes the DEFINE statement. FastLoad must know the structure and the name of the flat file to be used
as the input FILE, or source file for the load.

Step Five: FastLoad makes no assumptions from the DROP TABLE statements with regard to what you want loaded. In
the BEGIN LOADING statement, the script must name the target table and the two error tables for the load. Did you notice
that there is no CREATE TABLE statement for the error tables in this script? FastLoad will automatically create them for
you once you name them in the script. In this instance, they are named "Emp_Err1" and "Emp_Err2". Phase 1 uses
"Emp_Err1" because it comes first and Phase 2 uses "Emp_Err2". The names are arbitrary, of course. You may call them
whatever you like. At the same time, they must be unique within a database, so using a combination of your userid and
target table name helps insure this uniqueness between multiple FastLoad jobs occurring in the same database.

Page 30 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

In the BEGIN LOADING statement we have also included the optional CHECKPOINT parameter. We included
[CHECKPOINT 100000]. Although not required, this optional parameter performs a vital task with regard to the load. In the
old days, children were always told to focus on the three "R's' in grade school ("reading, 'riting, and 'rithmatic"). There are
two very different, yet equally important, R's to consider whenever you run FastLoad. They are RERUN and RESTART.
RERUN means that the job is capable of running all the processing again from the beginning of the load. RESTART means
that the job is capable of running the processing again from the point where it left off when the job was interrupted,
causing it to fail. When CHECKPOINT is requested, it allows FastLoad to resume loading from the first row following the
last successful CHECKPOINT. We will learn more about CHECKPOINT in the section on Restarting FastLoad.

Step Six: FastLoad focuses on its task of loading data blocks to AMPs like little Yorkshire terrier's do when playing with a
ball! It will not stop unless you tell it to stop. Therefore, it will not proceed to Phase 2 without the END LOADING command.

In reality, this provides a very valuable capability for FastLoad. Since the table must be empty at the start of the job, it
prevents loading rows as they arrive from different time zones. However, to accomplish this processing, simply omit the
END LOADING on the load job. Then, you can run the same FastLoad multiple times and continue loading the worktables
until the last file is received. Then run the last FastLoad job with an END LOADING and you have partitioned your load
jobs into smaller segments instead of one huge job. This makes FastLoad even faster!

Of course to make this work, FastLoad must be restartable. Therefore, you cannot use the DROP or CREATE commands
within the script. Additionally, every script is exactly the same with the exception of the last one, which contains the END
LOADING causing FastLoad to proceed to Phase 2. That's a pretty clever way to do a partitioned type of data load.

Step Seven: All that goes up must come down. And all the sessions must LOGOFF. This will be the last utility command in
your script. At this point the table lock is released and if there are no rows in the error tables, they are dropped
automatically. However, if a single row is in one of them, you are responsible to check it, take the appropriate action and
drop the table manually.

When You Cannot RESTART FastLoad

There are two types of FastLoad scripts: those that you can restart and those that you cannot without modifying the script.
If any of the following conditions are true of the FastLoad script that you are dealing with, it is NOT restartable:

n The Error Tables are DROPPED

n The Target Table is DROPPED

n The Target Table is CREATED

When You Can RESTART FastLoad

If all of the following conditions are true, then FastLoad is ALWAYS restartable:

n The Error Tables are NOT DROPPED in the script

n The Target Table is NOT DROPPED in the script

n The Target Table is NOT CREATED in the script

n You have defined a checkpoint

So, if you need to drop or create tables, do it in a separate job using BTEQ. Imagine that you have a table whose data
changes so much that you typically drop it monthly and build it again. Let's go back to the script we just reviewed above
and see how we can break it into the two parts necessary to make it fully RESTARTABLE. It is broken up below.

STEP ONE: Run the following SQL statements in Queryman or BTEQ before you start FastLoad:

DROP TABLE SQL01.Department; DROPS TARGET TABLE AND ERROR TABLES


DROP TABLE SQL01.Dept_Err1;
DROP TABLE SQL01.Dept_Err2;

CREATE TABLE SQL01.Department CREATES THE DEPARTMENT TARGET TABLE IN THE SQL01 DATA BASE IN
(Dept_No INTEGER TERADATA

Page 31 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

,Dept_Name CHAR(20)
)
UNIQUE PRIMARY INDEX (Dept_No);

First, you ensure that the target table and error tables, if they existed previously, are blown away. If there had been no
errors in the error tables, they would be automatically dropped. If these tables did not exist, you have not lost anything.
Next, if needed, you create the empty table structure needed to receive a FastLoad.

STEP TWO: Run the FastLoad script

This is the portion of the earlier script that carries out these vital steps:

n Defines the structure of the flat file

n Tells FastLoad where to load the data and store the errors

n Specifies the checkpoint so a RESTART will not go back to row one

n Loads the data

If these are true, all you need do is resubmit the FastLoad job and it starts loading data again with the next record after the
last checkpoint. Now, with that said, if you did not request a checkpoint, the output message will normally indicate how
many records were loaded.

You may optionally use the RECORD command to manually restart on the next record after the one indicated in the
message.

Now, if the FastLoad job aborts in Phase 2, you can simply submit a script with only the BEGIN LOADING and END
LOADING. It will then restart right into Phase 2.

An Introduction to MultiLoad
Why it is Called "Multi" Load

If we were going to be stranded on an island with a Teradata Data Warehouse and we could only take along one Teradata
load utility, clearly, MultiLoad would be our choice. MultiLoad has the capability to load multiple tables at one time from
either a LAN or Channel environment. This is in stark contrast to its fleet-footed cousin, FastLoad, which can only load one
table at a time. And it gets better, yet!

This feature rich utility can perform multiple types of DML tasks, including INSERT, UPDATE, DELETE and UPSERT on
up to five (5) empty or populated target tables at a time. These DML functions may be run either solo or in combinations,
against one or more tables. For these reasons, MultiLoad is the utility of choice when it comes to loading populated tables
in the batch environment. As the volume of data being loaded or updated in a single block, the performance of MultiLoad
improves. MultiLoad shines when it can impact more than one row in every data block. In other words, MultiLoad looks at
massive amounts of data and says, "Bring it on!"

Leo Tolstoy once said, "All happy families resemble each other." Like happy families, the Teradata load utilities resemble
each other, although they may have some differences. You are going to be pleased to find that you do not have to learn all
new commands and concepts for each load utility. MultiLoad has many similarities to FastLoad. It has even more
commands in common with TPump. The similarities will be evident as you work with them. Where there are some quirky
differences, we will point them out for you.
Two MultiLoad Modes: IMPORT and DELETE

MultiLoad provides two types of operations via modes: IMPORT and DELETE. In MultiLoad IMPORT mode, you have the
freedom to "mix and match" up to twenty (20) INSERTs, UPDATEs or DELETEs on up to five target tables. The execution
of the DML statements is not mandatory for all rows in a table. Instead, their execution hinges upon the conditions
contained in the APPLY clause of the script. Once again, MultiLoad demonstrates its user-friendly flexibility. For UPDATEs
or DELETEs to be successful in IMPORT mode, they must reference the Primary Index in the WHERE clause.

The MultiLoad DELETE mode is used to perform a global (all AMP) delete on just one table. The reason to use .BEGIN
DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be RESTARTed if an error causes it to terminate
prior to finishing. When performing in DELETE mode, the DELETE SQL statement cannot reference the Primary Index in

Page 32 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

the WHERE clause. This due to the fact that a primary index access is to a specific AMP; this is a global operation.

The other factor that makes a DELETE mode operation so good is that it examines an entire block of rows at a time. Once
all the eligible rows have been removed, the block is written one time and a checkpoint is written. So, if a restart is
necessary, it simply starts deleting rows from the next block without a checkpoint. This is a smart way to continue.
Remember, when using the TJ all deleted rows are put back into the table from the TJ as a rollback. A rollback can take
longer to finish then the delete. MultiLoad does not do a rollback; it does a restart.

In the above diagram, monthly data is being stored in a quarterly table. To keep the contents limited to four months,
monthly data is rotated in and out. At the end of every month, the oldest month of data is removed and the new month is
added. The cycle is "add a month, delete a month, add a month, delete a month." In our illustration, that means that
January data must be deleted to make room for May's data.

Here is a question for you: What if there was another way to accomplish this same goal without consuming all of these
extra resources? To illustrate, let's consider the following scenario: Suppose you have TableA that contains 12 billion rows.
You want to delete a range of rows based on a date and then load in fresh data to replace these rows. Normally, the
process is to perform a MultiLoad DELETE to DELETE FROM TableA WHERE <date-column> < '2002-02-01'. The final
step would be to INSERT the new rows for May using MultiLoad IMPORT.

MultiLoad Imposes Limits

Rule #1: Unique Secondary Indexes are not supported on a Target Table. Like FastLoad, MultiLoad does not
support Unique Secondary Indexes (USIs). But unlike FastLoad, it does support the use of Non-Unique Secondary Indexes
(NUSIs) because the index subtable row is on the same AMP as the data row. MultiLoad uses every AMP independently
and in parallel. If two AMPs must communicate, they are not independent. Therefore, a NUSI (same AMP) is fine, but a USI
(different AMP) is not.

Rule #2: Referential Integrity is not supported. MultiLoad will not load data into tables that are defined with Referential
Integrity (RI). Like a USI, this requires the AMPs to communicate with each other. So, RI constraints must be dropped from
the target table prior to using MultiLoad.

Rule #3: Triggers are not supported at load time. Triggers cause actions on related tables based upon what happens
in a target table. Again, this is a multi-AMP operation and to a different table. To keep MultiLoad running smoothly, disable
all Triggers prior to using it.

Page 33 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Rule #4: No concatenation of input files is allowed. MultiLoad does not want you to do this because it could impact are
restart if the files were concatenated in a different sequence or data was deleted between runs.

Rule #5: The host will not process aggregates, arithmetic functions or exponentiation. If you need data
conversions or math, you might be better off using an INMOD to prepare the data prior to loading it.

Error Tables, Work Tables and Log Tables

Besides target table(s), MultiLoad requires the use of four special tables in order to function. They consist of two error
tables (per target table), one worktable (per target table), and one log table. In essence, the Error Tables will be used to
store any conversion, constraint or uniqueness violations during a load. Work Tables are used to receive and sort data
and SQL on each AMP prior to storing them permanently to disk. A Log Table (also called, "Logtable") is used to store
successful checkpoints during load processing in case a RESTART is needed.

HINT: Sometimes a company wants all of these load support tables to be housed in a particular database. When these
tables are to be stored in any database other than the user's own default database, then you must give them a qualified
name (<databasename>.<tablename>) in the script or use the DATABASE command to change the current database.

Where will you find these tables in the load script? The Logtable is generally identified immediately prior to the .LOGON
command. Worktables and error tables can be named in the BEGIN MLOAD statement. Do not underestimate the value of
these tables. They are vital to the operation of MultiLoad. Without them a MultiLoad job can not run. Now that you have
had the "executive summary", let's look at each type of table individually.

Two Error Tables: Here is another place where FastLoad and MultiLoad are similar. Both require the use of two error
tables per target table. MultiLoad will automatically create these tables. Rows are inserted into these tables only when
errors occur during the load process. The first error table is the acquisition Error Table (ET). It contains all translation and
constraint errors that may occur while the data is being acquired from the source(s).

The second is the Uniqueness Violation (UV) table that stores rows with duplicate values for Unique Primary Indexes
(UPI). Since a UPI must be unique, MultiLoad can only load one occurrence into a table. Any duplicate value will be stored
in the UV error table. For example, you might see a UPI error that shows a second employee number "99." In this case, if
the name for employee "99" is Kara Morgan, you will be glad that the row did not load since Kara Morgan is already in the
Employee table. However, if the name showed up as David Jackson, then you know that further investigation is needed,
because employee numbers must be unique.

Each error table does the following:

n Identifies errors

n Provides some detail about the errors

n Stores the actual offending row for debugging

You have the option to name these tables in the MultiLoad script (shown later). Alternatively, if you do not name them, they
default to ET_<target_table_name> and UV_<target_table_name>. In either case, MultiLoad will not accept error table
names that are the same as target table names. It does not matter what you name them. It is recommended that you
standardize on the naming convention to make it easier for everyone on your team. For more details on how these error
tables can help you, see the subsection in this chapter titled, "Troubleshooting MultiLoad Errors."

Log Table: MultiLoad requires a LOGTABLE. This table keeps a record of the results from each phase of the load so that
MultiLoad knows the proper point from which to RESTART. There is one LOGTABLE for each run. Since MultiLoad will not
resubmit a command that has been run previously, it will use the LOGTABLE to determine the last successfully completed
step.

Work Table(s): MultiLoad will automatically create one worktable for each target table. This means that in IMPORT mode
you could have one or more worktables. In the DELETE mode, you will only have one worktable since that mode only
works on one target table. The purpose of worktables is to hold two things:
1. The Data Manipulation Language (DML) tasks

2. The input data that is ready to APPLY to the AMPs

The worktables are created in a database using PERM space. They can become very large. If the script uses multiple SQL

Page 34 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

statements for a single data record, the data is sent to the AMP once for each SQL statement. This replication guarantees
fast performance and that no SQL statement will ever be done more than once. So, this is very important. However, there
is no such thing as a free lunch, the cost is space. Later, you will see that using a FILLER field can help reduce this disk
space by not sending unneeded data to an AMP. In other words, the efficiency of the MultiLoad run is in your hands.

Supported Input Formats

Data input files come in a variety of formats but MultiLoad is flexible enough to handle many of them. MultiLoad supports
the following five format options: BINARY, FASTLOAD, TEXT, UNFORMAT and VARTEXT.

BINARY Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is the smallest means of storage of for
Teradata.
FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies the end of the record.
TEXT Each record has a random number of bytes and is followed by an end of the record marker.
UNFORMAT The format for these input records is defined in the LAYOUT statement of the MultiLoad script using the components
FIELD, FILLER and TABLE.
VARTEXT This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only
use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two
delimiter characters in a row will result in a null value between them.

MultiLoad Has Five IMPORT Phases

MultiLoad IMPORT has five phases, but don't be fazed by this! Here is the short list:

n Phase 1: Preliminary Phase

n Phase 2: DML Transaction Phase

n Phase 3: Acquisition Phase

n Phase 4: Application Phase

n Phase 5: Cleanup Phase

Let's take a look at each phase and see what it contributes to the overall load process of this magnificent utility. Should you
memorize every detail about each phase? Probably not. But it is important to know the essence of each phase because
sometimes a load fails. When it does, you need to know in which phase it broke down since the method for fixing the error
to RESTART may vary depending on the phase. And if you can picture what MultiLoad actually does in each phase, you
will likely write better scripts that run more efficiently.
Phase 1: Preliminary Phase

The ancient oriental proverb says, "Measure one thousand times; Cut once." MultiLoad uses Phase 1 to conduct several
preliminary set-up activities whose goal is to provide a smooth and successful climate for running your load. The first task
is to be sure that the SQL syntax and MultiLoad commands are valid. After all, why try to run a script when the system will
just find out during the load process that the statements are not useable? MultiLoad knows that it is much better to identify
any syntax errors, right up front. All the preliminary steps are automated. No user intervention is required in this phase.

Second, all MultiLoad sessions with Teradata need to be established. The default is the number of available AMPs.
Teradata will quickly establish this number as a factor of 16 for the basis regarding the number of sessions to create. The
general rule of thumb for the number of sessions to use for smaller systems is the following: use the number of AMPs plus
two more. For larger systems with hundreds of AMP processors, the SESSIONS option is available to lower the default.
Remember, these sessions are running on your poor little computer as well as on Teradata.

Each session loads the data to Teradata across the network or channel. Every AMP plays an essential role in the
MultiLoad process. They receive the data blocks, hash each row and send the rows to the correct AMP. When the rows
come to an AMP, it stores them in worktable blocks on disk. But, lest we get ahead of ourselves, suffice it to say that there
is ample reason for multiple sessions to be established.

What about the extra two sessions? Well, the first one is a control session to handle the SQL and logging. The second is a
back up or alternate for logging. You may have to use some trial and error to find what works best on your system

Page 35 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

configuration. If you specify too few sessions it may impair performance and increase the time it takes to complete load
jobs. On the other hand, too many sessions will reduce the resources available for other important database activities.

Third, the required support tables are created. They are the following:

Type of Table Table Details


ERRORTABLES MultiLoad requires two error tables per target table. The first error table contains constraint violations, while the
second error table stores Unique Primary Index violations.
WORKTABLES Work Tables hold two things: the DML tasks requested and the input data that is ready to APPLY to the AMPs.
LOGTABLE The LOGTABLE keeps a record of the results from each phase of the load so that MultiLoad knows the proper
point from which to RESTART.

The final task of the Preliminary Phase is to apply utility locks to the target tables. Initially, access locks are placed on all
target tables, allowing other users to read or write to the table for the time being. However, this lock does prevent the
opportunity for a user to request an exclusive lock. Although, these locks will still allow the MultiLoad user to drop the table,
no one else may DROP or ALTER a target table while it is locked for loading. This leads us to Phase 2.
Phase 2: DML Transaction Phase

In Phase 2, all of the SQL Data Manipulation Language (DML) statements are sent ahead to Teradata. MultiLoad allows
the use of multiple DML functions. Teradata's Parsing Engine (PE) parses the DML and generates a step-by-step plan to
execute the request. This execution plan is then communicated to each AMP and stored in the appropriate worktable for
each target table. In other words, each AMP is going to work off the same page.

Later, during the Acquisition phase the actual input data will also be stored in the worktable so that it may be applied in
Phase 4, the Application Phase. Next, a match tag is assigned to each DML request that will match it with the appropriate
rows of input data. The match tags will not actually be used until the data has already been acquired and is about to be
applied to the worktable. This is somewhat like a student who receives a letter from the university in the summer that lists
his courses, professor's names, and classroom locations for the upcoming semester. The letter is a "match tag" for the
student to his school schedule, although it will not be used for several months. This matching tag for SQL and data is the
reason that the data is replicated for each SQL statement using the same data record.
Phase 3: Acquisition Phase

With the proper set-up complete and the PE's plan stored on each AMP, MultiLoad is now ready to receive the INPUT
data. This is where it gets interesting! MultiLoad now acquires the data in large, unsorted 64K blocks from the host and
sends it to the AMPs.

At this point, Teradata does not care about which AMP receives the data block. The blocks are simply sent, one after the
other, to the next AMP in line. For their part, each AMP begins to deal with the blocks that they have been dealt. It is like a
game of cards - you take the cards that you have received and then play the game. You want to keep some and give some
away.

Similarly, the AMPs will keep some data rows from the blocks and give some away. The AMP hashes each row on the
primary index and sends it over the BYNET to the proper AMP where it will ultimately be used. But the row does not get
inserted into its target table, just yet. The receiving AMP must first do some preparation before that happens. Don't you
have to get ready before company arrives at your house? The AMP puts all of the hashed rows it has received from other
AMPs into the worktables where it assembles them into the SQL. Why? Because once the rows are reblocked, they can be
sorted into the proper order for storage in the target table. Now the utility places a load lock on each target table in
preparation for the Application Phase. Of course, there is no Acquisition Phase when you perform a MultiLoad DELETE
task, since no data is being acquired.
Phase 4: Application Phase

The purpose of this phase is to write, or APPLY, the specified changes to both the target tables and NUSI subtables. Once
the data is on the AMPs, it is married up to the SQL for execution. To accomplish this substitution of data into SQL, when
sending the data, the host has already attached some sequence information and five (5) match tags to each data row.
Those match tags are used to join the data with the proper SQL statement based on the SQL statement within a DMP
label. In addition to associating each row with the correct DML statement, match tags also guarantee that no row will be
updated more than once, even when a RESTART occurs.

Page 36 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

The following five columns are the matching tags:

MATCHING TAGS
ImportSeq Sequence number that identifies the IMPORT command where the error occurred
DMLSeq Sequence number for the DML statement involved with the error
SMTSeq Sequence number of the DML statement being carried out when the error was discovered
ApplySeq Sequence number that tells which APPLY clause was running when the error occurred
SourceSeq The number of the data row in the client file that was being built when the error took place

Remember, MultiLoad allows for the existence of NUSI processing during a load. Every hash-sequence sorted block from
Phase 3 and each block of the base table is read only once to reduce I/O operations to gain speed. Then, all matching
rows in the base block are inserted, updated or deleted before the entire block is written back to disk, one time. This is why
the match tags are so important. Changes are made based upon corresponding data and DML (SQL) based on the match
tags. They guarantee that the correct operation is performed for the rows and blocks with no duplicate operations, a block
at a time. And each time a table block is written to disk successfully, a record is inserted into the LOGTABLE. This permits
MultiLoad to avoid starting again from the very beginning if a RESTART is needed.

What happens when several tables are being updated simultaneously? In this case, all of the updates are scripted as a
multi-statement request. That means that Teradata views them as a single transaction. If there is a failure at any point of
the load process, MultiLoad will merely need to be RESTARTed from the point where it failed. No rollback is required. Any
errors will be written to the proper error table.
Phase 5: Clean Up Phase

Those of you reading these paragraphs that have young children or teenagers will certainly appreciate this final phase!
MultiLoad actually cleans up after itself. The utility looks at the final Error Code (&SYSRC). MultiLoad believes the adage,
"All is well that ends well." If the last error code is zero (0), all of the job steps have ended successfully (i.e., all has
certainly ended well). This being the case, all empty error tables, worktables and the log table are dropped. All locks, both
Teradata and MultiLoad, are released. The statistics for the job are generated for output (SYSPRINT) and the system
count variables are set. After this, each MultiLoad session is logged off. So what happens if the final error code is not zero?
Stay tuned. Restarting MultiLoad is a topic that will be covered later in this chapter.

MultiLoad Commands
Two Types of Commands

You may see two types of commands in MultiLoad scripts: tasks and support functions. MultiLoad tasks are commands that
are used by the MultiLoad utility for specific individual steps as it processes a load. Support functions are those commands
that involve the Teradata utility Support Environment (covered in Chapter 9), are used to set parameters, or are helpful for
monitoring a load.

The chart below lists the key commands, their type, and what they do.

MLOAD Type What does the MLOAD Command do?


Command
.BEGIN Support This command communicates directly with Teradata to specify if the MultiLoad mode is going to be
[ IMPORT] IMPORT or DELETE. Note that the word IMPORT is optional in the syntax because it is the
MLOAD DEFAULT, but DELETE is required. We recommend using the word IMPORT to make the coding
consistent and easier for others to read. Any parameters for the load, such as error limits or checkpoints
will be included under the .BEGIN command, too. It is important to know which commands or parameters
.BEGIN are optional since, if you do not include them, MultiLoad may supply defaults that may impact your load.
[ DELETE]
MLOAD

.DML Task The DML LABEL defines treatment options and labels for the application (APPLY) of data for the INSERT,
LABEL UPDATE, UPSERT and DELETE operations. A LABEL is simply a name for a requested SQL activity.
The LABEL is defined first, and then referenced later in the APPLY clause.
.END Task This instructs MultiLoad to finish the APPLY operations with the changes to the designated databases and
MLOAD tables.

Page 37 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

.FIELD Task This defines a column of the data source record that will be sent to the Teradata database via SQL. When
writing the script, you must include a FIELD for each data field you need in SQL. This command is used
with the LAYOUT command.
.FILLER Task Do not assume that MultiLoad has somehow uncovered much of what you used in your term papers at the
university! FILLER defines a field that is accounted for as part of the data source's row format, but is not
sent to the Teradata DBS. It is used with the LAYOUT command.
.LAYOUT Task LAYOUT defines the format of the INPUT DATA record so Teradata knows what to expect. If one record is
not large enough, you can concatenate multiple data records by using the LAYOUT parameter
CONTINUEIF to tell which value to perform for the concatenation. Another option is INDICATORS, which is
used to represent nulls by using the bitmap (1 bit per field) at the front of the data record.
.LOGON Support This specifies the username or LOGON string that will establish sessions for MultiLoad with Teradata.

.LOGTABLE Support This support command names the name of the Restart Log that will be used for storing CHECKPOINT
data pertaining to a load. The LOGTABLE is then used to tell MultiLoad where to RESTART, should that
be necessary. It is recommended that this command be placed before the .LOGON command.
.LOGOFF Support This command terminates any sessions established by the LOGON command.

.IMPORT Task This command defines the INPUT DATA FILE, file type, file usage, the LAYOUT to use and where to
APPLY the data to SQL.
.SET Support Optionally, you can SET utility variables. An example would be {.SET DBName TO 'CDW_Test'}.

.SYSTEM Support This interrupts the operation of MultiLoad in order to issue commands to the local operating system.

.TABLE Task This is a command that may be used with the .LAYOUT command. It identifies a table whose columns
(both their order and data types) are to be used as the field names and data descriptions of the data
source records.

Parameters for .BEGIN IMPORT MLOAD

Here is a list of components or parameters that may be used in the .BEGIN IMPORT command. Note: The parameters do
not require the usual dot prior to the command since they are actually sub-commands.

REQUIRED
PARAMETER OR NOT WHAT IT DOES
AMPCHECK Optional NONE specifies that MLOAD starts even with one down AMP per cluster if all tables
{NONE|APPLY|ALL} are Fallback. APPLY (DEFAULT) specifies MLOAD will not start or finish Phase 4
with a down AMP.
ALL specifies not to proceed if any AMPs are down, just like FastLoad.
AXSMOD Optional Short for Access Module, this command specifies input protocol like OLE-DB or
reading a tape from REEL Librarian. This parameter is for network-attached systems
only. When used, it must precede the DEFINE command in the script.
CHECKPOINT Optional You have two options: CHECKPOINT refers to the number of minutes, or frequency,
at which you wish a CHECKPOINT to occur if the number is 60 or less. If the number is
greater than 60, it designates the number of rows at which you want the
CHECKPOINT to occur. This command is NOT valid in DELETE mode.
ERRLIMIT errcount Optional You may specify the maximum number of errors, or the percentage, that you will
[errpercent] tolerate during the processing of a load job.

ERRORTABLES Optional Names the two error tables, two per target table. Note there is no comma separator.
ET_ERR UV_ERR

NOTIFY Optional If you opt to use NOTIFY for a any event during a load, you may designate the priority
{LOW|MEDIUM|HIGH|OFF of that notification: LOW for level events, MEDIUM for important events, HIGH for
events at operational decision points, and OFF to eliminate any notification at all for a
given phase.

Page 38 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

SESSIONS <MAX> <MIN> Optional This refers to the number of SESSIONS that should be established with Teradata. For
MultiLoad, the optimal number of sessions is the number of AMPs in the system, plus
two more.
You can also use MAX or MIN, which automatically use the maximum or minimum
number of sessions to complete the job. If you specify nothing, it will default to MAX.
SLEEP Optional Tells MultiLoad how frequently, in minutes, to try logging on to the system.

TABLES Tablename1, Required Names up to 5 target tables.


Tablename2...,Tablename5

TENACITY Optional Tells MultiLoad how many hours to try logging on when its initial effort to do so is
rebuffed.
WORKTABLES Tablename1 Optional Names the worktable(s), one per target table.
,Tablename2...,Tablename5

Parameters for .BEGIN DELETE MLOAD

Here is a list of components or parameters that may be used in the BEGIN DELETE command. Note: The parameters do
not require the usual dot prior to the command since parameters are actually sub-commands.

REQUIRED OR
PARAMETER NOT WHAT IT DOES
TABLES Tablename1 Required Names the Target table.

WORKTABLES Tablename1 Optional Names the worktable one per target table.

ERRORTABLES Optional Names the two error tables, two per target table and there is no comma
ET_ERR UV_ERR separator between them.

TENACITY Optional Tells MultiLoad how many hours to try establishing sessions when its initial effort
to do so is rebuffed.

A Simple MultiLoad IMPORT Script

"We must use time as a tool, not as a crutch."


– John F. Kennedy

Ask Not – What your Multiload can do for you. Ask what you can do for your Multiload. Multiload is a great tool when you're
short on time. Multiload can update, insert, delete or upsert on Teradata tables that are already populated. It can even do
all four in one script. Our flatfile will contain Employee_numbers and Salaries * 2. We are giving a big raise. We're going to
create a flat file to use with Multiload, as shown below:

Let's create a flat file for our Multiload

Page 39 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Let's Execute it:

Remember, we'll still use the BTEQ utility to create our flat file.

Building our Multiload Script

"I can accept failure, but I can't accept not trying."


-Michael Jordan

Getting these scripts down is a very hard process, so don't be discouraged if you have a couple of mistakes. The next two
slides will show you a blank copy of the basic Multiload script, as well as a marked slide illustrating the important parts of
the script:

Page 40 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Building Our Multiload Script

Creating our Multiload script

Executing Multiload

Page 41 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

"Ambition is a dream with a V8 Engine."


– Elvis Presley

You will feel like the King after executing your first Multiload script. Multiload is the Elvis Presley of data warehousing
because nobody knows how make more records then Multiload. If you have the ambition to learn, this book will give you
what it takes to steer through these utilities. We initialize the Multiload utility like we do with BTEQ, except that the keyword
with Multiload Is mload. Remember that this Multiload is going to double the salaries of our employees.

Let's execute our Multiload script

Here is a before and after image of our Employee_table02:

Another Simple MultiLoad IMPORT Script

MultiLoad can be somewhat intimidating to the new user because there are many commands and phases. In reality, the
load scripts are understandable when you think through what the IMPORT mode does:

n Setting up a Logtable

n Logging onto Teradata

n Identifying the Target, Work and Error tables

n Defining the INPUT flat file

n Defining the DML activities to occur

Page 42 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

n Naming the IMPORT file

n Telling MultiLoad to use a particular LAYOUT

n Telling the system to start loading

n Finishing loading and logging off of Teradata

This first script example is designed to show MultiLoad IMPORT in its simplest form. It depicts the loading of a three-
column Employee table. The actual script is in the left column and our comments are on the right. Below the script is a
step-by-step description of how this script works.

/* Simple Mload script */ Sets Up a Logtable and Logs on to Teradata

.LOGTABLE SQL01.CDW_Log;
.LOGON TDATA/SQL01,SQL0;

.BEGIN IMPORT MLOAD TABLES Begins the Load Process by naming the Target Table, Work table and error tables;
SQL01.Employee_Dept1 Notice NO comma between the error tables
WORKTABLES SQL01.CDW_WT
ERRORTABLES SQL01.CDW_ET
SQL01.CDW_UV;

.LAYOUT FILEIN; Names the LAYOUT of the INPUT record and defines its structure; Notice the dots
.FIELD Employee_No * CHAR(11); before the FIELD and FILLER and the semi-colons after each definition.
.FIELD Last_Name * CHAR(20);
.FILLER Junk_stuff * CHAR(100);
.FIELD Dept_No * CHAR(6);

.DML LABEL INSERTS; Names the DML Label

INSERT INTO SQL01.Employee_Dept1 Tells MultiLoad to INSERT a row into the target table and defines the row format.
(Employee_No Lists, in order, the VALUES (each one preceded by a colon) to be INSERTed.
,Last_Name
,Dept_No)
VALUES
(:Employee_No
,:Last_Name
,:Dept_No);

.IMPORT INFILE CDW_Join_Export.txt Names the Import File and its Format type; Cites the LAYOUT file to use tells Mload to
FORMAT TEXT APPLY the INSERTs.
LAYOUT FILEIN
APPLY INSERTS;

.END MLOAD; Ends MultiLoad and Logs off all MultiLoad sessions
.LOGOFF;

Step One: Setting up a Logtable and Logging onto Teradata — MultiLoad requires you specify a log table right at the
outset with the .LOGTABLE command. We have called it CDW_Log. Once you name the Logtable, it will be automatically
created for you. The Logtable may be placed in the same database as the target table, or it may be placed in another
database. Immediately after this you log onto Teradata using the .LOGON command. The order of these two commands is
interchangeable, but it is recommended to define the Logtable first and then to Log on, second. If you reverse the
order, Teradata will give a warning message. Notice that the commands in MultiLoad require a dot in front of the command
key word.

Step Two: Identifying the Target, Work and Error tables — In this step of the script you must tell Teradata which
tables to use. To do this, you use the .BEGIN IMPORT MLOAD command. Then you will preface the names of these tables
with the sub-commands TABLES, WORKTABLES AND ERROR TABLES. All you must do is name the tables and specify

Page 43 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

what database they are in. Work tables and error tables are created automatically for you. Keep in mind that you get to
name and locate these tables. If you do not do this, Teradata might supply some defaults of its own!

At the same time, these names are optional. If the WORKTABLES and ERRORTABLES had not specifically been named,
the script would still execute and build these tables. They would have been built in the default database for the user. The
name of the worktable would be WT_EMPLOYEE_DEPT1 and the two error tables would be called
ET_EMPLOYEE_DEPT1 and UV_EMPLOYEE_DEPT1, respectively.

Sometimes, large Teradata systems have a work database with a lot of extra PERM space. One customer calls this
database CORP_WORK. This is where all of the logtables and worktables are normally created. You can use a
DATABASE command to point all table creations to it or qualify the names of these tables individually.

Step Three: Defining the INPUT flat file record structure — MultiLoad is going to need to know the structure the
INPUT flat file. Use the .LAYOUT command to name the layout. Then list the fields and their data types used in your SQL
as a .FIELD. Did you notice that an asterisk is placed between the column name and its data type? This means to
automatically calculate the next byte in the record. It is used to designate the starting location for this data based on the
previous fields length. If you are listing fields in order and need to skip a few bytes in the record, you can either use
the .FILLER (like above) to position to the cursor to the next field, or the "*" on the Dept_No field could have been replaced
with the number 132 (CHAR(11)+CHAR(20)+CHAR(100)+1). Then, the .FILLER is not needed. Also, if the input record
fields are exactly the same as the table, the .TABLE can be used to automatically define all the .FIELDS for you. The
LAYOUT name will be referenced later in the .IMPORT command. If the input file is created with INDICATORS, it is
specified in the LAYOUT.

Step Four: Defining the DML activities to occur — The .DML LABEL names and defines the SQL that is to execute. It
is like setting up executable code in a programming language, but using SQL. In our example, MultiLoad is being told to
INSERT a row into the SQL01.Employee_Dept table. The VALUES come from the data in each FIELD because it is
preceded by a colon (:). Are you allowed to use multiple labels in a script? Sure! But remember this: Every label must be
referenced in an APPLY clause of the .IMPORT clause.

Step Five: Naming the INPUT file and its format type — This step is vital! Using the .IMPORT command, we have
identified the INFILE data as being contained in a file called "CDW_Join_Export.txt". Then we list the FORMAT type as
TEXT. Next, we referenced the LAYOUT named FILEIN to describe the fields in the record. Finally, we told MultiLoad to
APPLY the DML LABEL called INSERTS — that is, to INSERT the data rows into the target table. This is still a sub-
component of the .IMPORT MLOAD command. If the script is to run on a mainframe, the INFILE name is actually the name
of a JCL Data Definition (DD) statement that contains the real name of the file.

Notice that the .IMPORT goes on for 4 lines of information. This is possible because it continues until it finds the semi-
colon to define the end of the command. This is how it determines one operation from another. Therefore, it is very
important or it would have attempted to process the END LOADING as part of the IMPORT — it wouldn't work.

Step Six: Finishing loading and logging off of Teradata — This is the closing ceremonies for the load. MultiLoad to
wrap things up, closes the curtains, and logs off of the Teradata system.

Important note: Since the script above in Figure 5-6 does not DROP any tables, it is completely capable of being restarted
if an error occurs. Compare this to the next script in Figure 5-7. Do you think it is restartable? If you said no, pat yourself on
the back.

Error Treatment Options for the .DML LABEL Command

MultiLoad allows you to tailor how it deals with different types of errors that it encounters during the load process, to fit
your needs. Here is a summary of the options available to you:

ERROR TREATMENT OPTIONS FOR .DML LABEL


.DML LABEL {labelname}

{MARK | IGNORE} DUPLICATE [INSERT |UPDATE] ROWS


{MARK | IGNORE} MISSING [INSERT |UPDATE] ROWS

DO INSERT FOR [MISSING UPDATE] ROWS ;

Page 44 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

In IMPORT mode, you may specify as many as five distinct error-treatment options for one .DML statement. For
example, if there is more than one instance of a row, do you want MultiLoad to IGNORE the duplicate row, or to MARK it
(list it) in an error table? If you do not specify IGNORE, then MultiLoad will MARK, or record all of the errors. Imagine you
have a standard INSERT load that you know will end up recording about 20,000 duplicate row errors. Using the following
syntax "IGNORE DUPLICATE INSERT ROWS;" will keep them out of the error table. By ignoring those errors, you gain
three benefits:
1. You do not need to see all the errors.

2. The error table is not filled up needlessly.

3. MultiLoad runs much faster since it is not conducting a duplicate row check.

When doing an UPSERT, there are two rules to remember:

n The default is IGNORE MISSING UPDATE ROWS. Mark is the default for all operations. When doing an UPSERT, you
anticipate that some rows are missing, otherwise, why do an UPSERT. So, this keeps these rows out of your error
table.

n The DO INSERT FOR MISSING UPDATE ROWS is mandatory. This tells MultiLoad to insert a row from the data
source if that row does not exist in the target table because the update didn't find it.

The table that follows shows you, in more detail, how flexible your options are:

ERROR TREATMENT OPTIONS IN DETAIL


.DML LABEL OPTION WHAT IT DOES
MARK DUPLICATE INSERT This option logs an entry for all duplicate INSERT rows in the UV_ERR table. Use this when you want
ROWS to know about the duplicates.
IGNORE DUPLICATE This tells MultiLoad to IGNORE duplicate INSERT rows because you do not want to see them.
INSERT ROWS
MARK DUPLICATE UPDATE This logs the existence of every duplicate UPDATE row.
ROWS
IGNORE DUPLICATE This eliminates the listing of duplicate update row errors.
UPDATE ROWS
MARK MISSING UPDATE This option ensures a listing of data rows that had to be INSERTed since there was no row to
ROWS UPDATE.
IGNORE MISSING UPDATE This tells MultiLoad NOT to list UPDATE rows as an error. This is a good option when doing an
ROWS UPSERT since UPSERT will INSERT a new row.
MARK MISSING DELETE This option makes a note in the ET_Error Table that a row to be deleted is missing.
ROWS
IGNORE MISSING DELETE This option says, "Do not tell me that a row to be deleted is missing.
ROWS
DO INSERT for MISSING This is required to accomplish an UPSERT. It tells MultiLoad that if the row to be updated does not
UPDATE ROWS exist in the target table, then INSERT the entire row from the data source.

An UPSERT Sample Script

The following sample script is provided to demonstrate how do an UPSERT — that is, to update a table and if a row from
the data source table does not exist in the target table, then insert a new row. In this instance we are loading the
Student_Profile table with new data for the next semester. The clause "DO INSERT FOR MISSING UPDATE ROWS"
indicates an UPSERT. The DML statements that follow this option must be in the order of a single UPDATE statement
followed by a single INSERT statement.

/* !/bin/ksh* */ Load Runs from a shell script; Any words between /* … */ are
/* +++++++++++++++++++++++++++++++++++++ */ comments only and are not processed by Teradata;
/* MultiLoad UPSERT SCRIPT */
Names and describes the purpose of the script; names the author.
/*This script Updates the Student_Profile Table */
/* with new data and Inserts a new row into the table */
/* if the row to be updated does not exist. */

Page 45 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

/* Version 1.1 */
/* Created by Coffing Data Warehousing */
/* ++++++++++++++++++++++++++++++++++++++ */

/* Setup Logtable, Logon Statements*/ Sets Up a Logtable and then logs on to Teradata.

.LOGTABLE SQL01.CDW_Log;
.LOGON CDW/SQL01,SQL01;

/* Begin Import and Define Work and Error Tables */ Begins the Load Process by telling us first the names of the target
table, work table and error tables.
.BEGIN IMPORT MLOAD TABLES
SQL01.Student_Profile
WORKTABLES SQL01.SWA_WT
ERRORTABLES SQL01.SWA_ET
SQL01.SWA_UV;

/* Define Layout of Input File */ Names the LAYOUT of the INPUT file;
An ALL CHARACTER based flat file.
.LAYOUT FILEIN;
.FIELD Student_ID * INTEGER; Defines the structure of the INPUT file; Notice the dots before the
.FIELD Last_Name * CHAR (20); FIELD command and the semi-colons after each FIELD definition;
.FIELD First_Name * VARCHAR (12);
.FIELD Class_Code * CHAR (2);
.FIELD Grade_Pt * DECIMAL(5,2);

/* Begin INSERT and UPDATE Process on Table */ Names the DML Label
Tells MultiLoad to INSERT a row if there is not one to be
.DML LABEL UPSERTER UPDATED, i.e., UPSERT.
DO INSERT FOR MISSING UPDATE ROWS;
/* Without the above DO, one of these is guaranteed to
fail on this same table. If the UPDATE fails because
rows is missing, it corrects by doing the INSERT */

UPDATE SQL01.Student_Profile Defines the UPDATE.


SET Last_Name = :Last_Name Qualifies the UPDATE.
,First_Name = :First_Name
,Class_Code = :Class_Code
,Grade_Pt = :Grade_Pt
WHERE Student_ID = :Student_ID;

INSERT INTO SQL01.Student_Profile Defines the INSERT.


VALUES ( :Student_ID We recommend placing comma separators in front of the following
,:Last_Name column or value for easier debugging.
,:First_Name
,:Class_Code
,:Grade_Pt );

.IMPORT INFILE CDW_IMPORT.DAT Names the Import File and it names the Layout file to use and tells
LAYOUT FILEIN MultiLoad to APPLY the UPSERTs.
APPLY UPSERTER;

.END MLOAD; Ends MultiLoad and logs off of Teradata


.LOGOFF;

Troubleshooting MultiLoad Errors — More on the Error Tables

The output statistics in the above example indicate that the load was entirely successful. But that is not always the case.
Now we need to troubleshoot in order identify the errors and correct them, if desired. Earlier on, we noted that MultiLoad

Page 46 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

generates two error tables, the Acquisition Error and the Application error table. You may select from these tables to
discover the problem and research the issues.

For the most part, the Acquisition error table logs errors that occur during that processing phase. The Application error
table lists Unique Primary Index violations, field overflow errors on non-PI columns, and constraint errors that occur in the
APPLY phase. MultiLoad error tables not only list the errors they encounter, they also have the capability to STORE those
errors. Do you remember the MARK and IGNORE parameters? This is where they come into play. MARK will ensure that
the error rows, along with some details about the errors are stored in the error table. IGNORE does neither; it is as if the
error never occurred.

THREE COLUMNS SPECIFIC TO THE ACQUISITION ERROR TABLE


ErrorCode System code that identifies the error.
ErrorField Name of the column in the target table where the error happened; is left blank if the offending column cannot be
identified.
HostData The data row that contains the error.

THREE COLUMNS SPECIFIC TO THE APPLICATION ERROR TABLE


Uniqueness Contains a certain value that disallows duplicate row errors in this table; can be ignored, if desired.
DBCErrorCode System code that identifies the error.
DBCErrorField Name of the column in the target table where the error happened; is left blank if the offending column cannot be
identified. NOTE: A copy of the target table column immediately follows this column.

RESTARTing MultiLoad

Who hasn't experienced a failure at some time when attempting a load? Don't take it personally! Failures can and do occur
on the host or Teradata (DBC) for many reasons. MultiLoad has the impressive ability to RESTART from failures in either
environment. In fact, it requires almost no effort to continue or resubmit the load job. Here are the factors that determine
how it works:
First, MultiLoad will check the Restart Logtable and automatically resume the load process from the last successful
CHECKPOINT before the failure occurred. Remember, the Logtable is essential for restarts. MultiLoad uses neither the
Transient Journal nor rollbacks during a failure. That is why you must designate a Logtable at the beginning of your
script. MultiLoad either restarts by itself or waits for the user to resubmit the job. Then MultiLoad takes over right where
it left off.

Second, suppose Teradata experiences a reset while MultiLoad is running. In this case, the host program will restart
MultiLoad after Teradata is back up and running. You do not have to do a thing!

Third, if a host mainframe or network client fails during a MultiLoad, or the job is aborted, you may simply resubmit the
script without changing a thing. MultiLoad will find out where it stopped and start again from that very spot.

Fourth, if MultiLoad halts during the Application Phase it must be resubmitted and allowed to run until complete.

Fifth, during the Acquisition Phase the CHECKPOINT (n) you stipulated in the .BEGIN MLOAD clause will be enacted.
The results are stored in the Logtable. During the Application Phase, CHECKPOINTs are logged each time a data
block is successfully written to its target table.

HINT: The default number for CHECKPOINT is 15 minutes, but if you specify the CHECKPOINT as 60 or less, minutes
are assumed. If you specify the checkpoint at 61 or above, the number of records is assumed.

RELEASE MLOAD: When You DON'T Want to Restart MultiLoad

What if a failure occurs but you do not want to RESTART MultiLoad? Since MultiLoad has already updated the table
headers, it assumes that it still "owns" them. Therefore, it limits access to the table(s). So what is a user to do? Well there
is good news and bad news. The good news is that if the job you may use the RELEASE MLOAD command to release the
locks and rollback the job. The bad news is that if you have been loading multiple millions of rows, the rollback may take a
lot of time. For this reason, most customers would rather just go ahead and RESTART.

Before V2R3: In the earlier days of Teradata it was NOT possible to use RELEASE MLOAD if one of the following three

Page 47 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

conditions was true:

n In IMPORT mode, once MultiLoad had reached the end of the Acquisition Phase you could not use RELEASE MLOAD.
This is sometimes referred to as the "point of no return."

n In DELETE mode, the point of no return was when Teradata received the DELETE statement.

n If the job halted in the Apply Phase, you will have to RESTART the job.

With and since V2R3: The advent of V2R3 brought new possibilities with regard to using the RELEASE MLOAD
command. It can NOW be used in the APPLY Phase, if:

n You are running a Teradata V2R3 or later version

n You use the correct syntax:


RELEASE MLOAD <target-table> IN APPLY

n The load script has NOT been modified in any way

n The target tables either:

¡ Must be empty, or

¡ Must have no Fallback, no NUSIs, no Permanent Journals

You should be very cautious using the RELEASE command. It could potentially leave your table half updated. Therefore, it
is handy for a test environment, but please don't get too reliant on it for production runs. They should be allowed to finish to
guarantee data integrity.

An Introduction to TPump

The chemistry of relationships is very interesting. Frederick Buechner once stated, "My assumption is that the story of any
one of us is in some measure the story of us all." In this chapter, you will find that TPump has similarities with the rest of
the family of Teradata utilities. But this newer utility has been designed with fewer limitations and many distinguishing
abilities that the other load utilities do not have.

Do you remember the first Swiss ArmyTM knife you ever owned? Aside from its original intent as a compact survival tool,
this knife has thrilled generations with its multiple capabilities. TPump is the Swiss ArmyTM knife of the Teradata load
utilities. Just as this knife was designed for small tasks, TPump was developed to handle batch loads with low volumes.
And, just as the Swiss ArmyTM knife easily fits in your pocket when you are loaded down with gear, TPump is a perfect fit
when you have a large, busy system with few resources to spare. Let's look in more detail at the many facets of this
amazing load tool.
Why It Is Called "TPump"

TPump is the shortened name for the load utility Teradata Parallel Data Pump. To understand this, you must know how
the load utilities move the data. Both FastLoad and MultiLoad assemble massive volumes of data rows into 64K blocks and
then moves those blocks. Picture in your mind the way that huge ice blocks used to be floated down long rivers to large
cities prior to the advent of refrigeration. There they were cut up and distributed to the people. TPump does NOT move
data in the large blocks. Instead, it loads data one row at a time, using row hash locks. Because it locks at this level,
and not at the table level like MultiLoad, TPump can make many simultaneous, or concurrent, updates on a table.

Envision TPump as the water pump on a well. Pumping in a very slow, gentle manner results in a steady trickle of water
that could be pumped into a cup. But strong and steady pumping results in a powerful stream of water that would require a
larger container. TPump is a data pump which, like the water pump, may allow either a trickle-feed of data to flow into the
warehouse or a strong and steady stream. In essence, you may "throttle" the flow of data based upon your system and
business user requirements. Remember, TPump is THE PUMP!

TPump Has Many Unbelievable Abilities

Just in Time: Transactional systems, such those implemented for ATM machines or Point-of-Sale terminals, are known
for their tremendous speed in executing transactions. But how soon can you get the information pertaining to that

Page 48 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

transaction into the data warehouse? Can you afford to wait until a nightly batch load? If not, then TPump may be the utility
that you are looking for! TPump allows the user to accomplish near real-time updates from source systems into the
Teradata data warehouse.

Throttle-switch Capability: What about the throttle capability that was mentioned above? With TPump you may stipulate
how many updates may occur per minute. This is also called the statement rate. In fact, you may change the statement rate
during the job, "throttling up" the rate with a higher number, or "throttling down" the number of updates with a lower
one. An example: Having this capability, you might want to throttle up the rate during the period from 12:00 noon to 1:30
PM when most of the users have gone to lunch. You could then lower the rate when they return and begin running their
business queries. This way, you need not have such clearly defined load windows, as the other utilities require. You can
have TPump running in the background all the time, and just control its flow rate.

DML Functions: Like MultiLoad, TPump does DML functions, including INSERT, UPDATE and DELETE. These can be
run solo, or in combination with one another. Note that it also supports UPSERTs like MultiLoad. But here is one place that
TPump differs vastly from the other utilities: FastLoad can only load one table and MultiLoad can load up to five tables.
But, when it pulls data from a single source, TPump can load more than 60 tables at a time! And the number of concurrent
instances in such situations is unlimited. That's right, not 15, but unlimited for Teradata! Well OK, maybe by your computer.
I cannot imagine my laptop running 20 TPump jobs, but Teradata does not care.

How could you use this ability? Well, imagine partitioning a huge table horizontally into multiple smaller tables and then
performing various DML functions on all of them in parallel. Keep in mind that TPump places no limit on the number of jobs
that may be established. Now, think of ways you might use this ability in your data warehouse environment. The
possibilities are endless.

More benefits: Just when you think you have pulled out all of the options on a Swiss ArmyTM knife, there always seems
to be just one more blade or tool you had not noticed. Similar to the knife, TPump always seems to have another
advantage in its list of capabilities. Here are several that relate to TPump requirements for target tables. TPump allows
both Unique and Non-Unique Secondary Indexes (USIs and NUSIs), unlike FastLoad, which allows neither, and MultiLoad,
which allows just NUSIs. Like MultiLoad, TPump allows the target tables to either be empty or to be populated with data
rows. Tables allowing duplicate rows (MULTISET tables) are allowed. Besides this, Referential Integrity is allowed and
need not be dropped. As to the existence of Triggers, TPump says, "No problem!"

Support Environment compatibility: The Support Environment (SE) works in tandem with TPump to enable the
operator to have even more control in the TPump load environment. The SE coordinates TPump activities, assists in
managing the acquisition of files, and aids in the processing of conditions for loads. The Support Environment aids in the
execution of DML and DDL that occur in Teradata, outside of the load utility.

Stopping without Repercussions: Finally, this utility can be stopped at any time and all of locks may be dropped with no
ill consequences. Is this too good to be true? Are there no limits to this load utility? TPump does not like to steal any
thunder from the other load utilities, but it just might become one of the most valuable survival tools for businesses in
today's data warehouse environment.

TPump Has Some Limits

TPump has rightfully earned its place as a superstar in the family of Teradata load utilities. But this does not mean that it
has no limits. It has a few that we will list here for you:
Rule #1: No concatenation of input data files is allowed. TPump is not designed to support this.

Rule #2: TPump will not process aggregates, arithmetic functions or exponentiation. If you need data
conversions or math, you might consider using an INMOD to prepare the data prior to loading it.

Rule #3: The use of the SELECT function is not allowed. You may not use SELECT in your SQL statements.

Rule #4: No more than four IMPORT commands may be used in a single load task. This means that at most,
four files can be directly read in a single run.

Rule #5: Dates before 1900 or after 1999 must be represented by the yyyy format for the year portion of the
date, not the default format of yy. This must be specified when you create the table. Any dates using the default yy
format for the year are taken to mean 20th century years.

Rule #6: On some network attached systems, the maximum file size when using TPump is 2GB. This is true

Page 49 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

for a computer running under a 32-bit operating system.

Rule #7: TPump performance will be diminished if Access Logging is used. The reason for this is that TPump
uses normal SQL to accomplish its tasks. Besides the extra overhead incurred, if you use Access Logging for
successful table updates, then Teradata will make an entry in the Access Log table for each operation. This can cause
the potential for row hash conflicts between the Access Log and the target tables.

Supported Input Formats

TPump, like MultiLoad, supports the following five format options: BINARY, FASTLOAD, TEXT, UNFORMAT and
VARTEXT. But TPump is quite finicky when it comes to data format errors. Such errors will generally cause TPump to
terminate. You have got to be careful! In fact, you may specify an Error Limit to keep TPump from terminating prematurely
when faced with a data format error. You can specify a number (n) of errors that are to be tolerated before TPump will halt.
Here is a data format chart for your reference:

BINARY Each record is a 2-byte integer, n, that is followed by n bytes of data. A byte is the smallest address space you can
have in Teradata.
FASTLOAD This format is the same as Binary, plus a marker (X '0A' or X '0D') that specifies the end of the record.
TEXT Each record has a variable number of bytes and is followed by an end of the record marker.
UNFORMAT The format for these input records is defined in the LAYOUT statement of the MultiLoad script using the components
FIELD, FILLER and TABLE.
VARTEXT This is variable length text RECORD format separated by delimiters such as a comma. For this format you may only
use VARCHAR, LONG VARCHAR (IBM) or VARBYTE data formats in your MultiLoad LAYOUT. Note that two
delimiter characters in a row denote a null value between them.

TPump Commands and Parameters

Each command in TPump must begin on a new line, preceded by a dot. It may utilize several lines, but must always end in
a semi-colon. Like MultiLoad, TPump makes use of several optional parameters in the .BEGIN LOAD command. Some are
the same ones used by MultiLoad. However, TPump has other parameters. Let's look at each group.

LOAD Parameters IN COMMON with MultiLoad

PARAMETER WHAT IT DOES


ERRLIMIT You may specify the maximum number of errors, or the percentage, that you will tolerate during the processing of
errcount a load job. The key point here is that you should set the ERRLIMIT to a number greater than the PACK number.
[errpercent] The reason for this is that sometimes, if the PACK factor is a smaller number than the ERRLIMIT, the job will
terminate, telling you that you have gone over the ERRLIMIT. When this happens, there will be no entries in the
error tables.
CHECKPOINT (n) In TPump, the CHECKPOINT refers to the number of minutes, or frequency, at which you wish a checkpoint to
occur. This is unlike Mulitload which allows either minutes or the number of rows.
SESSIONS (n) This refers to the number of SESSIONS that should be established with Teradata. TPump places no limit on the
number of SESSIONS you may have. For TPump, the optimal number of sessions is dependent on your needs
and your host computer (like a laptop).
TENACITY Tells TPump how many hours to try logging on when less than the requested number of sessions is available.
SLEEP Tells TPump how frequently, in minutes, to try establishing additional sessions on the system.

.BEGIN LOAD Parameters UNIQUE to TPump

MACRODB This parameter identifies a database that will contain any macros utilized by TPump. Remember, TPump does
<databasename> not run the SQL statements by itself. It places them into Macros and executes those Macros for efficiency.
NOMONITOR Use this parameter when you wish to keep TPump from checking either statement rates or update status
information for the TPump Monitor application.
PACK (n) Use this to state the number of statements TPump will "pack" into a multiple-statement request. Multi-statement
requests improve efficiency in either a network or channel environment because it uses fewer sends and
receives between the application and Teradata.
RATE This refers to the Statement Rate. It shows the initial max number of statements that will be sent per minute. A

Page 50 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

zero means that the rate is unlimited. If the Statement Rate specified is less than the PACK number, then
TPump will send requests that are smaller than the PACK number.
ROBUST ON/OFF ROBUST defines how TPump will conduct a RESTART. ROBUST ON means that one row is written to the
Logtable for every SQL transaction. The downside of running TPump in ROBUST mode is that it incurs
additional, and possibly unneeded, overhead. ON is the default.
If you specify ROBUST OFF, you are telling TPump to utilize "simple" RESTART logic: Just start from the last
successful CHECKPOINT. Be aware that if some statements are reprocessed, such as those processed after
the last CHECKPOINT, then you may end up with extra rows in your error tables. Why? Because some of the
statements in the original run may have found errors, in which case they would have recorded those errors in an
error table.
SERIALIZE You only use the SERIALIZE parameter when you are going to specify a PRIMARY KEY in the .FIELD
OFF/ON command. For example, ".FIELD Salaryrate * DECIMAL KEY." If you specify SERIALIZE TPump will ensure
that all operations on a row will occur serially. If you code "SERIALIZE", but do not specify ON or OFF, the
default is ON. Otherwise, the default is OFF unless doing an UPSERT.

TPUMP Exercise

"Don't use a big word where a diminutive one will suffice."


- Unknown

Don't use a big utility where TPump will suffice. TPump is great when you just want to trickle information into a table at all
times. Think of it as a water hose filling up a bucket. Instead of filling the bucket up a glass of water a time (Fastload), we
can just trickle the information in using a hose (TPUMP). The great thing about Tpump is that like a pump we can trickle in
data or we can fire hose it in. If users are not on the system then we want to crank up the fire hose. If users are on the
system and many of them are accessing a table we should trickle in the rows.

For our TPUMP exercise, let's create an empty table:

Now execute the script:

Creating a Flatfile for our Tpump Job to Utilize

"In order to be irreplaceable one must always be different."


– Coco Chanel

Tpump is irreplaceable because no other utility works like it. Tpump can also use flat files to populate a table. While the
script is somewhat different compared to other utilities, TPUMPs structure isn't completely foreign.

Page 51 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Let's create our flat file to populate our empty table

Now we can use the flat file to populate our table:

Creating a Tpump Script

"Acting is all about honesty. If you can fake that, you've got it made."
– George Burns

George Burns wasn't a big fan of Teradata, because there's no way that one could fake his /her way through a TPUMP
script. The following 2 slides will show a basic Tpump script and point out the important parts to that script:

Let's create our TPUMP script

Page 52 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

Executing the Tpump Script

"Unless you believe, you will not understand."


– Saint Augustine

After running through these utility exercises, Teradata is destined to make you as the reader a believer. Utilities such as

Page 53 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

TPUMP are very hard to grasp at first. But if you believe utilities work and continue to analyze them, enlightenment is just
around the corner.

Executing our new TPUMP script

Let's check out our new table:

Much of the TPump command structure should look quite familiar to you. It is quite similar to MultiLoad. In this example, the
Student_Names table is being loaded with new data from the university's registrar. It will be used as an associative table
for linking various tables in the data warehouse.

/* This script inserts rows into a table called Sets Up a Logtable and then logs on with .RUN.
student_names from a single file */ The logon.txt file contains: .logon TDATA/SQL01,SQL01;.
.LOGTABLE WORK_DB.LOG_PUMP; Also specifies the database to find the necessary tables.
.RUN FILE C:\mydir\logon.txt;

DATABASE SQL01;

.BEGIN LOAD Begins the Load Process;


ERRLIMIT 5 Specifies optional parameters.
CHECKPOINT 1
SESSIONS 64 Names the error table for this run.
TENACITY 2
PACK 40
RATE 1000
ERRORTABLE SQL01.ERR_PUMP;

.LAYOUT FILELAYOUT; Names the LAYOUT of the INPUT record;

Page 54 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

.FIELD Student_ID * INTEGER; Notice the dots before the .FIELD and .FILLER commands and the semi-colons
.FIELD Last_Name * CHAR (20); after each FIELD definition. Also, the more_junk field moves the field pointer to
.FILLER More_Junk * CHAR (20); the start of the First_name data.
.FIELD First_Name * CHAR (14);
Notice the comment in the script.
/* start comment - this could also be coded as:
.FIELD Student_ID * INTEGER;
.FIELD Last_Name * CHAR (20);
.FIELD First_Name 45 CHAR (14);

end of the comment */

.DML LABEL INSREC; Names the DML Label


Tells TPump to INSERT a row into the target table and defines the row format;
INSERT INTO SQL01.Student_Names
( Student_ID Comma separators are placed in front of the following column or value for
,Last_Name easier debugging
,First_Name ) Lists, in order, the VALUES to be INSERTed. Colons precede VALUEs.

VALUES
( :Student_ID
,:Last_Name
,:First_Name );

.IMPORT INFILE CDW_import.txt Names the IMPORT file;


FORMAT TEXT Names the LAYOUT to be called from above; tells TPump which DML Label to
LAYOUT FILELAYOUT APPLY.
APPLY INSREC;

.END LOAD; Tells TPump to stop loading and logs off all sessions.
.LOGOFF;

Step One: Setting up a Logtable and Logging onto Teradata — First, you define the Logtable using the .LOGTABLE
command. We have named it LOG_PUMP in the WORK_DB database. The Logtable is automatically created for you. It
may be placed in any database by qualifying the table name with the name of the database by using syntax like this:
<databasename>.<tablename>

Next, the connection is made to Teradata. Notice that the commands in TPump, like those in MultiLoad, require a dot in
front of the command key word.

Step Two: Begin load process, add parameters, naming the Error Table — Here, the script reveals the parameters
requested by the user to assist in managing the load for smooth operation. It also names the one error table, calling it
SQL01.ERR_PUMP. Now let's look at each parameter:

n ERRLIMIT 5 says that the job should terminate after encountering five errors. You may set the limit that is tolerable for
the load.

n CHECKPOINT 1 tells TPump to pause and evaluate the progress of the load in increments of one minute.

n SESSIONS 64 tells TPump to establish 64 sessions with Teradata.

n TENACITY 2 says that if there is any problem establishing sessions, then to keep on trying for a period of two hours.

n PACK 40 tells TPump to "pack" 40 data rows and load them at one time.

n RATE 1000 means that 1,000 data rows will be sent per minute.

Step Three: Defining the INPUT flat file structure — TPump, like MultiLoad, needs to know the structure the INPUT flat
file record. You use the .LAYOUT command to name the layout. Following that, you list the columns and data types of the
INPUT file using the .FIELD, .FILLER or .TABLE commands. Did you notice that an asterisk is placed between the column
name and its data type? This means to automatically calculate the next byte in the record. It is used to designate the
starting location for this data based on the previous field's length. If you are listing fields in order and need to skip a few

Page 55 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

bytes in the record, you can either use the .FILLER with the correct number of bytes as character to position to the cursor
to the next field, or the "*" can be replaced by a number that equals the lengths of all previous fields added together plus 1
extra byte. When you use this technique, the .FILLER is not needed. In our example, this says to begin with Student_ID,
continue on to load Last_Name, and finish when First_Name is loaded.

Step Four: Defining the DML activities to occur — At this point, the .DML LABEL names and defines the SQL that is
to execute. It also names the columns receiving data and defines the sequence in which the VALUES are to be arranged.
In our example, TPump is to INSERT a row into the SQL01.Student_NAMES. The data values coming in from the record
are named in the VALUES with a colon prior to the name. This provides the PE with information on what substitution is to
take place in the SQL. Each LABEL used must also be referenced in an APPLY clause of the .IMPORT clause.

Step Five: Naming the INPUT file and defining its FORMAT — Using the .IMPORT INFILE command, we have
identified the INPUT data file as "CDW_import.txt". The file was created using the TEXT format.

Step Six: Associate the data with the description — Next, we told the IMPORT command to use the LAYOUT called,
"FILELAYOUT."

Step Seven: Telling TPump to start loading — Finally, we told TPump to APPLY the DML LABEL called INSREC —
that is, to INSERT the data rows into the target table.

Step Seven: Finishing loading and logging off of Teradata — The .END LOAD command tells TPump to finish the
load process. Finally, TPump logs off of the Teradata system.

TPump Script with Error Treatment Options

/* !/bin/ksh* */ Load with a Shell Script

/* ++++++++++++++++++++++++++++++++++ */ Names and describes the purpose of the script; names the author.
/* TPUMP SCRIPT - CDW */
/*This script loads SQL01.Student_Profile4 */
/* Version 1.1 */
/* Created by Coffing Data Warehousing */
/* ++++++++++++++++++++++++++++++++++ */

/* Setup the TPUMP Logtables, Logon Statements and Sets up a Logtable and then logs on to Teradata.
Database Default */ Specifies the database containing the table.

.LOGTABLE SQL01.LOG_PUMP;
.LOGON CDW/SQL01,SQL01;
DATABASE SQL01;

/* Begin Load and Define TPUMP Parameters and Error BEGINS THE LOAD PROCESS
Tables */ SPECIFIES MULTIPLE PARAMETERS TO AID IN PROCESS CONTROL
.BEGIN LOAD
ERRLIMIT 5 NAMES THE ERRROR TABLE; TPump HAS ONLY ONE ERROR
CHECKPOINT 1 TABLE.
SESSIONS 1
TENACITY 2
PACK 40
RATE 1000
ERRORTABLE SQL01.ERR_PUMP;

.LAYOUT FILELAYOUT; Names the LAYOUT of the INPUT file.


.FIELD Student_ID * VARCHAR (11); Defines the structure of the INPUT file; here, all Variable CHARACTER
.FIELD Last_Name * VARCHAR (20); data and the file has a comma delimiter. See .IMPORT below for file
.FIELD First_Name * VARCHAR (14); type and the declaration of the delimiter.
.FIELD Class_Code * VARCHAR (2);
.FIELD Grade_Pt * VARCHAR (8);

.DML LABEL INSREC Names the DML Label;

Page 56 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

IGNORE DUPLICATE ROWS SPECIFIES 3 ERROR TREATMENT OPTIONS with the ; after the last
IGNORE MISSING ROWS option.
IGNORE EXTRA ROWS;
Tells TPump to INSERT a row into the target table and defines the row
INSERT INTO Student_Profile4 format.
( Student_ID Note that we place comma separators in front of the following column
,Last_Name or value for easier debugging.
,First_Name Lists, in order, the VALUES to be INSERTed. A colon always precedes
,Class_Code values.
,Grade_Pt)
VALUES
( :Student_ID
,:Last_Name
,:First_Name
,:Class_Code
,:Grade_Pt);

.IMPORT INFILE Cdw_import.txt Names the IMPORT file;


FORMAT VARTEXT "," Names the LAYOUT to be called from above; Tells TPump which DML
LAYOUT FILELAYOUT Label to APPLY.
APPLY INSREC;
Notice the FORMAT with a comma in the quotes to define the delimiter
between fields in the input record.
.END LOAD; Tells TPump to stop loading and Logs Off all sessions.
.LOGOFF;

A TPump UPSERT Sample Script

/* this is an UPSERT TPump script */ Sets Up a Logtable and then logs on to Teradata.
.LOGTABLE SQL01.CDW_LOG;
.LOGON CDW/SQL01,SQL01;

.BEGIN LOAD Begins the load process


ERRLIMIT 5 Specifies multiple parameters to aid in load management
CHECKPOINT 10
SESSIONS 10 Names the error table; TPump HAS ONLY ONE ERROR TABLE PER TARGET
TENACITY 2 TABLE
PACK 10
RATE 10
ERRORTABLE SQL01.SWA_ET;

.LAYOUT INREC INDICATORS; Defines the LAYOUT for the 1st INPUT file; also has the indicators for NULL
.FIELD StudentID * INTEGER; data.
.FIELD Last_name * CHAR(20);
.FIELD First_name * VARCHAR(14);
.FIELD Class_code * CHAR(2);
.FIELD Grade_Pt * DECIMAL(8,2);

.DML LABEL UPSERTER ames the 1st DML Label and specifies 2 Error Treatment options.
DO INSERT FOR MISSING UPDATE ROWS;
Tells TPump to INSERT a row into the target table and defines the row format.
UPDATE Student_Profile Lists, in order, the VALUES to be INSERTed. A colon always precedes values.
SET Last_Name = :Last_Name
,First_Name = :First_Name
,Class_Code = :Class_Code
,Grade_Pt = :Grade_Pt
WHERE Student_ID = :StudentID ;

INSERT INTO Student_Profile


VALUES (:StudentID
,:Last_Name

Page 57 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

,:First_Name
,:Class_Code
,:Grade_Pt) ;

. Names the Import File as UPSERT-FILE.DAT. The file name is under Windows so
IMPORT INFILE UPSERT-FILE.DAT the "-" is fine.
FORMAT FASTLOAD The file type is FASTLOAD.
LAYOUT INREC
APPLY UPSERTER ;

.END LOAD; Tells TPump to stop loading and logs off all sessions.
.LOGOFF;

Monitoring TPump

TPump comes with a monitoring tool called the TPump Monitor. This tool allows you to check the status of TPump jobs as
they run and to change (remember "throttle up" and "throttle down?") the statement rate on the fly. Key to this monitor is the
"SysAdmin.TpumpStatusTbl" table in the Data Dictionary Directory. If your Database Administrator creates this table,
TPump will update it on a minute-by-minute basis when it is running. You may update the table to change the statement
rate for an IMPORT. If you want TPump to run unmonitored, then the table is not needed.

You can start a monitor program under UNIX with the following command:
tpumpmon [-h] [TDPID/] <UserName>,<Password> [,<AccountID>]

Below is a chart that shows the Views and Macros used to access the "SysAdmin.TpumpStatusTbl" table. Queries may be
written against the Views. The macros may be executed.

Views and Macros to access the table SysAdmin.TpumpStatusTbl


View SysAdmin.TPumpStatus
View SysAdmin.TPumpStatusX
Macro Sysadmin.TPumpUpdateSelect
Macro TPumpMacro.UserUpdateSelect

Handling Errors in TPump Using the Error Table

Error Table per target table, not two. If you name the table, TPump will create it automatically. Entries are made to
these tables whenever errors occur during the load process. Like MultiLoad, TPump offers the option to either MARK
errors (include them in the error table) or IGNORE errors (pay no attention to them whatsoever). These options are listed in
the .DML LABEL sections of the script and apply ONLY to the DML functions in that LABEL. The default is to MARK. If you
specify nothing, TPump will assume the default. When doing an UPSERT, this default does not apply.

The error table does the following:

n Identifies errors

n Provides some detail about the errors

n Stores a portion the actual offending row for debugging

When compared to the error tables in MultiLoad, the TPump error table is most similar to the MultiLoad Acquisition error
table. Like that table, it stores information about errors that take place while it is trying to acquire data. It is the errors that
occur when the data is being moved, such as data translation problems that TPump will want to report on. It will also want
to report any difficulties compiling valid Primary Indexes. Remember, TPump has less tolerance for errors than FastLoad or
Multiload.

COLUMNS IN THE TPUMP ERROR TABLE


ImportSeq Sequence number that identifies the IMPORT command where the error occurred
DMLSeq Sequence number for the DML statement involved with the error

Page 58 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited
Teradata User's Guide: The Ultimate Companion, Third Edition

SMTSeq Sequence number of the DML statement being carried out when the error was discovered
ApplySeq Sequence number that tells which APPLY clause was running when the error occurred
SourceSeq The number of the data row in the client file that was being built when the error took place
DataSeq Identifies the INPUT data source where the error row came from
ErrorCode System code that identifies the error
ErrorMsg Generic description of the error
ErrorField Number of the column in the target table where the error happened; is left blank if the offending column cannot be
identified; This is different from MultiLoad, which supplies the column name.
HostData The data row that contains the error, limited to the first 63,728 bytes related to the error

RESTARTing TPump

Like the other utilities, a TPump script is fully restartable as long as the log table and error tables are not dropped. As
mentioned earlier you have a choice of setting ROBUST either ON (default) or OFF. There is more overhead using
ROBUST ON, but it does provide a higher degree of data integrity, but lower performance.

TPump and MultiLoad Comparision Chart

Function MultiLoad TPump


Error Tables must be defined Optional, 2 per target table Optional, 1 per target table
Work Tables must be defined Optional, 1 per target table No
Logtable must be defined Yes Yes
Allows Referential Integrity No Yes
Allows Unique Secondary Indexes No Yes
Allows Non-Unique Secondary Indexes Yes Yes
Allows Triggers No Yes
Loads a maximum of n number of tables Five 60
Maximum Concurrent Load Instances 15 Unlimited
Locks at this level Table Row Hash
DML Statements Supported INSERT, UPDATE, DELETE, INSERT, UPDATE, DELETE, "UPSERTs"
"UPSERT"
How DML Statements are Performed Runs actual DML commands Compiles DML into MACROS and executes
DDL Statements Supported All All
Transfers data in 64K blocks Yes No, moves data at row level
RESTARTable Yes Yes
Stores UPI Violation Rows Yes, with MARK option Yes, with MARK option
Allows use of Aggregated, Arithmetic No No
calculations or Conditional Exponentiation
Allows Data Conversion Yes Yes
Performance Improvement As data volumes increase By using multi-statement requests
Table Access During Load Uses WRITE lock on tables in Allows simultaneous READ and WRITE
Application Phase access due to Row Hash Locking
Effects of Stopping the Load Consequences No repercussions
Resource Consumption Hogs available resources Allows consumption management via
Parameters

Page 59 / 59
Reprinted for CSC/vramanathan, CSC Coffing Data Warehousing, Coffing Publishing (c) 2006, Copying Prohibited

Você também pode gostar