Escolar Documentos
Profissional Documentos
Cultura Documentos
Introduction
Release 5.4.2
B035-2305-106A
October 2016
The product or products described in this book are licensed products of Teradata Corporation or its affiliates.
Teradata, BYNET, DBC/1012, DecisionCast, DecisionFlow, DecisionPoint, Eye logo design, InfoWise, Meta Warehouse, MyCommerce,
SeeChain, SeeCommerce, SeeRisk, Teradata Warehouse Miner, Teradata Source Experts, WebAnalyst, and Youve Never Seen Your Business
Like This Before are trademarks or registered trademarks of Teradata Corporation or its affiliates.
Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc.
AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc.
BakBone and NetVault are trademarks or registered trademarks of BakBone Software, Inc.
Cloudera and the Cloudera logo are trademarks of Cloudera, Inc.
This software contains material under license from DUNDAS SOFTWARE LTD., which is 1994-1999 DUNDAS SOFTWARE LTD., all
rights reserved.
EMC, PowerPath, SRDF, and Symmetrix are registered trademarks of EMC Corporation.
GoldenGate is a trademark of GoldenGate Software, Inc.
Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company.
Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other
countries.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
IBM, CICS, DB2, MVS, RACF, Tivoli, and VM are registered trademarks of International Business Machines Corporation.
Linux is a registered trademark of Linus Torvalds.
LSI and Engenio are registered trademarks of LSI Corporation.
MapR, MapR Heatmap, Direct Access NFS, Distributed NameNode HA, Direct Shuffle and Lockless Storage Services are all trademarks of
MapR Technologies, Inc.
Microsoft, Active Directory, Windows, Windows NT, Windows Server, Windows Vista, Visual Studio and Excel are either registered trademarks
or trademarks of Microsoft Corporation in the United States or other countries.
MongoDB, Mongo, and the leaf logo are registered trademarks of MongoDB, Inc.
Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries.
QLogic and SANbox trademarks or registered trademarks of QLogic Corporation.
SAS, SAS/C and Enterprise Miner are trademarks or registered trademarks of SAS Institute Inc.
SPSS is a registered trademark of SPSS Inc.
STATISTICA and StatSoft are trademarks or registered trademarks of StatSoft, Inc.
SPARC is a registered trademarks of SPARC International, Inc.
Sun Microsystems, Solaris, Sun, and Sun Java are trademarks or registered trademarks of Sun Microsystems, Inc., in the United States and
other countries.
Symantec, NetBackup, and VERITAS are trademarks or registered trademarks of Symantec Corporation or its affiliates in the United States
and other countries.
Unicode is a collective membership mark and a service mark of Unicode, Inc.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.
The information contained in this document may contain references or cross-references to features, functions, products, or services that are
not announced or available in your country. Such references do not imply that Teradata Corporation intends to announce such features, functions,
products, or services in your country. Please consult your local Teradata Corporation representative for those features, functions, products, or
services available in your country.
Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated
without notice. Teradata Corporation may also make improvements or changes in the products or services described in this information at any
time without notice.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this
document. Please email: teradata-books@lists.teradata.com
Any comments or materials (collectively referred to as Feedback) sent to Teradata Corporation will be deemed non-confidential. Teradata
Corporation will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform,
create derivative works of, and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, Teradata
Corporation will be free to use any ideas, concepts, know-how, or techniques contained in such Feedback for any purpose whatsoever, including
developing, manufacturing, or marketing products or services incorporating Feedback.
Copyright 1999-2016 by Teradata Corporation. All Rights Reserved.
Purpose
This introductory document provides the following information:
Limitations of the Express Edition
Overview of the TWM family of products
General installation and configuration instructions
Configuring the tutorial environment
Examples of using the product
Audience
Professionals interested in evaluating Teradata Warehouse Miner through the use of the
Teradata Warehouse Miner Express product.
Revision Record
The following table lists a history of releases where this guide has been revised:
Convention Description
GUI Item Screen item and/or esp. something you will click on or highlight in
following a procedure.
Related Documents
Related Teradata documentation and other sources of information are available from:
http://www.info.teradata.com
Additional technical information on data warehousing and other topics is available from:
http://www.teradata.com/t/resources
Support Information
Services, support and training information is available from:
http://www.teradata.com/services-support
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Purpose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Audience. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Revision Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
How This Manual Is Organized . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Conventions Used In This Manual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Support Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Software Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Client Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Supporting Client Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Additional Client Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Teradata Database Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Installation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Configuration Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Creating Tutorial User/Databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Configuring a Data Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Configuring Connection Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Creating Metadata Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Installing Tutorial Tables and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Appendix A: References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Teradata Profiler
The first of the products that may be evaluated using the Express Edition of Teradata
Warehouse miner is the Teradata Profiler. The components available in this offering were
developed to provide a comprehensive data profiling and exploration tool that analyzes data
directly within the data warehouse through the use of generated SQL. A wide variety of
descriptive statistics functions are available to generate reports and graphs with drill down
capabilities, pointing out potential issues with data quality.
The highlight of this offering is the Data Explorer analysis that can perform the Values,
Frequency, Histogram and Statistics analyses on selected tables or columns, using multiple
threads of SQL operation and providing thumbnail graphs and drill-down capabilities.
Software Dependencies
Installation Instructions
Teradata Warehouse Miner Express is installed by opening the supplied TWM_Express.msi file
and following the prompts from the installation dialog. The following points should be
considered before installing the software.
Prior to installing Teradata Warehouse Miner Express, any previous versions of TWM,
Profiler or ADS Generator must first be removed.
When installing Teradata Warehouse Miner or any Teradata Tools and Utilities (TTU)
component, be sure to reboot if/when the individual installation programs tell you to. Do
not delay your reboot until all the software has been installed.
Configuration Instructions
There are two ways to configure Teradata Warehouse Miner Express. One is to configure it to
access the tutorial environment, and the other is to configure it to access other data in your
Teradata system. You can of course configure both ways, but configuring for the tutorial
environment is what is described here.
Configuring Teradata Warehouse Miner Express for the tutorial environment makes it
possible to use the tutorial projects that come with the product and that match the examples
in the online help system. Importing these projects is described later in this section.
Performing these operations requires that the Teradata Fastload and BTEQ utilities be
installed on the workstation where Teradata Warehouse Miner Express is installed (please refer
to Additional Client Software on page 6 for details).
Performing these operations also requires an entry in the hosts file on the client machine. (To
add an entry you must have administrative privileges). The hosts file is located in folder
C:\WINDOWS\system32\drivers\etc. For example, if the IP address is 127.0.0.1 and the system
name is dbc, the host entry should be:
127.0.0.1 dbc dbccop1
(The three items in the line above should be separated by a tab character.)
To install the tutorial tables in the twm_source database on host dbc, perform the following.
(Note that this operation requires the Teradata Fastload utility.)
Execute the program item Start > Programs > Teradata Warehouse Miner 5.4.2 > Load
Demonstration Data.
Hostname: dbc
Userid: twm
Password: twm
Account:
Database: twm_source
Char Set:
To install the Statistical Test tables in the twm database on host dbc, perform the following.
(Note that this operation requires the Teradata Fastload utility.)
Execute the program item Start > Programs > Teradata Warehouse Miner 5.4.2 > Load Statistical
Test Metadata.
Hostname: dbc
Userid: twm
Password: twm
Account:
Database: twm
Char Set:
To install the PMML User Defined Functions (UDFs) in the twm database on host dbc,
perform the following. (Note that this operation requires the Teradata Bteq utility and that
the target Teradata system has a suitable C compiler.)
Execute the program item Start > Programs > Teradata Warehouse Miner 5.4.2 > PMML UDF
Creation.
Hostname: dbc
Userid: twm
Password: twm
Database: twm
Account:
Char Set:
This Chapter provides examples of using Teradata Warehouse Miner functions. The examples
include:
1 Getting Started with Teradata Warehouse Miner on page 13
2 Exploring Data with a Data Explorer Analysis on page 15
3 Creating an Analytic Data Set on page 17
4 Creating and Scoring a Decision Tree model on page 21
Additional information about these and many other features can be found in the applicable
user guide and in the help system. Note that context sensitive help is also available by selecting
the F1 key.
There are three windows on the main screen, the largest of which is for viewing and editing
analysis forms. On the right is the Project Explorer window where open projects and the
analyses they contain are displayed in a tree view. Underneath both of these areas is the
Execution Status window. Directly over the analysis work area is a toolbar with icons for
primary functions (the names of which can be seen by hovering over them), and over that is a
series of menu topics, including File, View, Project, Tools, Window and Help.
In the sample screen above, the Open Connection icon has been selected to connect to data
source dbc twm, and the Add New Analysis icon has been selected to select Data Explorer from
the Descriptive Statistics category.
Now looking at the Data Explorer input form covering most of the main screen, selectors can
be seen on the left side of the form for selecting databases, tables and columns, and on the
right an area to drag selected columns into. (The arrow buttons in the middle can also be used
to select and de-select columns.)
Over the selectors are tabs for INPUT, OUTPUT and RESULTS, with sub-tabs that depend on
the type of analysis. After the parameters for an analysis have been specified, the analysis can
be executed by clicking the run button above, by right clicking on the project or analysis in the
project work area and selecting run, or by pressing the F5 key on the keyboard. The status of
the execution will be displayed in the Execution Status window below. When execution is
complete, the RESULTS tab will be enabled, and upon selection, the resulting data, graphs and
generated SQL (depending on analysis type) can be viewed.
Graph
The following is a snapshot of the icon displayed when the graph tab is selected.
Figure 4: Graph snapshot
By clicking anywhere in this picture the subsequent display of the actual graph object is
displayed.
Figure 5: Graph
Clicking on the city_name thumbnail graph (6th from the left in the second row) leads
to the following display, while clicking on the bar for San Diego adds the drill down
box to the display. By clicking on the drill down button the customers in San Diego can
be displayed.
6 Because there may be negative values, drag and drop an Absolute Value (Arithmetic) SQL
Element over both interest_amt and principal_amt:
Figure 9: Absolute Value (Arithmetic)
7 Take the average of this expression, by dragging and dropping an Average (Aggregation)
on top of the Add:
Figure 10: Average (Aggregation)
8 Because this analysis may generate many NULL values by joining TWM_CUSTOMER to
TWM_CREDIT_TRAN, drag a Coalesce (Case) on top of the Average:
Figure 11: Coalesce (Case)
9 Drag and drop a Number (Literal) 0 into the expressions folder and rename it from
Variable1 to avg_cc_tran_amt to complete the variable:
Figure 12: avg_cc_tran_amt
12 Go to OUTPUT-storage, and select Store the tabular output of this analysis in the database.
Specify that a Table should be created named twm_tutorials_vc1.
city_name, state_code
female, single
married, separated
ckacct, svacct
avg_ck_bal, avg_sv_bal
avg_ck_tran_amt, avg_ck_tran_cnt
avg_sv_tran_amt, avg_sv_tran_cnt
Tree Splitting: Gain Ratio
Minimum Split Count: 2
Maximum Nodes: 1000
Maximum Depth: 10
Bin Numeric Variables: Disabled
Pruning Method: Gain Ratio
Include Lift Table: Enabled
Response Value: 1
Run the analysis and click on Results when it completes. For this example, the Decision Tree
analysis generated the following pages.
Variables
Dependent Variable
ccacct
Independent Variable
income
ckacct
Independent Variable
avg_sv_bal
avg_sv_tran_cnt
Confusion Matrix
Cumulative
Captured Cumulative Captured
Response Response Cumulative Response Response Cumulative
Decile Count Response (%) (%) Lift Response (%) (%) Lift
Graphs
By default the Tree Browser is displayed as follows:
Select the Text Tree tab to view the rules in textual format:
Figure 16: Text Tree tab
Additionally, you can click on Lift Chart to view the Lift Table graphically.
Figure 17: Lift Chart tab
Confusion Matrix
Data
Table 8: Data
1362480 1 0.92
1362481 0 0
Table 8: Data
1362484 1 0.92
1362485 0 0
1362486 1 0.92
Lift Graph
In this case, the Lift Graph is the same as when the Decision Tree model was built. (Note that
the Lift Graph is available only when the Evaluate or Score and Evaluate option is selected.)
1 Teradata Warehouse Miner Model Manager User Guide, B035-2303-106A, October 2016
2 Teradata Warehouse Miner Release Definition, B035-2494-106C, October 2016
3 Teradata Warehouse Miner User Guide, Volume 1, Introduction and Profiling,
B035-2300-106A, October 2016
4 Teradata Warehouse Miner User Guide, Volume 2, ADS Generation, B035-2301-106A,
October 2016
5 Teradata Warehouse Miner User Guide, Volume 3, Analytic Functions, B035-2302-106A,
October 2016