Teradata and Hortonworks Hadoop PDF

What would you do if you knew?
Teradata QueryGrid
Teradata and Hortonworks Hadoop

Installation Guide
Release 14.10, 15.00, 15.01
B035-5989-125K
March 2016
The product or products described in this book are licensed products of Teradata Corporation or its affiliates.
Teradata, Active Data Warehousing, Active Enterprise Intelligence, Applications-Within, Aprimo Marketing Studio, Aster, BYNET, Claraview,
DecisionCast, Gridscale, MyCommerce, QueryGrid, SQL-MapReduce, Teradata Decision Experts, "Teradata Labs" logo, Teradata
ServiceConnect, Teradata Source Experts, WebAnalyst, and Xkoto are trademarks or registered trademarks of Teradata Corporation or its
affiliates in the United States and other countries.
Adaptec and SCSISelect are trademarks or registered trademarks of Adaptec, Inc.
AMD Opteron and Opteron are trademarks of Advanced Micro Devices, Inc.
Apache, Apache Avro, Apache Hadoop, Apache Hive, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the
Apache Software Foundation in the United States and/or other countries.
Apple, Mac, and OS X all are registered trademarks of Apple Inc.
Axeda is a registered trademark of Axeda Corporation. Axeda Agents, Axeda Applications, Axeda Policy Manager, Axeda Enterprise, Axeda
Access, Axeda Software Management, Axeda Service, Axeda ServiceLink, and Firewall-Friendly are trademarks and Maximum Results and
Maximum Support are servicemarks of Axeda Corporation.
Data Domain, EMC, PowerPath, SRDF, and Symmetrix are registered trademarks of EMC Corporation.
GoldenGate is a trademark of Oracle.
Hewlett-Packard and HP are registered trademarks of Hewlett-Packard Company.
Hortonworks, the Hortonworks logo and other Hortonworks trademarks are trademarks of Hortonworks Inc. in the United States and other
countries.
Intel, Pentium, and XEON are registered trademarks of Intel Corporation.
IBM, CICS, RACF, Tivoli, and z/OS are registered trademarks of International Business Machines Corporation.
Linux is a registered trademark of Linus Torvalds.
LSI is a registered trademark of LSI Corporation.
Microsoft, Active Directory, Windows, Windows NT, and Windows Server are registered trademarks of Microsoft Corporation in the United
States and other countries.
NetVault is a trademark or registered trademark of Dell Inc. in the United States and/or other countries.
Novell and SUSE are registered trademarks of Novell, Inc., in the United States and other countries.
Oracle, Java, and Solaris are registered trademarks of Oracle and/or its affiliates.
QLogic and SANbox are trademarks or registered trademarks of QLogic Corporation.
Quantum and the Quantum logo are trademarks of Quantum Corporation, registered in the U.S.A. and other countries.
Red Hat is a trademark of Red Hat, Inc., registered in the U.S. and other countries. Used under license.
SAP is the trademark or registered trademark of SAP AG in Germany and in several other countries.
SAS and SAS/C are trademarks or registered trademarks of SAS Institute Inc.
Simba, the Simba logo, SimbaEngine, SimbaEngine C/S, SimbaExpress and SimbaLib are registered trademarks of Simba Technologies Inc.
SPARC is a registered trademark of SPARC International, Inc.
Symantec, NetBackup, and VERITAS are trademarks or registered trademarks of Symantec Corporation or its affiliates in the United States and
other countries.
Unicode is a registered trademark of Unicode, Inc. in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other product and company names mentioned herein may be the trademarks of their respective owners.
The information contained in this document is provided on an "as-is" basis, without warranty of any kind, either express or implied,
including the implied warranties of merchantability, fitness for a particular purpose, or non-infringement. Some jurisdictions do not allow
the exclusion of implied warranties, so the above exclusion may not apply to you. In no event will Teradata Corporation be liable for any
indirect, direct, special, incidental, or consequential damages, including lost profits or lost savings, even if expressly advised of the
possibility of such damages.
The information contained in this document may contain references or cross-references to features, functions, products, or services that are not
announced or available in your country. Such references do not imply that Teradata Corporation intends to announce such features, functions,
products, or services in your country. Please consult your local Teradata Corporation representative for those features, functions, products, or
services available in your country.
Information contained in this document may contain technical inaccuracies or typographical errors. Information may be changed or updated
without notice. Teradata Corporation may also make improvements or changes in the products or services described in this information at any
time without notice.
To maintain the quality of our products and services, we would like your comments on the accuracy, clarity, organization, and value of this
document. Please e-mail: teradata-books@lists.teradata.com
Any comments or materials (collectively referred to as "Feedback") sent to Teradata Corporation will be deemed non-confidential. Teradata
Corporation will have no obligation of any kind with respect to Feedback and will be free to use, reproduce, disclose, exhibit, display, transform,
create derivative works of, and distribute the Feedback and derivative works thereof without limitation on a royalty-free basis. Further, Teradata
Corporation will be free to use any ideas, concepts, know-how, or techniques contained in such Feedback for any purpose whatsoever, including
developing, manufacturing, or marketing products or services incorporating Feedback.
Copyright 2015 - 2016 by Teradata. All Rights Reserved.
Table of Contents
Preface.............................................................................................................................................................5
Purpose.................................................................................................................................................................. 5
Audience................................................................................................................................................................ 5
Revision History................................................................................................................................................... 5
Supported Releases............................................................................................................................................... 6
Additional Information........................................................................................................................................6
Related Documents....................................................................................................................................... 6
Related Links..................................................................................................................................................6
Product Safety Information.................................................................................................................................6
Chapter 1:
Overview........................................................................................................................................................7
Teradata QueryGrid Description....................................................................................................................... 7

Dependencies........................................................................................................................................................ 7
Security...................................................................................................................................................................8
Chapter 2:
Preparing for Installation............................................................................................................ 9
Pre-Installation Checklist.................................................................................................................................... 9
Obtaining a Change Control Number............................................................................................................. 10
Obtaining Required Patches..............................................................................................................................10
Acquiring Remote Proxy User Information....................................................................................................10
Identifying the HCatalog Server.......................................................................................................................11
Obtaining and Running the System Validation Script.................................................................................. 11
Resolving Hostname Conflicts..........................................................................................................................12
Setting Up Hadoop Cluster for LDAP............................................................................................................. 12
Setting Up Kerberos........................................................................................................................................... 13
Port Requirements..............................................................................................................................................15
Teradata QueryGrid: Teradata and Hortonworks Hadoop

Installation Guide Release 14.10, 15.00, 15.01 3
Table of Contents
Chapter 3:
Installing Software............................................................................................................................ 17
Adding Hadoop IP Addresses to the Teradata Host File................................................................................17

Configuring Hadoop for Teradata Proxy Setup...............................................................................................17
Configuring Teradata Proxy Setup with Hadoop 2.1 or Later Systems Using Ambari.......................17
Configuring Hadoop for Teradata Proxy Setup for Hadoop 1.3.2 Systems.......................................... 18
Installing the Teradata QueryGrid Packages on the Teradata Nodes Using PUT...................................... 19
Installing the Teradata QueryGrid Package on the Hadoop Nodes............................................................. 20
Configuring Kerberos Settings for Teradata QueryGrid................................................................................22
Running the Kerberos Setup Script................................................................................................................... 24
Teradata JVM Heap Size Configuration........................................................................................................... 25
Configuring JVM Heap Size for Teradata Database 15.0........................................................................25
Configuring JVM Heap Size for Teradata Database 14.10......................................................................26
Calculating JVM Heap Size Values............................................................................................................ 27
Chapter 4:
Post Software Installation Activities...............................................................................29
Validating the Installation.................................................................................................................................. 29

Validating the System.......................................................................................................................................... 33
Appendix A:
Manual Installation of Teradata QueryGrid Packages................................... 35
Installing the Teradata QueryGrid Packages................................................................................................... 35

Running the Setup Script....................................................................................................................................35

4 Installation Guide Release 14.10, 15.00, 15.01
Preface
Purpose
This guide explains how to install Teradata QueryGrid.
Audience
This guide is intended for use by the following personnel:
System administrators
Database administrators
Hadoop administrators
Customers
Teradata Customer Support
Revision History
Date Description
March 2016 Maintenance release
January 2016 Added related links information in Preface
Added links to Knowledge Article on performing JVM and FSG cache
memory calculations
December 2015 Added support for Teradata QueryGrid 15.01
October 2015 Added support for Teradata Open Distribution for Hadoop (TDH) 2.3
and Hortonworks Data Platform (HDP) 2.3
August 2015 Initial Release
Note: This book is a combination of the previously released Teradata QueryGrid Teradata
Database-to-Teradata Open Distribution for Hadoop and Teradata QueryGrid Teradata
Database-to-Hortonworks Data Platform books.

Preface
Supported Releases
Supported Releases
For information on Teradata QueryGrid supported releases, see Knowledge Article
KAP314E23E, accessed through https://tays.teradata.com.
Additional Information
Related Documents
Documents are located at http://www.info.teradata.com.
Title Publication ID
Teradata QueryGrid: Teradata Database-to-Hadoop User Guide B035-1203 (Release 15.01)
Describes the Teradata QueryGrid: Teradata Database-to-Hadoop B035-1185 (Release 15.00)
SQL interface for transferring data between Teradata Database and
remote Hadoop hosts.
SQL Functions, Operators, Expressions, and Predicates B035-1145 (Release 14.10)
The topic titled LOAD_FROM_HCATALOG describes use of
Teradata QueryGrid 14.10 (referred to as Teradata SQL-H).
Parallel Upgrade Tool (PUT) Reference B035-5716
Describes how to install application software using PUT.
Related Links
URL Description
https://tays.teradata.com Secure site for accessing Orange Books, technical

alerts, and knowledge repositories; viewing and
joining forums; and downloading software
packages from Teradata Software Server (TSS).
http://www.info.teradata.com External site for published Teradata customer

documentation.
Product Safety Information

This document may contain information addressing product safety practices related to data
or property damage, identified by the word Notice. A notice indicates a situation which, if not
avoided, could result in damage to property, such as equipment or data, but not related to
personal injury.
Example
Notice: Improper use of the Reconfiguration utility can result in data loss.

CHAPTER 1
Overview
Teradata QueryGrid Description

Teradata QueryGrid is software that provides predefined table operators to access remote
data using SQL and join the remote data with Teradata Database tables.
Teradata QueryGrid releases before tdsqlh_td 15.01.00.xx imported data in parallel
from remote nodes into Teradata Database AMPs, and then converted the data from the
remote data types into corresponding Teradata Database data types for analysis.
Later releases of Teradata QueryGrid add support for data-type conversion on the remote
nodes, which then transmit the converted data back to Teradata Database for parallel
import. This approach, known as Enhanced Concurrency Architecture (ECA), provides for
better performance and higher query concurrency without the need for memory tuning in
Teradata Database.
Dependencies
The following minimum requirements must exist prior to installing Teradata QueryGrid:
Component Requirement
Hardware Network connectivity between the Teradata nodes and Hadoop
master node, and all data nodes through customer LAN, BYNET,
or Infiniband.
Firmware None
Package tdsqlh, the license/base package
When the Teradata QueryGrid: Teradata Database-to-Hadoop

license is purchased, IPP ships a distribution CD base package
(the license package) to the customer.
Note: tdsqlh must be installed before the installation of the
other packages.
Once installed, it is not necessary to purchase or download
again for upgrades.
tdsqlh_td, the connector package

Chapter 1 Overview
Security
Component Requirement
Package versions are intended to match specific Teradata
database versions. Versions of tdsqlh_td may be older than
the current Teradata Database version to allow backwards
compatibility only. Later packages support older grammar.
For example:
The tdsqlh_td_15.0.xx.xx package must be used with
Teradata Database 15.00.
Teradata Database 14.10 grammar works with the
tdsqlh_td_15.0.xx.xx package.
tdsqlh_hdp, the vendor package
The tdsqlh_hdp version is intended to match the similarly

versioned vendor distribution. Multiple instances of
tdsqlh_hdp may exist on the Teradata node simultaneously
to address different systems at different Hadoop distribution
versions.
Connector packages are available through from Teradata
Software Server, accessed by clicking Software Downloads at
https://tays.teradata.com.
Note: For information on Teradata QueryGrid supported
releases, see Knowledge Article KAP314E23E, accessed
through https://tays.teradata.com.
Parallel Upgrade Tool (PUT) The most current version of PUT available, installed on the
Teradata master node.
Download PUT from Teradata Software Server, accessed by
clicking Software Downloads at https://tays.teradata.com.
Kerberos The most current version of Kerberos available.
Security
The physical security of data as it resides on disk or is transferred across the network is not
addressed by Teradata QueryGrid. Teradata QueryGrid does not support encryption across
networks.
Teradata QueryGrid 15.x security includes grammar that supports INSERT and SELECT
privileges on the foreign server. Granting EXECUTE privileges is not recommended for
Teradata QueryGrid 15.x.
Teradata QueryGrid 14.10 includes execution mapping security and user mapping
security:
Execution Mapping Security: The user can use any IP or host name to reach any
remote destination. Only the Database Administrator can execute and revoke user
execution privileges.
User Mapping Security: Limits the user to only reading data, preventing the user from
making changes to the accessed table.

CHAPTER 2
Preparing for Installation
Pre-Installation Checklist
Later versions of the tdsqlh_td connector support the syntax of earlier versions. However,
certain features and, therefore, installation tasks are version specific, as noted below and in
the corresponding topics of this documentation.
1. Confirm network connectivity is in place, consulting with a Solution Architect as
necessary.
2. Obtain the Teradata QueryGrid base package or media as directed by your sales
representative.
3. If using Kerberos, and the Kerberos Client is not installed on the Teradata Database
system, download krb5-client from Teradata Software Server (accessed by clicking
Software Downloads at https://tays.teradata.com), and use PUT to install it on the
Teradata master node.
Note: Teradata QueryGrid supports Kerberos starting with connector package
tdsqlh_td 15.00.03.xx.
4. Obtain a Change Control number.
5. Obtain the latest required patches.
6. Obtain remote proxy user information, consulting with a Solution Architect as
necessary.
Note: This task applies only to versions of the connector prior to tdsqlh_td
15.00.02.
7. Obtain the Teradata configuration for FSGCache, Java Heap, and Perm space.
To do the memory calculations, see Knowledge Article KAC13BA1A, accessed through
https://tays.teradata.com), or contact your Customer Support Representative.
8. Identify the HCatalog server.
9. Obtain and run the system validation script.
10. Resolve Hostname conflicts.
11. If using LDAP, update the storage format in the LDAP directory. See Setting Up a
Hadoop Cluster for LDAP.
Note: Teradata QueryGrid supports LDAP starting with connector package tdsqlh_td
15.00.02.01 and through connector package tdsqlh_td 15.00.04.xx.
12. If using Kerberos, set up the Kerberos security feature. See Setting Up Kerberos.
Note: Teradata QueryGrid supports Kerberos starting with connector package
tdsqlh_td 15.00.03.xx.

Chapter 2 Preparing for Installation
Obtaining a Change Control Number
Obtaining a Change Control Number

1 Open an incident through Teradata At Your Service at https://tays.teradata.com to obtain
a Teradata Change Control Number.
Change Control Numbers must be obtained at least 28 days prior to the installation or
upgrade date.
2 Record the Change Control Number for future use.
Obtaining Required Patches

Download the latest patch versions of the tdsqlh patches.
1 Log on to https://tays.teradata.com.
2 Click Software Downloads to access the Teradata Software Server (TSS).
3 Click Search.
4 In Patch Name, type tdsqlh.
5 Select Current.
6 Click Submit.
7 From the Search Results table, check the appropriate version.
8 Select a Download Type.
9 Complete the identification fields:
User Name (completed by default)
E-Mail (completed by default)
Site ID
Change Control Number
10 Click Submit.
11 Repeat these steps to download any additional tsdqlh patches.
Acquiring Remote Proxy User Information

This task applies only to versions of the connector prior to tdsqlh_td 15.00.02. The
Hadoop administrator usually performs this task.
For Teradata QueryGrid to work with a Hadoop system, a Teradata proxy user must be
configured on the Hadoop NameNode. This proxy user must be allowed to access HDFS
from the Teradata nodes on behalf of another Hadoop user in a secured way. The Teradata
proxy user for this setting is tdatuser.
1 Confirm the following:

Identifying the HCatalog Server
The proxy user values have been provided.

Both tdatuser and the tdatuser home directory exist in /home.
For example: /home/tdatuser
The default shell for the user is set to /etc/passwd.
For example: /bin/bash
2 On the Hadoop side, the following configurations are required in core-site.xml to
add tdatuser as a trusted proxy user:
a Determine the file system groups that tdatuser may impersonate.
b Determine the hosts and nodes from where the tdatuser user may access the HDFS.
These configurations must be present, otherwise impersonation is not allowed and
Teradata queries fail with a security error.
Identifying the HCatalog Server

All nodes in the Hadoop cluster must have the HCatalog libraries installed. HCatalog is the
entry point for the Teradata QueryGrid connector.
1 If the customer has changed the default Ambari server login credentials of admin/admin,
obtain the new updated credentials.
2 On a Hadoop node, log on to Ambari :
For TDH systems, use port 8081.
For HDP systems, use port 8080.
If an external connection already exists, use SWS through Server Management or the
customer name.
3 Click the Hive tab.
4 Click Configs.
5 Click the Advanced tab.
The Hive Metastore host is displayed.
Obtaining and Running the System Validation

Script
1 Obtain the most current version of teradata-gsctools from TSS:
a Log on to https://tays.teradata.com.
b Click Software Downloads.
c Click Search.

Resolving Hostname Conflicts
d At Patch Name, enter teradata-gsctools.
e At Current, select Current.
f Click Submit.
g Select the teradata-gsctools check box.
h Complete the identification fields:

User Name (completed by default)
E-Mail (completed by default)
Site ID
Change Control Number
i Click Submit.
j Save the compressed teradata-gsctools file, then extract the .rpm file from the
compressed file to the following locations:
On the Teradata Control or PDN node in the /var/opt/teradata/
customermodepkgs directory.
On each Hadoop node.
2 Install the extracted .rpm file as follows:
On the Teradata Control or PDN node, use PUT to install the file.
On each Hadoop node, install the file manually.
3 Verify the system by independently running /opt/teradata/gsctools/bin/
chk_all on the Teradata master node and the Hadoop master node and confirming that
no errors result.
Resolving Hostname Conflicts

Teradata-to-Hadoop uses node hostnames to resolve network addressing. However, it is
possible that there may be conflicts and duplicate hostnames between Teradata nodes and
Hadoop nodes that must be resolved before Teradata QueryGrid is installed.
If problems exist, contact the Teradata Global Support Center, Hadoop Support team for
changes.
Setting Up Hadoop Cluster for LDAP

Teradata QueryGrid supports LDAP starting with connector package tdsqlh_td
15.00.02.01 and through connector package tdsqlh_td 15.00.04.xx.
Before using Teradata QueryGrid with LDAP, update the storage format in the LDAP
directory.
1 In Ambari, open the Hive configuration page (Services > Hive > Config).

Setting Up Kerberos
2 Complete the fields:
hive.server2 Description
hive.server2.authentication LDAP
hive.server2.authentication.ldap.base Matches the directory location where the
DN authenticated users are stored on the LDAP server
hive.server2.authentication.ldap.url Matches the correct LDAP server
Note: HiveServer2 requires the schema for user Distinguished Names (DN) to follow the
format uid=<username>,baseDN where:
username is the name of the user being added
baseDN is the directory where the authenticated usernames are stored
For example:
3 Save the changes and restart all Hive services.
Setting Up Kerberos
Teradata QueryGrid supports Kerberos starting with connector package tdsqlh_td
15.00.03.xx.
The Kerberos Security feature permits Teradata QueryGrid to provide connectivity when the
Hadoop cluster is protected with Kerberos security. The connector accesses the services of
HCatalog, Hive, HDFS, and JDBC. Each of these resources is protected in a Kerberos system.
1 Verify the Kerberos client is installed on all nodes of the Teradata Database and Hadoop
systems.
2 Copy the krb5.conf file from /etc/ on the Hadoop system to /etc/ on all Teradata
nodes.
3 Navigate to the krb5.conf files in /etc/ on all nodes in both systems and set up
communication between the Teradata Database and the Kerberos authentication server
or realm.
The following example is for a Hadoop master node 1 named
spiral1.mydivision.mycompany.com; be sure to replace values shown in bold with actual
values for your environment.
[libdefaults]
default_realm = MYCLUSTER.HADOOP.MYCOMPANY.COM

Setting Up Kerberos
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 24h
forwardable = yes
udp_preference_limit = 1
[realms]
EXAMPLE.COM = {
kdc = kerberos.example.com
admin_server = kerberos.example.com
}
MYCLUSTER.HADOOP.MYCOMPANY.COM = {
kdc = spiral1.mydivision.mycompany.com:88
admin_server = spiral1.mydivision.mycompany.com:749
default_domain = hadoop.com }
[domain_realm]
spiral1.mydivision.mycompany.com = MYCLUSTER.HADOOP.MYCOMPANY.COM
[logging]
kdc = FILE:/var/log/krb5/krb5kdc.log
admin_server = FILE:/var/log/krb5/kadmind.log
default = SYSLOG:NOTICE:DAEMON
4 Verify that the krb5.conf file is readable by all users on both systems.
For example: chmod 744 /etc/krb5.conf
Permissions can change during the copy process.
5 After updating the krb5.conf file on the Teradata nodes, restart the database.
6 For each Hadoop cluster protected by Kerberos, configure the required JAR file:
a Create a directory named for the cluster that will be referenced in the CREATE
FOREIGN SERVER statement.
For example, for a cluster named mycluster:
mkdir mycluster
b Copy the required configuration files to the directory created in the previous step:
cp /etc/hadoop/conf/hdfs-site.xml ./mycluster
cp /etc/hadoop/conf/core-site.xml ./mycluster
cp /etc/hadoop/conf/mapred-site.xml ./mycluster
cp /etc/hadoop/conf/yarn-site.xml ./mycluster
cp /etc/hive/conf/hive-site.xml ./mycluster
c In the same directory, create a JAR file named for the directory and containing the
configuration files:
Teradata QueryGrid Connector Package Directory
Version
15.00.04.xx or later /opt/teradata/jvm64/jdk8/bin/jar
cvf mycluster.jar mycluster

Port Requirements
Teradata QueryGrid Connector Package Directory

Version
15.00.03.xx or earlier /opt/teradata/jvm64/jdk7/bin/jar
cvf mycluster.jar mycluster
Port Requirements
Teradata QueryGrid requires specific ports to be open for specific services. Certain ports
and services vary depending on whether you intend to configure queries using ECA
operators or pre-ECA operators.
Teradata QueryGrid Operator Type Ports Sevices
ECA (available beginning with tdsqlh_td 5002 DataNode
15.01.xx.xx)
10000 HiveServer2
11000 OozieServer
Pre-ECA (available with all versions of 8020 NameNode
tdsqlh_td)
9083 Metastore
10000 HiveServer2
50010 DataNode

Port Requirements

CHAPTER 3
Installing Software
Adding Hadoop IP Addresses to the Teradata

Host File
Resolve all hostname conflicts and confirm the hostnames are not being resolved through a
local DNS.
1 On all Teradata TPA nodes, save a copy of the /etc/hosts file at:
cp /etc/hosts/ etc/orig.hosts
2 Add the Hadoop node's IP addresses to /etc/hosts.
For example:
192.168.135.100 hdp002-8
192.168.135.101 hdp002-9
3 Confirm the byn1 IP addresses on Hadoop systems using byn1.
Configuring Hadoop for Teradata Proxy Setup

The method used to configure the Teradata proxy user on the Hadoop NameNode is
determined by the Hadoop system version.
Configuring Teradata Proxy Setup with Hadoop 2.1 or Later Systems Using
Ambari
This task applies only to versions of the connector prior to tdsqlh_td 15.00.02.
Use Ambari to edit the core-site.xml file. Note the following:
Property value changes made in Ambari appear in the core-site.xml file.
Property value changes made in core-site.xml through manual editing do not appear
in Ambari.
If Ambari is used for cluster management, then also use Ambari for modifying service
property values.
1 Verify the customer has provided the following information needed for the
configuration.
Ambari server login and password

Chapter 3 Installing Software
Configuring Hadoop for Teradata Proxy Setup
The default Ambari server username and password is admin/admin. If the customer
has changed the Ambari server password, it must be provided prior to the installation.
Network access to Hadoop Master Node 1
The values to complete the Add Property field have been obtained
2 Log onto Ambari Hadoop Master Node 1:

For TDH systems, use port 8081.
For HDP systems, use port 8080.
Use SWS through Server Management or customer name if external connections already
exist.
For example, for HDP systems, use http://hdp002-1:8080 and for TDH systems, use
http://hdp002-1:8081
3 Click the Services tab.
4 From the left pane, click HDFS.
5 On the HDFS screen, click the Configs tab.
6 Expand Custom core-site.xml.
7 Configure the Teradata proxy user:
The default wildcard value for these properties is * and allows impersonation from any
host or user. If specific groups and hosts have been identified, replace * with the groups
and hosts in a comma separated list.
a Click Add Property and add a property with the key value
hadoop.proxyuser.tdatuser.groups and value *.
b Click Add Property and add a property with the key value
hadoop.proxyuser.tdatuser.hosts and value *.
8 Click Save.
9 When the Restart button appears, restart HDFS by clicking Restart > Restart All >
Confirm Restart All.
Configuring Hadoop for Teradata Proxy Setup for Hadoop 1.3.2 Systems
This task applies only to versions of the connector prior to tdsqlh_td 15.00.02.
For Hadoop 1.3.2 systems, Ambari is disabled and you must edit the core-site.xml file
manually.
1 Navigate to the Hadoop NameNode configuration file:
/etc/hadoop/conf/core-site.xml
2 Add the following properties to the file: and properties to the file.
hadoop.proxyuser.tdatuser.groups
hadoop.proxyuser.tdatuser.hosts
<property>
<name>hadoop.proxyuser.tdatuser.groups</name>

Installing the Teradata QueryGrid Packages on the Teradata Nodes Using PUT
<value>users</value>
<description>
Allow the proxy user tdatuser to impersonate any members of
HDFS group(s). For example, users is used as HDFS group that
tdatuser is allowed to impersonate users belonged to this group.
</description>
</property>
<property>
<name>hadoop.proxyuser.tdatuser.hosts</name>
<value>host1,host2</value>
<description>
The proxy user can connect only from host1 and host2 to
impersonate a user. Here host1 and host2 represents Teradata nodes.
All nodes of the Teradata system need to be listed here in order
for SQL-H query to be processed. It is recommended to use the IP
addresses of the Teradata nodes.
</description>
</property>
The property values are based on the Teradata and Hadoop environment setup
requirements.
3 Save the core-site.xml file.
4 Restart the NameNode:
hcli system restart
Installing the Teradata QueryGrid Packages on

the Teradata Nodes Using PUT
Always use the most recent version of PUT. Earlier PUT versions may not perform the
installation correctly.
1 FTP the connector and vendor packages from TSS to the customermodepkgs directory
on PUT master node:
/var/opt/teradata/customermodepkgs
2 Copy the base package from the media to the customermodepkgs directory on PUT
master node:
/var/opt/teradata/customermodepkgs
3 Start PUT and select Install/Upgrade Software.
4 During the Install/Upgrade Software operation, do the following:
a In the Select Nodes step, select the host nodes that have connectivity for Teradata
QueryGrid.
This may not be all nodes.
b When prompted, enter the paths to the packages locations.

Installing the Teradata QueryGrid Package on the Hadoop Nodes
c When prompted, select the packages and required dependencies.
d When prompted, select only non-VM&F mode.
e Click Continue.
5 If prompted for the DBS Login Information page, enter it and click Continue.
PUT must proceed until finished.
Installing the Teradata QueryGrid Package on the

Hadoop Nodes
Prerequisite:
To ensure successful installation of the Teradata QueryGrid package on Hadoop nodes, the
hive user must have administrator privileges and ALL permissions (read/write/execute) to
the following HDFS paths:
hdfs:///apps/querygrid/
hdfs:///apps/querygrid/lib/
This task applies starting with Teradata QueryGrid connector tdsqlh_td 15.01.xx.xx; it
does not apply to earlier versions of the connector.
After installing the Teradata QueryGrid connector package on the Teradata node, install it on
the Hadoop node.
1 If the Hadoop cluster is protected by Kerberos, complete the following steps:
a Create a keytab file for hdfs:
su hdfs
xst -k /etc/security/keytabs/hdfs.headless.keytab hdfs/
<fully.qualified.domain.name>
b Retrieve tickets from the Key Distribution Center (KDC) for users hdfs and hive,
replacing hdfs-HDP23TEST1@HDP23TEST1.HADOOP.TERADATA.COM with the
values for the primary, instance, and realm of the actual system:
su hdfs
kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-
HDP23TEST1@HDP23TEST1.HADOOP.TERADATA.COM
#must use hdfs to execute this
exit
2 On the Hadoop Master node, install the Teradata QueryGrid connector package:
tdh123m1:/tmp/jr # rpm -ivh tdsqlh_td-15.01.00.00-1.x86_64.rpm
The installation creates /apps/querygrid/ on HDFS and deploys the required UDF
JAR files and libraries.
3 Verify deployment of workflow.xml to /apps/querygrid/, being sure to run su hdfs
before ls on hdfs if the Hadoop cluster is protected by Kerberos:

Installing the Teradata QueryGrid Package on the Hadoop Nodes
tdh123m1:/tmp/jr # hadoop fs -ls /apps/querygrid/

Found 2 items
drwxr-xr-x - hdfs hdfs 0 2015-11-25 14:28 /apps/querygrid/lib
-rwxr-xr-x 3 hdfs hdfs 2941 2015-11-25 14:28 /apps/querygrid/
workflow.xml
This file is required for exporting.

4 Verify deployment of the required UDF JAR files to /apps/querygrid/, being sure to
run su hdfs before ls on hdfs if the Hadoop cluster is protected by Kerberos:
tdh123m1:/tmp/jr # hadoop fs -ls /apps/querygrid/lib/
Found 16 items
-rwxr-xr-x 3 hdfs hdfs 294335 2015-11-25 14:28 /apps/querygrid/lib/
hive-common-1.2.1.2.3.0.0-2557.jar
-rwxr-xr-x 3 hdfs hdfs 20593816 2015-11-25 14:28 /apps/
querygrid/lib/hive-exec-1.2.1.2.3.0.0-2557.jar
hive-hcat-core-1.2.1.2.3.0.0-2557.jar
hive-metastore-1.2.1.2.3.0.0-2557.jar
hive-serde-1.2.1.2.3.0.0-2557.jar
-rw-r--r-- 3 hdfs hdfs 19097 2015-11-25 14:28 /apps/querygrid/lib/
hive-site.xml
hiveudf.jar
joda-time-1.6.2.jar
libfb303-0.9.0.jar
mapper.jar
tdefssp.jar
tdgssconfig.jar
tdptl.jar
tdrowconverter.jar
tdsqlh_td.properties
terajdbc4.jar
5 From the Hive CLI, verify installation of the INDICFMT UDF:
tdh123m1:~ # su hivehive@tdh123m1:/root> hive

hive> describe function INDICFMT
6 If the INDICFMT UDF failed to install on a cluster protected by Kerberos, manually

install it:
a Execute the following command:

Configuring Kerberos Settings for Teradata QueryGrid
su hive
kinit -kt /etc/security/keytabs/hive.service.keytab hive/
tdh123m1.labs.teradata.com@HDP23TEST1.HADOOP.TERADATA.COM
hive
b On another console, open the Teradata QueryGrid connector installation file (for
example, tdsqlh_td-15.01.00.00-1.x86_64.rpm) in vim, copy its creating
function statement, and execute it.
For tdsqlh_td 15.01.01.xx:
SET hive.execution.engine = mr;use default;CREATE FUNCTION
INDICFMT AS
'com.teradata.dynaload.hcatalog.hiveudf.TDIndicRowTbl' USING JAR
'hdfs:///apps/querygrid/lib/tdptl.jar',JAR 'hdfs:///apps/
querygrid/lib/tdefssp.jar', JAR 'hdfs:///apps/querygrid/lib/
tdgssconfig.jar', JAR 'hdfs:///apps/querygrid/lib/terajdbc4.jar',
JAR 'hdfs:///apps/querygrid/lib/tdrowconverter.jar', JAR 'hdfs:///
apps/querygrid/lib/joda-time-2.5.jar', JAR 'hdfs:///apps/
querygrid/lib/hiveudf.jar';
For tdsqlh_td 15.01.00.xx:

SET hive.execution.engine = mr;use default;CREATE FUNCTION
INDICFMT AS
'com.teradata.dynaload.hcatalog.hiveudf.TDIndicRowTbl' USING JAR
'hdfs:///apps/querygrid/lib/tdptl.jar',JAR 'hdfs:///apps/
querygrid/lib/tdefssp.jar', JAR 'hdfs:///apps/querygrid/lib/
tdgssconfig.jar', JAR 'hdfs:///apps/querygrid/lib/terajdbc4.jar',
JAR 'hdfs:///apps/querygrid/lib/tdrowconverter.jar', JAR 'hdfs:///
apps/querygrid/lib/joda-time-1.6.2.jar', JAR 'hdfs:///apps/
querygrid/lib/hiveudf.jar';
Configuring Kerberos Settings for Teradata

QueryGrid
15.00.03.xx.
If you are configuring Kerberos, complete this procedure after using PUT to install the
Teradata QueryGrid connector.
1 Copy the JAR file you previously created as part of setting up Kerberos to /opt/
teradata/tdsqlh_hdp/<hdp_vendor_package_version>. For example:
/opt/teradata/tdsqlh_hdp/02.03.00.01
2 Edit the tdsqlh_hdp.bteq file to install the JAR file previously created as part of
setting up Kerberos, and add the file to CLASSPATH:
mycluster designates the directory name previously created as part of setting up
Kerberos.
myjar.jar designates the JAR file previously created as part of setting up Kerberos.

Configuring Kerberos Settings for Teradata QueryGrid
a Add the following lines into tdsqlh_hdp.bteq near similar lines of code:
CALL sqlj.install_jar('cj!myjar.jar','mycluster',0);
CALL sqlj.replace_jar('cj!myjar.jar','mycluster');
b Modify the following statements in tdsqlh_hdp.bteq by adding (*,mycluster) to

the end of the statements.
Connector Version Statement
tdsqlh_td 15.00.04.xx CALL sqlj.alter_java_path('JR_HDP2_3_0','(*,mycluster)(*,tdptl)
and later (*,tdrowconverter)(*,jr_terajdbc4)(*,jr_tdgssconfig)
(*,oozieclient_HDP2_3_0)(*,json_simple_HDP2_3_0)(*,tdsqlh_hdp_HDP2_3_0)
(*,avro_HDP2_3_0)(*,commons_cli_HDP2_3_0)(*,commons_codec_HDP2_3_0)
(*,commons_configuration_HDP2_3_0)(*,commons_lang_HDP2_3_0)
(*,commons_logging_HDP2_3_0)(*,datanucleus_core_HDP2_3_0)
(*,guava_HDP2_3_0)(*,hadoop_auth_HDP2_3_0)(*,hadoop_common_HDP2_3_0)
(*,hadoop_hdfs_HDP2_3_0)(*,hadoop_mr_common_HDP2_3_0)
(*,hadoop_mr_core_HDP2_3_0)(*,hive_common_HDP2_3_0)
(*,hive_exec_HDP2_3_0)(*,hive_hcat_core_HDP2_3_0)(*,hive_jdbc_HDP2_3_0)
(*,hive_metastore_HDP2_3_0)(*,hive_serde_HDP2_3_0)
(*,hive_service_HDP2_3_0)(*,httpclient_HDP2_3_0)(*,httpcore_HDP2_3_0)
(*,jackson_core_asl_HDP2_3_0)(*,jetty_HDP2_3_0)(*,jetty_util_HDP2_3_0)
(*,libfb303_HDP2_3_0)(*,log4j_HDP2_3_0)(*,pig_HDP2_3_0)
(*,slf4j_api_HDP2_3_0)(*,slf4j_log4j12_HDP2_3_0)(*,snappy_java_HDP2_3_0)
(*,common_collection2_3_0)(*,htrace_core2_3_0)(*,yarn_common2_3_0)
(*,yarn_api2_3_0)(*,commons_io_HDP2_3_0)(*,servlet_api_HDP2_3_0)
(*,tdefssp)');
CALL sqlj.alter_java_path('tdefssp_t2h','(*,mycluster)(*,jr_terajdbc4)
(*,jr_tdgssconfig)(*,avro_HDP2_3_0)(*,commons_cli_HDP2_3_0)
(*,commons_codec_HDP2_3_0)(*,commons_configuration_HDP2_3_0)
(*,commons_lang_HDP2_3_0)(*,commons_logging_HDP2_3_0)
(*,datanucleus_core_HDP2_3_0)(*,guava_HDP2_3_0)(*,hadoop_auth_HDP2_3_0)
(*,hadoop_common_HDP2_3_0)(*,hadoop_hdfs_HDP2_3_0)
(*,hadoop_mr_common_HDP2_3_0)(*,hadoop_mr_core_HDP2_3_0)
(*,hive_common_HDP2_3_0)(*,hive_exec_HDP2_3_0)
(*,hive_hcat_core_HDP2_3_0)(*,hive_jdbc_HDP2_3_0)
(*,hive_metastore_HDP2_3_0)(*,hive_serde_HDP2_3_0)
(*,hive_service_HDP2_3_0)(*,httpclient_HDP2_3_0)(*,httpcore_HDP2_3_0)
(*,jackson_core_asl_HDP2_3_0)(*,jetty_HDP2_3_0)(*,jetty_util_HDP2_3_0)
(*,libfb303_HDP2_3_0)(*,log4j_HDP2_3_0)(*,pig_HDP2_3_0)
(*,slf4j_api_HDP2_3_0)(*,slf4j_log4j12_HDP2_3_0)(*,snappy_java_HDP2_3_0)
(*,common_collection2_3_0)(*,htrace_core2_3_0)(*,yarn_common2_3_0)
(*,yarn_api2_3_0)(*,commons_io_HDP2_3_0)(*,servlet_api_HDP2_3_0)');
tdsqlh_td 15.00.03.xx CALL sqlj.alter_java_path('SQLH_HDP2_1_2','(*,tdsqlh_hdp_HDP2_1_2)

(*,avro_HDP2_1_2)(*,commons-cli_HDP2_1_2)(*,commons-codec_HDP2_1_2)
(*,commons-configuration_HDP2_1_2)(*,commons-lang_HDP2_1_2)(*,commons-
logging_HDP2_1_2)(*,datanucleus-core_HDP2_1_2)(*,guava_HDP2_1_2)
(*,hadoop-auth_HDP2_1_2)(*,hadoop-common_HDP2_1_2)(*,hadoop-
hdfs_HDP2_1_2)(*,hadoop-mr-common_HDP2_1_2)(*,hadoop-mr-core_HDP2_1_2)
(*,hive-common_HDP2_1_2)(*,hive-exec_HDP2_1_2)(*,hive-hcat-
core_HDP2_1_2)(*,hive-jdbc_HDP2_1_2)(*,hive-metastore_HDP2_1_2)(*,hive-
serde_HDP2_1_2)(*,hive-service_HDP2_1_2)(*,httpclient_HDP2_1_2)
(*,httpcore_HDP2_1_2)(*,jackson-core-asl_HDP2_1_2)(*,jetty_HDP2_1_2)
(*,jetty-util_HDP2_1_2)(*,libfb303_HDP2_1_2)(*,log4j_HDP2_1_2)
(*,pig_HDP2_1_2)(*,slf4j-api_HDP2_1_2)(*,slf4j-log4j12_HDP2_1_2)
(*,snappy-java_HDP2_1_2)(*,mycluster)');

Running the Kerberos Setup Script
Connector Version Statement
CALL sqlj.alter_java_path('SQLH_NO_VER','(*,tdsqlh_hdp_HDP2_1_2)
(*,avro_HDP2_1_2)(*,commons-cli_HDP2_1_2)(*,commons-codec_HDP2_1_2)
(*,commons-configuration_HDP2_1_2)(*,commons-lang_HDP2_1_2)(*,commons-
logging_HDP2_1_2)(*,datanucleus-core_HDP2_1_2)(*,guava_HDP2_1_2)
(*,hadoop-auth_HDP2_1_2)(*,hadoop-common_HDP2_1_2)(*,hadoop-
hdfs_HDP2_1_2)(*,hadoop-mr-common_HDP2_1_2)(*,hadoop-mr-core_HDP2_1_2)
(*,hive-common_HDP2_1_2)(*,hive-exec_HDP2_1_2)(*,hive-hcat-
core_HDP2_1_2)(*,hive-jdbc_HDP2_1_2)(*,hive-metastore_HDP2_1_2)(*,hive-
serde_HDP2_1_2)(*,hive-service_HDP2_1_2)(*,httpclient_HDP2_1_2)
(*,httpcore_HDP2_1_2)(*,jackson-core-asl_HDP2_1_2)(*,jetty_HDP2_1_2)
(*,jetty-util_HDP2_1_2)(*,libfb303_HDP2_1_2)(*,log4j_HDP2_1_2)
(*,pig_HDP2_1_2)(*,slf4j-api_HDP2_1_2)(*,slf4j-log4j12_HDP2_1_2)
(*,snappy-java_HDP2_1_2)(*,mycluster)');
Running the Kerberos Setup Script

Use this procedure only if Kerberos is in use.
15.00.03.xx.
1 On the primary Teradata node, navigate to /opt/teradata/tdsqlh_hdp/

<version> and run the config.sh script.
./config.sh -l <username> -p <password>
The script installs the Hadoop vendor libraries and links them with the objects created in
the previous step.
2 Review /var/opt/teradata/tdtemp/
sqlh_hdp_postinstall_<timestamp>.log for errors.
The following table outlines the most common errors:
Error Type Example Cause Action
SYSLIB call During installation, Increase the
database sqlj.install_jar('cj!pig- Hadoop JARs require 40 SYSLIB
withouthadoop.jar','pig', megabytes of space. database size.
0); *** Failure 2644 No There is not enough
more room in database available space in the
SYSLIB. SYSLIB database.
SQL Failure Example 1: Running the setup script These errors are
for the first time benign and can
DROP FUNCTION
sometimes returns these be ignored.
SYSLIB.load_from_hcatalog
messages.
; *** Failure 5589
Function
'load_from_hcatalog' does
not exist.

Teradata JVM Heap Size Configuration
Example 2:
call
sqlj.remove_jar('SQLH',
0); *** Failure 7972 Jar
'SYSLIB.SQLH' does not
exist.
Example 3:
*** Warning: 9241 Check
output for possible
warnings encountered in
Installing or Replacing a
JAR.

This task applies only to Teradata QueryGrid connector versions before tdsqlh_td
15.01.xx.xx with Teradata Database 14.10 or Teradata Database 15.00 installations.
During upgrades you remove the previous JVM heap size settings and reset them.
Configuring JVM Heap Size for Teradata Database 15.0
15.01.xx.xx with Teradata Database 15.00 installations.
cufconfig was enhanced for Teradata QueryGrid 15.0. A new option, JVMOptions, was
added so that the environment file does not have to be placed on all nodes. The options
needed can be added using cufconfig and are automatically replicated to all nodes.
1 If upgrading, before beginning the upgrade process remove any existing JVM options:
a On the primary Teradata node, edit /tmp/jvm_base.txt by removing /tmp/

jvmopt.txt from the JavaEnvFile field.
b Run cufconfig f /tmp/jvm_base.txt.
c Run cufconfig o.
The JavaEnvFile field value must be empty.
2 Calculate the Java Heap values.
3 In the /tmp directory of the primary Teradata node, create a new jvmopt.txt file with
the following options:
-server
-XX:UseParallelGC

-XX:+UseParallelOldGC
Xms7100m -Xmx7100m
-XX:NewSize=2370m
-XX:MaxNewSize=2370m
-XX:MaxPermSize=864m
4 Copy jvmopt.txt to the /tmp directory on all Teradata nodes.

5 Set the jvmopt.txt file permissions:
psh chmod 777 /tmp/jvmopt.txt
6 Run cufconfig f /tmp/jvmopt.txt.
7 Run cufconfig o
The JVMOptions field value must be populated.
8 Restart the Teradata Database:
tpareset -y restart-with-sqlh
Configuring JVM Heap Size for Teradata Database 14.10
15.01.xx.xx with Teradata Database 14.10 installations. During upgrades, remove the
previous JVM heap size settings and reset them.
1 If upgrading, before beginning the upgrade process remove any existing JVM options:
a On the primary Teradata node, edit /tmp/jvm_base.txt by removing /tmp/

jvmopt.txt from the JavaEnvFile field.
b Run cufconfig f /tmp/jvm_base.txt.

2 Calculate the Java Heap values.
3 In the /tmp directory of the primary Teradata node, create a new jvmopt.txt file with
the following options:
-server
-XX:MaxPermSize=512m
-Xms6g
-Xmx6g
-XX:NewSize=2g
-XX:MaxNewSize=2g
-XX:ParallelGCThreads=24
-XX:UseParallelGC
-XX:+UseParallelOldGC
4 Copy jvmopt.txt to the /tmp directory on all Teradata nodes.

5 Set the jvmopt.txt file permissions:
psh chmod 777 /tmp/jvmopt.txt

6 In the /tmp directory of the primary Teradata node, create the jvm_base.txt file and
add JavaEnvFile:/tmp/jvmopt.txt.
7 Run cufconfig f /tmp/jvm_base.txt.
8 Run cufconfig o.
The JavaEnvFile field value must be empty.
9 Restart the Teradata Database:
tpareset -y restart-with-sqlh
Calculating JVM Heap Size Values
15.01.xx.xx with Teradata Database 14.10 or Teradata Database 15.00 installations.
Teradata QueryGrid query concurrency depends on FSGCache settings and JVM Heap size
being configured. The desired concurrency level dictates the FSG cache setting and the JVM
Heap and Perm space.


CHAPTER 4
Post Software Installation Activities
Validating the Installation

Validate the Teradata and Hadoop setups are ready for Teradata-to-Hadoop queries.
1 Create the hcatalog table with data:
a On the primary Teradata node, navigate to /opt/teradata/sqlh/version.
b Download tdsqlh_example.hive and tdsql_data.csv and copy them to

the /tmp directory on the Hadoop NameNode.
c Log into the Hadoop NameNode and navigate to the /tmp directory.
d Change files permissions on the copied files:

chmod 777 tdsqlh_example.hive tdsql_data.csv
e Change the user to Hive:

su hive
2 Run a sample Hive script by creating a tdsqlh_test table with 14 columns and 805
populated rows:
hive < tdsqlh_example.hive
Verify the script completes and returns row count as 805.
Total MapReduce CPU Time Spent: 4 seconds 580 msec
OK
805
Time taken: 33.76 seconds
3 Use SQL Assistant or BTEQ to log into the Teradata primary node as user dbc.
4 Run the Teradata-to-Hadoop query to import rows from the tdsqlh_test table.
5 Run the query to count the rows in the tdsqlh_test table, being sure to first replace all
variable placeholder values indicated by italics with the actual values for your
environment.
Note: In the following query examples, MYHCATALOGSERVER represents the defined
Metastore host name.
For Hadoop 2.3 or Hadoop 2.1 systems with one master node and without Kerberos
security or Hadoop 1.3.2 systems:

Chapter 4 Post Software Installation Activities
Connector Package Query Example

15.01.xx.xx CREATE FOREIGN SERVER TD_SERVER_DB.tdh123_15_01 USING
server ('MYHCATALOGSERVER')
hosttype ('hive')
hiveport ('10000')
username ('hive')
ip_device ('eth0')
hadoop_properties ('<dfs.client.use.datanode.hostname=true>')
DO IMPORT WITH SYSLIB.LOAD_FROM_HIVE_HDP2_3_0 ,

DO EXPORT WITH SYSLIB.LOAD_TO_HIVE_HDP2_3_0 ;
SELECT count(*) FROM tdsqlh_test@tdh123_15_01;
DROP FOREIGN SERVER TD_SERVER_DB.tdh123_15_01;

hosttype ('hadoop')
port('9083')
hiveport ('10000')
username ('hive')
hadoop_properties ('<dfs.client.use.datanode.hostname=true>')
DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_HDP2_3_0 ,

DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_HDP2_3_0 ;
14.10.00.xx SELECT count(*)

FROM SYSLIB.load_from_hcatalog(USING
server('MYHCATALOGSERVER')
port('9083')
hosttype('hadoop')
username('hive')
dbname('default')
tablename('tdsqlh_test')
columns('*')
templeton_port('50111')
hadoop_properties('<dfs.client.use.datanode.hostname=true>)
) as D1;
For Hadoop 2.3 or Hadoop 2.1 systems with multiple master nodes and Kerberos
security:
Note: Each master node in the system must have an entry.

15.01.xx.xx CREATE AUTHORIZATION TD_SERVER_DB.tdh123_remote_auth AS DEFINER TRUSTED
USER 'MYKERBEROSUSER' PASSWORD 'MYKERBEROSUSERPASSWORD';
CREATE FOREIGN SERVER TD_SERVER_DB.tdh123_15_01_krb

EXTERNAL SECURITY DEFINER TRUSTED tdh123_remote_auth

USING
hosttype ('hive')
hiveport ('10000')
security ('kerberos')
clustername ('MYKERBEROSCLUSTER')
ip_device ('MYIPDEVICE')
hadoop_properties ('<dfs.client.use.datanode.hostname=true>,
<dfs.datanode.use.datanode.hostname=true>,
<dfs.nameservices=MYHCATALOGSERVER>,
<dfs.ha.namenodes.MYHCATALOGSERVER=nn1,nn2>,
<dfs.namenode.rpc-address.MYHCATALOGSERVER.nn1=MYNAMENODE1.labs.teradata.com:
8020>,
<dfs.namenode.rpc-address.MYHCATALOGSERVER.nn2=MYNAMENODE2.labs.teradata.com:
8020>,
<dfs.client.failover.proxy.provider.MYHCATALOGSERVER=org.apache.hadoop.hdfs.serv
er.namenode.ha.ConfiguredFailoverProxyProvider>
')

SELECT count(*) FROM tdsqlh_test@tdh123_15_01_krb;
DROP FOREIGN SERVER TD_SERVER_DB.tdh123_15_01_krb;
DROP AUTHORIZATION TD_SERVER_DB.tdh123_remote_auth;
15.00.04.xx CREATE AUTHORIZATION TD_SERVER_DB.tdh123_remote_auth AS DEFINER TRUSTED

15.00.03.xx USER 'MYKERBEROSUSER' PASSWORD 'MYKERBEROSUSERPASSWORD';
CREATE FOREIGN SERVER TD_SERVER_DB.tdh123_15_00_krb

EXTERNAL SECURITY DEFINER TRUSTED tdh123_remote_auth
USING
hosttype ('hadoop')
port('9083')
hiveport ('10000')
security ('kerberos')
clustername ('MYKERBEROSCLUSTER')
<dfs.nameservices=MYCLUSTER>,
<dfs.ha.namenodes.MYCLUSTER=nn1,nn2>,
<dfs.namenode.rpc-address.MYCLUSTER.nn1=MYNAMENODE1.labs.teradata.com:8020>,
<dfs.client.failover.proxy.provider.MYCLUSTER=org.apache.hadoop.hdfs.server.name
node.ha.ConfiguredFailoverProxyProvider>
')

DO IMPORT WITH SYSLIB.LOAD_FROM_HCATALOG_5_4_0 ,
DO EXPORT WITH SYSLIB.LOAD_TO_HCATALOG_5_4_0 ;
SELECT count(*) FROM tdsqlh_test@tdh123_15_00_krb;

DROP FOREIGN SERVER TD_SERVER_DB.tdh123_15_00_krb;
DROP AUTHORIZATION TD_SERVER_DB.tdh123_remote_auth;
For Hadoop 2.3 or Hadoop 2.1 systems with multiple master nodes and without
Kerberos security:
Note: Each master node in the system must have an entry.

hosttype ('hive')
hiveport ('10000')
username ('hive')
ip_device ('eth0')
')


hosttype ('hadoop')
port('9083')
hiveport ('10000')
username ('hive')
')


Validating the System
14.10.00.xx SELECT count(*)

FROM SYSLIB.load_from_hcatalog(USING
server('MYHCATALOGSERVER')
port('9083')
hosttype('hadoop')
username('hive')
dbname('default')
tablename('tdsqlh_test')
columns('*')
templeton_port('50111')
hadoop_properties('<dfs.client.use.datanode.hostname=true>,
<dfs.namenode.rpc-address.MYCLUSTER.nn1=MYNAMENODE1:8020>,
<dfs.namenode.rpc-address.MYCLUSTER.nn2=MYNAMENODE2:8020>,
node.ha.ConfiguredFailoverProxyProvider>')
) as D1;
Clustername and namenode rpc-addresses are located in the hdfs-site.xml settings:

Services > HDFS > Configs > Custom hdfs-site.xml
The following are common terms found in this script:
Term Definition
Server DNS hostname or IP address for the Hadoop NameNode
Port Port for the Hadoop NameNode service
templeton_port The web Hcatalog port
If the query returns an error instead of row count 805 then the Teradata-to-Hadoop
setup requires manual troubleshooting to isolate the problem.

1 Verify the system by independently running /opt/teradata/gsctools/bin/
chk_all on the Teradata master node and the Hadoop master node and confirming
that no errors result.


APPENDIX A
Manual Installation of Teradata QueryGrid

Packages
Installing the Teradata QueryGrid Packages

Prerequisite: User privileges must be granted in advance of manual installation.
Under some circumstances, you may prefer or need to install Teradata QueryGrid packages
manually. For example, if you want multiple versions of the tdsqlh_hdp package on the
same node, you must download and install the package manually, because the PUT utility is
not currently configured to support this installation scenario.
1 Download the required packages and transfer them to all impacted Teradata nodes.
2 Change the directory to the location of the downloaded packages:
cd package_location
3 Run tar -xvfz tdsqlh/version.

4 Run tar -xvfz tdsqlh_td/version.
5 Run tar -xvfz tdsqlh_hdp/version.
6 Run rpm -ivh tdsqlh/version.
7 Run rpm -ivh tdsqlh_td/version.
8 Run rpm -ivh tdsqlh_hdp/version.
Running the Setup Script

The setup scripts are run on the Teradata Database. Perform this procedure for both
installations and upgrades.
1 Log on to the Teradata node, navigate to /opt/teradata/sqlh/<version>, and run
the config.sh script.
The script sets up a temporary Unicode user; installs import and export table operators,
stored procedures, and other required components; then removes the temporary user.
2 On the same Teradata node, navigate to /opt/teradata/tdsqlh_hdp/<version>
and run the config.sh script.
The script installs the Hadoop vendor libraries and links them with the objects created in
the previous step.

Appendix A Manual Installation of Teradata QueryGrid Packages
Running the Setup Script
3 Review /var/opt/teradata/tdtemp/sqlh_postinstall_<timestamp>.log
and /var/opt/teradata/tdtemp/sqlh_hdp_postinstall_<timestamp>.log
for errors.
The following table outlines the most common errors:
SYSLIB call During installation, Increase the
database sqlj.install_jar('cj!pig- Hadoop JARs require 40 SYSLIB
withouthadoop.jar','pig', megabytes of space. database size.
0); *** Failure 2644 No There is not enough
more room in database available space in the
SYSLIB. SYSLIB database.
SQL Failure Example 1: Running the setup script These errors are
for the first time benign and can
DROP FUNCTION
sometimes returns these be ignored.
SYSLIB.load_from_hcatalog
messages.
; *** Failure 5589
Function
'load_from_hcatalog' does
not exist.
Example 2:
call
sqlj.remove_jar('SQLH',
0); *** Failure 7972 Jar
'SYSLIB.SQLH' does not
exist.
Example 3:
*** Warning: 9241 Check
output for possible
warnings encountered in
Installing or Replacing a
JAR.


Teradata and Hortonworks Hadoop PDF

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Teradata and Hortonworks Hadoop PDF

Enviado por

Direitos autorais:

Formatos disponíveis

What would you do if you knew?

Teradata and Hortonworks Hadoop

Teradata QueryGrid Description....................................................................................................................... 7

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Adding Hadoop IP Addresses to the Teradata Host File................................................................................17

Validating the Installation.................................................................................................................................. 29

Installing the Teradata QueryGrid Packages................................................................................................... 35

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Teradata QueryGrid: Teradata and Hortonworks Hadoop

https://tays.teradata.com Secure site for accessing Orange Books, technical

http://www.info.teradata.com External site for published Teradata customer

Product Safety Information

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Teradata QueryGrid Description

When the Teradata QueryGrid: Teradata Database-to-Hadoop

Teradata QueryGrid: Teradata and Hortonworks Hadoop

The tdsqlh_hdp version is intended to match the similarly

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Preparing for Installation

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Obtaining a Change Control Number

Obtaining Required Patches

Acquiring Remote Proxy User Information

Teradata QueryGrid: Teradata and Hortonworks Hadoop

The proxy user values have been provided.

Identifying the HCatalog Server

Obtaining and Running the System Validation

b Click Software Downloads.

Teradata QueryGrid: Teradata and Hortonworks Hadoop

d At Patch Name, enter teradata-gsctools.

e At Current, select Current.

h Complete the identification fields:

Resolving Hostname Conflicts

Setting Up Hadoop Cluster for LDAP

Teradata QueryGrid: Teradata and Hortonworks Hadoop

2 Complete the fields:

3 Save the changes and restart all Hive services.

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Teradata QueryGrid Connector Package Directory

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Adding Hadoop IP Addresses to the Teradata

Configuring Hadoop for Teradata Proxy Setup

Teradata QueryGrid: Teradata and Hortonworks Hadoop

2 Log onto Ambari Hadoop Master Node 1:

Teradata QueryGrid: Teradata and Hortonworks Hadoop

Installing the Teradata QueryGrid Packages on

Teradata QueryGrid: Teradata and Hortonworks Hadoop

c When prompted, select the packages and required dependencies.

d When prompted, select only non-VM&F mode.

Installing the Teradata QueryGrid Package on the

a Create a keytab file for hdfs:

tdh123m1:/tmp/jr # rpm -ivh tdsqlh_td-15.01.00.00-1.x86_64.rpm

Teradata QueryGrid: Teradata and Hortonworks Hadoop

tdh123m1:/tmp/jr # hadoop fs -ls /apps/querygrid/

This file is required for exporting.

5 From the Hive CLI, verify installation of the INDICFMT UDF:

tdh123m1:~ # su hivehive@tdh123m1:/root> hive

6 If the INDICFMT UDF failed to install on a cluster protected by Kerberos, manually

Teradata QueryGrid: Teradata and Hortonworks Hadoop

For tdsqlh_td 15.01.00.xx:

Configuring Kerberos Settings for Teradata