Teradata Tuts

ello guys....
I have few links n docs to share which i have collected during my interview
prepartions...

among them TeraTomon book is very nice which will explain teradata basics in details...

there is one more doc 'Teradata Utitlities' on ultimatix (learning & dev>>
Learning>safari online) which explains Fload ,. Mload TPT in details
I couldnt download that one beccause of insuffiecient token... so you can read it
there...

this will be useful for your daily work as well.. so save it some where..

utilities fload and mload : http://www.bi-dw.info/teradata-loading-tools.htm
http://www.javaorator.com/teradata/tutorial/Fast-load-in-Teradata-132.code

SQl Questions :
http://usefulfreetips.com/Teradata-SQL-Tutorial/tag/teradata-sql-query-test/

Interview:
http://www.teradatahelp.com/2010/09/teradata-interview-questions-part-3.html
http://www.teradatahelp.com/2010/08/teradata-interview-questions-part-1.html

IMP DOCS

-----------------------------------------------------------------------------------------
---------

Steps to improve performance of the query?

Explain Primary Index and what would be the constrains to select PI?

What is skew factor?

If your Skew factor is going up. What are remedies?

When, How and why we use Secondry Indexes?

What is difference between Primary Key and Primary Index?

What is difference between database and user in Teradata? What are the things
you can do or cannot do in both?

When do you use BTEQ. What other softwares have you used or can we use rather
than BTEQ?

What is AMP?

Types of indexes in Teradata

What is PE?

What is Collect Statistics?

What is Hashing Alogo

What is Hash value

What is HashMAP

What is HashBucket

What is hash collision?

How does PI work to insert row?

How does PI work to sel an row?

What is SI?

How to tune a query?

Steps to improve performance of the query?

How you will load a table/flat file with 5000 columns in Teradata?

Maximum number of columns supported in Teradata Table? In different versions of
Teradata.

What is difference between Primary Key and Primary Index?

Primary Key

Primary Index

Logical concept of data modeling

Physical mechanism for access and stroge

Terada doesnt need to recognize

Each table must have exactly one

No Limit on column no.

64- column limit

Documented in data model

Defined in CREATE TABLE statement

Uniquely identifies each row

Used to place and locate each row on an AMP

Values should not change

Values may be changed (Del+Ins)

Must be not NULL

May be NULL

Does not imply an access path

Defines most efficient access path

Chosen for logical correctness

Chosen for physical performance

What is Hashing Algorithms?

- When the primary index value of a row is input to the hashing algorithm, then the
output is called the row hash. Row hash is the logical storage address of the row, and
identifies the amp of the row.

What is Hash value

- Hash value determine in which AMP the row will reside and it always attached
along with the ROW to make it a UNIQUE identification for the ROW.

What is HashMAP

- HASHMAP contains the different bucket called as Hash Map Buckets, distributed
along the rows and columns.

What is HashBucket

- Hash Buckets contain only the different AMP number which is attached with the TD
system.

What is hashcollision

- This occurs when there is same hash value generated for two different Primary
Index Values.

- To handle hash collision increase the contrast between the two column values, if
your input column is char then try to change the values to alphanumeric to get more
contrast in values.

What is skew factor?

- Skew Factor refers to the row distribution of table among the AMPs. If the data
is highly skewed, it means some AMPs are having more rows and some very less. Means data
is not evenly distributed. It affects the Teradata's performance. The data distribution
or skewness can be controlled by choosing indexes.

If your Skew factor is going up. What are remedies?

- We will create new index which will have less skew factor.

What is PE?

Parsing Engine(PE) We can say PE as the mother of TD. Whenever a user login to TD it
actually connected to PE. And when a user submits a query, then the PE takes action. It
perform following task

1. It creates a plan and instructs AMPs what to do in order to get the result from
the query.

2. Session control ( 120 session per PE) it check on the access right of the user
that weather the user has the privilege to execute the query or not.

3. Act as an OPTIMIZER - Dispatching the optimized plan to AMPs by creating best
possible execute plan.

4. It Parses the SQL request act as a compiler.

What is AMP?

- Access Module Processor (AMP) AMP is attached to the PE via BYNET for
instruction and connected to its own disk and has the privilege to read or write the data
to its disk.

- Each AMP is allowed to read and write in its own disk only it is known as the
SHARED NOTHING ARCHITECTURE

- AMPs can be best considered as the computer processor with its own disk attached
to it.

- Whenever it receives the instructions form the PE it fetches the data from its
disk and sends it to back to PE through BYNET.

SQL

First highest sal second highest third salary

1. List all employees

2. List only one employee, Sorted Alphabetically

Finding 1 and 2 highest salary from emp table.
Using rank()
select * from (select empno, ename, sal, rank() over ( order by sal desc) rn from emp) a
where a.rn = 1
--- a.rn = 1 will return 1st highest and a.rn = 2 will return 2nd highest salary.
Using row_number()
Similarly we can use row_number() function in place of rank() function to find the 1st
highest, 2nd highest and so on.
select * from (select empno, ename, sal, row_number() over ( order by sal desc) rn from
emp) a where a.rn = 2
If will want highest salary department wise we need to add partition by deptno just
before order by like rank() over (partition by deptno order by sal desc) rn
Basic difference between these two is that rank() will skip the next no if more than 1
row receive the same rank were as row_number() will generate sequence no.
Sql 2) How will you insert text '2014-02-01' in a table?
Sql 3) If i have below table
Col
abcdef
abc_def
How will i search string abc_def?
Sql4) If i have a table as below
Customer Expenditure
A 100
A 200
A 300
B 200
B 400
B 500
C 600
C 700
C 800

We have to find that customer name that has done maximum sum of expenditure keeping in
mind the performance of the query .Data in table is huge.
Destination question ans : -
Table-route
dname dno
Delhi 1
Nagpur2
Mumbai 3
Chennai 4

Show the route rows Delhi to Nagpur.so on

select a.dname, b.dname from route a inner join (select dno, dname from route) b on
a.dno+1 = b.dno;

A column has some negative values and some positive values. It is required to find the
sum of negative numbers and the sum of the positive numbers in two separate columns.
SELECT
SUM(CASE WHEN num < 0 THEN num ELSE 0 END) neg,
SUM(CASE WHEN num > 0 THEN num ELSE 0 END)pos
FROM neg_pos;

A Employee table has column Gender. By mistake all Male has flag F and all female has
flag M. How to correct this.

Find the employees who make more than twice the average salary in their department.
CREATE VIEW DS(D,S,C) AS
SELECT DEPT,SUM(SALARY),COUNT(*)
FROM EMP
GROUP BY DEPT;

SELECT E.NAME
FROM EMP E, DS
WHERE E.DEPT=DS.D
AND E.SALARY>2*(DS.S/DS.C);

Find the departments whose salary total is more than twice the average departmental
salary total.
SELECT D
FROM DS
WHERE D.S>2*(SELECT AVG(S) FROM DS)

Find the employees whose salaries are among the top 100 salaries.
SELECT E1.NAME
FROM EMP E1
WHERE 99>= (SELECT COUNT(DISTINCT SALARY)
FROM EMP E2
WHERE E2.SALARY>E1.SALARY)

____________BASICS______________________________

Read architecture ..... indexes....(PI, Secondary etc.).....
Utilities a little..., performance tuning...and queries

What are the types of PI (Primary Index) in Teradata?

There are two types of Primary Index. Unique Primary Index ( UPI) and Non Unique Primary
Index (NUPI). By default, NUPI is created when the table is created. Unique keyword has
to be explicitly given when UPI has to be created.

UPI will slower the performance sometimes as for each and every row , uniqueness of the
column value has to be checked and it is an additional overhead to the system but the
distribution of data will be even.
Care should be taken while choosing a NUPI so that the distribution of data is almost
even . UPI/NUPI decision should be taken based on the data and its usage.

How to Choose Primary Index(PI) in Teradata?

Choosing a Primary Index is based on Data Distribution and Join frequency of the Column.
If a Column is used for joining most of the tables then it is wise to choose the column
as PI candidate.
For example, We have an Employee table with EMPID and DEPTID and this table needs to be
joined to the Department Table based on DEPTID.

It is not a wise decision to choose DEPTID as the PI of the employee table. Reason being,
employee table will have thousands of employees whereas number of departments in a
company will be less than 100. So choosing EMPID will have better performance in terms of
distribution.

How the data is distributed among AMPs based on PI in Teradata?

Assume a row is to be inserted into a Teradata table
The Primary Index Value for the Row is put into the Hash Algorithm
The output is a 32-bit Row Hash
The Row Hash points to a bucket in the Hash Map.The first 16 bits of the Row Hash of is
used to locate a bucket in the Hash Map
The bucket points to a specific AMP
The row along with the Row Hash are delivered to that AMP

When the AMP receives a row it will place the row into the proper table, and the AMP
checks if it has any other rows in the table with the same row hash. If this is the first
row with this particular row hash the AMP will assign a 32-bit uniqueness value of 1. If
this is the second row hash with that particular row hash, the AMP will assign a
uniqueness value of 2. The 32-bit row hash and the 32-bit uniqueness value make up the
64-bit Row ID. The Row ID is how tables are sorted on an AMP.

This uniqueness value is useful in case of NUPI's to distinguish each BUPI value.
Both UPI and NUPI is always a One AMP operation as the same values will be stores in same
AMP.
What are Secondary Indexes (SI) , types of SI and disadvantages of Secondary Indexes in
Teradata?

Secondary Indexes provide another path to access data. Teradata allows up to 32 secondary
indexes per table. Keep in mind; row distribution of records does not occur when
secondary indexes are defined. The value of secondary indexes is that they reside in a
subtable and are stored on all AMPs, which is very different from how the primary indexes
(part of base table) are stored. Keep in mind that Secondary Indexes (when defined) do
take up additional space.

Secondary Indexes are frequently used in a WHERE clause. The Secondary Index can be
changed or dropped at any time. However, because of the overhead for index maintenance,
it is recommended that index values should not be frequently changed.

There are two different types of Secondary Indexes, Unique Secondary Index (USI), and
Non-Unique Secondary Index (NUSI). Unique Secondary Indexes are extremely efficient. A
USI is considered a two-AMP operation. One AMP is utilized to access the USI subtable row
(in the Secondary Index subtable) that references the actual data row, which resides on
the second AMP.

A Non-Unique Secondary Index is an All-AMP operation and will usually require a spool
file. Although a NUSI is an All-AMP operation, it is faster than a full table scan.

Secondary indexes can be useful for:
Satisfying complex conditions
Processing aggregates
Value comparisons
Matching character combinations
Joining tables

How are the data distributed in Secondary Index Subtables in Teradata?

When a user creates a Secondary Index, Teradata automatically creates a Secondary Index
Subtable. The subtable will contain the:
Secondary Index Value
Secondary Index Row ID
Primary Index Row ID

When a user writes an SQL query that has an SI in the WHERE clause, the Parsing Engine
will Hash the Secondary Index Value. The output is the Row Hash, which points to a bucket
in the Hash Map.
That bucket contains an AMP number and the Parsing Engine then knows which AMP contains
the Secondary Index Subtable pertaining to the requested USI information.

The PE will direct the chosen AMP to look-up the Row Hash in the Subtable. The AMP will
check to see if the Row Hash exists in the Subtable and double check the subtable row
with the actual secondary index value. Then, the AMP will pass the Primary Index Row ID
back up the BYNET network. This request is directed to the AMP with the base table row,
which is then easily retrieved.

What are the types of JOINs available in Teradata?

Types of JOINs are : Inner Join, Outer Join (Left, Right, Full), Self Join, Cross Join
and Cartesian Joins.

Teradata Tuts

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Teradata Tuts

Enviado por

Direitos autorais:

Formatos disponíveis

ello guys....

Você também pode gostar