Você está na página 1de 12

1) How many error tables are there in fload and what are their significance/use?

Can we see the data of error tables?


How many error tables are their in mload and what is there use?
When mload job fails, can we access mload tables? If yes then how?

Fload uses 2 error tables

Error table 1: where format of data is not correct.


Error table 2: violations of UPI

Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table

1. ET TABLE - Data error


MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found
during the acquisition phase of a MultiLoad import task.
2. UV TABLE - UPI violations
MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found
during the application phase of a MultiLoad import or delete task
3. WORK TABLE - WT
Mload loads the selected records in the work table
4. LOG TABLE
A log table maintains record of all checkpoints related to the load job, it is essential/madatory to
specify a log table in mload job. This table will be useful in case you have a job abort or restart due to
any reason.
2) What are set tables and multiset tables in Teradata?Explain with an appropriate example?
Set table does not allow dupicate values. Multiset table allows duplicates.
3) Teradata Optimization and Performance Tuning
Optimization is the technique of selecting the least expensive plan (fastest plan) for the query to
fetch results.
Optimization is directly proportional to the availibility of --
1. CPU resources
2. Systems resources - amps PEs etc.

Teradata performance tuning is a technique of improving the process in order for query to perform
faster with the minimal use of CPU resources.
4) How many types errors will be occur in total SPOOL process . How will you connect
a database server to other server?
We can connect from one server to another server in UNIX using the command
ssh or FTP or SU

5) Join Stratagies
There are 2 tables, table A with 10 million records, table B has 100 million records, now we are
joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it
will redistributes it

Now the Question is: By that plan is the optimizer is correct job or not ? Justify Ans
2. From the same above example now the optimizer is taking Table B (100 million records) and it
is distributing it,

Now is the optimizer is doing best? and How you avoid this situation
Teradata is smart enough to decide when to redistribute and when to copy....
It compares the tables. Are they comparable? or one is big as compared to the other?
Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to
copy....what I mean is the small table is copied into all the AMPs in the SPOOL space...Remember all
always the JOINs has to take place on the AMPs SPOOL Space...By redistributing it is making sure that
the 100 million rows table gets the feeling that it is making AMP local JOIN...

Remember the basic thing what ever Teradata does...it does keeping in consideration for Space and
Performance and not to forget the Effiiciency...

My simple formula: If the table is small redistribute them to all the AMPs to have the AMP local Join.
Always JOINs are made AMP local if it cannot then you have the high chance of running out
of SPOOL space.
6) what is explain in teradata?
The EXPLAIN facility is a teradata extension that provides you with an "ENGLISH" translation of the
steps choosen by the optimizer to execute an SQL statement.It may be used oin any
valid teradata database with a preface called "EXPLAIN".

The following is an example:-

EXPLAIN select last_name first_name FROM employees;

The EXPLAIN parses the SQL statement but does not execute it.

This provides the designer with an "execution stratergy".

The execution stratergy provides what an optimizer does but not why it choses them.

The EXPLAIN facility is used to analyze all joins and complex queries.
7)

What is the difference between Global temporary tables


and Volatiletemporary tables?
Global Temporary tables (GTT) -
1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. thats why, data is active upto the session ends, and definition will remain there upto its not
dropped using Drop table statement.
If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.

Volatile Temporary tables (VTT) -


1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.
8) How teradata makes sure that there are no duplicate rows being inserted when its a SET
table?
Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash
value), and if it find same row hash value in that AMP (hash synonyms) then it start comparing the
whole row, and find out if duplicate.
If its a duplicate it silently skips it without throwing any error.
9) After creating tables dynamically in the Teardata, where is the GRANT table option usually
done ? When tables are newly created, what is the default role and what the default
privileges which get assigned?
The GRANT option for any particular table depends on the privilages of the user. If it is an
admin user you can grant privilages at any point of time.

The deafult roles associated with the newly created tables depend on he schema in which
they are created.

10) What is cliques? What is Vdisk and how it will communicate


with physicaldata storage at the time of data retrieval through
AMP ?
A clique is a set of Teradata nodes that share a common set of disk arrays. Cabling a subset of
nodes to the same disk arrays creates a clique. Each AMP vproc must have access to an array
controller which in turn accesses the physical disks. AMP vprocs are associated with one or
more ranks (or mirrored pairs) of data. The total disk space associated with an AMP is called a
vdisk. A vdisk may have up to three ranks. Hence Vdisk will communicate with physical
storage through array controllers
VDisk provides the protection against the disk failure
Node provides the protection against the AMP failure
Clique provides the protection against the Node failure
All the Disks(Dsik<VDisk<Amp<Node) under a Clique are physically connected to each other and
grouped logically under VDisk < under AMP < under Node.

In case of one Disk failure the mirrored disk would provide the fallback protection
In case of AMP failure the second AMP is used to access the data on Disk/VDisk
In case of Node failure the alternate node provide the access to the data on Disk/ADisk
In case of Clique failure the alternate clique would provide the fallback protection.

See the actual data is on the disk and that disk can be accessed through any AMP Node
or Clique in case the default AMP Node Clique or even the BYNet is failed.

11) How do Indexes optimize the query performance?


Indexing is a way to physically reorganise the records to enable some frequently used queries
to run faster.

The index can be used as a pointer to the large table. It helps to locate the required row
quickly and then return ot back to the user.

or

The frequesntly used queries need not hit a large table for data. they can get what they want
from the indexitself. - cover queries.

Index comes with the overhead of maintanance. Teradata maintains its index by itself. Each
time an insert/update/delete is done on the table the indexes will also need to be updated and
maintained.

Indexes cannot be accessed directly by users. Only the optimizer has access to the index.
12)
What is a common data source for the central enterprise data
warehouse?
operational data stores
13) What is the difference between Multiload & Fastload interms of Performance?
Answer-1:
If you want to load, empty table then you use the fastload, so it will very usefull than the mutiload..because
fastload performs the loading of the data in

2phase..and it noneed a work table for loading the data.., so it is faster as well as it follows the below steps
to load the data in the table

Phase1-It moves all the records to all the AMP first without any hashing
Phase2-After giving endloading command, Amp will hashes the record and send it to the appropriate
AMPS .

Multiload:

It does the loading in the 5 phases


Phase1:It will get the import file and checks the script
Phase2:It reads the record from the base table and store in the work table
Phase3:In this Application phase it locks the table header
Phase4:In the DML opreation will done in the tables
Phase 5: In this table locks will be released and work tables will be dropped

14)Teradata performance tuning and optimization


collecting statistics
Explain Statements
Avoid Product Joins when possible
select appropriate primary index to avoid skewness in storage
Avoid Redistributions when possible
Use sub-selects instead of big "IN" lists
Use drived tables
Use GROUP BY instead of DISTINCT ( GROUP BY sorts the data locally on the VPROC.
DISTINCT sorts the data after it is redistributed)
Use Compression on large tables

15) Why MLOAD needs Work Tables?


Work Tables are used to receive and sort data and SQL on each AMP prior
to storing them permanently to disk.The purpose of worktables is to
hold two things:1. The Data Manipulation Language (DML) tasks 2. The
input data that is ready to APPLY to the AMPs MultiLoad will
automatically create one worktable for each target table. This means
that in IMPORT mode you could have one or more worktables. In the
DELETE mode, you will only have one worktable since that mode only
works on one target table.

16) Write a single SQL to delete duplicate records from the a single table
based on a column value. I need only Unique records at the end of the
Query
Nested query method might be required in other databases how ever in TD
we don’t need to follow such a difficult way to just find out the
unique rows. In TD we have functions like Rank () and Rownum() in the
combination of Qualify, helps you to select out the rows which you
wants to delete.you can add a condition like ‘Where Rank() > 1’

17)Why Fload doesn’t support multiset table?


restart logic is the reason that FastLoad will not load duplicate rows into a
MULTISET table. after you restart the fast load job,
Therefore, some number of rows will be sent to the AMPs
again because the restart starts on the next record after
the value stored in the checkpoint. Hence, when a restart
occurs, the first row after the checkpoint and some of the
consecutive rows are sent a second time. These will be
caught as duplicate rows after the sort. This restart logic
is the reason that FastLoad will not load duplicate rows
into a MULTISET table. It assumes they are duplicates
because of this logic.

18)Why MultiLoad Utility supports only Non Unique


Secondary Index(NUSI) in the Target Table ?
Like FastLoad, MultiLoad does not support Unique Secondary Indexes (USIs).
But unlike FastLoad, it does support the use of Non-Unique Secondary
Indexes (NUSIs) because the index subtable row is on the same AMP as the
data row. MultiLoad uses every AMP independently and in parallel. If two
AMPs must communicate, they are not independent. Therefore, a NUSI (same
AMP) is fine, but a USI (different AMP) is not.

19)We can find the information of all the indexes in the system table
"dbc.indices"
20)Can we load a Multi set table using MLOAD?

We can Load SET, MULTISET tables using Mload, But here when
loading into MULTISET table using MLOAD duplicate rows will
not be rejected, we have to take care of them before
loading.

But incase of Fload when we are loading into MULTISET


duplicate rows are automatically rejected, FLOAD will not
load duplicate rows weather table is SET or MULTISET

21) Types of Tables in Teradata :

1.Derived
2.Volatile
3.Global Temp.
4.Permanent
4.1. SET
4.2. Multiset

22)What is FILLER command in Teradata?


while using the mload of fastload if you don;t want to load a particular
filed in the datafile to the target then use this filler command to
achieve this

23) Restart Multiload

If the data / data structure of the table is changed then drop the
worktables, error tables and log tables and release mload from the table in
which it is supposed to insert values.
If mload fails due to any other error then simply restart it after fixing
that error and it will resume from the check point.

24)can I use “drop” statement in the utility “fload”?

YES,

But you have to declare it out of the FLOAD Block it means


it should not come between .begin loading,.end loading

FLOAD also supports DELETE,CREATE,DROP statements which we


have to declare out of FLOAD block

in the FLOAD Block we can give only INSERT

25)
fast load

4 tables
any loader utility will have 2 tables
log and target table
for mload and fload we have ERROR TABLE(ET) AND UV(unique value table)
errors related to unique value will be loaded into uv table
data conversion errors will be loaded into ET table

IN MLOAD , there are 5 tables


work table to load the data from the source

5 phases
1) preliminary phase- all mload commands and sql syntaxes are checked .sessions are defines and
support tables are created
2) DML transaction phase: sql is being to PE. sql plans are built
3) acquisition phase: date is being captured and data row is assigned to each row based on hashing
4) application : date is being sorted
5) clean up phase: cleaning all the logs and closing the sessions.

What is explain and how does it work?

Answer-1:
The EXPLAIN facility is a teradata extension that provides you with an "ENGLISH" translation of the steps choosen
by the optimizer to execute an SQL

statement.It may be used oin any valid teradata database with a preface called "EXPLAIN".

The following is an example:-

EXPLAIN select last_name,first_name FROM employees;

The EXPLAIN parses the SQL statement but does not execute it.

This provides the designer with an "execution stratergy".

The execution stratergy provides what an optimizer does but not why it choses them.

The EXPLAIN facility is used to analyze all joins and complex queries.

What is an optimization and performance tuning and how does it really work in practical projects?

Answer-1:
Performance tuning and optimization of a query involves collecting statistics on join columns, avoiding cross product
join, selection of appropriate primary

index (to avoid skewness in storage) and using secondary index.

Avoiding NUSI is advisable.

What is the difference between Global temporary tables and Volatile temporary tables?

Global Temporary tables (GTT) -


1. When they are created, its definition goes into Data Dictionary.
2. When materialized data goes in temp space.
3. thats why, data is active upto the session ends, and definition will remain there upto its not dropped using Drop
table statement.
If dropped from some other session then its should be Drop table all;
4. you can collect stats on GTT.

Volatile Temporary tables (VTT) -


1. Table Definition is stored in System cache
2. Data is stored in spool space.
3. thats why, data and table definition both are active only upto session ends.
4. No collect stats for VTT.

How teradata makes sure that there are no duplicate rows being inserted when its a SET table?
Answer-1:
Teradata will redirect the new inserted row as per its PI to the target AMP (on the basis of its row hash value), and if it
find same row hash value in that

AMP (hash synonyms) then it start comparing the whole row, and find out if duplicate.
If its a duplicate it silently skips it without throwing any error.

Fload, Mload and error tables:

[How many error tables are there in fload and what are their significance/use?
Can we see the data of error tables?
How many error tables are their in mload and what is there use?
When mload job fails, can we access mload tables? If yes then how?]

Answer-1:

load uses 2 error tables

Error table 1: where format of data is not correct.

Error table 2: violations of UPI

Mload also uses 2 error tables (ET and UV), 1 work table and 1 log table

1. ET TABLE - Data error

MultiLoad uses the ET table, also called the Acquisition Phase error table, to store data errors found during the
acquisition phase of a MultiLoad import

task.

2. UV TABLE - UPI violations

MultiLoad uses the UV table, also called the Application Phase error table, to store data errors found during the
application phase of a MultiLoad import or

delete task

3. WORK TABLE - WT

Mload loads the selected records in the work table

4. LOG TABLE
A log table maintains record of all checkpoints related to the load job, it is essential/madatory to specify a log table in
mload job. This table will be

useful in case you have a job abort or restart due to any reason.

What are the enhanced features in Teradata V2R5 and V2R6?

V2R6 included the feature of replica in it.in which copy of data base are available on another system.meam V2R6
provide the additional data protaction as

comprison to V2R5 while if data from one system has been vanishes.

After creating tables dynamically in the Teardata, where is the GRANT table option usually done ? When tables are
newly created, what is the default role and

What the default privileges which get assigned ?

Answer-1:
The GRANT option for any particular table depends on the privilages of the user. If it is an admin user you can grant
privilages at any point of time.

The deafult roles associated with the newly created tables depend on he schema in which they are created.

What is cliques? What is Vdisk and how it will communicate with physical data storage at the time
of data retrieval through AMP ?

Answer-1:
A clique is a set of Teradata nodes that share a common set of disk arrays. Cabling a subset of nodes to the same
disk arrays creates a clique.

Each AMP vproc must have access to an array controller, which in turn accesses the physical disks. AMP vprocs are
associated with one or more ranks
(or mirrored pairs) of data. The total disk space associated with an AMP is called a vdisk. A vdisk may have up to
three ranks. Hence Vdisk will communicate

with physical storage through array controllers.

What is basic teradata query language?


Answer-1:
BTEQ(Basic teradata query)

It allows us to write SQL statements along with BTEQ commands. We can use BTEQ for importing,exporting and
reporting purposes.

The commands start with a (.) dot and can be terminated by using (;), it is not mandatory to use (;). SQL statements
doesnt start with a dot , but (;) is

compulsary to terminate the SQL statement.

BTEQ will assume any thing written with out a dot as a sql statement and requires a (;) to terminate it.

How many codd's rules are satisfied by teradata database?

Answer-1:
There are 12 codd's rules applied to the teradata database

What is the difference between Multiload & Fastload interms of Performance?

Answer-1:
If you want to load, empty table then you use the fastload, so it will very usefull than the mutiload..because fastload
performs the loading of the data in

2phase..and it noneed a work table for loading the data.., so it is faster as well as it follows the below steps to load
the data in the table

Phase1-It moves all the records to all the AMP first without any hashing
Phase2-After giving endloading command, Amp will hashes the record and send it to the appropriate AMPS .

Multiload:

It does the loading in the 5 phases


Phase1:It will get the import file and checks the script
Phase2:It reads the record from the base table and store in the work table
Phase3:In this Application phase it locks the table header
Phase4:In the DML opreation will done in the tables
Phase 5: In this table locks will be released and work tables will be dropped.

Does SDLC changes when you use Teradata instead of Oracle?

Answer-1:
If the teradata is going to be only a data base means It won’t change the System development life cycle (SDLC)
If you are going to use the teradata utilities then it will change the Architecture or SDLC
If your schema is going to be in 3NF then there won’t be huge in change

Which two statements are true about a foreign key?

Each Foreign Key must exist as a Primary Key.


Foreign Keys can change values over time.
Answer-1:
first : True
second : False
1. Foreign Keys can change values over time.
2. Each Foreign Key must exist as a Primary Key.

What are two examples of an OLTP environment?

# Transactions take a matter of seconds or less.

# Many transactions involve a small amount of data.

Answer-1:
On Line Banking
On Line Reservation (Transportation like Rail, Air etc.)

Answer-2:
1- ATM
2- POS

Answer-3:
OLTP is typified by a small number of rows (or records) or a few of many possible tables being accessed in a matter
of seconds or less. Very little I/O

processing is required to complete the transaction. For eg.

1. This type of transaction takes place when we take out money at an ATM. Once our card is validated, a debit
transaction takes place against our current

balance to reflect the amount of cash withdrawn.

2. This type of transaction also takes place when we deposit money into a checking account and the balance gets
updated.

We expect these transactions to be performed quickly. They must occur in real time.

Você também pode gostar