Você está na página 1de 8

Parser Engine Component

Parser
Optimizer
Generator
Dispatcher
Parser receives the request from client system and performs the following actions
through its different components:
SYNTAXER: Checks syntax and comes up with parse tree.
RESOLVER: Adds additional info from data dictionary.
SECURITY MODULE: Checks for access on the objects in the request sent.
Optimizer
Comes up with the most optimized way to execute a query
Scans the request to find out the locks required on the objects.
Passes the optimized parse tree to the generator.

Generator
Comes up with the plastic steps to execute a query.
Caches the plastic steps in a request cache.
Passes these steps to GNC apply which actually binds in parameters and
comes up with concrete steps.
Dispatcher
It controls the sequence in which steps are executed.
Performs the following four major tasks:
Receives concrete steps from GNC apply.
Sends the first step to BYNET which then send to specific AMP for processing.
Receives completion request from AMP.
Places next step in BYNET.

PRIMARY INDEX

HASHING ALOGRITHM
When the primary index value of a row is input to the hashing algorithm, then the output is called the row hash. Row
hash is the logical storage address of the row, and identifies the amp of the row. Also, the table id plus the row hash
identifies the cylinder and data block, and is used for row distribution, placement and retrieval of the row. Based on the row
hash uniqueness, data distribution happens.
The table id is a sequential number assigned whenever a table is created. This number changes whenever a table is recreated.
Hash code redistribution is used in join operation. This is used when the foreign key (join column) of a table (i.e. table A) is
joined to a primary index of another table (i.e. table B). For each table A row, the row hash of the foreign key is calculated.
Then, the table A row is sent to the amp dictated by the rowhash, which is the same amp that contains table Bs row for that
row hash.
Join column hash code sequence is the result of a sorting. The row hash of the foreign key (join column) of a table (i.e.
table A) is sorted into this sequence. These are matched in sequence to the other table (i.e. table B) on the same amp.

PART2
Teradata uses a HASHING algorithm to distribute rows among various AMPs. The process of rows distribution is unique to
Teradata and obviously is the core reason behind the parallel architecture of TERADATA. To understand the process of Rows
Distribution , refer to below diagram.

ROWS-DISTRIBUTION-IN-TERADATA

TERADATA uses indexes to determine the distribution of rows. Teradata uses a hashing algorithm which processes the index
and gives the HASH VALUE. Depending on the HASH VALUE , it refers to HASH MAP to decide the HASH BUCKET and

hence HASH AMP. That particular AMP will store that record. Similary there are other AMPs also receiving their share of
records. So each record is stored in specific AMP depending on the HASH VALUE. This is the reason it is suggested that the
columns with more unique values and used in joins etc are preferable index columns. So whenever set of records are
received , index columns are processed and are stored in respective AMPs. Since the work is distributed between AMPs
TERADATA is so swift. So we can say TERADATA is as fast as its slowest AMP.

Você também pode gostar