Você está na página 1de 5

COLLECT STATS

COLLECT STATS is one on the most useful utility in Teradata.


The Purpose of COLLECT STATISTICS is to gather and store demographic data for one or
more columns or indexes of a table or join index.
The optimizer uses this synopsis data to generate efficient table access and join plans

Below are the statistics will collect


The number of rows in the table
The average row size
Information on all Indexes in which statistics were collected
The range of values for the column(s) in which statistics were collected
The number of rows per value for the column(s) in which statistics were collected
The number of NULLs for the column(s) in which statistics were collected

But there is a serious confusion between doing COLLECT STATS on column level or
table level.

Here I would like to explain the difference between both the scenarios with
appropriate example.

Suppose we have a table and we want to collect statistics on 3 columns. We can do this by
the below mentioned query
collect stats on TABLE_NAME column(COL1);
collect stats on TABLE_NAME column(COL2);
collect stats on TABLE_NAME column(COL3);
As we discussed above that the stats can be defined on both the column level and the
table level. Below is the example for defining stats at table level.

The other way of defining COLLECT STATS on the same table is


collect stats on TABLE_NAME;

The second query is collecting stats on table level. Both the approach will do the same thing, but
we cannot directly collect stats on table level.
If you are collecting STATS on table level, then the STATS must already be defined on the above
mentioned 3 columns of the table.
This can be done at the time of creation of table, below is the example
CREATE TABLE Subject2 AS (
SELECT EMP_ID
, EMP_NAME
, salary
FROM Employees
)WITH DATA
PRIMARY INDEX (EMP_ID);
COLLECT STATISTICS Subject2 INDEX(EMP_ID);

If we are not defining the STATS for the columns earlier, then our COLLECT STATS on table level
will give an error message
Collect stats on table can only be used on a table which has stats defined on it, on any no. of
columns for that being defined.
Once stats are defined on the columns you can you use collect stats on table for refresh the
stats for all the defined columns.
If you use collect stats on TABLE_NAME column(COL1)
It will refresh the stats on the mentioned column (COL1) only.
We can say that COLLECT STATS on table level is just the shortcut of collecting stats on all
the columns on whom we have already defined stats.

It saves the overhead of writing COLLECT STATS query on each column, each time we want
to gather statistics.
In our case there are only 3 columns whose statistics we want, but suppose in a huge table if
there are more than 10-20 columns which are required for COLLECT STATS.
then our COLLECT STATS on table level saves us a lot of typing time. Its good practice to
perform a collect stats on the columns of even an empty table.

when data is loaded into the table a collect stats on table can be used to collect all the
statistics without having to collect statistics on the individual columns.
If you want to see the stats which are defined on particular table we can use the
help command
help statistics subject1;

Você também pode gostar