Você está na página 1de 47

SQL Tips and Techniques

Paul Derouin
Learning Consultant
Teradata Learning
Table of Contents

SQL Tips and Techniques


Using CASE statement
Random Sampling
Dynamic SQL
Join and Aggregate Index
Timestamp Applications
Performance Reminders
Summary

2 pg.
Using Union For Set Tagging
Show the name of manager 1019 and the names of his direct reports.

SELECT first_name
,last_name
, ' employee ' AS "Employee//Type"
FROM employee
WHERE manager_employee_number = 1019
UNION
SELECT first_name
,last_name
,' manager '
FROM employee
WHERE employee_number = 1019
ORDER BY 2;

Employee
first_name last_name Type
---------------------------- -------------------- --------------
Carol Kanieski employee
Ron Kubic manager
John Stein employee

3 pg.
Using Union For Set Tagging (Cont.)

SELECT first_name ,last_name, ' employee ' AS "Employee//Type"


FROM employee
WHERE manager_employee_number = 1019
UNION
SELECTfirst_name ,last_name ,' manager '
FROM employee
WHERE employee_number = 1019
ORDER BY 2;
3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee by way of an all-rows
scan with a condition of ( "CUSTOMER_SERVICE.employee.manager_employee_number = 1019")
into Spool 1, which is redistributed by hash code to all AMPs. The size of Spool 1 is estimated with no
confidence to be 3 rows. The estimated time for this step is 0.16 seconds.
4) We do a single-AMP RETRIEVE step from CUSTOMER_SERVICE.employee by way of the unique
primary index "CUSTOMER_SERVICE.employee.employee_number = 1019" with no residual
conditions into Spool 1, which is redistributed by hash code to all AMPs. Then we do a SORT to order
Spool 1 by the sort
key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with high confidence to be
2 to 26 rows. The estimated time for this step is 0.15 seconds.

The total estimated time is 0.31 seconds

4 pg.
Using CASE For Set Tagging
Show the name of manager 1019 and the names of his direct reports.

SELECT first_name
,last_name
,CASE WHEN manager_employee_number = 1019 THEN 'employee'
WHEN employee_number = 1019 THEN 'manager'
ELSE NULL END
FROM employee
WHERE employee_number = 1019
OR manager_employee_number = 1019;

first_name last_name <CASE expression>


-------------------------- -------------------- ------------------------------
Carol Kanieski employee
Ron Kubic manager
John Stein employee

5 pg.
Using CASE For Set Tagging
SELECT first_name ,last_name
,CASE WHEN manager_employee_number = 1019 THEN 'employee'
WHEN employee_number = 1019 THEN 'manager'
ELSE NULL END
FROM employee
WHERE employee_number = 1019
OR manager_employee_number = 1019;

3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee by way of an all-rows


scan with a condition of ( "(CUSTOMER_SERVICE.employee.employee_number = 1019) OR
(CUSTOMER_SERVICE.employee.manager_employee_number = 1019)") into Spool 1, which is
built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 4 rows. The
estimated time for this step is 0.15 seconds.
The total estimated time is 0.15 seconds vs 0.31 for UNION

Use of CASE requires only a single table scan.

6 pg.
Reporting By Day of Week
Show the sales figures by day of week as seen below.
Day of
Week Sales
---------------- ----------
Sunday 2950.00
Monday 2200.00
Tuesday 2000.00
Wednesday 2100.00
Thursday 2000.00
Friday 2450.00
Saturday 3250.00

Useful for business purposes


Teradata System Calendar provides day of week as a numeric
Requires a join to the System Calendar

7 pg.
Creating a Day of Week Table

CREATE TABLE day_of_week


(numeric_day BYTEINT
,char_day CHAR(9)
)
UNIQUE PRIMARY INDEX (numeric_day);

INSERT INTO day_of_week VALUES (1, 'Sunday');


INSERT INTO day_of_week VALUES (2, 'Monday');
INSERT INTO day_of_week VALUES (3, 'Tuesday');
INSERT INTO day_of_week VALUES (4, 'Wednesday');
INSERT INTO day_of_week VALUES (5, 'Thursday');
INSERT INTO day_of_week VALUES (6, 'Friday');
INSERT INTO day_of_week VALUES (7, 'Saturday');

8 pg.
Using a Day Of Week Table
Show the sales figures by day of week.
Day of
Week Sales
SELECT dw.char_day "Day of// Week" ---------------- ----------
,SUM(ds.sales) AS Sales Sunday 2950.00
Monday 2200.00
FROM daily_sales ds Tuesday 2000.00
,sys_calendar.calendar sc Wednesday 2100.00
Thursday 2000.00
, day_of_week dw Friday 2450.00
WHERE sc.calendar_date = ds.salesdate Saturday 3250.00
AND sc.day_of_week = dw.numeric_day
GROUP BY 1, dw.numeric_day
ORDER BY dw.numeric_day;

•Requires joining three tables using two join conditions


•Day of Week table has only seven rows
Total cost of this query is approx .47

9 pg.
Using CASE Statement
SELECT CASE sc.day_of_week
WHEN 1 then 'Sunday' Same Result
WHEN 2 then 'Monday' Day of
Week Sales
WHEN 3 then 'Tuesday' ---------------- ----------
WHEN 4 then 'Wednesday' Sunday 2950.00
WHEN 5 then 'Thursday' Monday 2200.00
Tuesday 2000.00
WHEN 6 then 'Friday' Wednesday 2100.00
WHEN 7 then 'Saturday' Thursday 2000.00
ELSE 'Not Found' END Friday 2450.00
AS "Day of// Week" Saturday 3250.00

,SUM(ds.sales) AS Sales
FROM daily_sales ds ,sys_calendar.calendar sc
WHERE sc.calendar_date = ds.salesdate
GROUP BY 1, sc.day_of_week
ORDER BY sc.day_of_week;
Requires joining only two tables using one join condition
Total cost of this query is approx .35

10 pg.
RANDOM Function
The RANDOM function may be used to generate a random number between a
specified range.

RANDOM (Lower limit, Upper limit) returns a random number between the lower and
upper limits inclusive. Both limits must be specified.

Example: Assign a random number between 1 and 9 to each department.

SELECT department_number, RANDOM(1,9) FROM department;

department_number Random(1,9)
----------------- -----------
501 2
301 6
Note it is possible for random
201 3
numbers to repeat. The RANDOM
600 7
function is activated for each row
100 3
processed, thus duplicate random
402 2
values are possible.
403 1
302 5
401 1

11 pg.
Duplicate RANDOM Values
Duplicate value likelihood may be reduced by increasing the size of the RANDOM
interval relative to the size of the table.

Example: Assign a random number between 1 and 100 to each department.

SELECT department_number
, RANDOM(1,100)
FROM department;
department_number Random(1,100)
----------------- -------------
501 15
301 19
201 71
600 75
100 61
402 41
403 81
302 31
401 59

Note that no duplicates were generated because the pool of possible


values is over ten times the number of rows to be assigned.

12 pg.
Duplicate RANDOM Values (cont'd)
Duplicate random values can be increased, by decreasing the size of the RANDOM
interval relative to the size of the table.

Example: Assign a random number between 1 and 3 to each department.

SELECT department_number, RANDOM(1,3) FROM department;


department_number Random(1,3)
----------------- -----------
501 2
301 3
201 3
600 1
100 3
402 2
403 1
302 2
401 1

With only three values to distribute over nine rows, duplicates are necessary.

13 pg.
RANDOM Sampling
Consider the following distribution of employee salaries.

Salary Range Count


------------- -------
$ 0 to < $30K 6
$30 to < $40K 9
$40 to < $50K 4
$50K + 7

Problem: Select a sample representing two thirds of the employees making


under $30,000. Use the RANDOM function to accomplish this.

SELECT employee_number Because of the nature of random


, salary_amount number generation, we end up with a
FROM employee 50% sample (3 out of 6) instead of a
WHERE (salary_amount < 30000 67% sample (4 out of 6).
AND RANDOM(1,3) < 3);
employee_number salary_amount
--------------- -------------
1006 29450.00
1023 26500.00
1013 24500.00

14 pg.
Using The SAMPLE Function
A sample of a single group can also be generated and with more accuracy using
the SAMPLE function.

Solution 2:

SELECT employee_number
, salary_amount
FROM employee
WHERE salary_amount < 30000
SAMPLE .67;

employee_number salary_amount
--------------- -------------
1006 29450.00
1023 26500.00
1008 29250.00
1014 24500.00

4 out of 6 employees represents a 67% sample.

15 pg.
SAMPLE Function For Multiple Samples
Permits use of percentage or row count specification.
Used rows are not reusable for subsequent sample sets.

SELECT department_number SELECT department_number


,SAMPLEID ,SAMPLEID
FROM department FROM department
SAMPLE .25, .25, .50 SAMPLE 3, 5, 8
ORDER BY SAMPLEID ; ORDER BY SAMPLEID;

department_number SampleId department_number SampleId


301 1 301 1
403 1 403 1
302 2 302 1
401 2 401 2
100 3 100 2
402 3 402 2
201 3 201 2
600 3 501 2
501 3 600 3

16 pg.
Complex RANDOM Sampling
The RANDOM function can be used multiple times in the same SELECT statement, It
can be used to produce multiple samples, each using a separate criteria.

Example: Create a sample consisting of approximately 67% from each of the under
$50,000 salary ranges.
employee_number salary_amount
SELECT employee_number, salary_amount --------------- -------------
1014 24500.00
FROM employee 1001 25525.00
WHERE (salary_amount < 30000 1023 26500.00
1009 31000.00
AND RANDOM(1,3) < 3) 1005 31200.00
OR (salary_amount BETWEEN 30001 1004 36300.00
1003 37850.00
AND 40000 AND RANDOM(1,3) < 3) 1021 38750.00
OR (salary_amount BETWEEN 40001 1020
1002
39500.00
43100.00
AND 50000 AND RANDOM(1,3) < 3) 1024 43700.00
ORDER BY 2; 1010
1007
46000.00
49700.00
The result shows the following distribution:
Under $30,000 — 3 out of 6 (50%)

Between $30,000 and $39,999 — 6 out of 9 (67%)

Between $40,000 and $49,999 — 4 out of 4 (100%)

17 pg.
Complex RANDOM Sampling (cont'd)
Changing the size of the RANDOM range can affect the size of the returned sample.

Example: Perform the same query but change the size of the RANDOM range to 100.

SELECT employee_number, salary_amount employee_number salary_amount


FROM employee --------------- -------------
WHERE (salary_amount < 30000 1013 24500.00
AND RANDOM(1,100) < 68) 1023 26500.00
OR (salary_amount BETWEEN 30001 AND 40000 1005 31200.00
1022 32300.00
AND RANDOM(1,100) < 68)
1004 36300.00
OR (salary_amount BETWEEN 40001 AND 50000 1003 37850.00
AND RANDOM(1,100) < 68) 1007 49700.00
ORDER BY 2;

This result shows the following distribution:

Under $30,000 — 2 out of 6 (33%)

Between $30,000 and $39,999 — 4 out of 9 (44%)

Between $40,000 and $49,999 — 1 out of 4 (25%)

18 pg.
Sample Sizing Issues
The larger the pool of rows to be drawn from, the closer one can get to achieving a
specific percentage of rows in the sample.
SEL COUNT(*) FROM agent_sales
WHERE (sales_amt BETWEEN 20000 and 39999);
Returns 100 rows exactly
Each of the following examples attempts to return a 50% sample of the target rows.

SEL COUNT(*) FROM agent_sales


WHERE (sales_amt BETWEEN 20000 and 39999) AND RANDOM(1,100) < 51;

Returns 58 rows or 58%

SELECT COUNT(*) FROM agent_sales


WHERE (sales_amt BETWEEN 20000 and 39999) AND RANDOM(1,10) < 6;

Returns 53 rows or 53%

SELECT COUNT(*) FROM agent_sales


WHERE (sales_amt BETWEEN 20000 and 39999) AND RANDOM(1,4) < 3;

Returns 50 rows or 50%

The smaller the RANDOM range is defined relative to the size of the pool of rows,
the more accurately a specific percentage can be achieved.
19 pg.
Limitations On Use Of RANDOM

RANDOM is non-ANSI standard

RANDOM may be used in a SELECT list or a WHERE clause, but not


both

RANDOM may be used in Updating, Inserting or Deleting rows

RANDOM may not be used with aggregate or OLAP functions

RANDOM cannot be referenced by numeric position in a GROUP BY or


ORDER BY clause

20 pg.
V2R5 Sampling Features
Before V2R5:
Sampling without replacement
Proportional allocation
- each AMP provides same proportion of sample rows.

With V2R5:
Sampling with or without replacement (User choice)
Proportional allocation
- each AMP provides same proportion of sample rows.
Ramdomized allocation
- randomized across system - not AMP proportional.

21 pg.
Dynamic SQL and Static SQL
Dynamic SQL
- technique for generating and executing SQL commands dynamically from a
stored procedure at runtime.

Static SQL
- pre-constructed SQL compiled into the stored procedure.
- may be parameterized.
- still optimized prior to each execution
Static SQL Example
REPLACE PROCEDURE static_sql (IN sal DEC(9,2)
,IN emp_num INT)
BEGIN
UPDATE emp1
SET salary_amount = :sal
WHERE employee_number = :emp_num);
END;

CALL static_sql(50000, 1018);

22 pg.
Dynamic SQL (cont'd)
Dynamic SQL Example

REPLACE PROCEDURE dyn_sql (IN col1 CHAR(15)


,IN val1 CHAR(10)
,IN emp_num CHAR(8))
BEGIN
CALL DBC.SysExecSQL('UPDATE emp1 SET '|| :col1 || '= ' || :val1 || ' WHERE
employee_number = ' || :emp_num);
END;

CALL dyn_sql('salary_amount','50000','1018');
/* Updates employee 1018 salary_amount to $50,000 */

CALL dyn_sql('job_code','567890','1018');
/* Updates employee 1018 job_code to 567890 */

Dynamic SQL
- Constructed as a concatenated character string

- Passed to DBC.SysExecSQL for execution

- May be subject to run-time errors

23 pg.
Dynamic SQL (cont'd)
The following are restrictions on the use of Dynamic SQL within stored procedures:

Restrictions
The creating user must also be the owner of the procedure in order to have
the right to use dynamic SQL.

The size of the SQL command string cannot exceed 32000.

Multi-statement requests are not supported.

The ending semi-colon is optional on the SQL command.

The following SQL statements cannot be used as dynamic SQL in stored


procedures:

CALL SELECT
CREATE PROCEDURE SELECT INTO
DATABASE SET SESSION ACCOUNT
EXPLAIN SET SESSION COLLATION
HELP SET SESSION DATEFORM
REPLACE PROCEDURE SET TIME ZONE
SHOW
24 pg.
Join Indexes
A Join Index is an optional index which may be created by the
user for one of the following three purposes:
− Pre-join multiple tables(Multi-table Join Index)
− Distribute the rows of a single table on the hash value of a
foreign key value(Single-table Join Index)
− Aggregate one or more columns of a single table or multiple
tables into a summary table(Aggregate Join Index)
If possible, the optimizer will use a Join Index rather than access
tables directly
This typically will result in much better performance
Join Indexes are automatically updated as the table rows are
updated
A Join Index may not be accessed directly
It is a option which the optimizer may choose if the index ‘covers’
the query
25 pg.
Customer and Order Tables
CREATE TABLE customer
( cust_id INTEGER NOT NULL,
cust_name CHAR(15),
cust_addr CHAR(25) )UNIQUE PRIMARY INDEX ( cust_id );

CREATE TABLE orders


( order_id INTEGER NOT NULL,
order_date DATE FORMAT 'yyyy-mm-dd',
cust_id INTEGER,
order_status CHAR(1)) UNIQUE PRIMARY INDEX ( order_id );

CUSTOMERS 49 1 ORDERS
1

49 valid customers have orders.


1 valid customer has no orders.
1 order has an invalid customer.

26 pg.
Single Table Query
How many orders have assigned customers?

SELECT COUNT(order_id) FROM orders


WHERE cust-id IS NOT NULL;

Count(order_id)
----------------------
50
CUSTOMERS 49 1 ORDERS
1

A join index will not help this query


The table ‘orders’ covers the query

27 pg.
Will Join Index Help?
How many orders have assigned valid customers?

SELECT COUNT(o.order_id) FROM customer c INNER JOIN orders o


ON c.cust_id = o.cust_id;

Count(order_id)
------------------------
49
CUSTOMERS 49 1 ORDERS
1

A join index can help this query


Two tables are needed to cover the query

Query cost: .39 secs

28 pg.
Creating a Join Index
CREATE JOIN INDEX cust_ord_ix AS
SELECT (c.cust_id, cust_name),(order_id, order_status, order_date)
FROM customer c, orders o
WHERE c.cust_id = o.cust_id
PRIMARY INDEX (cust_id);

Fixed Portion Variable Portion


CUST_ID CUST_NAME ORDER_ID ORDER_STATUS ORDER_DATE
1001 ABC Corp 501 C 990120
502 C 990220
503 C 990320
504 C 990420
505 C 990520
506 C 990620
1002 BCD Corp 507 C 990122
508 C 990222
509 C 990322
: : :

29 pg.
With Join Index
How many orders have assigned valid customers?

SELECT COUNT(o.order_id) FROM customer c INNER JOIN orders o


ON c.cust_id = o.cust_id;

Count(order_id)
------------------------
49 CUSTOMERS 49 1 ORDERS
1

Same SQL query


Optimizer picks Join Index rather than doing a join
Join Index covers query

Without Join Index .39 secs


With Join Index .17 secs

30 pg.
Join Index Coverage
How many valid customers have assigned orders in January 1999?

SELECT COUNT(C.CUST_ID) FROM customer c INNER JOIN orders o


ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN 990101 AND 990131;
Count(cust_id) Join Index
---------------------- CUST_ID CUST_NAME ORDER_ID ORDER_STATUS ORDER_DATE
1001 ABC Corp 501 C 990120
9 502 C 990220
503 C 990320
504 C 990420
505 C 990520
Order_date is part of Join Index 506 C 990620
1002 BCD Corp 507 C 990122
Join Index covers query 508 C 990222
509 C 990322
Optimizer picks Join Index

Without Join Index .40 secs


With Join Index .17 secs

31 pg.
Join Index Comparison
Name the valid customers who have open orders in January 1999?

SELECT c.cust_name FROM customer c INNER JOIN orders o


ON c.cust_id = o.cust_id
WHERE o.order_date BETWEEN 990101 and 990131
AND o.order_status = ‘O’;
Join Index
cust_name CUST_ID
1001
CUST_NAME
ABC Corp
ORDER_ID
501
ORDER_STATUS
C
ORDER_DATE
990120
---------------- 502
503
C
C
990220
990320
JKL Corp 504
505
C
C
990420
990520
506 C 990620
1002 BCD Corp 507 C 990122
508 C 990222
509 C 990322

All referenced columns part of join index


Join Index covers query
Optimizer picks Join Index Without Join Index .23 secs
With Join Index .15 secs

32 pg.
Aggregate Join Indexes
Aggregate Join Indexes are:
• Designed for queries which use counts, sums and averages
• Extracted aggregated data optionally based on months or years
• An alternative to summary tables
• Automatically updated as base tables change
• An option for the optimizer when the index covers the query
• Are not compatible with Multiload or Fastload

33 pg.
Traditional Aggregation
SELECT EXTRACT(YEAR FROM salesdate) AS Yr
, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales
WHERE itemid = 10 AND Yr IN (‘1997’, ‘1998’)
GROUP BY 1,2
Yr Mon Sum(sales)
ORDER BY 1,2; ----------- ----------- --------------
1997 1 2150.00
Explanation 1997 2 1950.00
-------------------------------------------------------------------------- 1997 8 1950.00
1) First, we do a SUM step to aggregate from PED1.daily_sales by 1997 9 2100.00
way of the primary index "PED1.daily_sales.itemid = 10" with a 1998 1 1950.00
residual condition of ("((EXTRACT(YEAR FROM 1998 2 2100.00
(PED1.daily_sales.salesdate )))= 1997) OR ((EXTRACT(YEAR 1998 8 2200.00
FROM (PED1.daily_sales.salesdate )))= 1998)"), and the grouping 1998 9 2550.00
identifier in field 1. Aggregate Intermediate Results are
computed locally, then placed in Spool 2. The size of Spool 2 is
estimated with high confidence to be 1 to 1 rows.

34 pg.
Creating An Aggregate Index
CREATE SET TABLE daily_sales ,NO
FALLBACK ,
(
itemid INTEGER,
salesdate DATE FORMAT 'YY/MM/DD',
sales DECIMAL(9,2))
PRIMARY INDEX ( itemid );

CREATE JOIN INDEX monthly_sales AS


SELECT itemid AS Item
,EXTRACT(YEAR FROM salesdate) AS Yr
,EXTRACT(MONTH FROM salesdate) AS
Mon
,SUM(sales) AS SumSales
FROM daily_sales
GROUP BY 1,2,3;

35 pg.
Query Using Aggregate Index
SELECT EXTRACT(YEAR FROM salesdate)AS Yr
, EXTRACT(MONTH FROM salesdate)AS Mon
, SUM(sales)
FROM daily_sales
WHERE itemid = 10 AND Yr IN (‘1997’, ‘1998’)
Yr Mon Sum(sales)
GROUP BY 1,2 ------------ ----------- --------------
ORDER BY 1,2; 1997 1 2150.00
1997 2 1950.00
1997 8 1950.00
1997 9 2100.00
1998 1 1950.00
Explanation 1998 2 2100.00
----------------------------------------------------------------------- 1998 8 2200.00
1) First, we do a SUM step to aggregate from join index table 1998 9 2550.00
PED1.monthly_sales by way of the primary index
"PED1.monthly_sales.Item = 10", and the grouping identifier in
field 1. Aggregate Intermediate Results are computed locally,
then placed in Spool 2. The size of Spool 2 is estimated with low
confidence to be 4 to 4 rows.

36 pg.
Join Index Summary
A Join Index:
Is a denormalization tool
Pre-joins existing tables
Aggregates existing columns
Can improve performance for covered queries
Can join more than two tables
Can use inner, outer and cross joins
Costs additional disk space
Costs additional maintenance processing for updates
Cannot be accessed directly by SQL
Is a choice for the optimizer

37 pg.
ANSI Timestamp
Timestamp combines date and time into a single column.

TIMESTAMP(n) - Where n=(0-6)


Consists of 6 fields of information
YEAR,MONTH,DAY,HOUR,MINUTE,SECOND
Internal format is DATE(4 bytes) + TIME(6 bytes) = 10 bytes

Timestamp representation Character conversion


TIMESTAMP(0) 2001-12-07 11:37:58 CHAR(19)
TIMESTAMP(6) 2001-12-07 11:37:58.213000 CHAR(26)

CREATE TABLE tblb (tmstampb TIMESTAMP);


INSERT INTO tblb (CURRENT_TIMESTAMP);
SELECT * FROM tblb;

tmstampb
---------------------------------------
2001-11-06 13:48:38.580000

38 pg.
Timestamp + Interval
YEAR
Timestamp may be combined with any YEAR TO MONTH
day-time interval to produce a new MONTH
timestamp. DAY
DAY TO HOUR
TIMESTAMP + DAY TO MINUTE = TIMESTAMP
DAY TO SECOND
HOUR
HOUR TO MINUTE
MINUTE
MINUTE TO SECOND
SECOND
Subtract 2 yrs and 6 mos from the designated timestamp:
SELECT TIMESTAMP '1999-10-01 09:30:22'
- INTERVAL '2-06' YEAR TO MONTH;
1997-04-01 09:30:22

Subtract 1 hr, 20 mins and 10 secs from designated timestamp:


SELECT TIMESTAMP '1999-10-01 09:30:22'
- INTERVAL '01:20:10' HOUR TO SECOND;
1999-10-01 08:10:12

39 pg.
Timestamp Subtraction
YEAR
YEAR TO MONTH
TIMESTAMP - TIMESTAMP = MONTH
DAY
Given the following two timestamps, calculate the difference DAY TO HOUR
between them as directed: DAY TO MINUTE
DAY TO SECOND
In months? HOUR
HOUR TO MINUTE
SELECT (TIMESTAMP '1999-10-20 10:25:40' -
MINUTE
TIMESTAMP '1998-09-19 08:20:00') MONTH; MINUTE TO SECOND
SECOND
13
In years?
SELECT (TIMESTAMP '1999-10-20 10:25:40' -
TIMESTAMP '1998-09-19 08:20:00') YEAR;
1
In days?
SELECT (TIMESTAMP '1999-10-20 10:25:40' -
TIMESTAMP '1998-09-19 08:20:00') DAY(3);
396

40 pg.
Using Timestamp In An Application
CREATE TABLE Repair_time
( serial_number INTEGER
,product_desc CHAR(8)
,start_time TIMESTAMP(0)
,end_time TIMESTAMP(0))
UNIQUE PRIMARY INDEX (serial_number);

SELECT * FROM Repair_time ORDER BY 1;

serial_number product_desc start_time end_time


-------------------- ----------------- ---------------------------- ----------------------------
100 TV 2000-01-15 10:30:00 2000-01-17 13:20:00
101 TV 2000-01-20 08:30:00 2000-01-23 12:20:00
102 TV 2000-01-25 13:40:00 2000-01-26 14:20:00
103 TV 2000-02-02 11:30:00 2000-02-09 08:50:00
104 TV 2000-02-07 09:00:00 2000-02-10 08:50:00
105 TV 2000-02-10 08:40:00 2000-02-12 14:50:00
106 TV 2000-02-15 12:30:00 2000-02-20 15:20:00
107 TV 2000-02-19 14:30:00 2000-02-21 10:50:00
108 TV 2000-02-21 11:30:00 2000-02-23 16:40:00

41 pg.
Calculating Time Intervals
Produce a report showing each TV by serial number and how long in days, hours and
minutes it took to repair the TV?

SELECT serial_number, (end_time - start_time) DAY TO MINUTE AS work_time FROM


Repair_time ORDER BY 1;

serial_number work_time
------------------- --------------
100 2 02:50
What is the average amount of time it takes to repair a TV?
101 3 03:50
102 1 00:40 Show the answer in days, hours and minutes.
103 6 21:20
104 2 23:50 SELECT AVG( (end_time - start_time) DAY TO MINUTE)
105 2 06:10 AS avg_repair_time
106 5 02:50 FROM Repair_time;
107 1 20:20
108 2 05:10 avg_repair_time
---------------------
3 01:40

42 pg.
Comparing Intervals
Show the serial number and the number of days required for
each TV that took longer than 2 days to repair.

SELECT serial_number,
(end_time - start_time) DAY TO MINUTE
AS #_DaysHrsMns
FROM Repair_time
WHERE #_DaysHrsMns >
INTERVAL '02 00:00' DAY TO MINUTE;

serial_number #_DaysHrsMns
-------------------- --------------------
106 5 02:50
101 3 03:50
108 2 05:10
100 2 02:50
104 2 23:50
103 6 21:20
105 2 06:10

43 pg.
Advanced Use of Timestamp - Example 1

Produce a list which pairs by serial number any two TV’s that
were being repaired at the same time.

SELECT a.serial_number, b.serial_number


FROM Repair_time a CROSS JOIN Repair_time b
WHERE (a.start_time, a.end_time) OVERLAPS
(b.start_time, b.end_time)
AND a.serial_number < b.serial_number; serial_number serial_number
------------------- -------------------
106 107
103 104
104 105
Alternative Approach Using DISTINCT

SELECT DISTINCT a.serial_number, b.serial_number


FROM Repair_time a CROSS JOIN Repair_time b
WHERE (a.start_time, a.end_time) OVERLAPS
(b.start_time, b.end_time);

44 pg.
Advanced Use of Timestamp - Example 2
What percentage of all TV’s took 2 or more days to repair?
SELECT (100 * COUNT(serial) / cnt) (FORMAT '99%')
FROM (SELECT COUNT(*) FROM Repair_time) AS temp1(cnt),
(SELECT serial_number, (end_time - start_time) day AS Num_Days
FROM Repair_time
WHERE Num_days > INTERVAL '02' DAY) AS temp2(serial, Number_days)
GROUP BY cnt;
((100*Count(serial))/cnt)
----------------------------------
33% Incorrect Answer
SELECT (100 * COUNT(serial) / cnt) (FORMAT '99%')
FROM (SELECT COUNT(*) FROM Repair_time) AS temp1(cnt),
(SELECT serial_number, (end_time - start_time) day AS Num_Days
FROM Repair_time
WHERE Num_days > INTERVAL '02 00:00' DAY TO MINUTE;)
AS temp2(serial, Number_days)
GROUP BY cnt;
((100*Count(serial))/cnt)
----------------------------------
78% Correct Answer

45 pg.
Performance Reminders

• Consider use of CASE for small set values testing

•Use appropriate sampling functions - RANDOM or SAMPLE

• Use Dynamic SQL with Stored Procedures

• Join indexes can help query performance by pre-joining tables

• Aggregate indexes are preferable to aggregated views or tables

• Use TIMESTAMP and INTERVALS for time-related processing

46 pg.
Summary

• SQL is a very versatile language


• Usually, if there’s a will, there’s a way
• Often there are several ways to write a query
• Find the one that performs best, using EXPLAIN

47 pg.