Escolar Documentos
Profissional Documentos
Cultura Documentos
Contents
Overview
Analytic Functions, which have been available since Oracle 8.1.6, are
designed to address such problems as "Calculate a running total", "Find
percentages within a group", "Top-N queries", "Compute a moving average"
and many more. Most of these problems can be solved using standard PL/
SQL, however the performance is often not what it should be. Analytic
Functions add extensions to the SQL language that not only make these
operations easier to code; they make them faster than could be achieved
with pure SQL or PL/SQL. These extensions are currently under review by the
ANSI SQL committee for inclusion in the SQL specification.
Analytic functions are the last set of operations performed in a query except
for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and
HAVING clauses are completed before the analytic functions are processed.
Therefore, analytic functions can appear only in the select list or ORDER BY
clause.
The Syntax
Analytic-Function(<Argument>,<Argument>,...)
OVER (
<Query-Partition-Clause>
<Order-By-Clause>
<Windowing-Clause>
)
● Analytic-Function
● Arguments
● Query-Partition-Clause
● Order-By-Clause
● Windowing-Clause
This example shows the cumulative salary within a departement row by row,
with each row including a summation of the prior rows salary.
Execution Plan
---------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 WINDOW (SORT)
2 1 TABLE ACCESS (FULL) OF 'EMP'
Statistics
---------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
1658 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed
The example shows how to calculate a "Running Total" for the entire
query. This is done using the entire ordered result set, via SUM(sal) OVER
(ORDER BY deptno, ename).
The execution plan shows, that the whole query is very well performed with
only 3 consistent gets, this can never be accomplished with standard SQL or
even PL/SQL.
Top-N Queries
There are some problems with Top-N queries however; mostly in the way
people phrase them. It is something to be careful about when designing
reports. Consider this seemingly sensible request:
because of repeated values, there might be four people who all make the
same salary, what should we do then ?
Let's look at three examples, all use the well known table EMP.
Example 1
Sort the sales people by salary from greatest to least. Give the first three
rows. If there are less then three people in a department, this will return less
than three records.
SELECT * FROM (
SELECT deptno, ename, sal, ROW_NUMBER()
OVER (
PARTITION BY deptno ORDER BY sal DESC
) Top3 FROM emp
)
WHERE Top3 <= 3
/
20 SCOTT 3000 1
FORD 3000 2
JONES 2975 3
30 BLAKE 2850 1
ALLEN 1600 2
TURNER 1500 3
9 rows selected.
Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT)
3 2 TABLE ACCESS (FULL) OF 'EMP'
This query works by sorting each partition (or group, which is the deptno), in
a descending order, based on the salary column and then assigning a
sequential row number to each row in the group as it is processed. The use
of a WHERE clause after doing this to get just the first three rows in each
partition.
Example 2
Give me the set of sales people who make the top 3 salaries - that is, find
the set of distinct salary amounts, sort them, take the largest three, and give
me everyone who makes one of those values.
SELECT * FROM (
SELECT deptno, ename, sal,
DENSE_RANK()
OVER (
PARTITION BY deptno ORDER BY sal desc
) TopN FROM emp
)
WHERE TopN <= 3
ORDER BY deptno, sal DESC
/
30 BLAKE 2850 1
ALLEN 1600 2
30 TURNER 1500 3
10 rows selected.
Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT PUSHED RANK)
3 2 TABLE ACCESS (FULL) OF 'EMP'
Here the DENSE_RANK function was used to get the top three salaries. We
assigned the dense rank to the salary column and sorted it in a descending
order.
The DENSE_RANK function does not skip numbers and will assign the same
number to those rows with the same value. Hence, after the result set is built
in the inline view, we can simply select all of the rows with a dense rank of
three or less, this gives us everyone who makes the top three salaries by
department number.
Windows
Let's look at an example with a sliding window within a group and compute
the sum of the current row's SAL column plus the previous 2 rows in that
group. If we need a report that shows the sum of the current employee's
salary with the preceding two salaries within a departement, it would look
like this.
The partition clause makes the SUM (sal) be computed within each
department, independent of the other groups. Tthe SUM (sal) is ' reset ' as
the department changes. The ORDER BY ENAME clause sorts the data within
each department by ENAME; this allows the window clause: ROWS 2
PRECEDING, to access the 2 rows prior to the current row in a group
in order to sum the salaries.
For example, if you note the SLIDING TOTAL value for SMITH is 6 7 7 5,
which is the sum of 800, 3000, and 2975. That was simply SMITH's row plus
the salary from the preceding two rows in the window.
Range Windows
Range windows collect rows together based on a WHERE clause. If I say '
range 5 preceding ' for example, this will generate a sliding window that has
the set of all preceding rows in the group such that they are within 5 units of
the current row. These units may either be numeric comparisons or date
comparisons and it is not valid to use RANGE with datatypes other than
numbers and dates.
Example
Count the employees which where hired within the last 100 days preceding
the own hiredate. The range window goes back 100 days from the current
row's hiredate and then counts the rows within this range. The solution ist to
use the following window specification:
As an example, compute the average salary of people hired within 100 days
before for each employee. The query looks like this:
Look at CLARK again, since we understand his range window within the
group. We can see that the average salary of 2758 is equal to (2975+2850
+2450)/3. This is the average of the salaries for CLARK and the rows
preceding CLARK, those of JONES and BLAKE. The data must be sorted in
ascending order.
Row Windows
Row Windows are physical units; physical number of rows, to include in the
window. For example you can calculate the average salary of a given record
with the (up to 5) employees hired before them or after them as follows:
The window consist of up to 6 rows, the current row and five rows " in front
of " this row, where " in front of " is defined by the ORDER BY clause. With
ROW partitions, we do not have the limitation of RANGE partition - the data
may be of any type and the order by may include many columns. Notice,
that we selected out a COUNT(*) as well. This is useful just to demonstrate
how many rows went into making up a given average. We can see clearly
that for ALLEN's record, the average salary computation for people hired
before him used only 2 records whereas the computation for salaries of
people hired after him used 6.
Frequently you want to access data not only from the current row but the
current row " in front of " or " behind " them. For example, let's say you need
a report that shows, by department all of the employees; their hire date;
how many days before was the last hire; how many days after was the next
hire.
Using straight SQL this query would be difficult to write. Not only that but its
performance would once again definitely be questionable. The approach I
typically took in the past was either to " select a select " or write a PL/SQL
function that would take some data from the current row and " find " the
previous and next rows data. This worked, but introduce large overhead into
both the development of the query and the run-time execution of the query.
set echo on
The LEAD and LAG routines could be considered a way to " index into your
partitioned group ". Using these functions you can access any individual row.
Notice for example in the above printout, it shows that the record for KING
includes the data (in bold red font) from the prior row (LAST HIRE) and the
next row (NEXT-HIRE). We can access the fields in records preceding or
following the current record in an ordered partition easily.
LAG
LAG provides access to more than one row of a table at the same time
without a self join. Given a series of rows returned from a query and a
position of the cursor, LAG provides access to a row at a given physical offset
prior to that position.
If you do not specify offset, then its default is 1. The optional default value is
returned if the offset goes beyond the scope of the window. If you do not
The following example provides, for each person in the EMP table, the salary
of the employee hired just before:
SELECT ename,hiredate,sal,
LAG(sal, 1, 0)
OVER (ORDER BY hiredate) AS PrevSal
FROM emp
WHERE job = 'CLERK';
LEAD
LEAD provides access to more than one row of a table at the same time
without a self join. Given a series of rows returned from a query and a
position of the cursor, LEAD provides access to a row at a given physical
offset beyond that position.
If you do not specify offset, then its default is 1. The optional default value is
returned if the offset goes beyond the scope of the table. If you do not
specify default, then its default value is null.
The following example provides, for each employee in the EMP table, the hire
date of the employee hired just after:
The FIRST_VALUE and LAST_VALUE functions allow you to select the first
and last rows from a group. These rows are especially valuable because they
are often used as the baselines in calculations.
Example
The following example selects, for each employee in each department, the
name of the employee with the lowest salary.
The following example selects, for each employee in each department, the
name of the employee with the highest salary.
Example
Let's say you want to show the top 3 salary earners in each department as
columns. The query needs to return exactly 1 row per department and the
row would have 4 columns. The DEPTNO, the name of the highest paid
employee in the department, the name of the next highest paid, and so on.
Using analytic functions this almost easy, without analytic functions this was
virtually impossible.
SELECT deptno,
MAX(DECODE(seq,1,ename,null)) first,
MAX(DECODE(seq,2,ename,null)) second,
MAX(DECODE(seq,3,ename,null)) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
GROUP BY deptno
/
Note the inner query, that assigned a sequence (RowNr) to each employee
by department number in order of salary.
30 WARD 1250 4
30 MARTIN 1250 5
30 JAMES 950 6
The DECODE in the outer query keeps only rows with sequences 1, 2 or 3
and assigns them to the correct "column". The GROUP BY gets rid of the
redundant rows and we are left with our collapsed result. It may be easier to
understand if you see the resultset without the aggregate function MAX
grouped by deptno.
SELECT deptno,
DECODE(seq,1,ename,null) first,
DECODE(seq,2,ename,null) second,
DECODE(seq,3,ename,null) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
/
Conclusion