Você está na página 1de 263

Lecture 1.

Introduction and Motivation

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Instructor and Student Introductions


Dr. Brian C. Dean
Email: bcdean@cs.clemson.edu
Office: McAdams 205
Office Hours: Preferably MWF 2:15-3:15,
although students welcome any time.
Research Interests: most areas of
algorithms and combinatorial optimization
Student introductions

Course Overview
CpSc 838 is a comprehensive, graduatelevel introduction to data structures.
It is a theory course, so we will analyze
the performance of data structures in a
mathematically rigorous fashion
Prereq: reasonable mathematical maturity
Prereq: some prior exposure to basic
algorithms and data structures.
No programming assignments.
Note: just because this is a theory course it
doesnt mean we ignore practical applicability!

Grading and Homework


Grading:

Weekly homework assignments: 35%


Two quizzes: 2 x 20%
Final exam: 25%
No preset letter grade cutoffs.

Homework:
Collaboration encouraged, you should write up
solutions independently.
List collaborators and cite external resources
Neatness is crucial. Typesetting encouraged.
Describe algorithms and data structures in a precise
yet concise fashion in English (preferable) or using
high-level pseudocode.

Textbook and Readings


Algorithms Explained, an electronic multimedia
textbook currently being written by the instructor.
Chapters to appear on the course website soon.
These chapters include hyperlinks to animated
whiteboard lectures that explain the details of
certain topics. Most of these are under construction,
and the instructor will try to make as many available
as possible over the course of the semester.

Introduction to Algorithms, by Cormen et al.


Various original papers will be posted on the
course website as the semester progresses.

Miscellaneous Details
Feedback:
Please feel free to ask for or give constructive
feedback at any time. Your instructor values quality
teaching very highly.
The instructor will occasionally solicit feedback on the
effectiveness of his animated whiteboard lectures (in
general, the instructor is interested in ways to use
technology to improve the effectiveness of teaching).

Algorithm seminar F3:30. Students (especially


those considering research in algorithms and
theory) are highly encouraged to attend.
Advanced graph algorithms TTh11:00-12:15.

Data Structures : Motivation


Data structures are fundamentally
important in computer science, both in
theory and in practice.
The study of data structures goes hand in
hand with the study of algorithms.
The fastest known algorithms for many
problems (e.g., shortest paths, network flows,
matchings) have fancy data structures to
thank for their remarkable efficiency.

Motivation : Databases
We can think of the records in a database as
points in a high-dimensional space:
Age

Household income

How do we store these points so that we can


answer range queries efficiently?
Ex: Tell me all records with age in the range [18, 24] and household
income in the range [$50,000, $80,000].

Data structures: kd-trees, range trees.

Motivation : Computational Geometry


Given a planar map,

preprocess it to build a point location


data structure that can quickly determine
the region in which the user clicks.

Motivation : BioInformatics
The human genome is a string of As, Cs, Gs,
and Ts that is roughly 3 billion characters in
length.
Quick: how many occurrences of CAT appear
in this string? At what positions to they occur?
TACATGCGCTACGTGCTTCATCGA
What is the longest substring that occurs twice
or more? What is the shortest substring that
doesnt occur?
Data structures: suffix arrays, suffix trees

Motivation : Graph Connectivity


Design a dynamic connectivity data structure
that can:
Insert edges
Delete edges
Query whether nodes x and y
are connected by some path.

x
y

Encoding connectivity:
Connectivity ij between nodes i and j denotes the
maximum # of edge-disjoint paths connecting i and j.
Using Cartesian trees, we can encode ij for all pairs
of nodes in an n-node graph in only O(n) space, such
that each ij can be obtained in only O(1) time!
The same data structure techniques are useful for
many other problems: minimum spanning tree
verification, reconstructing evolutionary trees, etc.

Motivation : Graph Algorithms


Disjoint set data structures
Minimum spanning trees (Kruskals algorithm)
Maximum matching in general graphs (Micali-Vazirani)

Fibonacci heaps
Shortest paths with nonnegative edge costs (Dijkstras)
Minimum spanning trees (Jarnik, Prim, Dijkstra)

Dynamic trees
Maximum flows (shortest augmenting path algorithm)
Maximum flows in planar graphs (Weihe, Borradaile-Klein)
Multi-source shortest paths in planar graphs (Klein)

Lowest common ancestor data structures


Minimum cost perfect matchings (Gabow)
Connectivity data structures (Gomory-Hu, Chazelle)

Motivation : Slow Memory


External memory data structures: B-trees
Cache-oblivious data structures:
Most computer memories are hierarchical:

Small fast cache


Larger slow cache
Even larger, slower main memory
Disk (very large and extremely slow)
Network

If we know the page size for block transfers


between levels, we can tune our data
structures accordingly.
But what if we dont know the page size?

Tools and Techniques


Amortized analysis
Useful for analyzing data structures with non-uniform
performance (most invocations fast, but a few are slow)
Worst case for a single operation looks bad, but worst-case
performance amortized over any sequence of operations is
typically much more reasonable.

Use of randomization to simplify the design


and/or analysis of a data structure

Randomized mergeable heaps


Randomized balancing for binary search trees
Skip lists
Universal hashing, perfect hashing
Randomized incremental construction of geometric data
structures

Abstract Data Types Versus Concrete


Implementations, Tradeoffs
A specification of a data structure (often called an abstract
data type) only states the operations to be supported, and
not how these are actually implemented:
Example: a priority queue maintains a set of elements with
associated keys and supports the operations:
Insert(e, k) : insert a new element e with a specified key k.
Remove-min : remove and return the element with minimum key.

There are usually many different ways to implement a


certain type of data structure, each with certain strengths
and weaknesses.
Example: Implementations of priority queues:
Insert

Remove-Min

Unsorted Array

O(1)

O(n)

Sorted Array

O(n)

O(1)

Binary Heap

O(log n)

O(log n)

Example: arrays and linked lists can both implement a sequence.

Coming Up
In the next lecture, we will say a few words
about models of computation.
Then, well spend a few lectures discussing
amortized analysis techniques.
And then well spend roughly one week
investigating our first class of data
structures, priority queues.
(binary heaps, leftist heaps, skew heaps, randomized
mergeable binary heaps, binomial heaps, Fibonacci heaps)

Lecture 2. Models of Computation,


Arrays & Linked Lists, Amortized Analysis

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Models of Computation
Question: What is the worst-case running
time of the fastest possible algorithm for
sorting n numbers?
A. O(n log n)
B. O(n (log log n)1/2)
C. O(n)
D. O(1)

Models of Computation
Answer: It depends on our model of
computation!
A. In the comparison model: O(n log n)
B. In the RAM model: O(n (log log n)1/2)
C. In the RAM model with small word size: O(n)
D. In a (very unrealistic!) model where our
machine has a sort n integers instruction: O(1)

Models of Computation
A model of computation defines the
primitive operations of the abstract
computing environment on which an
algorithm or data structure will execute.
E.g., addition, subtraction, memory read/write.

Allows us to analyze the running time of


an algorithm
Ideally, our model of computation should
reflect the actual capabilities and
limitations of a modern digital computer,
so our runtime analyses are realistic.

The RAM Model


RAM = Random Access Machine.
Our default model from now on.
Memory is a long array of integer words.
Memory accesses take O(1) time.
Random access to memory is allowed.

Simple arithmetic operations (addition,


multiplication, division, remainder, etc.) all take
O(1) time.
Is this a reasonable model for an actual
computer? (i.e., will runtime analysis in the RAM
model correspond well to the actual running time
of an algorithm on a real computer?)

The Comparison-Based Model


Input elements not necessarily numeric.
All we can do with input elements is compare
them pairwise to see if one element is <, >, or =
another element.
Careful! Algorithms in the comparison model often
have slower running times than those in the RAM
model, but this doesnt necessarily mean they are
worse -- they just dont have the advantage of the
more powerful RAM model.

Very general model: a comparison-based data


structure works perfectly fine when given any
type of comparable data (numbers, text strings,
etc.)

Other Models
Real RAM: Like the RAM, except words
can hold real numbers if desired.
Not realistic, but can simplify the development
of algorithms that deal with real #s.
Behaves similarly to the comparison model.

Pointer Machine: Like the RAM, except


random access prohibited. Can only
perform sequential access to memory and
follow pointers.

Lower Bounds
Weak models of computation like the
comparison model and the pointer
machine are useful for proving lower
bounds.
For example, any sorting algorithm in the
comparison model must take (n log n)
time in the worst case.
(on a RAM, we can sort faster than this!)

RAM vs. Comparison-Based Algorithms


and Data Structures
A RAM algorithm has the advantage of
knowing that its input contains integers.
It can exploit this fact by using tricks like
hashing and lookup tables, where input
elements are used as array indices.
Example: Given an integer x in the range
1..10, what is the largest power of 2 less
than x?
On a RAM, can answer this in 1 step using a
precomputed lookup table.
In the comparison model, we need to make
several comparisons to determine the answer.

RAM vs. Comparison-Based Algorithms


and Data Structures
For many data structures, we will see two
fundamentally different types of
implementations:
RAM implementations that require and exploit
integrality of the input
Comparison-based implementations that make no
such assumptions, and hence are somewhat more
general but slightly slower.

RAM data structures often have runtimes that


are input sensitive, depending on the
magnitude of the integers in the input.
Ex: a radix tree storing integers in the range 1..C
takes O(log C) time per operation.
Note that comparison-based algorithms are always
input-insensitive.

Annoying Subtlety: Word Size


For the RAM model, how many bits in a
word?
Constant number? Too small!
(log n)? Just about right.
(n)? Too many!

Standard assumption: (log n) bits in each


word.
Some fancy RAM data structures leverage
this fact to achieve an O(log n) speedup in
running time using lookup tables and wordlevel parallelism.

Switching Gears: Arrays & Linked Lists


Arrays and Linked Lists are two different ways
to implement a dynamic sequence.
Absolutely crucial to understand the strengths
and limitations of both arrays and linked lists!!!
Arrays support random access in O(1) time.
Since arrays must remain contiguous in memory, it
takes O(n) time to insert or delete an interior element.
Linked lists dont support random access O(n) time
required to scan to a particular element.
However, once weve scanned to the appropriate
position in a linked list, we can insert or delete in only
O(1) time.

Sometimes linked lists should be doubly-linked


for convenience.

Stacks and Queues


With both arrays and linked lists, its easy
to insert or delete elements from the
endpoints of a sequence in only O(1) time.
If we insert and delete from the same end,
we have a stack.
Insert and delete called push and pop.

If we insert and delete from different ends,


we have a FIFO queue.
Queues are often implemented using
circular arrays that wrap around within a
block of memory.

Memory Allocation
In most computing environments, memory
is allocated in fixed-size blocks.
Usually memory allocation takes only O(1)
time irrespective of block size, although we
arent guaranteed that our memory block will
be initialized

Since memory blocks often cannot expand


after allocation, what do we do when a
memory block fills up?
Example: suppose we allocate 100 words of
memory space for a stack (implemented as
an array), but then realize we have more than
100 elements to push onto the stack!

Memory Allocation : Successive Doubling


A common technique for block expansion:
whenever our current block fills up,
allocate a new block of twice its size and
transfer the contents to the new block.
Unfortunately, now some of our push
operations will be quite slow (although
most still run in O(1) time), so from a
worst-case perspective, the running time
of push can be quite bad.
This motivates the importance of
amortized analysis

Successive Doubling of Memory Blocks


How much time does it take to
perform a sequence of push
operations, if we start with a memory
block of size 1?
Op#:

Push:

Copy:
Total:

10

11

12

13

14

15

16

17

18

4
1

8
1

16
1

17

What is the best way to characterize


the running time of push?

Amortized Analysis
It turns out that any sequence of n pushes
requires at most 3n = O(n) units of time.
On average, each individual push
therefore takes O(1) time.
In this case, we say that push runs in O(1)
amortized time.
An operation runs in O(f(n)) amortized
time if any sequence of k such operations
runs in O(k f(n)) time.

Amortized Analysis : Motivation


Amortized analysis is an ideal way to
characterize the worst-case running time of
many operations with highly non-uniform
performance.
We are still performing a worst-case running
time analysis, just smoothed over a sequence
of operations.
Amortized analysis gives us a much clearer
picture of the true performance of a data
structure.
For example, if we had described our push operation
as having an O(n) worst-case running time, then this
might lead us to believe that n successive push
operations might take O(n2) time.

Amortized Analysis : Motivation


Suppose we have 2 implementations of a data
structure to choose from:
A: O(log n) worst-case time / operation.
B: O(log n) amortized time / operation.

There is no difference if we use either A or B as


part of a larger algorithm. For example, if our
algorithm makes n calls to the data structure, the
running time is O(n log n) in either case.
The choice between A and B only matters in a
real-time setting when the response time of an
individual operation is important.

10

Lecture 3. Amortized Analysis

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Amortized Analysis : Motivation


Amortized analysis is perhaps the most important
technique in the study of data structures.
Useful way to characterize the performance of
data structure operations with non-uniform
behavior (most invocations fast, but some slow).
From the perspective of the worst-case running
time of a larger algorithm using some data
structure D, it makes no difference if Ds
operations have amortized or worst-case
performance guarantees.

Definitions
We say an operation A requires O(f(n))
amortized time if any sequence of k
invocations of A requires O(k f(n)) time in
the worst case.
We say operations A and B have amortized
running times of O(fA(n)) and O(fB(n)) if any
sequence containing kA invocations of A
and kB invocations of B requires
O(kAfA(n) + kBfB(n)) time in the worst case.
And so on, for 3 or more operations

Trivial Example : Length-n Buffer


When writing a sequence of integers to disk,
suppose we hold them in a length-n buffer
and only write to disk when the buffer fills
up.
One operation: write(v)
Requires 1 unit of time, except every nth call
takes an additional 20n units of time to write
and clear the buffer.

Worst-case runtime of write: 20n + 1 = O(n).


Amortized running time: 21 = O(1).

Aggregate Analysis
Aggregate analysis is perhaps the simplest
approach to amortized analysis, since it follows
directly from the definition:
Compute the worst-case running time for an arbitrary
sequence of k operations, then divide by k.

For the length-n buffer, it is fairly easy to prove


that any sequence of k write operations takes at
most 21k units of time.
Unfortunately, aggregate analysis is usually hard
to apply, due to the difficulty in bounding the
running time of an arbitrary sequence of k
operations (especially if the operations are of
several types).

The Accounting Method


Mentally overcharge ourselves for earlier cheap
operations to build up sufficient credit to pay for
later expensive operations.
Real-life example: suppose we pay $50 per month
for gas, plus $1200 in December for car
maintenance.
The worst case monthy cost of owning our car is
$1250.
However, if we amortize the maintenance cost over the
whole year, we see it really only costs us $150 per
month (amortized).
Lets set aside $150 per month each month. $50 for
gas, and $100 into the bank. When we reach
December, we have enough credit to pay the entire
expensive maintenance cost!

The Accounting Method


With the length-n buffer example, let us
charge ourselves 21 units for each write
operation.
1 unit for the actual cost of the write.
20 units into the bank to pay for this elements
share of the upcoming expensive write.

When it comes time to empty the buffer, we


have accumulated a credit of 20n units of
time exactly what we need!

The Accounting Method


We are still spending the same amount of
running time, only accounting for some work
earlier than it actually happens.
Important: we never go into debt:
Can only account for work earlier than it
happens, not later.
At any point in time t, the amount of work we
have charged ourselves by t must be at least as
large as the amount of actual work done by t.

Life from the Perspective of an


Element of Data
In both algorithms and data structures, its
often useful to think about running time from
the perspective of individual elements of
data.
How much work is being done to me during my
lifetime in the data structure?
In the example of the length-n buffer: 21 units
In its simplest form, the accounting method just
charges all of this work (21 units) up front when
an element is inserted into the data structure.
Afterwards, subsequent operations involving
this particular element are all paid for.

Credit Attached to Data Elements


Using the accounting method, supose we charge
ourselves 21 units of running time to write an
element e into the buffer
We spend 1 unit actually performing the write.
The remaining 20 units contributes to a global credit
available for paying the cost of expensive future
operations.
Think of a $20 bill attached to the current element!

Our accounting scheme should be such that any


time we encounter an expensive operation, we
can always enough money in the data structure
to pay for it.

Example : Memory Allocation


Recall our earlier example of an expanding
stack where the push operation takes:
1 unit of time
plus n units of time if we fill up our current
memory buffer and as a result need to transfer
the entire stack into a new memory buffer of
size 2n.

Claim: push takes O(1) amortized time.


We could prove this with aggregate analysis,
but the accounting method is much easier

Example : Memory Allocation


Charge 3 units (still O(1) amortized time) for each
push operation.
1 unit for the immediate push.
$2 credit for future memory expansions.

When it comes time to expand our buffer from size


n to 2n (at a cost of n), note that exactly n/2 of the
elements in our current buffer have been newlyadded since the last memory expansion.
All these elements have $2 credit on them.
So we have $n worth of credit enough to pay for the
current memory expansion!
After expansion, no credit remains (subsequently-added
items will contribute toward next expansion).

Example : Memory Allocation


Actual running time of push:
Op#:

10

11

12

13

14

15

16

17

18

Push:

Copy:

Total:

4
5

8
9

16
17

Cumulative:

12

13

14

15

24

25

26

27

28

29

30

31

48

49

Amortized running time:


Total:

Cumulative:

12

15

18

21

24

27

30

33

36

39

42

45

48

51

54

Key property: Cumulative amortized time weve charged


ourselves must always be at least as large as the
cumulative amount of actual time spent so far (i.e., we
avoid going into debt).

Simple Problem : The Min-Stack


Lets try to implement a stack supporting each of
these operations in O(1) worst-case time:
Push
Pop
Find-min : return the minimum
value present in the stack.

Implementation:
Keep pointer to current minimum
Augment each element e with
a pointer to the minimum element
below e (i.e., the minimum at
the time e was pushed).

Top of
Stack
6
2
1
8

Current
Minimum

4
3
9

Harder Problem : The Min-Queue


Using either a linked list or a (circular)
array, it is easy to implement a FIFO
queue supporting the insert and delete
operations both in O(1) worst-case time.
Suppose that we also want to support a
find-min operation, which returns the value
of the minimum element currently present
in the queue.
It is possible to implement a min-queue
supporting insert, delete, and find-min all
in O(1) worst-case time?

A Pair of Back-to-Back Min-Stacks


Current Min

Current Min

Deleted elements popped


from this side

Newly-inserted elements
pushed onto this side
9

This side is
growing

and this
side is shrinking.

What happens when the yellow min-stack becomes empty?

A Pair of Back-to-Back Min-Stacks


Current Min

(empty)

O(n) time

(empty)

Current Min
9

When yellow stack becomes empty, spend O(n) time and


transfer the contents of blue stack into the yellow stack.
Worst-case running time for delete: O(n)

Amortized analysis
Charge insert 2 units of time: 1 for the push, and $1 in credit for each new element.
$1

$1

$1

$1

(empty)
Now we have enough credit to pay
for the entire transfer when it occurs!

(empty)

Final running times:


- Insert and Delete: O(1) amortized time
- Find-Min: O(1) worst-case time

Lecture 4. Amortized Analysis II

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Recap: Accounting Method


Account for some work earlier than it
actually happens.
Overcharge cheap operations to build up
sufficient credit to pay for expensive
operations.
Example: length-n buffer
Example: the min-queue
Example: successively doubling the
memory buffer for a stack.

Another Quick Example:


Incremental Priority Queues
A priority queue maintains a set of elements with
associated keys.
Keys indicate priorities. In this example, a high key value
indicates high priority .
Wed like to be able to locate and remove the highestpriority element efficiently.
Fundamental operations of a priority queue:
Insert(e)
remove-max

Well study general priority queues in a few days, but for


now, consider the special case of an incremental priority
queue:
Keys are nonnegative integers
We additionally support an increment-priority(e) operation that
takes a pointer to an element and increases its priority by 1.

Implementing an Incremental
Priority Queue
Current
Maximum
Nonempty
Bucket

e3

4
3

e7

e2

e4

e8

e1

e6

e5

2
1
0

All elements with key value = 1

The Remove-Max Operation


Current
Maximum
Nonempty
Bucket

e3

Returned by remove-max

4
3

e7

e2

e4

e8

e1

e6

e5

2
1
0

All elements with key value = 1

Analysis of Incremental Priority Queue


Let M denote the amount by which the
current maximum bucket pointer moves.
Worst-Case
Running Time
insert
1
increment-priority 1
remove-max
1+M (not bounded!)

Amortized
Running Time
1
2
1

All operations have O(1)


amortized running times!

Plenty of Credit to Spare


Current
Maximum
Nonempty
Bucket

5
$1

$1

$5

$5

$8

e3

e7

e2

e4

e8

e1

e6

e5

All elements with key value = 1

Example: Dynamic Memory Allocation


Revisited
Many data structures grow or shrink over time,
causing us to move them around in memory
among allocated blocks of different sizes.
Recall earlier example of a stack.
When the current memory block fills up due to
successive pushes, we allocate a new block of twice
the size and copy the entire contents of our stack over.
push requires O(1) amortized time.
We always keep our memory block at least half full (so
we dont waste much memory)

Suppose now that we add the pop operation.


We might end up wasting memory if we start with a
large stack and pop most of its elements.
Goal: Always use a memory buffer of size O(n), if n
denotes the number of elements currently in the stack.

Block Expansion and Contraction


Notation:
n = # elements currently in our stack.
m = size of our memory buffer.

We would like to maintain the invariant that


m = O(n), preferably with push and pop each
taking only O(1) amortized time.
Idea:

Expand buffer (to size 2m) if n = m.


Contract buffer (to size m/2) if n < m/2.
This ensures that m/2 n m.
What about the running time of push and pop?

Block Expansion and Contraction


First idea no good since push and pop take
(n) amortized time.
New idea:
Expand buffer (to size 2m) if n = m.
Contract buffer (to size m/2) if n < m/4.
This ensures that m/4 n m.

The accounting method shows that push


and pop run in only O(1) amortized time.
When we expand, we need m units of credit.
When we contract, we need m/4 units of credit.

Potential Functions
A potential function provides a somewhat formulaic way to
perform amortized analysis.
Its really just another way of looking at the accounting
method.
Express total amount of credit present in our data structure
using a non-negative potential function of the state of our
data structure.
Example: for the memory allocation problem, our
potential function is:

2n m if n m / 2
m / 2 n if n < m / 2

If n = m/2, then = 0. No credit right after expansion or contraction.


If n = m, then = m. Just enough credit to expand!
If n = m/4, then = m/4. Just enough credit to contract!

Potential Functions
Required properties of a potential function:
It should start out initially at zero (no credit initially).
It should be nonnegative (cant go into debt).

Some notation:
Let c1, c2, , ck denote the actual cost (running time) of
each of k successive invocations of some operation.
Let j denote the potential function value right after the
jth invocation.

The amortized cost aj of the jth operation is now:


aj = cj + (j - j-1)
Actual
cost

Change in potential
(i.e., total credit added or consumed)

Potential Functions : Example


aj = cj + (j - j-1)
Actual
cost

Change in potential
(i.e., total credit added or consumed)

2n m if n m / 2
m / 2 n if n < m / 2

Amortized cost of push:


Without expansion: aj = cj + (j - j-1) = 1 + 2 = 3.
(contributes 2 units of potential)
With expansion: aj = cj + (j - j-1) = 1 + m + (-m) = 1.
(draws m units of potential to pay for expansion)

Amortized cost of pop:


Without contraction: aj = cj + (j - j-1) = 1 + 1 = 2.
(contributes 1 unit of potential)
With expansion: aj = cj + (j - j-1) = 1 + m/4 + (-m/4) = 1.
(draws m/4 units of potential to pay for contraction).

Amortized Running Times as


Upper Bounds
Recall: aj = cj + j - j-1
Actual
cost

Change in potential
(i.e., total credit added or consumed)

Over a sequence of k operations:


0

j aj = j (cj + j - j-1) = (j cj)+ k 0 j cj


Therefore, over any sequence of operations, the
total amortized running time gives us an upper
bound on the total actual running time (as we
expected!)

Designing Potential Functions


Potential function should reflect the amount of credit present
in a data structure.
Prior to an expensive operation, the potential should be high
enough to pay for the operation (and potential should
reduce accordingly after the operation completes).
For simple data structures, its easy to design a potential
function by summing up the total per-element credits we
would get from the accounting method.
Examples:
Length-n memory buffer: = 20(# elements in buffer).
Min-queue: = # elements in the incoming min-stack (on the left).
Incremental priority queue: = e key(e).

Designing and Using Potential Functions


Some potential functions look substantially more
complicated than the ones weve seen so far.
However, they still correspond to the total per-element
credit from the accounting method, so its easy to
construct a potential function if accounting works.

Although it is more or less equivalent to the


accounting method, potential functions give us a
widely-accepted formulaic means of performing
amortized analysis.
State potential function.
Show that its zero initially and always nonnegative.
Then use aj = cj + j - j-1 to compute the amortized
running time of each operation.

Lecture 5. Priority Queues, Binary Heaps

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Priority Queues
In a simple FIFO queue, elements exit in the
same order as they enter.
In a priority queue, the element with highest
priority is always the first to exit.
Many uses:
Scheduling: Manage a set of tasks, where you always
perform the highest-priority task next.
Sorting: Insert n elements into a priority queue and
they will emerge in sorted order.
Complex Algorithms: For example, Dijkstras shortest
path algorithm is built on top of a priority queue.

Priority Queues
All priority queues support:
Insert(e, k) : Insert a new element e with key k.
Remove-Min : Remove and return the element with minimum key.

In practice (mostly due to Dijsktras


algorithm), many support:
Decrease-Key(e, k) : Given a pointer to element e within the heap,
reduce es key by k.

Some priority queues also support:


Increase-key(e, k) : Increase es key by k.
Delete(e) : Remove e from the structure.
Find-min : Return a pointer to the element with minimum key.

Redundancies Among Operations


Given insert and delete, we can implement
increase-key and decrease-key.
Given decrease-key and remove-min, we
can implement delete.
Given find-min and delete, we can
implement remove-min.
Given insert and remove-min, we can
implement find-min.

Storing Elements Outside a Heap


Most priority queues lack the ability to find an
element quickly given its key.
Hence, if we need to implement decrease-key,
increase-key, or delete:
these operations require a pointer to an elements
record within the heap data structure.
Elements themselves are typically stored outside the
data structure.
Each record in the data structure maintains a pointer to
the actual location of the element it represents, and
each physical element maintains a complementary
pointer to its record in the data structure.
This is a common approach in many data structures.

Storing Elements Outside a Heap


Priority Queue Data Structure:

Actual Elements of Data:


(say, stored in an array)
e1

e2

e3
e4

e5

e6
e7
6

Each record in the data structure keeps a pointer to the physical element of data
it represents, and each element of data maintains a pointer to its corresponding
record in the data structure. It is easy to keep these pointers up to date.

Priority Queue Implementations


There are many simple ways to implement a
priority queue:
insert

remove-min

Unsorted array or linked list

O(1)

O(n)

Sorted array or linked list

O(n)

O(1)

O(log n)

O(log n)

Binary heap (todays lecture)

Note that in the comparison-based model,


either insert or remove-min must run in
(log n) time. Otherwise we could sort
faster than O(n log n).

The Binary Heap


An almost-complete binary tree (all levels full except the
last, which is filled from the left side up to some point).
Satisfies the heap property: for every element e,
key(parent(e)) key(e).
Minimum element always resides at root.

Physically stored in an array A[1..n].


Easy to move around the array in a treelike fashion:
Parent(i) = floor(i/2).
Left-child(i) = 2i
Right-child(i) = 2i + 1.
1

A:

2
5
8

10

2 5 3 9 8 6 5 13 14 10

Actual array representation in memory

9
13 14

10

Mental picture as a tree

The Binary Heap


You can also build a max heap where the
maximum element resides at the root (if large key
value corresponds to high priority).
The heap property is common to many different
data structures well see (e.g., Cartesian trees).
Binary heaps are popular due to their simplicity of
implementation:
Elements stored in nothing more than an n-element
array, even though they represent a more sophisticated
tree
The almost-complete property of the tree is crucial for
allowing this with most other trees, we need to
maintain pointers from elements to their parents and
children.

Implementing Operations Using


Sift-Up and Sift-Down
All binary heap operations are built from the two
fundamental operations sift-up and sift-down:
sift-up(i) : Repeatedly swap element A[i] with its parent
as long as A[i] violates the heap property with respect to
its parent (i.e., as long as A[i] < A[parent(i)]).
sift-down(i) : As long as A[i] violates the heap property
with one of its children, swap A[i] with its smallest child.

Both operations run in O(log n) time since the


height of an n-element heap is O(log n).
In CLRS and elsewhere, sift-down is often known
as heapify.

Implementing Operations Using


Sift-Up and Sift-Down
The remaining operations are now easy to
implement in terms of sift-up and sift-down:

insert : place new element in A[n+1], then sift-up(n+1).


remove-min : swap A[n] and A[1], then sift-down(1).
decrease-key(i, k) : decrease A[i] by k, then sift-up(i).
increase-key(i, k) : incrase A[i] by k, then sift-down(i).
delete(i) : swap A[i] with A[n], then sift-up(i), sift-down(i).

All of these clearly run in O(log n) time.


General idea: modify the heap, then fix any
violation of the heap property with one or two calls
to sift-up or sift-down.

Building a Binary Heap


We could build a binary heap in O(n log n)
time using n successive calls to insert.
Interesting note: if elements are inserted in
random order, then the running drops to O(n)
with high probability!

Another way to build a heap: start with our n


elements in arbitrary order in A[1..n], then
call sift-down(i) for each i from n down to 1.
Remarkable fact #1: this builds a valid heap!
Remarkable fact #2: this runs in only O(n) time!

Bottom-Up Binary Heap Construction


The key property of sift-down is that it fixes an
isolated violation of the heap property at the root:
Possible heap
property
violations

Valid
Heap

Valid
Heap

Valid
Heap

Using induction, it is now easy to prove that our


bottom-up construction yields a valid heap.

Bottom-Up Binary Heap Construction


To analyze the running time of bottom-up
construction, note that:
At most n elements reside in the bottom level of the heap. Only 1
unit of work done to them by sift-down.
At most n/2 elements reside in the 2nd lowest level, and at most 2
units of work are done to each of them.
At most n/4 elements reside in the 3rd lowest level, and at most 3
units of work are done to them.

So total time T = n + 2(n/2) + 3(n/4) + 4(n/8) +


(for simplicity, we carry the sum out to infinity, as
this will certainly give us an upper bound).
Claim: T = 4n = O(n)

Shifting Technique for Infinite Sums

T
T/2
T/2

= n + 2(n/2) + 3(n/4) + 4(n/8) +


=
n/2 + 2(n/4) + 3(n/8) +
= n + n/2 + n/4 + n/8 +

Applying the same trick again:


T
= 2n + n + (n/2) + (n/4) +
- T/2 =
n + (n/2) + (n/4) +
T/2 = 2n

Laziness and Amortization:


Heaps of Heaps
Given that we can build a heap (a batch insert of n
elements) in only O(n) time, perhaps we can
reduce the amortized running time of insert to
O(1)
Idea: be lazy!
Maintain an unordered list of inserted elements.
When remove-min is called, build a heap on this list.
We therefore end up with a collection of low-level
heaps, one for every sequence of insert operations
between successive calls to remove-min.
Maintain a top-level heap on the roots of all of these
heaps.

Laziness and Amortization:


Heaps of Heaps
Top-level heap
(contains an entry
for the root of each
low-level heap):

Low-level
heaps:

Unordered list of elements


inserted since last remove-min:

Built into a
low-level
heap when
remove-min
called.

Heapsort
Any priority queue can be used to sort. Just use n
inserts followed by n remove-mins.
The binary heap gives us a particularly nice way
to sort in O(n log n) time, known as heapsort:

Start with an array A[1..n] of elements to sort.


Build a heap (bottom up) on A in O(n) time.
Call remove-min n times.
Afterwards, A will end up reverse-sorted (it would be
forward-sorted if we had started with a max heap)

Heapsort compares favorably to


Merge sort, because heapsort runs in place.
Randomized quicksort, because heapsort is
deterministic.

Lecture 6. Mergeable Heaps

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Mergeable Heaps : Motivation


A mergeable (meldable) heap supports all
fundamental priority queue operations as well as:
merge(h1, h2) : merge heaps h1 and h2 into a single heap.
(the individual heaps h1 and h2 are destroyed in the process).

The binary heap doesnt seem to support merge


any faster than in O(n) time.
Why study mergeable heaps?
They illustrate some elegant data structure design
techniques.
Mergeable heaps are the first step along the road to
more powerful comparison-based priority queues. For
example, well soon see how a Fibonacci heap can
perform decrease-key in only O(1) amortized time,
thereby speeding up Dijkstras shortest path algorithm!

Heap-Ordered Trees
Suppose we store our priority queue in
a heap-ordered binary tree.
Not necessarily an almost complete tree though, so
we cant encode it succinctly in a single array as with
the binary heap.
Each node maintains a pointer to
its parent, left child, and right child.
2
The tree is not necessarily
5
balanced. It could conceivably
8
6
7
be nothing more than a single
10
sorted path.

All You Need is Merge


Suppose we can merge two heapordered trees in O(log n) time.
All priority queue operations now easy
to implement in O(log n) time!
insert: merge with a new 1-element tree.
remove-min: remove root, merge left & right subtrees.
decrease-key(e): detach es subtree, decrease es key,
then merge it back into the main tree.
delete: use decrease-key + remove-min.
increase-key: use delete + insert.

Merging Two Heap-Ordered Trees


(Recursive Viewpoint)
Take two heap-ordered trees h1 and h2, where h1 has the
smaller root.
Clearly, h1s root must become the root of the merged tree.
To complete the merge, recursively merge h2 into either
the left or right subtree of h1:
h1: 2

h2:

Recursively
merge into
either L or R.

As a base case, the process ends when we merge a heap


h1 with an empty heap, the result being just h1.

Merging Two Heap-Ordered Trees


(Null Path Merging Viewpoint)
A null path is a path from the root of a tree down to an
empty space at the bottom of the tree.
Given specific null paths in h1 and h2, its easy to merge h1
and h2 along these paths.
The keys along a null path are a sorted sequence.
Merging along null paths is like merging two sorted sequences.
This process is also equivalent to the recursive merging process
from the previous slide.
h1:

h2:

3
8

6
8

13

10

Merging Two Heap-Ordered Trees


(Null Path Merging Viewpoint)
h1:

3
8

merge(h1, h2):

7
8

6
6

10

6
8

10

h2:

13

13

Running Time Analysis


The time required to merge two heaps
along null paths is proportional to the
combined lengths of these paths.
So all we need is a method to find short
null paths and we will have an efficient
merging algorithm.
Note that every n-node binary tree has a
null path of length O(log n).
There are many ways to find short null
paths, each of which leads us to a different
mergeable heap data structure

The Randomized Mergeable Binary Heap


Perhaps the simplest possible idea: choose null
paths at random!
I.e., starting from root, repeatedly step left or right, each
with probability .

In terms of our recursive outlook for merging h1


and h2 (h1 having the smaller root) this
corresponds to the following simple procedure:
Flip a fair coin.
If heads, recursively merge h2 into h1s left subtree.
If tails, recursively merge h2 into h1s right subtree.

Remarkably, this trivial procedure merges any two


heaps in O(log n) time with high probability!

Definition : With High Probability


We say an algorithm with input size n runs in
O(log n) time with high probability if, for any
constant c, we can find another constant k such
that
Pr[running time > k log n] 1 / nc.
That is, the probability we fail to run in O(log n)
time is at most 1 / nc, for any constant c of our
choosing (as long as we choose a sufficiently
large hidden constant in the O(log n) notation).
Well discuss and motivate this definition in
more detail later in the course.

Reducing Problem Size by


a Constant Fraction per Iteration
Suppose we have an algorithm for which:
We start with a problem of size n.
In every iteration, the effective size of the
problem is reduced to a constant fraction of its
original size.

Then, its easy to see that our algorithm


must perform only O(log n) iterations.
Prototypical example: binary search.

The Randomized Reduction Lemma


Suppose we have an algorithm for which:
We start with a problem of size n.
In every iteration, the effective size of the
problem is reduced to a constant fraction of its
original size with some constant probability.

Then, our algorithm performs only O(log n)


iterations with high probability.
We call this the randomized reduction
lemma, and it will be one of the main tools
we use for analyzing randomized algorithms
and data structures.

Applying the Randomized


Reduction Lemma
Suppose we build a null path in an arbitrary heapordered binary tree by walking downward at
random from the root.
In each step (say, at element e), one of es left or
right subtrees must contain fewer than of the
total number of elements in es subtree.
So with probability , each step reduces the number of
nodes in our current subtree by a factor of at least 2.
Hence, a randomly-built null path has length O(log n)
with high probability.

On a randomized mergeable binary heap, merge


and all other priority queue operations therefore
run in O(log n) time with high probability.

Size-Augmented Mergeable Heaps


Lets try to remove the randomness from
our preceding approach
Augment each element with the number of elements in its subtree.
Now we can easily find a null path of length O(log n) by repeatedly
stepping to whichever child has a smaller size.
Each step reduces the size of our current subtree by at least a
factor of 2.
2

10

Updating Augmented Information


After Merging
When we merge two heaps, we walk back up the
merge path and update augmented information.
7

10

5
2

13

10

6
5

7
3

10

13

6
1

13

Trouble with Decrease-Key


Recall: decrease-key(e) removes the subtree
rooted at element e, decreases es key, and then
merges the result back into the main tree.
When we remove es subtree, we need to update
the augmented size information along the path
-p
from the root down to e.
-p
-p
However, if e is deep in the
-p
-p
tree, this can be too expensive!
-p
ed
ov
This also affects delete and
m
re
e
increase-key, since these are
p
elements
built using decrease-key.

Augmenting with Null Path Lengths


The null path length of element e, npl(e), is the
shortest distance from e down to an empty space
at the bottom of of es subtree.
Suppose we augment every element e in our
heap with npl(e).
Since npl(root) = O(log n), we can find a null path
2
of length O(log n) by repeatedly
2
2
1
stepping to a child with the
5
3
smaller null path length.
1
1
1
8
6
7
1
This allows us to merge
10
in O(log n) time.

Updating Augmented Information


Just as with size-augmented mergeable heaps:
We can update null path length information after a
merge by walking back up the merge path.
The decrease-key(e) operation hard to accommodate
quickly if e resides deep within the tree. Same goes for
increase-key and delete.

So both size-augmented and npl-augmented


heaps support insert, remove-min, and merge in
O(log n) time in the worst case, but the other
operations are more difficult.

Leftist Heaps
The leftist property says that npl(left(e)) npl(right(e)) for every
element e in our tree.
A leftist heap is a heap-ordered leftist tree. Each element in a leftist
heap is augmented with its null path length.
The shortest null path in a leftist tree is its right spine.
And the right spine always has length O(log n).

So we can merge two leftist heaps in O(log n) time by merging their


right spines together:
merge(h , h ): 2
1

h1:

4
2

h2:

6
11

4
7

7
11

Restoring the Leftist Property


After merging, we walk back up the right spine of the
merged heap
recalculating npl(e) for each element e.
and swapping the left and right subtrees of any element that
now violates the leftist property.

Therefore, merge (hence insert and remove-min) all take


O(log n) time on a leftist heap.
The same problem exists with decrease-key, delete, and
increase-key as with size-augmented and npl-augmented
heaps though.
In fact, the leftist heap is really nothing more than an nplaugmented heap where we always treat the child of
smaller npl as the left child.

10

Skew Heaps
A relaxed variant of leftist heaps.
O(log n) amortized performance for all
operations.
Skew heaps and their relatives, splay trees,
are called self-adjusting data structures
since they manage to adjust their structure
according to simple local update rules that
do not require us to maintain any
augmented information (subtree sizes, null
path lengths, etc.) within the tree!

Skew Heaps
Merge along right spines, just as with leftist
heaps.
After merge, walk up the right spine of the
merged heap and swap the left and right
subtrees of all nodes (except the lowest).
Remarkably, this keeps the tree sufficiently
leftist that merge requires only O(log n)
amortized time.
All other priority queue operations, including
decrease-key, delete, and increase-key, run
in O(log n) amortized time on a skew heap.

11

Skew Heaps : Amortized Analysis


We call an element right-heavy if its right
subtree contains more elements than its left
subtree. Otherwise, it is left-heavy.
The right spine of a binary tree can contain
at most O(log n) left-heavy elements.
When walking down the right spine, every time
we pass a left-heavy element our current
subtree size drops to of its current size.

All right-heavy elements will have an extra


$1 credit on them.

Skew Heaps : Amortized Analysis


Consider some merge operation.
Let L denote the length of the right spine of our
merged heap (so the actual running time of the
merge operation was O(L)).
As we walk up the right spine swapping children,
Right-heavy elements are free since we can use their
$1 credits (these elements become left-heavy after
swapping).
Left-heavy elements cost us 2 units each (since they
become right-heavy after swapping and hence need a
new $1 credit attached to them). However, there are at
most O(log n) of these on the right spine!

12

Lecture 7. Binomial Heaps

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Review : Null Path Mergeable Heaps


Many ways to merge binary heaps along null
paths in an efficient fashion:

Randomized mergeable binary heaps


Size-augmented mergeable binary heaps
Npl-augmented mergeable binary heaps
Leftist heaps
Skew heaps

Of these, the randomized and skew variants are


our favorites, since they dont require us to
maintain any augmented information.
And they also easily accommodate decrease-key,
delete, and increase-key in an efficient fashion.

Binomial Heaps : Motivation


Whats missing?
Skew heaps have O(log n) amortized performance.
Randomized mergeable binary heaps have O(log n)
performance with high probability.

In this lecture, well discuss an elegant mergeable


priority queue (not based on null path merging)
called a binomial heap that can perform all
operations in O(log n) worst-case time.
Later, well extend the binomial heap to a
Fibonacci heap, which supports decrease-key in
only O(1) amortized time.

The Binomial Heap : Tree Shapes


A binomial heap is built from a collection of
heap-ordered trees (not necessarily binary)
that come in particular shapes:

Rank = 3

Rank = 2

Rank = 1

Rank = 0

5
8

8
7

10

10

10

Building a Tree of Rank k From


Trees of Smaller Ranks
Take one tree each of rank 0 k 1. Make
them all children under a common root:
Rank 3

Rank 2

Rank 1

Rank 0

Take two trees of rank k 1 and make one a


child of the root of the other:
Rank 3

Rank 2

Rank 2

The Binomial Heap : Tree Properties


A tree of rank k:

Has a root with k children.


Has height k.
Contains exactly 2k elements.
Contains (kd) elements (that is, k choose d elements)
at depth d. This is how binomial heaps get their name,
since (kd) = k!/(k-d)!d! is the coefficient of xd in the
expansion of the binomial (1 + x)k.

Trees are stored by maintaining along with each


element a pointer to its parent, first and/or last
child, and previous / next siblings (so the children
of an element are connected by a linked list).

The Binomial Heap


A binomial heap on n elements consists of a
linked list of trees, at most one of each rank:

5
8

4
8

10

1
3

11

Which trees are present in this list corresponds


precisely to the binary representation of n.
In this case n = 13 = 11012.

The Binomial Heap


Since a rank-k tree contains 2k elements, log n is
the maximum rank of any tree in an n-element
binomial heap
If there was a tree of rank higher than log n, then this
would contain more than n elements!

So our list of trees contains at most log n entries.


This agrees with the fact that the binary representation
of n consists of log n digits).

Since our trees are all heap-ordered, we


can find the minimum element by scanning
the root list in O(log n) time.

All You Need is Merge


Suppose we can merge two binomial heaps in
O(log n) time.
Now its easy to implement both insert and
remove-min in O(log n) time each!
insert : Merge with a new 1-element binomial heap.
remove-min: Find the minimum element e by scanning the root
list. Delete e, and merge the binomial heap comprised of es
children back into the main heap:
Minimum element

m
e, then
Remov

erge ba

ck in.

A binomial heap!

Remaining Operations
For decrease-key, we use sift-up.
Since height = rank, all trees have height at most O(log n).
Hence, sift-up (and therefore decrease-key) runs in O(log n) time.
Note: sift-up doesnt change the shape of our trees.
Also note: sift-up isnt affected by the fact that our trees arent
binary (sift-down would run slower in this case).

For delete, use decrease-key + remove-min.


For increase-key, use delete + insert.
Once again, all we need to do is implement
merge in O(log n) time

Merge and Binary Addition


Consider merging two binomial heaps h1 and h2 of sizes
n1 and n2 to obtain a heap of size n = n1 + n2.
Recall: the configuration of trees in an n-element
binomial heap reflects the binary representation of n.
The merge operation therefore corresponds to the
process of adding the binary representations of n1 and n2
to obtain the binary representation of n:
1

n1 = 5 = 1012

n2 = 6 = 1102
1

n = 11 = 10112

Merge and Binary Addition


Scan the root lists from right to left simultaneously in both
input heaps, moving trees into the merged heap.
If we encounter a rank-k tree in both of the input heaps,
we join the two together in O(1) time into a rank-(k+1)
tree (just like a carry operation in binary addition).
Total running time: O(log n).
1

n1 = 5 = 1012

n2 = 6 = 1102
carry

join
1

n = 11 = 10112

Building a Binomial Heap


Like a binary heap, we can build a binomial heap
in only O(n) time.
Appropriately implemented, the insert operation
in a binomial heap takes O(1) amortized time.
Its just like incrementing a binary counter (from the
first homework assignment)!

Next Lecture: improving the Binomial heap to a


Fibonacci heap, supporting insert, merge, and
decrease-key all in O(1) amortized time, and all
remaining operations in O(log n) amortized time.

Lecture 8. Fibonacci Heaps

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Starting Point: The Binomial Heap


A binomial heap on n elements consists of a
linked list of trees, at most one of each rank:
Rank 3

Rank 2

5
8

8
7

10

Rank 1
(not present)

Rank 0
3

11

The root list contains at most O(log n) trees,


since the maximum possible rank is log n.
Recall that a tree of rank k contains 2k elements.

Insert and Merge in O(1) Amortized Time


Suppose we allow multiple trees of the same rank
in the root list.
This allows us to insert and merge in O(1) time:
insert : add a new rank-0 tree to the root list.
merge : link together the root lists of two heaps.

When we call remove-min, we first clean up the


root list using a consolidate procedure that joins
equal-rank trees together until at most one tree of
each rank remains.
Afterwards, the root list contains O(log n) trees, and
the rest of remove-min takes O(log n) time.

Consolidate
Let R denote the number of roots in the root list.
Create an array of log n buckets B[0] B[log n].
In O(R) time, scan the root list and deposit each
tree of rank k into bucket B[k].
Afterwards, B[k] contains a list of all trees of rank k.

In O(R + log n) time:


For k = 1 to log n,
While bucket B[k] contains 2 trees,
Remove 2 trees from B[k], join them to form a rank-(k+1)
tree, and deposit this new tree in B[k+1].

Finally, in O(log n) time, form a new root list using


the contents of B[0] B[log n].

Total time: O(R + log n).

Consolidate : Amortized Analysis


Consolidate takes O(R + log n) time to clean up a
root list with R entries.
Lets make sure we have a $1 credit attached to
each root.
Use these credits to pay off the O(R) part of
consolidates running time.
Consolidate needs to place $1 of new credit on the
roots after its finished, but this only costs an extra
O(log n).
Insert charged an extra $1 as well.

So consolidate runs in O(log n) amortized time.


Therefore, so does remove-min.

Summary of our Current Progress


Our relaxed binomial heap now has the
following performance:

insert, merge : O(1) amortized


remove-min : O(log n) amortized
decrease-key : O(log n) worst-case (using sift-up)
delete, increase-key : O(log n) amortized

Next: decrease-key in O(1) amortized time.


delete, and increase-key will stay O(log n) amortized
since they involve a call to remove-min.

In the future: any operation that changes the root


list needs to be careful to ensure that $1 credit
remains attached to all roots.

A New Idea for Decrease-Key


decrease-key(e) :
Detach es subtree from parent.
Insert in root list (pay $1 extra to add the necessary
credit to e, since now its a root).
Decrease es key.
$1
e: 5
8

X
7

4
8

10

1
8

11

decrease es key from 5 to 1

This only takes O(1) amortized time. Are we


missing something?

Ramifications of Removing Subtrees


With the binomial heap, trees come in very
specific shapes, one for each rank.
If we allow removals of arbitrary subtrees, we
might end up with trees of all shapes and sizes.
So we need to define rank more carefully now. Well
say the rank of a tree is the # of children of its root.

Previously:
A tree of rank k
contains exactly
2k elements

Maximum
possible
rank = log2 n.

Root list
contains
at most
O(log n)
trees.

Remove-min
takes O(log n)
amortized
time.

No longer true if we allow subtree removals!

How to Fix Things


Well make sure that we dont remove too much
mass from any tree due to decrease-key.
As a result, well be able to show that a rank-k
tree must contain at least k elements, where
= (1 + 5) / 2 1.618 is the so-called golden
ratio.
Therefore:
A tree of rank k
contains exactly
k elements

Maximum
possible
rank = log n.

Root list
contains
at most
O(log n)
trees.

Remove-min
takes O(log n)
amortized
time.

Cascading Cuts
We allow at most one child to be removed from
every non-root element.
Let e be some non-root element. If a call to
decrease-key removes one of es children, we
mark e to remember that it has lost one child.
We also pay an extra $2 to leave on e, so every
marked element will have $2 credit on it.
$1

$1

$2
e:
= Marked

Cascading Cuts
If a call to decrease-key removes a child from a
marked element e, we then detach e as well and
insert it into the root list.
Since e will become a root, we remove its mark.
Leave $1 credit on e, since every root needs $1.
We also need to mark es former parent, since it has
lost a child
2
$1

$1

$1

$2
$2
e:
1

= Marked

Cascading Cuts
If a call to decrease-key removes a child from a
marked element e, we then detach e as well and
insert it into the root list.
Since e will become a root, we remove its mark.
Leave the $1 credit on e, since every root needs $1.
We also need to mark es former parent, since it has
lost a child. If es former parent was already marked,
however, we detach it as well, and then mark / detach
es former grandparent, and so on.
The resulting series of subtree detachments is called a
cascading cut.

Cascading Cuts : Amortized Analysis


A cascading cut of length L costs 2L + 2 units:
L units to actually perform the cut.
L units of credit to leave on the L newly-added roots ($1 per root).
$2 units of credit to leave on the marked element (if any) at the
endpoint of the cascading cut.

Fortunately, the cascading cut gets rid of L marks,


allowing us to draw $2L units of credit to pay for the cost
of the cut.
The amortized cost of a cascading cut is therefore just 2
units, which we simply charge to the decrease-key
operation initiating the cut.
Using potential functions:
= (# roots) + 2(# marked elements).

Back to Subtree Sizes


With cascading cuts, we finally have an O(1)
amortized running time for decrease-key.
In addition, each non-root element can have at
most one child removed.
Let Sk denote the minimum possible number of
elements in a tree of rank k.
We claimed earlier that Sk k.
We now prove this by showing that Sk Fk+2,
where Fk is the kth Fibonacci number.
It is well known that Fk+2 k .

Fibonacci Numbers
Recursively defined:
F0 = 0
F1 = 1
Fk = Fk-1 + Fk-2, for k 2.

First few terms: 0, 1, 1, 2, 3, 5, 8, 13,


Another useful recursive characterization:
Fk+2 = 1 + F0 + F1 + F2 + + Fk for k 0.
This is easy to prove by induction on k:
Fk+2 = Fk + Fk+1 = Fk + (1 + F0 + F1 + + Fk-1)

Properties of Subtree Sizes


Sk : minimum # of elements in a rank-k tree.
Consider a tree of rank k with root e.
Let x1, x2, , xk denote the k current children of e, in the
order in which they were linked to e.
Claim #1: rank(xj) j 2.
Proof:
rank(xj) = rank(e) j 1 right before xj became es child.
Since that time, xj might have lost 1 child.

Claim #2: Sk 2 + S0 + S1 + + Sk-2.


Proof:
Let size(xj) denote the number of elements currently in xjs subtree.
According to Claim #1, size(x1) 1, size(x2) S0, size(x3) S1, etc.
In es subtree, we thus have 1 + 1 + S0 + S1 + + Sk-2 elements.
e itself

x1s subtree

x2s subtree

Properties of Subtree Sizes


Fk : kth Fibonacci number.
Fk+2 = 1 + F0 + F1 + F2 + + Fk.

Sk : minimum # of elements in a rank-k tree.


Sk 2 + S0 + S1 + + Sk-2.
Claim: Sk Fk+2
Proof: by induction on k:
Sk 2 + S0 + S1 + + Sk-2
2 + F2 + F3 + + Fk
= 1 + F0 + F1 + F2 + F3 + + Fk = Fk+2.

Lecture 9. Binary Search Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Dictionary Data Structures


Last week we saw many data structures for
implementing priority queues.
In the next several lecture we turn our attention to
a new class of data structures: dictionaries.
A dictionary maintains a set of elements with
associated keys, supporting the operations:
find(k) : return a pointer to the element having key k, or
indicates that such an element does not exist.
insert(e, k) : insert a new element e with key k.
delete(e) : delete element e, given a pointer to e.
To delete an element given its key: delete(find(k)).

Dictionary Data Structures


Dictionaries are probably the most common
type of data structure found in practice.
Any time we want to maintain a collection
of data records in a fashion that supports
fast searching, we need a dictionary.
E.g., student records (key = name or ID #)
E.g., library catalog (key = author, ISBN, etc.)

If we want the ability to search on several


different keys, we can cross-reference our
data in several dictionary data structures
simultaneously.

Binary Search Trees


Today we introduce the binary search tree
(BST), a remarkably versatile data structure
that can efficiently implement all the
fundamental dictionary operations, and
much much more
In a few weeks, we will also see several
relatives in the BST, including skip lists
and B-trees. Well also learn about hash
tables, another popular alternative for
implementing a dictionary.

Binary Search Trees


A BST is a binary tree satisfying the binary
search tree property:
k

Elements
with keys k

Elements
with keys > k

Each node typically maintains a pointer to


its parent, left child, and right child.

Example
17
4

20

-1

25

12

11

21

15

23

Fundamental Operations
Most BST operations can be implemented in a simple
recursive fashion. For example:
Find(T, k):
if T == NULL, then return NULL.
if k == T.key, then return T.
if k < T.key, then return Find(T.left, k).
else return Find(T.right, k).
Insert(T, e, k):
if T == NULL, then return NewNode(e, k).
if k < T.key, then T.left = Insert(T.left, e, k).
else T.right = Insert(T.right, e, k).
return T.

Fundamental Operations
Lets postpone discussion of delete for a
moment
All fundamental BST operations run in O(h)
time, where h is the height of the tree.
We say a BST is balanced if h = O(log n),
so operations on a balanced BST run in
O(log n) time.
Well discuss techniques for maintaining
balance starting with the next lecture.

Tree Traversals
We often enumerate the contents of an n-element BST in
O(n) time using an inorder tree traversal:
Inorder(T):
if T == NULL, then return.
Inorder(T.left).
print T.key.
Inorder(T.right).
Other types of common traversals:
Preorder: root, left subtree, right subtree.
Postorder: left subtree, right subtree, root.
Eulerian: root, left subtree, root, right subtree, root.
(sequence of nodes encountered when we walk around a BST)

Sorting with a BST


An inorder traversal prints the contents of
an n-element BST in sorted order in O(n)
time.
Therefore, we can use BSTs to sort: insert n
elements, then do an inorder traversal.
Since the BST is a comparison-based data
structure, this means insert must run in
(log n) time in the worst case; otherwise
we could sort faster than O(n log n) in the
comparison model.

Min and Max


Its easy to locate the minimum and
maximum elements in a BST in O(h) time.
For the minimum, start at the root and walk
left as long as possible:
minimum

Vice-versa for the maximum

Predecessor and Successor


The pred(e) operation takes a pointer to e
and locates the element immediately
preceding e in an inorder traversal.
If e has a left child, then pred(e) is the
maximum element in es left subtree:
e
pred(e)

Otherwise, how do we find pred(e)?

Predecessor and Successor


If e has no left child, then pred(e) is the first left
parent we encounter when walking from e up to
the root:
pred(e)

The successor operation, succ(e), is analogous


and completely symmetric.
Both pred and succ take O(h) time.

Delete
Its easy to delete an element with zero or
one children.
To delete an element e with two children,
first swap e with either pred(e) or succ(e),
then proceed to delete e.
Note that if e has two children, then pred(e) and
succ(e) can both have at most one child.
Also note that replacing e with pred(e) or
succ(e) is o.k. according to the BST property.

Using min combined with delete, we can


implement a priority queue using a BST.

Using pred and succ for Inexact Searches


In addition to pred(e) and succ(e) that take
pointers to elements, we could implement:
pred(k) : find the element with largest key k.
succ(k) : find the element with smallest key k.

This allows us to perform inexact searches: if an


element with key k isnt present, we can at least
find the closest nearby elements in the BST.
This is one reason a BST is so powerful. Other
dictionary data structures, notably hash tables,
dont support pred / succ and cannot perform
inexact searches.

Paging Through Nearby Elements


Starting from any element e, we can use
pred / succ to enumerate elements near e
very quickly.
Prototypical application: library catalog.
After searching for an authors name, you are
presented with a alphabetized list in which you
can scroll through nearby names.

Can also answer range queries: output all


elements whose keys are in the range [a, b].
Start at the element e = succ(a). Then
repeatedly call succ(e) until you reach b.

Amortized Behavior of pred and succ


pred and succ each take O(h) time in the worst
case, which could be as bad as O(n) in a very
unbalanced tree.
However, if we start at the minimum element in a
BST and repeatedly call succ to enumerate the
contents of the tree in sorted order, this takes only
O(n) total time.
So O(1) amortized time per call to succ.
This follows the exact same path as an in-order
traversal!

If we call succ to enumerate all k elements in the


range [a, b], this takes O(h + k) time, so its O(1)
amortized plus a fixed cost of O(h).
Why O(h + k)?

Augmenting a BST
We often extend the power of a data structure by
augmenting it with extra information.
Allows for more powerful query operations.
But be careful that update operations can still maintain
augmented data in an efficient fashion.

Example: augment each element with a pointer to


its predecessor, successor, subtree min and max.
pred/succ, and min/max now take O(1) worst-case time.
Its also easy to update insert and delete so they
maintain our augmented data without harming their
O(h) worst-case running times.

Augmenting with Subtree Sizes


A very common type of augmentation is to
have each element maintain the total # of
13
elements in its subtree:
17
4

20

25

8
1

-1

12

21

11

23

15

Augmenting with Subtree Sizes


Its easy to keep this information up to date
when new elements are inserted.
17

13 + 1

8+1

20

5+1

25

1+1

12

-1
1

11

21
1

15

23

10

Augmenting with Subtree Sizes


Its easy to keep this information up to date
when new elements are inserted
14 - 1
17
and deleted.
9-1

20

6-1

25

3-1

-1
1

12

21
1

11

15

23

Augmenting with Subtree Sizes


A BST augmented with subtree sizes can
support two additional useful operations:
select(k) : return a pointer to the kth largest
element in the tree (k = 1 is the min, k = n is the
max, and k = n/2 is the median).
rank(e) : given a pointer to e, returns the
number of elements with keys es key.
(that is, es index within an inorder traversal).

How can we implement both of these


operations in O(h) worst-case time?

11

Select and rank


Select(T, k):
r = size(T.left) + 1.
(r = rank of root)
if k = r, then return T.
if k < r, then return Select(T.left, k).
else, return Select(T.right, k r).
Rank(e):
Add up the total # of yellow elements
as we walk from e up to the root.
(these are all the elements less than e)

Order Statistic Trees


The element of rank k in a set is also called
the kth order statistic of the set.
Accordingly, the CLRS textbook calls a BST
augmented this way an order statistic tree.
This brings us to one of the other main uses
of a BST: implementing a dynamic
sequence

12

Dynamic Sequences
Suppose we use an array or linked list to encode
a sequence of n elements.
We can insert/delete at the endpoints of the sequence
in O(1) time,
But insertions/deletions in the middle of the sequence
take O(n) worst-case time.

Using a balanced BST augmented with subtree


sizes, we can perform all the following operations
in O(log n) time:
Insert / delete anywhere in the sequence.
Access or modify any element by its index in the
sequence.

Dynamic Sequences
The BST property is slightly different when we
use a BST to encode a sequence A[1..n]:
A[i]

A[1..i-1]
(all elements
preceding A[i])

A[i+1..n]
(all elements
following A[i])

Elements no longer have keys, and we no longer


use find. Rather, we rely on the select to access
elements based their index within the sequence.

13

Dynamic Sequences : Example


13

Sequence: 3141592653589

1
1

5
1

2
1

5
1

Dynamic Sequences
Note that an inorder traversal prints the contents
of our sequence in order, and succ and pred allow
us to move between successive elements (as in a
linked list).
You might want to think of a BST as a structure
that fundamentally encodes an arbitrary sequence
from left to right. In the case of a dictionary, this
sequence happens to be the sorted ordering of
the elements we are storing.
Next lecture: Balancing a BST

14

Lecture 10. Balanced Binary


Search Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Balancing a BST
A BST is balanced if its height h = O(log n).
(so all operations run in O(log n) time).
There are many ways to modify the insert and
delete operations of a BST to keep it balanced:
Worst-case mechanisms: h = O(log n) always.
AVL trees, BB[] trees, red-black trees

Amortized mechanisms: starting with an empty tree,


any sequence of m operations takes O(m log n) time.
splay trees, batch rebalancing methods

Randomized mechanisms: always balanced with high


probability. randomly-balanced BSTs, treaps

There are also several data structures that give


the same performance as a balanced BST but
are not binary (B-trees) or not trees (skip lists).

Worst-Case Balancing Mechanisms


Today well focus on worst-case methods that
always maintain balance, so every operation runs
in O(log n) time in the worst case.
All of these mechanisms work by storing some
augmented information in our tree, and using this
information after insert or delete to restructure the
tree in a more balanced fashion.
Well see three examples today:
AVL trees
Red-black trees
Bounded balance (BB[]) trees

Rotations
Almost all balancing mechanisms use
rotations to restructure a BST:
y
x
A

C
B

Right rotation

Left rotation

A
B

A single rotation takes O(1) time and


preserves the BST property.

Rotations : Preserving Augmented Data


We often augment the nodes in a binary
search tree with extra information.
For example, to use the rank / select operations or to maintain a
dynamic sequence, we augment nodes with subtree sizes.

We must take care to update any such


augmented information after a rotation.
General rule: If you can compute the augmented data at a node x
in O(1) time using only the augmented data at the left and right
child of x, then its easy update
y
x
augmented data after any
rotation in only O(1) time.
x
y
C
A
For most augmentations
A
B
B C
we consider, this is true.

AVL (Height-Balanced) Trees


AVL: Adelson-Velski, and Landis 62.
A binary tree is height balanced if:
Its left and right subtrees differ in height by at most 1, and
Its left and right subtrees are themselves height balanced.

Its easy to show that height-balanced trees are


balanced (proof on next slide).
An AVL tree is a height-balanced tree where
every node is augmented with the height of its
subtree.
Note: easy to keep this info up to date after a rotation.

After each insertion or deletion, we restore the


height balance property by performing O(log n)
carefully chosen rotations.

Height Balanced Balanced


Claim: A height-balanced tree of height h contains Fh
nodes, where Fh is the hth Fibonacci number.
Recall that Fh = (h), where = (1+5) / 2. Therefore,
an n-element height-balanced tree can have height at
most O(log n).
Easy proof by induction:

Consider any arbitrary height-balanced tree of height h.


Without loss of generality, suppose its left subtree is tallest.
By induction, left subtree contains Fh-1 elements.
By induction, right subree contains Fh-2 elements.
So total # of elements is 1 + Fh-1 + Fh-2 = 1 + Fh.
Fh-1
Dont forget the base cases:
elements

Fh-2
elements

Tree of height 0 contains 1 element (F0 = 0).


Tree of height 1 contains 2 element (F1 = 1).

Restoring the Height Balance Property


After an Insertion
After an insert, walk back up the insertion path and increment
augmented subtree heights.
Stop at the first node (z in the figure below) at which we encounter a
height-balancing violation (a height imbalance of exactly 2).
First node at which we find
a violation of the heightbalance property.

+1
+1

+1

Newly inserted
element

Using 1 or 2 rotations, we will be able to rebalance our tree at z.


The height of zs subtree will decrease by 1 to its original value, so we
dont need to visit the remaining nodes on the insertion path above z.

Restoring the Height Balance Property


After an Insertion
Lets suppose we have an imbalance at node z due to an
insertion that makes zs left subtree too tall (the same
argument will apply to the right subtree as well)
Easy case: Insertion into subtree A makes it too tall:
z

Right rotation

C
A

Newly-inserted element

Harder case: Insertion into subtree B makes it too tall

Restoring the Height Balance Property


After an Insertion
Harder case: Subtree B too tall. There are now 2 possible
subcases:
z

1.

x
y

BL

BR

2.

x
y

C
A

BL

BR

In all both cases, a left rotation about the edge (x, y) brings
us back to the previous case (with A being too tall).
So 2 total rotations needed for these cases.

Restoring the Height Balance Property


After an Deletion
If a new element is inserted, we walk up the tree
and perform at most 1 or 2 rotations at the first
point of imbalance. Then we stop.
For deletion, we need to continue walking upward
all the way to the root, performing 1 or 2 rotations
at every imbalanced node we encounter along the
deletion path (so O(log n) total rotations).
These rotations are quite similar in structure to the
cases weve seen for insert. We omit further
details.
Dont worry in this class you wont need to memorize
all the different cases! Its the high-level ideas that are
important.

Bounded-Balance (BB[]) Trees


A n-element binary tree is -balanced if:
Its left and right subtrees each contain n elements, and
Its left and right subtrees are themselves -balanced.

For any 0 < < 1/2, an -balanced tree with n


elements has height O(log n).
Very simple proof: Walk down the tree starting from the root. The ,
the number of elements in our current subtree shrinks to (1 )
times its original value in each step, so log1-a n steps possible.

Like the AVL tree, we can restore the -balance


property after insertion or deletion using at most
O(log n) rotations (if = 1 2 / 2 0.293).
Further details omitted

Red-Black Trees
Color each node of a tree red or black.
Attach dummy leaf nodes below each leaf element, so
all our elements are stored as internal nodes in the tree.
A BST is a red-black tree if:
The children of a red element are all black.
All leaves are black.
Every root-leaf path contains the
same number of black elements
(we call this the black height
of the tree).

As before, these properties guarantee that a tree is


balanced, and we can maintain them in O(log n) time after
any insert or delete.
Lots of messy special cases though!

Red-Black Property Balance


Claim: A red-black tree of black height h contains 2h
elements (so black height of an n-element tree log n).
Easy proof by induction:
Case 1. Root is red

2h
elements

2h
elements

Total # elements 1 + 2h+1.

Case 2. Root is black

2h-1
elements

2h-1
elements

Total # elements 1 + 2h.

Finally, the height of a red-black tree is at most twice its


black height, since every red element has black children.

Maintaining the Red-Black Property


After an Insertion
When we insert a new element e, we color it red.
If parent(e) happens to be black, wonderful! We dont
violate any of the red-black properties.
So lets assume parent(e) is red, violating the rule that red
elements must have black children.
If uncle(e) is red, then we can recolor and push the
violation higher in the tree:

Possible new
red-red violation

recolor
e

Maintaining the Red-Black Property


After an Insertion
Ok, so what if parent(e) is red but uncle(e) is black?
First, if e is not a left child of parent(e), make this so by performing a
left rotation with parent(e):
e

left rotation
e

Then do a right rotation and some recoloring:


y
x
e

right
rotation

x
e

x
recolor

y
z

y
z

Maintaining the Red-Black Property


Since our tree is balanced, it takes O(log n) extra time to
insert a new element (potentially O(log n) work recoloring
and up to 2 rotations).
Deletion involves even more special cases than insertion,
but also takes just O(log n) extra time (details omitted).
As you can see, it seems that a multitude of messy cases
is the price we pay for maintaining balance at all times (so
operations take O(log n) worst-case time).
In the next few lectures, well see several much simpler
balancing mechanisms (with far fewer or no special
cases) that give either O(log n) amortized performance, or
O(log n) performance with high probability.

Lecture 11. B-Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Summary
The following balancing mechanisms give us
O(log n) worst-case running times for all
fundamental BST operations:

AVL trees (height-balanced)


BB[] trees (size-balanced)
Red-black trees
B-trees today

B-trees are somewhat similar to the preceding


methods, except they arent binary trees.
Balancing a B-tree is fairly simple, and doesnt
involve so many special cases.

The B-Tree : Structure


Each node in a binary tree stores 1 key and each
non-leaf node has 1..2 children.
In a B-tree, each node stores B 1 2B 1
keys and has B 2B children:
2

13

The root is special, and has no lower limits. It


could store a single key.
B = 2 is a common special case called a 2-3-4
tree, since each node has 2, 3, or 4 children.

The B-Tree : Structure


All leaves have the same depth, so the total tree
height is O(logB n).
40

20

12

17

30

25

50

32

34

41

47

57

72

All operations on a B-tree (e.g., insert, delete,


find, pred, succ, min, max, rank, select) can be
easily implemented in O(B logB n) time.

Block Memory Transfers


Most memories are hierarchical in structure:
larger, but
slower

Cache
Main Memory
Disk

Between levels, data is often transferred in large


blocks (say, 1K at a time).
Often the true performance of a data structure is
determined by the number of blocks it accesses.
A B-tree is ideally suited for this setting (if we
know the block transfer size).

Finding an Element
Scan the root node sequentially to find the
appropriate child pointer, then recursively
search this child subtree.
2

13

O(B logB n) in the worst case.

If we assume B = O(1), this is O(log n).


If we have a model where all we care about is
memory accesses, then only O(logB n) time!

Other Operations
Its easy to implement pred, succ, min, and
max all in O(B logB n) time.
Similar to their BST counterparts.
Can support pred(e) / succ(e) as well as
pred(k) / succ(k), so can do inexact search and
page through nearby results, just like with a
BST.

If we augment nodes in a B-tree with


subtree sizes, we can also support rank
and select in O(B logB n) time.

Insert and Delete


Insertions and deletions always take place
in leaf nodes.
To ensure we delete from a leaf node, we
may need to swap first with our
predecessor or successor (just like with a
BST when we delete a node with 2
children).
The concern: insertion might make a node
too large, or deletion might make a node
too small.

Sharing With Your Siblings


Rotations allow us to donate or steal
elements from a sibling.
Left rotation

41

15

55

63

82 130

Right rotation

90 101 112

41

15

55

63 130

82

90 101 112

A rotation can be implemented in O(1) time,


although O(B) time is usually also ok.

Splitting a Node after Insertion


Insertion of a new element into a leaf node might give the
leaf node too many keys (2B of them).
If so, we can split the leaf and donate its median element
to the parent:
15

61

15

B = 3:

42

61

Split a node
19

23

24

42

2B keys

47

56

19

23
B keys

24

47

56

B 1 keys

This might give the parent too many elements, causing it


to split, and then the grandparent, etc.
We could have also donated to a sibling, if possible

Joining Two Nodes After Deletion


Deletion of an element might give a leaf node too few
keys (B 2 of them).
Try to fix this by stealing from a child.
If this fails, then we steal 1 element from our parent and
join with an adjacent sibling.
B = 4:

19

23

15

61

24

42

15

Join siblings

47

2B 2 keys

56

Split a node

19

23

42

61

24

B 1 keys

47

56

B 2 keys

This might give the parent too few elements, causing it to


join, and then the grandparent, etc.

Insert and Delete : Summary


Insert and delete both take O(B logB n) total time.
Note that we can easily preserve simple augmented
data like subtree heights / sizes during the process.

Insert might result in a chain of splits that


propagates up the tree.
If the root is split, this is the only case where a B-tree
can increase in height.

Delete might result in a chain of joins that


propagates up the tree.
If the root is consumed by joining its two children, this
is only case where a B-tree can decrease in height.

Equivalence with Red-Black Trees


Interesting bit of trivia a 2-3-4 tree is
essentially equivalent to a red-black tree!
2
1

3
3

1
A

2
or

A
A

C
B

Cache-Oblivious Tree Layout


Lets return to the external memory model
where our data structure resides on disk or
in some other slow block-transfer media.
B-trees are often used in this setting, since
the bottleneck is the # of block accesses.
Want to choose B so that a single node fits
exactly in a memory block.
But what if we dont know the block transfer
size? How should we choose B?

Lecture 12. Splay Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Introduction
So far, weve learned about the following
balancing mechanisms:

AVL trees (height-balanced)


BB[] trees (size-balanced)
Red-black trees
B-trees

All of these provide deterministic O(log n) worstcase performance guarantees for all fundamental
operations.
Today well study a remarkable type of balanced
BST called a splay tree that provides O(log n)
amortized guarantees.

Static Optimal BSTs


Consider the problem of build a fixed, static
BST on a set of n elements.
If we build a balanced BST, then subsequent
access takes O(log n) time.
However, suppose we know in advance that
some elements will be accessed with much
higher frequency than others.
Problem: Given the anticipated access frequency
of each element, build a BST that will execute
these accesses in minimum total time.
For example, if an element is accessed very
frequently, it probably belongs near the root.

Static Optimal BSTs : Example

An average-case access only examines 3.44 nodes, even though most of the
elements require 4 or 5 node examinations to reach!

Static Optimal BSTs


Static Optimal BST Problem: Given the anticipated
access frequency of each element, build a BST that will
execute these accesses in minimum total time.
Equivalently, minimize the expected, or average access time per
element.

One can actually solve this problem in O(n3) time using


dynamic programming (take CpSc 840 for more detail!)
O(n2) time if were very clever.

In real life, however, we often dont know the frequency


with which elements will be accessed.
Natural question: can we design a BST that adjusts its
structure appropriately (to try to minimize average access
time) if it notices that some elements are being accessed
more frequently than others?

A Few Ideas
Idea #1: Whenever element e is accessed, rotate
e with its parent, so e moves one step closer to
the root:
e

e
A

C
B

A
B

Over time, it seems like this should move


frequently-accessed elements toward the top of
the tree.
Does this look like a good approach?

A Few Ideas
If we take two elements e and p = parent(e) deep
within the tree and access e and p in alternation,
neither one moves closer to the root over time!
Idea #1: When we access an element, flip a
coin. If heads: rotate one step closer to the root.
Breaks symmetry in the bad case above.
Leads to some promising ideas, which well discuss
later along with skip lists.

Idea #2: When we access an element, rotate it all


the way up to the root.

Rotate to Root : Bad Case


Start with a path and access elements from
bottom to top:
6
5
4
3
2
1

Rotate to Root : Bad Case


Start with a path and access elements from
bottom to top:
1

5
4

2
1

2
3
4
5

Then repeat. Average access time: (n)

Rotate to Root : An Improvement?


Idea #3: When an element is accessed,
rotate it up to the root as before, but using
more sophisticated double rotations:
If element is one step below root, rotate it up to
the root.
If element is in-line with parent and
grandparent, rotate parent first, then element.
Otherwise, rotate element up two steps.

Double Rotations

Double Rotations : Example


Consider the path example that was bad for
single rotations:
7
6
5
4

3
2
1

Again, suppose we access all elements in


sequence from lowest to highest.

Double Rotations : Example


Consider the path example that was bad for
single rotations:
7
6

4
3

4
2

7
5

We essentially halve the length of the path,


thereby making the tree more balanced!

Splay Trees
Splay(e) : Move e to the root using double
rotations.
A splay tree is a BST in which we splay an
element every time it is accessed:
find(e) : Find as e usual, then splay(e).
insert(e) : Insert as e usual, then splay(e).
delete(e) : Discuss in a moment

A splay tree is called a self-adjusting tree, since


it continually modifies its structure according to
simple local update rules that do not depend on
any augmented information stored within the tree.
The skew heap is another self-adjusting data structure.

Split and Join


Splay trees easily support the extended BST
operations split and join:
split(T, k) : Split the BST T into two BSTs, one
containing key k and the other keys > k.
join(T1, T2) : Take two BSTs T1 and T2, the keys in T1
all being less than the keys in T2, and join them into a
single BST.

On a splay tree:
split(T, k) : Find the element e of key k. Then splay e
to root and remove its right subtree.
join(T1, T2) : Splay the maximum element in T1 to the
root. Then attach T2 as its right subtree.

Insert and Delete Using Split and Join


If we can split and join easily, then we can
also insert and delete easily:
insert(T, e) : split T on es key into T1 and T2.
Then make e the root, with left subtree T1 and
right subtree T2.
delete(e) : replace e with the join of its left and
right subtrees.

On a splay tree, this is how we implement


delete (insert is usually done by inserting
as in a normal BST then splaying to root).

Splay Trees : Performance


Remarkable property: all operations on a
splay tree run in O(log n) amortized time!

So a splay tree magically stays balanced


(in an amortized sense), even though it
maintains no augmented information to
help it do so!

Splay Trees : Performance


Static Optimality Theorem:
Suppose we build a static optimal BST, T, for some
access sequence S with known element access
frequencies. Let X be the total amount of time T
spends processing S.
A splay tree will spend only O(X) time processing S,
even though it doesnt know the access frequencies!

Dynamic Optimality Conjecture: For any


access sequence S, a splay tree spends only
O(X) time, where X is the best possible
processing time for any dynamic BST (rotations
allowed) that knows S in advance.
Resolving this conjecture is one of the biggest current
open problems in the field of data structures today.

Lecture 13. Amortized Analysis


of Splay Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Board Image #1

Board Image #2

Lecture 14. Randomly-Balanced BSTs

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Randomly-Built BSTs
If we build a BST on n elements by inserting them in
random order, then with high probability every call to
insert will take O(log n) with high probability.
Therefore, with high probability:

Each element will have depth O(log n).


The entire tree will have depth O(log n).
The entire tree will take O(n log n) time to build.
Randomized quicksort runs in O(n log n) time!!!

Recall: a call to insert takes O(log n) time with high


probability (w.h.p.) if
Pr[insert fails to take O(log n) time] 1 / nk
for any constant k > 0 of our choosing.

Randomized Quicksort
To quicksort an array A[1..n],
Select a pivot element p.
In linear time, partition A
into two blocks: elements p
and elements > p.
Recursively sort both blocks.

A common way to select


the pivot is to choose an
element of the array
uniformly at random.
3

Quicksort and BST Construction


There is a direct analog 3 7
between quicksort and
the process of building 1 2
a BST.
1
Hence, both have the
same running times.
Randomized quicksort
is analogous to building
a BST by inserting elements
in random order.

12 14 15 13

13 14 15

13

15

= random choices of pivots


4

Two Useful Tools for Analyzing


Randomized Algorithms
Nearly all of the randomized algorithms and data
structures we encounter in this course can be
easily analyzed using a combination of two
powerful but simple tools:
The randomized reduction lemma (introduced
previously): Starting with a problem of size n, if each
iteration of an algorithm has some constant probability
(e.g., ) of reducing our effective problem size by a
constant fraction (e.g., to of its original size), then
the algorithm will take O(log n) iterations w.h.p.
The union bound: If an algorithm spends O(log n)
time w.h.p. on a single generic input element, then it
spends O(log n) time/element on each of its n input
elements w.h.p.
5

Unions of Events
If A and B are probabilistic events (sets of outcomes of some
probabilistic experiment), then A U B is the event that either
A or B occur, or both.
Pr[A U B] = Pr[A] + Pr[B] Pr[A B], as we can see from
Venn diagram below (for rolling a 6-sided die).
2
1

4
3

A: Roll an odd number

B: Roll a number > 3

The Union Bound


Recall that Pr[A U B] = Pr[A] + Pr[B] Pr[A B].
It typically suffices just to use the rough upper
bound Pr[A U B] Pr[A] + Pr[B].
For multiple events E1 Ek, this gives us what is
known as the union bound or Booles inequality:

Pr[E1 U E2 U U Ek] Pr[E1] + + Pr[Ek].


Examples:
If each of 50 parts in a complex machine fails with
probability 1/100, then Pr[entire machine fails] .
If Pr[more than O(log n) time spent on element i] 1 / n100
for each input element i = 1n, then our entire algorithm
fails to run in O(n log n) time with probability 1 / n99.
7

Back to Randomly-Built BSTs


Focus on any specific
element e:
One unit of work is spent on e
in each level of tree.
In each successive level, the
size of the current subproblem
containing e shrinks to 2/3 its
original size with probability 1/3.
So by the randomized reduction
lemma, depth(e) = O(log n) w.h.p.

12 14 15 13

13 14 15

13

15

Now by applying the union bound,


depth(ei) = O(log n) for all elements e1 en, w.h.p.
So tree depth = O(log n) w.h.p.
Build time (and runtime of rand qsort) = O(n log n) w.h.p.
8

Maintaining Randomness
If we build a BST at random on n elements, then
with high probability it will be balanced.
However, subsequent calls to insert and delete
(which are not random!) might cause the tree to
become unbalanced.
Remarkably, we can fix this by doing some
carefully chosen random rotations after each
insert and delete so the tree is always in a state
that is as if it was just randomly built from
scratch.
This leads to a collection of simple randomized
balancing mechanisms that provide O(log n)
performance guarantees with high probability.
9

Randomly-Structured BSTs
Well say a BST is randomly-structured if:
Every element is equally likely to be at the root
The left and right subtrees of the root are themselves
randomly-structured.

As weve seen, a randomly-built BST will be


randomly-structured, and a randomly-structured
BST is balanced with high probability.
Well investigate two simple ways to maintain the
randomly-structured property after insertions and
deletions:
Treaps
Randomly-balanced BSTs.
10

Treaps
Heap BST
5

23

10 11

26

12 28

15 24

14 31

A treap is a binary tree in which each node contains two


keys, a heap key and a BST key.
It satisfies the heap property with respect to the heap
keys, and the BST property with respect to the BST keys.
The BST keys will store the elements of our BST; well
choose heap keys as necessary to assist in balancing.
11

Treaps
If heap keys are all distinct, then there is only one
valid shape for a treap. (why?)
If we choose heap keys at random, then our treap
will be randomly-structured!
What about insert and delete?
insert : Insert new element as leaf using the standard
BST insertion procedure (so BST property satisfied).
Assign new element a random heap key. Then restore
heap property (while preserving BST property) using
sift-up implemented with rotations.
delete : Give element a heap key of +, sift it down
(again using rotations, to preserve BST property) to a
leaf, then delete it.
12

Randomly-Balanced BSTs
A randomly-balanced BST gives us yet
another way to preserve the randomlystructured BST (RBST) property after
insertion and deletion.
Its technically equivalent to a treap, but
offers another nice way to think about
randomized balancing
Lets now take a detailed look at the insert
and delete operations on a randomlybalanced BST
13

Randomly-Balanced BSTs : Insertion


To insert an element e into an (n 1)-element
tree:
With probability 1/n, insert e at the root (insert as
usual, then rotate it up to root).
Otherwise (with probability 1 1/n), recursively insert
into the left or right subtree of the root.

Claim: If we start with a randomly-structured BST


(RBST) use this procedure to insert a new
element, then the RBST property is preserved.
Its easy to show that in the new tree, every
element has an equal probability of being at the
root, so we only need to show that the RBST
property still holds for both subtrees of the root.
14

Randomly-Balanced BSTs : Insertion


To be even more formal,
Let e be the element we have just inserted,
Let x be a randomly chosen element in the tree
(after insertion), and
Let R denote the event that x is now the root.

Then,
Pr[R] = Pr[R | x=e] Pr[x=e] + Pr[R | xe] Pr[xe]
= (1/n) (1/n) + (1 / (n-1)) ((n-1) / n) (11/n)
=1/n
Pr[x at root before insertion]

Pr[x not replaced be e]


15

Randomly-Balanced BSTs : Insertion


Simpler Claim: After insertion, both
subtrees of the root are RBSTs.
In the case where we dont insert at the
root (i.e., we insert recursively into the left
or right subtrees), this holds by induction.
So now all we need to show is:
Even Simpler Claim: After insertion of a
new element at the root, both subtrees of
the root are RBSTs.
16

Preserving the RBST Property (For


Subtrees) During Insertion at Root

Consider the insertion of a new element e at the root of an RBST (so e is


inserted as a leaf and rotated to the root):
r
Final rotation

RBST

(RBST)

(RBST)

(RBST)

r
A

Assume w.l.o.g.
that key(e) key(r)

RBST

(RBST)

By
induction

(RBST)

(RBST)

An RBST, since r
was equally likely to
be any element in
its original subtree
with key key(e).

RBST

17

Randomly-Balanced BSTs : Deletion


To delete an element e,
replace e with the
randomized join
of its two subtrees:

e
L

(RBST)

(RBST)

Randomized join takes the RBSTs


and joins into a single RBST:

Join(L, R)
(RBST)

and

Left(L)

Left(R)

Right(L)

Right(R)

With probability |L| / (|L| + |R|):

With probability |R| / (|L| + |R|):

Left(L)
(RBST)

Join(Right(L), R)
(RBST by induction)

Join(L, Left(R))
(RBST by induction)

Right(R)
(RBST)
18

Randomly-Balanced BSTs : Deletion


To prove that delete produces an RBST, all we need to do
is prove that randomized join produces an RBST.
The subtrees of the joined trees will be RBSTs by
induction, so all we need to do is focus on the root:
Let x be a randomly-chosen element from L U R prior to the join.
Let R be the event that x ends up at the root of the joined tree.
Let EL be the event that x was chosen from L, and ER be the event
that x was chosen from R. Then,
Pr[R] = Pr[R | EL] Pr[EL] + Pr[R | ER] Pr[ER]
= (1 / L) (L / (L + R)) (L / (L + R)) + (1 / R) (R / (L + R)) (R / (L + R))
= 1 / (L + R) = 1 / n
Pr[x at root of R before join]
Pr[Root of R chosen to become root of joined tree]
19

10

Lecture 15. Skip Lists

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Introduction
A skip list is a simple randomized dictionary data
structure that provides O(log n) w.h.p.
performance guarantees (more or less equivalent
to a randomly-balanced BST).
Very simple to implement and analyze.
Based on linked lists rather than BSTs (often
billed as an alternative to balanced BSTs)
Suppose we store a dictionary as a sorted linked
list. Recall that scanning down the list is the
bottleneck operation (taking O(n) time in the
worst case).
How might we speed this up?
2

Example

We insert a dummy start element (with effective


key value -) that is present on all levels.
Define L as the maximum level in the skip list.
3

Fundamental Operations
To find an element, repeatedly scan right until the
next step would take us too far, then step down.
To insert a new element,
Insert into the level-0 list.
Flip a fair coin. If heads, also insert into the level-1 list,
then flip another fair coin, and if heads again, insert
into level-2 list, etc.

To delete, simply remove an element from every


level on which it exists.
Other operations like pred, succ, min, max, rank,
and select, are easy to implement.
4

Analysis
The running time of each operation is dominated
by the running time of finding an element, so lets
focus on analyzing the running time of find.
Clever idea: work backwards!
Starting from some element e in the level-0 list, retrace
the find path in reverse.
Step up whenever possible, otherwise step to the left.
Claim: This runs in O(log n) time w.h.p.
Proof: Wed like to use the randomized reduction
lemma. But how
Using the union bound, we can extend this to an
O(log n) w.h.p. for all elements in the skip list.
(and this also shows that L = O(log n) w.h.p.)
5

Analysis
Let N denote the number of elements in
current level to the left of our current
location during the backward scan.
N=5

Current location:
(key = 26, level = 1)

Claim: In each step, N is reduced to half of


its current value with probability at least .
6

Analysis
Claim: In each step, N is reduced to half of its current
value with probability at least .
Proof: For N to be reduced to N two events must
occur:
A: Our next step moves up, since wed flipped heads at the current
(element, level).
B: At most half the N elements to our left in the current level also
flipped heads (and hence also exist on the next level).

Pr[A B] = Pr[A]Pr[B] since A & B are independent.


Pr[A] = (unbiased coin flip)
Pr[B] = Pr[at most N/2 heads in N coin flips] .
Hence, Pr[N reduced to N/2] = Pr[A B] .
7

Lecture 16. Range Queries

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Range Queries : Examples


In a dictionary:
Tell me all elements with keys in the range [a, b].
How many elements are there with keys in the range [a, b]?
What is the min / max / sum of all elements in the range [a, b]?

In a sequence A1An:
What is the min / max / sum of all elements in AiAj?
What are the k largest values in the range AiAj?

In more than one dimension:


Age

Tell me all the points in this region.


Tell me some aggregate statistic
about all points in this region
(e.g., count, min, max, sum, etc.).
Household income

Range Updates : Examples


In a dictionary:
Delete all elements in the range [a, b].
Apply some operation to all elements in the range [a, b].

In a sequence A1An:
Delete all elements in the range AiAj.
Increase all elements in AiAj by a common value v.

In more than one dimension:


Age

Apply some operation to all points


in this region (e.g., delete, change
some attribute by a common value).

Household income

Range Queries in BSTs


Binary search trees are ideal data structures for
range queries and updates in both dictionaries
and sequences.
For now, well focus on one-dimensional range
queries, leaving the multi-dimensional case for
later when we discuss data structures for
computational geometry.
Well also focus on range queries rather than
updates.
As well see in the next homework assignment, range
updates can typically be performed in the same
amount of time as range queries.
4

Finding all Elements in [a, b]


in a Dictionary
First find a (or the successor of a, if a is not present).
Then call successor repeatedly until weve stepped
through all elements in [a, b].
LCA(a,b) = lowest common ancestor of a and b.

(or succ(a))

= in the range [a, b].

(or pred(b))

Total time: O(k + log n) on a balanced BST, where k is the


number of elements written as output.
This is called an output-sensitive running time, and well see
such running times often in the study of data structures.

Computing Aggregate Statistics


Over a Range
We can count or find the min/max/sum of elements in a range in
O(log n) time on a balanced BST.
This works for a dictionary or a sequence encoded within a BST.
On a sequence, we can use this to implement the operations
range-sum(i, j), range-min(i, j), and range-max(i, j).

Aggregate all node information at yellow nodes and augmented


subtree information at red nodes:
LCA(a,b)

(possibly succ(a), or alternatively


select(i) if encoding a sequence)

(possibly pred(b), or alternatively


select(j) if encoding a sequence)

Range Queries in Splay Trees


Range queries (and updates) are particularly nice on splay
trees.
Given a range query over [a, b] in a dictionary:
Splay(b)
Splay(a), making sure we perform a single rotation at the root.

This effectively isolates all the elements in (a, b) in a single


subtree!
a
b

(possibly succ(a), or alternatively


select(i) if encoding a sequence)

(possibly pred(b), or alternatively


select(j) if encoding a sequence)

(a, b)

And this of course works for a sequence too


7

Static Data Structures


To date, most of our data structures have been
dynamic, supporting insertion and deletion of
elements as well as various query operations.
A static data structure is built once and then
queried (no subsequence inserts / deletes).
Some data structures are so static that they can
operate from read-only memory, but we wont make
this a requirement for calling a structure static.

Often much easier to design a static data


structure than a dynamic data structure.
And the query time is often much faster for static data
structures as well

Static Data Structures for Range Queries


On Sequences
Lets think about how we can take a sequence
A1An and preprocess it so as to handle:

range-sum
range-min / range-max
(focus on range-min, since range-max is equivalent)

As with any static data structure, wed like to


minimize preprocessing time, space, and
query time.
Example: For a BST, we have:
O(n log n) preprocessing time.
O(n) space.
O(log n) query time for range-sum and range-min.
9

Alternative Range Query Structure:


Building a Tree Atop an Array
O(n) preprocessing time.
O(n) space.
O(log n) query time.
Advantage over encoding
a sequence within a BST?
We have O(1) access to
array elements by index
(versus O(log n) access
on a BST using select).
Easy to navigate using
clever encoding in a single
array (reminiscent of a
binary heap)

range-sum(i, j), range-min(i, j), etc.

10

Prefix Sums
Consider a sequence of number A1 An.
Let Bj = A1 + A2 + + Aj.
The prefix sums B1 Bn take only O(n) total time to
compute, since Bj = Bj-1 + Aj.
range-sum(i, j) = Bj Bi-1.
So for range sums, we can achieve O(n) preprocessing
time and space, and O(1) query time.
Extended Example:
Input: n x n matrix A with N = n2 total elements.
Wed like to support a query range-sum(i1..i2, j1..j2) returning the
sum of Aij over all i1 i i2 and j1 j j2.
Can we achieve O(N) preprocessing time and space and O(1)
query time?
11

Static Range Minimum Queries (RMQs)


On a Sequence A1 An
Cant use prefix sums as with range-sum
Using a BST or a tree atop an array.
O(n) preprocessing time and space.
O(log n) query time.

This is also easy: (why?)


O(n2) preprocessing time and space.
O(1) query time.

As it turns out (perhaps surprisingly), we can


achieve O(n) preprocessing time / space and
O(1) query time for RMQs!
It will take some work to get there, however
12

Easy Case : Fixed-Length RMQs


Suppose we know that all our RMQs will be for
subsequence of some fixed length L:
Divide A1An into blocks of length L.
Compute prefix mins and suffix mins within each block
in O(n) total time.
Now the min of any range (say, straddling blocks X
and Y) can be found in O(1) by taking the min of an
appropriate suffix min in X and a prefix min in Y.
A:
i

L elements

Block X

Block Y
13

O(1) Query Time with O(n log n)


Preprocessing Time and Space

Mins of all
length-8
windows

Mins of all
length-4
windows

6
7

1
3

Mins of all
length-2
windows
A:

1
4

7 13 6
8

i
j
Now we can compute the value of range-min(i, j) by taking the minimum of at most
two precomputed window mins (whose sizes are the next-smallest power of 2 less
than the size of our query window Ai Aj).
12

10 14

14

O(1) Query Time with O(n)


Preprocessing Time and Space*
Divide original array A into blocks of size log n.
Compute prefix and suffix mins inside blocks in O(n) time.
Construct new smaller array A of length m = n / log n containing
block minima.
Build previous data structure on A.
Preprocessing time/space = O(m log m) = O(n).

range-min(i, j) = min(suffix in is block, RMQ in A, prefix in js block)

Mins of all
length-2
windows

A:

O(1)

O(1)

O(1)

4
12

10 14

13

A:
i

j
* For queries of length at least log n.

15

Lecture 17. Range Minimum Queries


And Lowest Common Ancestors

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Range Minimum Queries : Review


Given an n-element sequence A1An, preprocess it so that we can
subsequently process range-min(i, j) queries efficiently.
Previous results:
Preprocessing
Time

Space

Query Time

BST

O(n)

O(n)

O(log n)

Tree atop array

O(n)

O(n)

O(log n)

Huge lookup table

O(n2)

O(n2)

O(1)

Fixed-length queries
Precomputed window minima of
lengths 1, 2, 4, 8,
Hybrid of previous two approaches
for queries of length > log n.

O(n)

O(n)

O(1)

O(n log n)

O(n log n)

O(1)

O(n)

O(n)

O(1)

Today well see a data structure for short queries that requires only
O(n) preprocessing time / space and O(1) query time.

The Lowest Common Ancestor (LCA)


Problem in a Rooted Tree (Static)
Given an n-node rooted tree (not necessarily binary),
preprocess it so we can efficiently compute LCA(i, j) for
any pair (i, j) of nodes.
Many applications: matching algorithms in graphs,
connectivity encoding, MST verification, string matching, ...
Easy:
LCA(i, j)

O(n2) preprocessing time


O(n2) space
O(1) query time.

We can improve this by


transforming an instance
of the LCA problem into an
equivalent RMQ problem!

LCA RMQ in O(n) time


Do an Euler tour traversal (walking around tree) in linear
time and output each element along with its depth.
For each node, compute its first
1
and last occurrence in the
LCA(i, j)
2
4
traversal sequence.
O(n) total time / space.
3
5
6
7
Now to find LCA(i, j), issue
an RMQ in the depth array from
8
10 11 j
i
the first occurrence of i to the
last occurrence of j.
first(i)
12
9
first / last(j)

last(i)

Depths: 0 1 2 1 0 1 2 3 2 3 4 3

4 3 2 1 2 1 2

3 2

3 2 3 2 1 0

Nodes: 1 2 3 2 1 4 5 8 5 i 9 i 12 i 5 4 6 4 7 10 7 11 7 j 7 4 1
4

Converting an RMQ Problem to an


Equivalent LCA Problem
We can also convert an RMQ problem into an equivalent
LCA problem by building what is called a Cartesian tree:
Suppose Am is the minimum element in our sequence.
Am placed at root of Cartesian tree.
Left and right subtrees recursively constructed from A1Am-1 and
Am+1An respectively.
-1
Am
A:

LCA(i, j)
6

4 13 10 -1 21 11 14 6
Ai
Min {Ai Aj}

7
Aj

10
13

7 j

11
i 21

14

How is this similar to a treap?


We can build a Cartesian tree from an n-element
sequence in only O(n) time (on next homework).

Back to RMQs
Well now describe a means of answering short
RMQs (length log n) with O(n) preprocessing
time and space, and O(1) query time.
Combined with our previous hybrid block data
structure, this allows us to answer any RMQ with
O(n) preprocessing time / space and O(1) query
time.
Using the same data structure, we can therefore
also solve LCA problems using only O(n)
preprocessing time / space and O(1) query time.
6

Short RMQs
To build a data structure for short RMQs (length
at most log n):
In O(n) time, convert the RMQ instance into an
equivalent LCA instance by building a Cartesian tree.
In O(n) time, convert this LCA instance back into an
equivalent RMQ instance.

What on Earth could we have possibly


accomplished by converting an RMQ problem
back into another RMQ problem?

Short RMQs
The new RMQ problem has a very special
property: each successive element differs by
exactly one from its neighbors!
We can encode any such sequence of length n
as a binary number having n 1 bits.
Encode +1 using 1 and -1 using 0.

Example:
RMQ:
A: 0 1 2 1 0 1 2 3 2 3 4 3 4 3 2 1 2 1 2 3 2 3 2 3 2 1 0
B: 1 1 0 0 1 1 1 0 1 1 0 1 0 0 0 1 0 1 1 0 1 0 1 0 0 0

How does this help us answer short RMQs?


8

Short RMQs
A short RMQ (length log n) corresponds to a
binary number with log n bits, and moreover
the answer to the RMQ is completely
determined by this binary number.
There are only 2log n = n binary numbers of
length = log n bits.
How many binary numbers of length log n bits?
At most ( log n)(n) = O(n) of them (this is why we
need the , since otherwise it would be O(n log n))
Build a lookup table with O(n) entries containing all
possible short RMQ answers!

Computational model alert: the use of a lookup


table requires the RAM model of computation.

Lecture 18. Applications of Cartesian


Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Cartesian Trees : Review


We have already seen how to build a Cartesian
tree from a sequence (in O(n) time, as well see
on the next homework).
This allows us to transform static range min query
(RMQ) problems on a sequence into static LCA
problems on a rooted tree.
-1
Am
A:

LCA(i, j)
6

4 13 10 -1 21 11 14 6
Ai
Min {Ai Aj}

7
Aj

10
13

7 j

11
i 21

14
2

Applications of Cartesian Trees


Cartesian trees have many other useful
applications:

RMQs in trees.
Verifying minimum spanning trees.
Constructing evolutionary trees.
Encoding graph connectivity.
Building a suffix tree from a suffix array.

Well discuss the first few now, leaving suffix


array construction for later in the course.
First application: RMQs in trees
3

Free Trees
Data structures often deal with rooted trees.
In graph theory, a tree is just a connected graph with no cycles.
To emphasize that such a tree has no designated root, we often use
the term free tree.
Free trees typically represented in memory with each node
maintaining an adjacency list of its neighbors.
We can root an n-node free tree in O(n) time (very common as a
preprocessing operation) by traversing it and specifying for each node
(except the root) which neighbor is its parent.
Since there is a unique
7
path connecting any pair
8
(u, v) of nodes in a tree,
10
9
it makes sense to think
2
about range queries (and
u
updates) in trees.
v
4
E.g., range-max(u, v)
3

Building a Cartesian Tree from a


Free Tree
7
(max-weight edge)

10

2
u
4

v
3
10
7
8
9
2
u

3
5

Building a Cartesian Tree from a


Free Tree
7
8

10

2
u
4

v
3
10
8

9
2
7

u
4

v
3
6

Building a Cartesian Tree from a


Free Tree
Max-weight edge
on u v path

7
9

10
2

u
4

v
LCA(u, v)

3
10
8

3
7

4
v

u
7

Range Queries on a Free Tree


Just like a Cartesian tree allows us to transform
an RMQ problem on a sequence into an
equivalent LCA problem on a rooted tree, we can
also transform an RMQ problem on a free tree
into an equivalent LCA on a rooted tree.
So we can answer range min/max queries in free
trees in O(1) time.
Preprocessing takes slightly longer: O(n log n).
What about range-sum(u, v) queries on a free
tree? Can we answer these in O(1) time with
O(n) preprocessing time and space?
8

Minimum Spanning Trees (MSTs)


An MST is a minimum-cost subset of the edges of a graph
that provides connectivity between all nodes (a minimumcost spanning tree of the graph).
7

19

15

10

12

11

2
14

16
3

MSTs have many applications in practice (e.g., network


design) and serve as a building block for many more
sophisticated algorithms (e.g., approximation algorithms
for the traveling salesman and Steiner tree problem).
9

MST Verification Using Cartesian Trees


Well discuss MSTs in more detail in the next lecture (and
also in CpSc 840 next semester). For now, we focus on
characterizing an optimal spanning tree
19

7
15

10

9
11

8
12
2

u
4

16

14

v
3

Theorem: A spanning tree T is an MST if and only if for


every non-tree edge uv, cost(uv) range-max(u, v).
So Cartesian trees allow us to verify an prospective MST
in an n-node, m-edge graph in O(m + n log n) time.
(O(m) is also possible, but much trickier to achieve)
10

Maximum Spanning Trees


The maximum spanning tree problem is more or less
equivalent to the minimum spanning tree problem.
7
u
15

19

10

9
11

16

8
12
2
14
3

Theorem: A spanning tree T is a maximum spanning tree


if and only if cost(uv) range-min(u, v) for every non-tree
edge uv.
So we can also verify a prospective maximum spanning
tree efficiently using Cartesian trees.

11

Evolutionary Trees
In the field of bioinformatics, a common problem
is to try and reconstruct the most likely
evolutionary history for a set of organisms given
the differences in features they exhibit at the
present time.
This gives rise to a host of related computational
problems involving the computation of an
evolutionary tree for a set of organisms.
Two common variants:
Distance-based
Feature-based
12

Distance-Based Evolutionary Trees


Monkeys

Tigers

Lions

Humans

Humans

Bacteria
Bacteria

400

400

400

400

24

10

24

24

24

Lions
Monkeys

Tigers

200

188

bacteria
7
5

9
5

humans

monkeys

lions tigers

200

Evolutionary time

12
5

humans

monkeys

lions tigers bacteria


13

Feature-Based Evolutionary Trees


(a.k.a. Character-Based Trees)
Evolutionary time

AC

AD

ADE

Feature

Organisms with
Feature

{1, 2, 3, 4}

{5, 6}

{2}

{3, 4}

BF

{4}

{6}

Phylogenetic tree: Assumes each feature evolves into being at a


unique point in time (not perfectly realistic, but simplifies the tree
construction problem substantially).
14

Distance-Based Evolutionary Trees


And Ultrametrics
For any three organisms (x, y, z), consider the distances
d(x, y), d(x, z), and d(y, z).
The largest two of these three distances are always equal.
This is called the ultrametric property, and a set of
distances satisfying this property is called an ultrametric.

Tigers

Tigers

Monkeys

Monkeys

Lions

Lions

Humans

Humans

Bacteria
Bacteria

200

400

400

400

400

24

10

24

24

24
0

12
5

humans

monkeys

lions tigers bacteria


15

Distance-Based Evolutionary Trees


And Ultrametrics
A metric is a distance function satisfying the triangle
inequality for all triples (x, y, z):
d(x, y) d(x, z) + d(y, z)
An ultrametric satisfies an even more stringent condition:
for all triples (x, y, z),
d(x, y) max{ d(x, z), d(y, z) }.
Some properties of ultrametrics:
Since max{ d(x, z), d(y, z) } d(x, z) + d(y, z), an ultrametric is also
a metric.
The ultrametric condition above is completely equivalent to our
earlier condition of any three distances d(x, y), d(x, z), and d(y, z),
the two largest of these are always equal.
For any sequence of points x1, x2, , xk, the above condition
implies (via induction) that d(x1, xk) max { d(xj, xj+1) : 1 j < k }.
16

Distance-Based Evolutionary Trees


And Ultrametrics
The distance function derived from an evolutionary tree is clearly an
ultrametric.
As it turns out, we can compute an evolutionary tree from a distance
function d if and only if d is an ultrametric
And this task is easy, using Cartesian trees:
1. Find a minimum spanning tree T in a graph where the cost of edge uv is d(u, v).
2. Compute the Cartesian tree of T.
3. Divide the value of each node in T by two.

Tigers

Tigers

Monkeys

Monkeys

Lions

Lions

Humans

Humans

Bacteria
Bacteria

200

400

400

400

400

24

10

24

24

24
0

Resulting
Cartesian tree
12
5

humans

monkeys

lions tigers bacteria


17

Distance-Based Evolutionary Trees


And Ultrametrics
Suppose d is an ultrametric, and let T be a
minimum spanning tree of ds associated graph
(i.e., an n-node graph where the cost of each
edge uv is d(u, v))
Key property: For any pair of organisms (u, v),
d(u, v) = range-max(u, v) in T.
Simple Proof:
For edges uv in T, this property obviously holds. For nontree edges uv, we have:
d(u, v) range-max(u, v) since T is an MST.
d(u, v) range-max(u, v) since d is an ultrametric.
18

The Perfect Phylogeny Problem

Evolutionary time

AC

AD

ADE

Feature

Organisms with
Feature

{1, 2, 3, 4}

{5, 6}

{2}

{3, 4}

BF

{4}

{6}

Perfect Phylogeny Problem: Given as input the sets {1, 2, 3, 4}, {5, 6},
{2}, {3, 4}, {4}, and {6}, find a valid phylogenetic tree, if it exists.
Theorem: A solution only exists if our input sets are laminar (nesting).

19

Min-Ultrametrics and Phylogenetic Trees


Suppose our input sets are laminar (nesting).
Let d(x, y) be the number of features that
organisms x and y share in common.
d is a min-ultrametric:
Feature Organisms with
Feature
d(x, y) min(d(x, z), d(y, z)}
A
{1, 2, 3, 4}
Can build a phylogenetic
B
{5, 6}
tree by constructing a
C
{2}
min-ultrametric tree from d.
D
{3, 4}
(A Cartesian tree built from
E
{4}
a maximum, rather than
F
{6}
minimum spanning tree).

20

10

Min-Ultrametrics and Phylogenetic Trees


d(x, y) = # of features that organisms x and y
share in common.
0

Feature Organisms with


Feature
1

AC

AD

ADE

{1, 2, 3, 4}

{5, 6}

{2}

{3, 4}

BF

{4}

{6}

21

Review : Encoding Ultrametrics and


Min-Ultrametrics
In general, we need to use a size-n2 table to
specify an arbitrary distance function on n points.
Any ultrametric distance function can be succinctly
encoded in only O(n) space within the edge
weights of a minimum spanning tree.
A min-ultrametric distance function can be similarly
encoded, using a maximum spanning tree.

By converting the MST into a Cartesian tree and


preprocessing it for fast LCAs, we can still
compute the distance d(x, y) between any pair of
points (x, y) in O(1) time, even in this compressed
representation!
22

11

Encoding Edge Connectivity in a Graph


The edge connectivity (x, y) between two nodes
x and y in a graph is:
The minimum number of edges whose removal
disconnects x from y.
Equivalently, the maximum number of edge-disjoint
paths connecting x to y.

(x, y) = 3

y
23

Encoding Edge Connectivity in a Graph


One can show that (x, y) is a min-ultrametric.
Therefore, we can use a Cartesian tree to encode
the edge connectivity between all n(n-1)/2 pairs of
nodes in a graph in only O(n) space, such that
(x, y) can be computed in only O(1) time!

(x, y) = 3

y
24

12

Lecture 19. Disjoint Sets

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Motivation : Kruskals MST Algorithm


Kruskals algorithm is perhaps the best known
algorithm for computing an MST.
Its a simple greedy algorithm:
Sort the edges in our graph in increasing order of cost.
For each edge in this sorted order:
Add it to our spanning tree if it doesnt create a cycle
with edges weve already added (otherwise, skip it).

Correctness follows directly from the MST


optimality conditions from the last lecture:
Each non-tree edge uv was not included because it
formed a cycle with edges of lower cost, so each such
edge satisfies cost(uv) range-max(u, v).

Kruskals Algorithm : Example

Kruskals Algorithm : Running Time


Kruskals algorithm:
1. Sort the edges in our graph in increasing order of cost.
2. For each edge uv in this sorted order:
Check if uv creates would create a cycle.
If not, add uv to our solution.

On a graph with n nodes and m edges, step 1


takes O(m log m) time.
Usually written as O(m log n) since m n2.

In order to implement step 2 efficiently, we


maintain the components of our partial spanning
tree using a disjoint set data structure
4

Disjoint Set Data Structures


A disjoint set data structure maintains a set of
elements partitioned into disjoint sets.
For example, if our elements are {1, 2, 3, 4, 5, 6},
then we might have sets {1, 5}, {2}, and {4, 3, 6}.
It supports these operations:
make-set(e) : Add a new element e and a singleton
set containing just e.
find-set(e) : Return an identifier that specifies the set to
which e belongs (usually a pointer to some canonical
element in es set).
union(S1, S2) : Combine sets S1 and S2 into a one set.

Sometimes, disjoint set data structures are also


known as union-find data structures.

Kruskals Algorithm Using Disjoint Sets


Kruskals algorithm:
1. Sort the edges in our graph in increasing order of cost.
1b. Call make-set(j) on each node j, so we start out with
each node being its own singleton set.
2. For each edge uv in this sorted order:
Su = find-set(u), Sv = find-set(v).
If Su Sv, then union(S1, S2).

Running time:

O(m log n) for initial sort


O(n) calls to make-set
O(n) calls to union.
O(m) calls to find-set.
6

Kruskals Algorithm : Running Time


There are many ways to implement a disjoint set
data structure.
Some of these are so efficient that the bottleneck
step in Kruskals algorithm becomes the initial
sorting of edges.
So Kruskals algorithm runs in O(m log n) time.
In the special case where we start with a tree and
run Kruskals algorithm on it: O(n log n) time.
This gives us an O(n log n) algorithm for constructing a
Cartesian tree from a free tree in a bottom up
fashion!
7

Building a Cartesian Tree Using Kruskal


Max-weight edge
on u v path

7
9

10
2

u
4

v
LCA(u, v)

3
10
8

3
7

4
v

u
8

Disjoint Set Implementations :


Element Set Pointers
Suppose we store the elements in a disjoint set
data structure in an array, and augment each one
with the identifier of the set to which it belongs.
Make-set: O(1)
Find-set: O(1)
Union: O(n)

Disjoint Set Implementations :


Storing Sets Using Linked Lists
Suppose we store the elements in each set within
a doubly-linked list.
We maintain pointers to the first and last elements in
the list, or alternatively use a circular doubly linked list.

Make-set: O(1)
Find-set: O(n)
Union: O(1)

10

A Hybrid Approach
Store the elements in each set within a doublylinked list (possibly circular).
Each element maintains a pointer to the
canonical element in its set.
Make-set: O(1)
Find-set: O(1)
Union: O(n)
Can we improve
the running time
of union?
11

Union By Rank
Augment each set with its size.
During union, always relabel the pointers in the
smaller set.
Each element now relabeled at most O(log n) times
during its lifetime in the data structure, since when you
are relabeled, the size of your set at least doubles.

Amortized runtimes:
Make-set: O(log n)
Find-set: O(1)
Union: O(1)
12

Tree-Based Disjoint Set Implementations


Each set stored as a rooted tree.
Each element only maintains a pointer to its parent.
The root is the canonical element of a set.
No restrictions (for now) on allowable tree shapes, so a
we could end up with bad tree shapes (paths) or very
good tree shapes (depth 1, like S4).

Make-set: O(1)
Find-set: O(n)
Union: O(1)

13

Union by Rank on Trees


Augment each element e with a value rank(e).
The rank of a tree is the rank of its root.
Initially (after make-set(e) called), rank(e) = 0.

During union of two trees with ranks r1 and r2:


If r1 > r2, link r2 as a child of r1.
If r1 = r2, link arbitrarily, but then make the rank of the
resulting tree r1 + 1.

Claim:
A tree of rank r must
contain 2r elements.
(easy proof by induction)
14

Union by Rank on Trees


Recall: A tree of rank r contains 2r elements.
Therefore, maximum rank is log n.
Note that a tree of rank r also has height r, so
union-by-rank ensures that all heights log n.
Worst-case running times:
Make-set: O(1)
Find-set: O(log n)
Union: O(1)

15

Path Compression
Ideally, trees should have depth 1 (like S4 below).
After we call find-set(e), compress the path from
e to the root by walking up from e and linking all
its ancestors directly to the root.
We can show that path compression reduces the
amortized running time of every disjoint set
operation to O(log n).
(details omitted)

16

Path Compression and Union-by-Rank


Suppose we apply both path compression and
union-by-rank to the tree implementation of
disjoint sets.
Note that ranks no longer correspond to tree heights,
since path compression may subsequently reduce the
height of a tree.
However, it is still true that a tree whose root has rank r
contains at least 2r elements.

Amazingly, this hybrid approach reduces the


amortized running time of every disjoint set
operation to O((n)), where () is the inverse
Ackermann function
17

The log* n Function


log* n = the minimum # of times you must
successively apply log2 to n before its value
drops to 1.
log* n

Range of values of n

1 = R0

2 = R1

34 = R2

516 = R3

17216 = R4

216 + 1 265536 = R5

Ri+1 = 2Ri

For all practical purposes, log* n is a constant.


(but in theory, we cant just ignore it)
18

The Inverse Ackermann Function (n)


log** n = the minimum # of times you must successively
apply log* to n before its value drops to 1.
log*** n = the minimum # of times you must successively
apply log** to n before its value drops to 1.
(n) = minumum value of k such that:
k

log***** n k
There are several variants of (n) in the literature, all of
which behave the same asymptotically.
(n) grows very slowly (slower than log**** with any
number of stars), so a running time of O((n)) is extremely
fast; (n) 4 for all practical purposes.

19

Amortized Analysis of Path Compression


And Union-By-Rank
Well prove that any sequence of m disjoint set operations
involving n m total elements takes O(m log*n) time.
(slightly weaker than the best bound of O(m (n)) time).
Easy initial observations:
Total time for all calls to make-set: O(n).
Total time for all calls to union: O(n).

We will show that find-set spends O(m log*n) total time:


O(m log*n) total time on big steps, and
O(n log*n) time in total on small steps (defined later).

Remaining details to be worked out on board

20

10

Lecture 20. Amortized Analysis of


Disjoint Sets

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Board Image #1
(From Animated Explanation)

Board Image #2
(From Animated Explanation)

Lecture 21. Random Variables and


Expected Value

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Analyzing Randomized Algorithms and


Data Structures
So far, our analyses of randomized algorithms
have focused on proving with high probability
results. For example:
Randomized quicksort runs in O(n log n) time w.h.p.
A randomly-built BST is balanced w.h.p.
Randomized mergeable binary heaps support a merge
operation running in O(log n) time w.h.p.

Our primary tools so far: the randomized


reduction lemma and the union bound.
Today, well see another type of randomized
analysis where we look at the average, or
expected running time of an algorithm.
2

Random Variables : Introduction


In elementary school, we all learn that a variable is a
name, or placeholder for some specific value.
By contrast, a random variable stands for a numeric
value that is determined by the outcome of some random
experiment. Examples:
Let X be the number we see when we roll a 6-sided die.
Let Y be the number of heads in 100 fair coin flips.
Let T be the number of comparisons made by applying randomized
quicksort to a length-n array.

Every r.v. has an associated probability distribution.


Think of the r.v. as a placeholder for a value that will be
instantiated, or drawn from this distribution once our
random experiment actually happens.
Example: X takes values 1..6 each with probability 1/6.
The distributions of Y and T are somewhat more complicated.

Formally, a r.v. is defined as a function mapping


experimental outcomes to numbers.

Equations Involving Random Variables


Just like we can write equations involving
standard variables, we can also write equations
involving random variables.
Simple example: Z = X + Y, where
X is the number we roll on our first roll of a die,
Y is the number we roll on the second roll, and
Z is the sum of the two numbers on both dice.

More interesting example: T = j Xj, where


T is the total amount of time spent by an algorithm.
Xj is the amount of time spend only on element j.

Key property: any equation or inequality


involving random variables must hold for every
possible random instantiation of these variables.

Events Derived from Random Variables


Let D be the largest face value we see when we roll two
6-sided dice.
The probability distribution for D is:
1: 1/36 2: 3/36 3: 5/36 4: 7/36 5: 9/36 6: 11/36

D is even and D > 3 are events, so we can consider


computing Pr[D is even] or Pr[D > 3].
Another example: If T denotes the running time of
randomized quicksort applied to an array of length n, then
Pr[T = O(log n)] 1 1/nc, for any constant c > 0.
Just be careful never to write Pr[D].
This is a syntax error, since D is a random variable and
not and event (a set of outcomes).

Expected Value
The expected value of a discrete random variable
X, denoted E[X], is defined as
E[X] = values v v Pr[X = v].
Think of E[X] informally as the center of mass of
Xs probability distribution.
Example: Let D be the max of two dice rolls.
Recall that D has this probability distribution.
Thus, E[D] = 1(1/36) + 2(3/36) +
3(5/36) + 4(7/36) + 5(9/36) +
6(11/36) = 161/36 = 4 17/36

1/36

3/36

1
2
Careful: Dont write E[A] if
A is an event (another syntax error!)

5/36

7/36

11/36
9/36

5
E[D]

Computing Expected Values


There are generally 4 different ways we will
compute expected values in this class:
1.
2.
3.
4.

Directly using the definition E[X] = v v Pr[X = v].


The special case of an indicator random variable.
The special case of a geometric random variable.
Expressing a complicated random variable in terms of
a sum of simpler r.v.s and applying linearity of
expectation.

Indicator Random Variables


Suppose E is some event.
(e.g., roll a 3 on a 6-sided die).
Let X be a random variable taking the value 1
when E occurs, and 0 otherwise.
We say X is an indicator random variable for E.
(also called a Bernoulli r.v.)
Easy to compute E[X]:
E[X] = 1 Pr[X = 1] + 0 Pr[X = 0]
= Pr[X = 1]
= Pr[E] (= 1/6 in our example).
The expected value of any indicator r.v. is just the
probability of its associated event.
8

Geometric Random Variables


(Expected Trials Until Success)
Suppose we perform a series of
independent random trials, where each trial
succeeds with probability p.
Let X denote the number of trials until the
first success.
X has a geometric probability distribution.
E[X] = 1 / p (easy to prove via definition).
Example: if X denotes the number of dice
rolls until we first see a 3, then E[X] = 1/6.
9

Linearity of Expectation
E[] is a linear operator:
E[cX] = cE[X] if c is a consant
E[X + Y] = E[X] + E[Y]

The above holds for any random variables X and Y,


regardless of whether or not they are independent!
This gives us a very powerful tool for computing
expectations of complicated random variables.
Example:
Let H be the total number of heads in 100 coin flips.
Computing E[H] by definition of E[] looks messy!
Instead, write H = H1 + H2 + + H100, where Hj = 1 if the
jth coin toss comes up heads.
Now E[H] = E[H1] + + E[H100] = 100(1/2) = 50.
10

Linearity of Expectation : Examples


If If everyone in this room is wearing a hat and we
randomly permute the hats, what is the expected
number of people ending up with their original hat?
If we randomly throw n balls into m bins, what is the
expected number of balls landing in a specific bin?
If we put n people in a room, what is the expected
number of pairs of people sharing the same
birthday? (assuming all birthdays equally likely)
Again, suppose n people are in a room. Person 1
is given a ball, and every second whoever has the
ball throws it to a randomly-selected person. What
is the expected number of seconds until everyone
has held the ball at least once?
11

Example : Ball Throwing


Whats the expected number of seconds until
everyone has held the ball at least once?
T: total # of seconds
Tj : # of seconds in the jth phase of the game,
where exactly j 1 people have held the ball so
far, and we are waiting for the jth.
E[Tj] = n / (n j + 1).
Now T = T1 + + Tn
So E[T] = E[T1 + + Tn] = E[T1] + + E[Tn]
= n/n + n/(n-1) + + n/1
= n(1/n + 1/(n-1) + + 1/2 + 1) = (n log n).
12

Example : Per Element Running Time


Analysis
Recall (due to the union bound):
If an algorithm spends O(T) time on a generic input
element w.h.p., then it spends O(nT) time on all
input elements w.h.p.
Linearity of expectation gives us a similar result:
If an algorithm spends O(T) expected time on a
generic input element, then it spends O(nT)
expected time on all input elements.
So when trying to find the expected running time of
an algorithm, we can often simplify this problem to
the computation of expected time on one element.

13

Lecture 22. Hash Tables

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Overview
Basic operations of a dictionary: insert, delete, find.
Using balanced BSTs, B-trees, or skip lists, we can
perform all of these operations in O(log n) time.
And in addition, we can implement dynamic sequences, perform
inexact searches, implement the min, max, pred, succ, select,
rank, split, and join operations, perform range queries and
updates, sort, implement a priority queue, etc.

Today, we will show how to implement the basic


dictionary operations insert, delete, and find all in O(1)
time using a universal hash table.
Requires a RAM (so elements must have integer-valued keys),
whereas the preceding data structures are all comparison-based.
Unfortunately, hash tables dont support any of the fancier
functionalities above supported by balanced BSTs
2

Simple Idea : The Direct Access Table


Suppose keys are integers in the range 0C 1.
(where C is fits in a single machine word)
Construct a huge array A[0C 1], and store the
element of key k (or a pointer to it) in A[k].
Insert, delete, and find now all take O(1) time!
Can even initialize the structure in O(1) time
using virtual initialization.
Only drawback: requires excessive space.
E.g., If keys are 9-digit student ID numbers, then we
need 1 billion words of memory, even if we are only
storing a few dozen records.

Reducing Space Requirements with


Hashing
Ideally, we should only use O(n) space to store a
data structure containing n elements of data.
So, lets do the following:
Use a table of size m = O(n).
Construct a hash function, h(k), that maps a key
k {0, , C 1} to a value h(k) {0, , m 1}.
Store element of key k in cell h(k) of our hash table.

This works great, except for the situation where


two elements with different keys k1 k2 collide
because h(h1) = h(k2).

Collisions are inherently unavoidable any time we map


a large space of keys down to a smaller range!
4

One Common Method for Dealing with


Collisions : (Linear) Probing
Suppose we want to insert an
element with key k, the cell with
h(k1)
index h(k) is already occupied.
Keep scanning forward from h(k)
= occupied
until we find an empty table cell
(wrap around to the beginning if necessary).
Sometimes we use a more complicated probing pattern
E.g., probe locations (h(k) + g(i)) mod m (for i = 0, 1, ) where g
is a secondary hash function.

Careful: If you delete an element, mark its cell so that you


dont stop scanning there during subsequent calls to find.
We wont analyze the performance of probing methods in
this class (it can be slightly complicated).
5

Another Method for Handling


Collisions: Chaining
Build a linked list off each table cell containing all
elements hashing to that cell:

Chaining : Performance Analysis


Insert and delete take O(1) time.
This becomes O(1) amortized time if we dynamically
resize the table to maintain the property that m = (n).
(although O(1) worst case still possible if we use
fancier worst-case table resizing mechanism).

Find(k) takes O(1 + L) time, where L is the length


of the linked list attached to cell h(k).
If our hash function is somewhat random, then
we expect L to be roughly m / n = O(1).
For good performance, we want to spread out
elements uniformly over the hash table, so an
ideal hash function should be somewhat
random.
This is the origin of the term hash (i.e., to mix up).

Choosing a Random Hash Function


In Practice
In practice, one often uses a deterministic hash
function that behaves somewhat randomly.
Examples:

h(k) = k mod m.
h(k) = mk / C.
h(k) = (ak) mod m.
h(k) = (ak + b) mod m.

a and b should be chosen in a somewhat


arbitrary fashion (try to make them
relatively prime with m, if possible).
E.g., h(k) = (29713k + 9071) mod m.

Be sure to fully utilize all the bits in a key. For


example, h(k) = k mod 256 is a somewhat poor
hash function if k is a 32-bit IP address.
Be sure to use the entire hash table; e.g., make
sure h(k) isnt always even.
8

The Case Against Deterministic


Hash Functions
Note that for any deterministic choice of hash
function, we can always find a bad set of input
elements that all hash to the same cell!
A consequence of the pigeonhole principle:
If you map 100 elements into 99 slots, then some slot
most end up containing 2 elements.
Similarly, if C m(n 1) + 1, then some set of n
different keys will be to the same table cell.

This reduces our hash table to nothing more than


a fancy linked list, so find takes (n) time!
So if we want a good worst-case guarantee for
find, we cannot use a deterministic hash function.
We must use randomness in some way.
9

Why Not Use a Completely


Random Hash Function?
Suppose we choose a hash function h(k) that maps
every key k {0, , C 1} to a completely random
location in {0, , m 1}.
This gives us E[L] = n / m, so find does indeed run
in O(1 + n / m) = O(1) time.
Please always assume (from here on) that we use
dynamic resizing to maintain m = (n), so n / m = O(1).
How do we show that E[L] = n / m? Linearity of
expectation!

Fatal flaw: requires (C) space to store the hash


function, same as the direct access table.
10

Universal Hashing
We say that a class of hash functions is universal
if the probability (over a random choice of hash
function from the class) of two different keys
colliding is at most 1 / m.
Example of a universal hash function:
h(k) = [(ak + b) mod p] mod m, with

p: any prime number C.


a: random integer in {1, , p 1}.
b: random integer in {0, , p}.

With a universal hash function, find runs in O(1)


expected time.
Easy proof using linearity of expectation (next class).
11

Applications of Hashing
Approximate Comparison:
Instead of directly comparing two large objects (files, web pages),
hash them and compare fingerprints instead.
String matching (similar to above).

Security:
Store fingerprints of files or packets to detect tampering.
Digital signatures (encrypted hash of message text).
Store passwords as hashes rather than in the clear.

Pseudorandom number generation


Load balancing (assign incoming webpage request to server
based on hash of source IP).
Associative arrays (emulate large direct access table).
Decomposing large problems into smaller problems
Anagram detection, near neighbor finding, bucket sort, etc.
12

Lecture 23. Universal Hashing


(and Modular Arithmetic)

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Motivation
By resolving collisions with chaining:
Insert and delete both run in O(1) time (amortized).
Find runs in O(1 + L) time, where L is the length of the
list we end up searching.

L = (n) in the worst case for any deterministic


choice of hash function.
E[L] = n / m = O(1) for a completely random hash
function, except this takes (C) space to store.
Hence, we want to use a partially random
function a function specified by a small number
of randomly-chosen parameters.
2

Universal Hash Functions


A set of hash functions is universal if the
probability (over a random choice of function from
the set) of two different keys colliding is 1 / m.
I.e., Pr[ h(k1) = h(k2) | k1 k2 ] 1 / m.
(where the probability is over our random choice of
hash function)

Example:

Select any prime number p C.


Choose a random integer a in {1, 2, , p 1}.
Choose a random integer b in {0, 1, , p 1}.
h(k) = [(ak + b) mod p] mod m.

This describes a set of p(p 1) hash functions.


3

O(1) Expected Running Time for Find


With Universal Hashing
Let T denote the running time of an unsuccessful
call to find(k).
Clearly, E[T] the expected time required for a
successful find operation, so our analysis also provides
a bound on the expected time of a successful find.

Compute E[T] using linearity of expectation:


Let k1 kn denote the keys stored in our hash table.
Write T = X1 + X2 + + Xn.
Xj: indicator random variable taking value 1 if h(kj) = h(k).
Since k kj, E[Xj] 1 / m (using universal hashing).
So E[T] = E[X1] + + E[Xn] = n E[Xj] n / m = O(1).
4

Example Applications
Consider the following problems:
Element Uniqueness: Given n numbers A1 An, are
all of these distinct, or are two of them equal?
Set Intersection / Union / Difference: Given two sets
A and B (specified by unsorted arrays) containing n
elements in total, output an array containing A B,
A U B, or A \ B.

All of these problems have (n log n) worst-case


lower bounds in the comparison model.
However, we can solve them in O(n) expected
time with universal hashing.
5

Modular Arithmetic
When we perform arithmetic modulo n, we care about
the remainder of a number when divided by n, so are we
effectively only using the numbers {0, 1, , n 1}.
x y (mod n) means that the integers x and y have the
same remainder when divided by n.
Read x is congruent to y, mod n.
Example: 17 12 (mod 5).

The equation x y (mod n) can be manipulated much like


any other equation we can add, subtract, and multiply
both sides by the same value and it remains true.
Division is much more interesting:
In order to solve for x in 4x 5 (mod 7), we need to somehow
divide both sides by 4...
6

Arithmetic Modulo a Prime


Remarkable fact: when we perform arithmetic modulo a
prime p, every nonzero integer has a unique multiplicative
inverse (so we can perform division!)
Example: 4x 5 (mod 7).
The unique multiplicative inverse of 4 is 2 (in mod 7 arithmetic),
since 4(2) 1 (mod 7).
Multiplying both sides by 4-1 = 2 we obtain:
(2)4x (2)5 (mod 7)
x 3 (mod 7)
(and indeed, 4(3) 5 (mod 7))

Arithmetic mod p is an example of an algebraic field:


A set of numbers over which we can perform addition, subtraction,
multiplication, and division.
The associative, commutative, and distributive properties hold.
7

Arithmetic Modulo a Prime


Arithmetic mod p is an example of an algebraic field.
(arithmetic over the reals is another familiar example)
Many types of mathematical facts and operations we
initially learned in the context of real number arithmetic
actually extend to any algebraic field:
Solving a system of linear equations.
Interpolating a polynomial (i.e., determining the coefficients of a
degree-d polynomial given its value at d+1 different points).
A polynomial a(x) = adxd + ad-1xd-1 + + a1x + a0 of degree d can
have at most d roots (values of x for which a(x) = 0).

Example (system of linear equations):


6x + 5y 2 (mod 7)
3x + y 4 (mod 7)

Unique solution (x, y) = (2, 5).


8

Example : Comparison of Large Arrays


Suppose Adam has a binary sequence A1 An and Bo
has a binary sequence B1 Bn.
With minimal communication (much less than n bits), wed
like to check whether A = B.
Clever idea: compare hashes of A and B
Construct two polynomials:
Adam:
Bo:

A(x) = A1x + A2x2 + + Anxn


B(x) = B1x + B2x2 + + Bnxn

Select a prime p n11, and a random x from {0, , p 1}.


Adam transmits p, x, and A(x) mod p to Bo (only O(log n) bits!)
Bo checks if A(x) B(x) (mod p). If so, he declares A = B.

If A = B, this always works.


If A B, Pr[failure] 1 / n10 (and can replace 10 with any
constant of our choosing, so this is a w.h.p. guarantee).
9

Example : Comparison of Large Arrays


Clever idea:
Construct two polynomials:
Adam:
Bo:

A(x) = A1x + A2x2 + + Anxn


B(x) = B1x + B2x2 + + Bnxn

Select a prime p n11, and a random x from {0, , p 1}.


Adam transmits p, x, and A(x) mod p to Bo (only O(log n) bits!)
Bo checks if A(x) B(x) (mod p). If so, he declares A = B.

Suppose that A B. Let C(x) = A(x) B(x).


Pr[erroneously conclude that A = B]
= Pr[A(x) B(x) (mod p)]
= Pr[C(x) 0 (mod p)]
n / p (since a poly of degree n has at most n roots)
1 / n10.

10

Proving that h(k) = [(ak + b) mod p] mod m


Is Universal

For simplicity, suppose table size is a prime: m = p


Hash function simplifies to h(k) = (ak + b) mod p.
Consider two different keys k1 k2.
Note that h(k1) h(k2), so k1 and k2 cant collide!
Why? Suppose h(k1) = h(k2).
Then ak1 + b ak2 + b (mod p)
So ak1 ak2 (mod p), and since a 0 we can multiply
both sides by a-1 to obtain k1 k2 (mod p), contradicting
the fact that k1 k2.

So k1 and k2 map to two different cells r s in our


hash table
11

Proving that h(k) = [(ak + b) mod p] mod m


Is Universal
Still considering just h(k) = (ak + b) mod p.
Suppose h(k1) = r and h(k2) = s for two hash table
cells r s. This gives us a linear system:
ak1 + b r (mod p)
ak2 + b s (mod p)

which has a unique solution for (a, b).


Thus, exactly one pair (a, b) causes k1 and k2 to
hash to r and s respectively.
So there is a 1-to-1 mapping between pairs (a, b) and
pairs (r, s).
Since we choose (a, b) at random, every pair (r, s) of
destination cells (with r s) is equally likely.
12

Proving that h(k) = [(ak + b) mod p] mod m


Is Universal
We now return to h(k) = [(ak + b) mod p] mod m.
To prove h is universal, we must show for any two
keys k1 k2 that Pr[h(k1) = h(k2)] 1 / m. That is,
Pr[ [(ak1 + b) mod p] mod m = [(ak2 + b) mod p] mod m ]
1/m
Equally likely to be any pair (r, s)
with r s and r, s {0, 1, , p 1}

Equivalent simpler problem: Let r and s be chosen


uniformly from {0, 1, , p 1} such that r s.
Show that Pr[r mod m = s mod m] 1 / m.
13

Proving that h(k) = [(ak + b) mod p] mod m


Is Universal
Let r, s be chosen uniformly from {0, 1, , p 1}
such that r s.
Show: Pr[r mod m = s mod m] 1 / m.
Equivalently, fix any r in {0, 1, , p 1}, and show
that at most a 1 / m fraction of choices for s r
collide with r when taken mod m:
At most p / m 1 choices for s will share the same
remainder (mod m) with r.
Since p / m (p + m 1) / m = 1 + (p 1) / m,
p / m 1 (p 1) / m.
And there are (p 1) choices for s, so picking one of
these at random gives us a 1 / m chance of collision!
14

Lecture 24. Perfect Hashing

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Review : Universal Hashing


A class of hash functions is universal if the
probability (over a random choice of hash
function from the class) of two different keys
colliding is at most 1/m.
I.e., Pr[h(k1) = h(k2) | k1 k2] 1/m
Note that O(1/m) would also work fine, rather than 1/m.

Running time bounds:


Insert: O(1) amortized
Delete: O(1) amortized
Find: O(1) in expectation

Space: m = (n).
2

Static Dictionaries and Perfect Hashing


The static dictionary problem: given n elements (each
with a numeric key), build a data structure that supports
find as efficiently as possible.
[Hagerup, Bro Miltersen, Pagh 01]: Deterministic static
dictionary construction algorithm with:
O(n log n) worst-case construction time.
O(n) space.
O(1) worst-case time for find.

Today well see how to use randomization to build a static


perfect hash table with:
(n) expected construction time.
(n) space.
O(1) worst-case time for find.
3

A First Step : Perfect Hashing with


Quadratic Space
With universal hashing, its easy to achieve:
(n2) space,
(n) expected build time,
O(1) worst-case time for find.

Idea: Pick a universal hash function and hash all


n elements into a table of size n2. Repeat the
whole process until we see no collisions.
Claim: Pr[no collisions] .
So E[# attempts] 2, and hence E[build time] = O(n).

This claim is one example of the famous birthday


paradox

The Birthday Paradox


Suppose that each of m possible birthdays are equally likely.
Claim: There is at most a probability of that two people
share the same birthday if fewer than m people are in a room
(beyond m people, the probability grows fairly rapidly).
In hashing terms: If we use universal hashing to place n
elements in a table of size n2, Pr[no collisions] .
Why?

X: total # of pairs of elements that collide.


Xij = 1 if elements i & j collide, 0 otherwise.
Universal hashing: E[Xij] = Pr[elements i & j collide] 1/m.
Linearity of expectation: E[X] = E[Xij] (n2)(1/m) n2/m = .

So we know the expected number of collisions is at most ,


but how do we then show that the probability of a collision is
at most ?
5

Tail Bounds
We are often interested in the probability that a random variable
deviates significantly from its expected value.
This leads to a collection of common tail bounds: Markovs
inequality, Chebychevs inequality, and Chernoff / Hoeffding bounds.
Today (and in this class) well focus on Markovs inequality.
You are encouraged to see the book for more information on the other
types of bounds; Chernoff bounds in particular are quite powerful and
useful, and they are the main tool we use to prove our highly-useful
randomized reduction lemma.

Probability distribution of X:

X 75

E[X] = 50

Markovs Inequality
X is any nonnegative random variable, then
Pr[X kE[X]] 1/k.
Sometimes written as Pr[X a] E[X] / a.
Example: let X be the number of heads we see when
flipping 100 coins.
E[X] = 50.
Pr[X 75] = Pr[X (3/2)E[X]] 2/3.

Due to its generality, Markovs inequality is a rather weak


bound, although its still quite useful.
If expected running time = T, then probability our
algorithm takes kT time is at most 1/k.
In the case of the birthday paradox:
Let X denote the # of collisions we obtain when hashing n
elements in to a table of size n2. Recall that E[X] .
So Pr[ 1 collision] = Pr[X 1] = Pr[X 2E[X]] .
7

A Two-Level Approach

A Two-Level Approach
Universally hash n keys into a size-n table in O(n)
time, resolving collisions (we expect a few of
them) using chaining.
Now go through each cell j = 1..n in our hash
table and for each cell (say, containing a chain of
Bj colliding elements) we build a 2nd-level
collision-free hash table of size Bj2.
This takes O(j Bj2) expected time and space.
We will show that as a consequence of universal
hashing at top level, E[j Bj2] 2n = O(n).
So we have O(n) expected construction time and
space.
Question: How can we obtain O(n) worst-case space
(still with O(n) expected construction time?)

O(n) Space in the Worst Case


Let S = j Bj2 be a random variable that denotes
the amount of space used by second-level tables.
Well shortly prove that E[S] 2n.
Note that we can evaluate S after hashing at the
top level (but before we commit to building the
second-level tables).
If we notice that S > 4n, then give up and restart.
Markov: Pr[S > 4n] Pr[S > 2E[S]] .
So E[# trials until success] 2, each of which
takes O(n) time (followed by construction of 2nd
level tables taking O(n + S) expected time).
Therefore, E[Total construction time] = O(n)

10

E[j Bj2] 2n
B B ( B j 1) 1
= ( B 2j B j )
Note : j = j
2
2
2
n
n

B
E B 2j = E 2 j + B j
j =1 2
j =1


n B
Since j Bj = n.
= n + 2 E j
j =1 2
= n + 2 E [Total # Colliding Pairs ]

= n + 2 E X ij
i< j

= n + 2 E [ X ij ]
i< j

Xij = 1 if elements i and j collide, 0 otherwise.


Since m = n at top level.

n
= n + 2 (1 / m ) = n + n 2 / m = 2 n .
2
11

Dynamic Perfect Hashing


[Dietzfelbinger et al. 94]: Two-level hashing we
just described + lots of amortized table expansion
and contraction (in top level and 2nd level tables)
O(1) worst-case time for find.
O(1) expected amortized time for insert and delete.

[Dietzfelbinger and Meyer Auf Der Heide 90]:


O(1) worst-case time for find.
O(1) amortized time for insert and delete w.h.p.

Cuckoo Hashing [Pagh and Rodler 04]:


O(1) worst-case time for find.
O(1) expected amortized time for insert and delete.
Much simpler to implement in practice
12

Cuckoo Hashing
Two very similar variants with essentially the
same performance:
Variant #1:
Use two hash tables T1 and T2, each of length (1+)n.
Use two hash functions h1 and h2.
Each key k is stored either in T1[h1(k)] or T2[h2(k)].

Variant #2:
Use a single hash table T of length (2+)n.
Use two hash functions h1 and h2.
Each key k is stored either in T[h1(k)] or T[h2(k)].

Since each key resides in only 2 possible


locations, find takes O(1) time in the worst case.
13

Cuckoo Hashing : Picture


Two-table variant:

One-table variant:

h1(k)
h1(k)

key k

key k
h2(k)

T1

h2(k)

T2
T
14

Cuckoo Hashing : Insertion


Assume two-table variant (one-table is similar):
To insert a new key k,
If T1[h1(k)] is empty, place k there.
Otherwise, if T2[h2(k)] is empty, place k there.
Otherwise, evict the current occupant of T1[h1(k)] and
replace it with k there. We then try to insert the
displaced key in its alternate location in T2 (possibly
displacing a 3rd key, and so on).
If our chain of displacements takes more than L steps
(for some predetermined threshold L = (log n)), then
give up, pick new hash functions, and try to re-build
our tables from scratch.
Also do the usual amortized rebuilding whenever our
tables become too densely occupied.

15

Cuckoo Hashing : Results


Cuckoo hashing is simple to implement and
seems to perform well in practice.
Its analysis is somewhat complicated,
unfortunately, and technically requires that we
use (O(1), O(log n))-universal hash functions
A hash function is (c, k)-universal if the probability of
any k different keys hashing to any specified set of k
table cells is at most c / mk.
The universal hash functions weve discussed so far
are (O(1), 2)-universal.
Its much harder (albeit still possible) to construct an
(O(1), O(log n))-universal hash function that can be
evaluated in constant time.
16

Lecture 25. Radix Trees and


Y-Fast Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Limitations of Hash Tables


In the comparison-based world, balanced BSTs
support not only the standard dictionary
operations insert, delete, and find, but also a host
of other useful operations.
In the RAM model, hash tables give us a faster
way to implement only insert, delete, and find.
Wouldnt it be nice if we could develop a fast
RAM data structure that also supports
pred(k) and succ(k) (for inexact search)
min and max
(to implement priority queues)
pred(e) and succ(e) (to enumerate nearby elements)
2

The (Binary) Radix Tree

Height = log2 C

Suppose our keys live in the range 0 C 1,


where C is a power of 2.
A natural way to store such keys in the RAM
model is in the leaves of a radix tree:

Keys (3-bit integers


in this case) stored
in leaves
Most-significant bit = 0

Most-significant bit = 1

The (Binary) Radix Tree


The path from the root to a key k corresponds to
the binary representation of k.
So height = log2 C.

Easy to implement insert/delete, pred/succ(k),


pred/succ(e), and min/max all in O(log C) time.
Assuming n C (true if no duplicate keys), we have
log n log C, so not really faster than a balanced BST.
By also storing elements in an auxiliary universal hash
table, however, we can find in O(1) expected time.

Space required: O(n log C).


Can we reduce this to O(n)?
4

Reducing Space by Contracting Paths


Compress long paths by removing non-branching
nodes, assigning multi-bit edge labels accordingly:
0
0

11

000

001

011

110

111

000

001

011

11
0

110

111

The resulting binary tree has n leaves, and every


internal node is a branching node.
Easy to show by induction that any such tree must
have n 1 internal nodes, so O(n) total space!
And its still easy to perform all of our usual operations
(e.g., insert/delete) in O(log C) time.

Reducing Space by Indirection


Idea: radix tree as a top level data structure that
bottoms out on blocks of elements, rather than
individual elements:
Each block stored at the
leaf node in the radix tree
corresponding to the min
key value in the block.

Blocks stored as arrays


or linked lists each containing
log C 2log C elements.

To access key k, search the radix tree


for pred(k) to find ks block, then scan
this block to search for k.

Reducing Space by Indirection


Total space usage now O(n), since less than
n / log C records stored in the radix tree.
Block management reminiscent of a B-tree:
If insertions cause a block to contain > 2 log C keys,
split it (thereby inserting a new record in the radix tree).
If deletions cause a block to contain < log C elements,
merge it with a neighboring block (deleting a record
from the radix tree in the process).

Appropriately implemented,
all operations still require
only O(log C) time.
7

Radix Trees : Further Thoughts


Often useful to store augmented information at
internal nodes.
Example: For an router, we might want to store a record
that corresponds to the IP block 1.2.3.* (all addresses
starting with first 24 bits = 1.2.3, last 8 bits arbitrary).

Can easily construct a B-ary radix tree:

Each node has B children labeled 0 B 1.


Node-to-leaf path corresponds to key written in base B.
Height = logB n.
Operations take O(B logB n) time (no better than with
the binary radix tree).

The radix tree is the basis for a popular type of


monotone priority queue called a radix heap.

Speeding Things Up : The Y-Fast Tree


As weve mentioned, the O(log C) performance of a radix
tree generally doesnt improve the O(log n) bounds we get
with a balanced BST.
But using a fancy radix tree called a Y-fast tree, we can
improve O(log C) to O(log log C).
First steps:
Start with a plain radix tree (no path compression / indirection yet).
Store the label of all nodes (even internal nodes) in a hash table.
(that is, the binary key value or prefix represented by a node)
Augment each element e with pointers to pred(e) and succ(e).
Augment each internal node in the tree with a pointer to the
minimum and maximum leaves in its subtree.
9

The Y-Fast Tree


Easy to compute succ(k) in O(1) time if we can first
compute the lowest present ancestor of k:

And we can compute LPA(k) in only O(log log C) time


using binary search over node labels!

10

The Y-Fast Tree


We can now implement pred(k) and succ(k) in
O(log log C) expected time expected.
Note that pred(e) and succ(e) take O(1) time,
since each element e maintains a pointer to
pred(e) and succ(e).
And note that we can update these pointers in O(1) time
after an insert or delete.

Problems yet to resolve:


Insert and delete still take O(log C) time.
Space required is O(n log C).
Can we fix both of these problems together?
11

Indirection to the Rescue


Use indirection with the Y-fast tree to reduce space
requirement back down to O(n).
Instead of storing blocks with arrays or linked lists, encode
the sequence within each block in a balanced BST.
Since blocks contain log C 2log C elements each, our BST
operations run in O(log log C) time!
When insert and delete cause blocks to be split or joined, we can
do this in O(log log C) time by splitting and joining BSTs.

Amortization: Inserts and delete in the top level radix tree


still occur (and these take O(log C) time), but they only
happen roughly every O(log C) calls to insert or delete.
So insert and delete now run in O(log log C) amortized time.
Think of the low-level blocks as functioning to slow down the rate
of inserts and deletes in the top-level radix tree.

12

Indirection with BSTs

Top-level radix tree


(operations here take
O(log C) time)

Low-level BSTs, each containing log C 2log C elements


(operations on these take O(log log C) time).
13

Lecture 26. Stratified Trees and the


van Emde Boas Structure

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Recap : Y-Fast Trees


Enhancement of a radix tree.
O(log log C) time:
insert / delete
pred(k) / succ(k)

(amortized)
(expected)

O(1) time:
pred(e) / succ(e)
min / max
find

(due to extra augmented data)


(due to extra augmented data)
(expected, auxiliary hash table).

Some Applications:
Fast versatile RAM dictionary.
Priority queue with ops running in O(log log C) time.
RAM sorting algorithm with O(n log log C) runtime.

The van Emde Boas (vEB) Structure


Named after Peter van Emde Boas.
Another means of achieving exactly the same
performance bounds as the Y-fast tree.
More or less equivalent to a Y-fast tree, although it
looks very different at first glance.

Think of each log C-bit key k = khigh klow.


khigh and klow are the high-order and low-order halfwords of k, each of length log C bits.
Example: k = 167, which in binary is 1010 0111.
Khigh = 10

Klow = 7

The van Emde Boas (vEB) Structure


Maintain the min and max keys being stored in
our structure.
If min = max, then only 1 element being stored,
so we dont bother storing anything else.
Otherwise,
Maintain a hash table T containing the khigh values of
all keys.
Maintain a recursive vEB structure, H, on the khigh
values of all keys.
Associated with each entry khigh in T, we store a lowlevel recursive vEB structure L[khigh] containing all the
keys with khigh as their high half-word. L[khigh] is keyed
on klow.

Example
Suppose we store
these 8-bit keys:
0011 0001
0011 0110
0110 0001
0110 0101
0110 1001
1000 0000
1000 0110
1000 1111

Min = 0011 0001


Max = 1000 1111

T: hash table containing:


{0011, 0110, 1000}

H: recursive vEB
structure containing the
half-size keys
{0011, 0110, 1000}

L[0011]: recursive
vEB structure containing
the half-size keys
{0001, 0110}
L[0110]: recursive
vEB structure containing
the half-size keys
{0001, 0101, 1001}

L[1000]: recursive
vEB structure containing
the half-size keys
{0000, 0110, 1111}
5

Implementing the succ(k) Operation


(pred(k) is Symmetric)
succ(k):
If k > max, then k has no successor.
If khigh is in T,
If klow < L[khigh].max, then return L[khigh].succ(klow).
If klow L[khigh].max, then return L[H.succ(khigh)].min.
Otherwise, return L[H.succ(khigh)].min.

Note that in every case, we make at most 1


recursive call to a vEB structure using half-sized
keys, so running time is at most O(log log C).
(in expectation, due to the hash lookup)
6

Stratified Trees : Another Way to


Recursively Decompose a Tree
Split a tree in half height-wise:

Then recursively continue to decompose each


half-height tree the same way
7

vEB Tree Picture of a Radix Tree


H:
Hashtable T
contaning
all khigh
values:

L[00]

L[01]

L[11]
L[11].min

L[11].max
8

Lecture 27. String Matching

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Introduction to String Matching


Input:
Text T[1..n]
(usually quite long).
Pattern P[1..m] (usually much shorter; assume m n).

Goal: Find occurrences of the pattern in the text:


6

14 16

24

T = ACGTGATACTGCTATATACATCGATATGCTCA
P = ATA

String matching is a classical algorithm problem


with many applications in practice. For example,
Searching the human genome (n 3 billion).
Searching large documents (e.g., word processing) or
entire document collections (e.g., the web).
2

Two Common Approaches


(Today) Solve a single problem instance once:
Brute Force approach : Compare the pattern at
each possible offset in the text. O(mn) time.
Karp-Rabin : Hash pattern in O(m) time, then compare
this hash against the hash of a sliding window through
the text. O(n) time, and finds all matches w.h.p.
Morris-Pratt / Knuth-Morris-Pratt : Preprocess
pattern in O(m) time, then match against text in only
O(n) time.

(Next lecture) Preprocess the text, building a data


structure so that we can search for any pattern
quickly (in time proportional to pattern length).
Suffix arrays, suffix trees.
3

Karp-Rabin String Matching


Suppose the characters in T and P are integers in
the range {0, , C 1}.
Typical assumption: characters in T and P are small
integers (small enough to sort).
If not, we can always use a balanced BST as a
preprocessing step to reduce the alphabet used by T
and P to {0, , m + n 1} in O(n log n) time.

Regard P as an m-digit number written in base C.


Regard each length-m substring of T as a
number written in base C.
We have a match if these two numbers are equal.
4

Karp-Rabin String Matching


Problem: It takes O(m) time to compare two mdigit numbers (they are much too large to fit into a
single word).
Obvious solution: hashing!
Hash each m-digit number using the hash function
h(k) = k mod p, where p is a randomly-chosen prime.
We use this particular hash function because it is easy
to update in O(1) time per offset when we slide a
length-m window through the text:
hi+1 = (Chi CmT[i] + T[i+m]) mod p,
where hi denotes the hash of T[i i + m 1].

Running time: O(m) to hash P, O(n) to hash all


substrings of T, so O(m + n) = O(n) total.

False Positives
If P matches a particular substring of the text, we
always detect this correctly.
However, we may erroneously conclude that
there is a match where none exists, if there is a
hashing collision.
However, we can show that collisions occur very
infrequently if we choose p large enough, then
the algorithm will be correct w.h.p.
From how large a range do we need to choose
our random prime p?...
6

False Positives : Analysis


To bound the probability of a false positive, let x
and y be two different m-digit numbers written in
base C (so 0 x, y < Cm).
If p is chosen at random, we want to compute
Pr[x mod p = y mod p] = Pr[x y 0 (mod p)].
|x y| is an integer in the range [1, Cm].
We only fail of we happen to pick a prime p that is
a factor of |x y|.
So how many bad choices for p are there?
(i.e., at most how many prime factors can a
number in the range [1, Cm] have?)

False Positives : Analysis


Q: How many factors can a number in the range
[1, Cm] have?
A: At most log2 Cm = m log2 C of them!
(If there were more, then their product would be
larger than 2m log C = Cm).
Since m log C bad choices for p, lets choose p
from among nk+1m log C different primes.
So Pr[false positive at 1 index] 1 / nk+1.
By union bound, Pr[any false positive] 1 / nk.
Since we can choose k to be any constant we like, this
gives us a w.h.p. bound of success.
8

Choosing a Random Prime


We need to choose p from among nk+1m log C
different primes. How?
Simple approach: guess and test
Choose p randomly from [1, X].
Test if p is prime. If not, repeat.
(note that there are efficient ways to test for primality;
one can even test for primality in polynomial time)

Prime number density theorem: as X grows larger


and larger, the fraction of numbers in [1, X] that
are prime tends to [1 / ln X]. (hard to prove!)
How large should we choose X?
What is the expected number of iterations of the
algorithm above for choosing p?

Hashing Prefixes
Recall that to answer range-sum queries in O(1)
time in a sequence, we start by precomputing
prefix sums.
The same idea is useful with string matching!
As a preprocessing step, hash all prefixes of T
(using the same hash function: h(k) = k mod p).
O(n) total time, since we can update the hash of a
prefix in O(1) time when extending it by one character:
hj+1 = Chj + T[j], where hj is the hash of T[1j].
In O(n) time, also precompute and store Ck and (C1)k
(mod p) for k = 1, , n.
Now we can hash of any substring T[ij] in O(1) time!
hi, j = [(C1 )i 1(hj Ci 1hi 1)] mod p.

10

Morris-Pratt String Matching


A fancier version of the brute force approach.
When a mismatch occurs at P[j], advance pattern
by the length of the longest border of P[1j 1]:

11

Morris-Pratt String Matching :


Precomputation
In O(m) time, find the length of the longest border
of every prefix P[1j] of the pattern.
This is accomplished by matching P with itself!

12

Morris-Pratt String Matching : Summary


Total running time: O(n)
Deterministic, and always correct.
Knuth-Morris-Pratt: Common variant of this same
approach, slightly optimized.
Note that for each index i at which T[i] does not
match P, we still learn the length of the longest
prefix that matches (useful for approximate
matching, as well see on the next homework).
Can also interpret Morris-Pratt (or Knuth-MorrisPratt) in a more data structure context as
building a pattern matching automaton.

13

A Pattern-Matching Automaton

Following a solid
edge consumes 1 character
from the text. A dotted edge
does not consume a character.

Pattern P: XYXYZX.
Example text T: XYXYQXYXYZXM.

14

Lecture 28. Suffix Arrays and


Suffix Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Motivation
The Morris-Pratt / Knuth-Morris-Pratt algorithm allows us
to preprocess a pattern so it can subsequently be
matched against any text in linear time.
Perhaps a more common problem, however, is that we
want to preprocess a large text so that we can then
search quickly for many different patterns.
This leads us to suffix arrays and suffix trees:
Both can be built in O(n) time & space from a text T[1n].
(Provided the T[1] T[n] are sufficiently small integers that we
can sort them in linear time. Since radix sort takes in O(n logn C)
time, we can handle T[i] nO(1), which is reasonably large).
A suffix array can count the # of matches of a pattern P[1..m] in
only O(m + log n) time, and a suffix tree requires only O(m) time!
We can then step through the indices at which these matches
occur in O(1) time per index.
2

The Suffix Array


An array of starting indices for the n sorted
suffixes of a text T[1n].
For example, take T[17] = BANANA$:
(we use a dummy end-of-string character, $, for convenience)
Suffix

Starting index

A$

ANA$

ANANA$

BANANA$

NA$

NANA$

To use only O(n) space,


We maintain only this array
[7, 6, 4, 2, 1, 5, 3], since the
suffixes themselves can be
obtained by indexing into T.
3

Searching a Suffix Array


A pattern P[1m] will match a contiguous
range of entries in the suffix array.
Use binary search in O(m log n) time to
locate the endpoints
Suffix
Starting index
of this range.
$
7
Then O(1) time per
A$
6
index to enumerate
ANA$
4
all matches.
ANANA$
2
1
Example: the pattern BANANA$
NA$
5
P = ANA matches
NANA$
3
at indices 2 and 4.

Longest Common Prefixes


Often augment a suffix array with an array
of longest common prefix (LCP) lengths:
Suffix

Starting index

Longest Common
Prefix (LCP) Length

A$

ANA$

ANANA$

BANANA$

NA$

NANA$

This reduces search time to O(m + log n)

Suffix Arrays : Construction


A nave approach to building a suffix array takes
O(n2 log n) time:
Use any O(n log n) sorting algorithm.
O(n) time for each comparison between suffixes made
by the algorithm.

As well see in a few days, it is possible to build a


suffix array in only O(n) time!
(even a suffix array augmented with LCPs)
After building a suffix array, we can transform it
into a more powerful suffix tree in O(n) time
(details to follow in the next lecture).

Suffix Trees
Rooted tree (sometimes called a trie) where each
root-to-leaf path corresponds to a suffix T[i]:
$
a

Children typically stored in sorted order

n
a

a
$

a
$

n
n

a
a

Each leaf labeled with


the starting index i of
its corresponding suffix
T[i]. Each suffix ends
at a leaf, thanks to our
use of the $ marker.

n
a
$

Potential problem: this structure could


take (n2) space in the worst case

Suffix Trees (Compressed)


In Only O(n) Space
Remove non-branching nodes.
Store only start : end indices on each edge.
$
$

a
a

n
n

a
a

1:7

a
$

n
a
$

a
$

na

banana

na$

na

na$

3
T[1..7] = banana$
1234567
8

Searching a Compressed Suffix Tree

banana

Since there are n leaves (each corresponding to


a suffix of the text) and each internal node is
branching, total space = O(n).
1:7
$
We can now search for
a
na
any pattern P[1m] in 7
$
na
$
na$
only O(m) time by walking
6 $
na$
5
down the tree.
4
3
$

[To make each step O(1), we often


assume that the alphabet of T has
constant size (e.g., with DNA).
Otherwise, we would need to use
a hash table to store the children
of each node.]

T[1..7] = banana$
1234567
9

Searching a Compressed Suffix Tree

a
ban

Once weve searched down the tree following a


pattern P, we arrive at a subtree whose leaves
tell us the indices of all matches.
Example: P = an leads
1:7
us to this subtree, whose
$
a
na
leaves indicate matches 7
$
na
at indices 2 and 4.
$
na$

na$

na$

Since no branching nodes,


4
3
2
a subtree with k leaves has
1
total size O(k), so we can
enumerate all k matches T[1..7] = banana$
by traversing the matching
1234567
subtree in O(k) total time.

10

Letter Depths
Given any suffix tree, we can traverse it in O(n)
time and assign letter depths to its nodes.
1:7

na
$

3
na$

na

E.g., letter-depth( 4 ) = (n+1) 4 = 3.

1
$

banana

Letter depth of a leaf


easy to derive from its
corresponding index.

5
1

2
na$

T[1..7] = banana$
1234567

Many suffix tree problems


become easier once we compute
letter depths as a preprocessing step.

11

Suffix Tree Applications


Suffix trees can be used to solve a wide range of
text searching problems in O(n) time:
Find the longest recurring substring within T, or
between two texts T1 and T2. Are there any substrings
in T (or in common between T1 and T2) that occurs at
least k times?
Find the longest palindrome.
What is the shortest substring that does not occur in
our text?
Which substring of length L occurs most frequently?

Many of these problems are quite useful in


application areas like bioinformatics.

12

Lecture 29. Building Suffix Arrays


and Suffix Trees

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Todays 3 Topics
Suffix tree suffix array in O(n) time.
Suffix array suffix tree in O(n) time.
Building a suffix array in O(n) time.
Notes:
All suffix arrays will be augmented with LCPs.
All suffix trees will be sorted (children listed in order)
Suffix tree construction in O(n) time assumes the
characters in our text T[1n] are integers that are
small enough to sort in linear time (e.g., integers of
size O(nc) where c = O(1), if we use radix sort).
2

Suffix Tree Suffix Array in O(n) Time


Traverse suffix tree and write out leaves in order.
0

3
na$

Starting
index

na
$

na

banana

1
$

Suffix
2

5
1

na$

T[1..7] = banana$
1234567

LCP
Length

A$

ANA$

ANANA$

BANANA$

NA$

NANA$

LCP lengths given by letter depths of LCAs


between successive leaves.
3

Suffix Array Suffix Tree


LCPs of 0 tell us branches emanating from root:
Suffix

Starting
LCP
index
Length

A$

ANA$

ANANA$

BANANA$

NA$

NANA$

0
1

0 0 0
Suffix
$

Index
7
Suffix

BANANA$

Suffix

A$

ANA$

ANANA$

Index
1

Index
1

Suffix

NA$

NANA$

Index
2

Suffix Array Suffix Tree


by Building a Cartesian Tree
Build (non-binary) Cartesian tree using LCPs!
Starting
LCP
Index
Length
7
6
4
2
1
5
3

0 0 0
Suffix

Index
7

Suffix

0
0

Suffix

Index

Suffix

A$

Index
1

BANANA$

Index

ANA$

ANANA$

Suffix
3

Index

NA$

NANA$

Suffix Array Suffix Tree


by Building a Cartesian Tree
Build (non-binary) Cartesian tree using LCPs!
Starting
LCP
Index
Length
7
6
4
2
1
5
3

0 0 0
Suffix

Index
7

Suffix

BANANA$

0
0

Suffix

Suffix

NANA$
Suffix
ANANA$

Index
4

Suffix
ANA$

1
2

Index

A$

Index

Index
5

Suffix
NA$

Index
3

Index
2

Suffix Array Suffix Tree


by Building a Cartesian Tree
Voila: a suffix tree! (augmented with letter depths)
Starting
LCP
Index
Length
7
6
4
2
1
5
3

0 0 0
7

0
1

1
3

0
6

3
5

2
4

2
7

Suffix Array Suffix Tree in O(n) Time


Suffix

A$

ANA$

ANANA$

BANANA$

NA$

NANA$

0
1
3
0
0

0 0 0

$ 1 na

banana$

Starting
LCP
index
Length

na$

na

na$

All we need to do is build a Cartesian tree using the LCP array.


Its not be a binary Cartesian tree (to which we are the most
accustomed), but our O(n)-time algorithm for building a Cartesian tree
from a sequence works the same in this more general setting!
Finally, edge labels can be added with a simple O(n) traversal
through the tree (using both LCPs / letter depths + the suffix array).

Building a Suffix Array in O(n) Time


We describe the very clever skew algorithm,
invented in 2003 by Krkkinen and Sanders.
Standard Assumption: characters in T[1n] are
integers small enough to sort in linear time.
By sorting the characters and replacing them with
their ranks, we can reduce the alphabet of
T[1n] down to the set {0, , n 1} in O(n) time.
Goal: Build a suffix array from T[1n] whose
characters are in {0, , n 1} in O(n) time.
9

Brief Detour : RAM Sorting Algorithms


Suppose we are sorting n integers in the range
0C 1 in the RAM model of computation.
Counting sort: O(n + C) time.
Sorts integers of magnitude C = O(n) in linear time.

Radix sort: O(n max(1, lognC)) time.


Sorts integers of magnitude C = O(nk), k = O(1), in
linear time.
Sorts constant-length strings with characters drawn
from the alphabet {0, 1, , n 1} in linear time.
10

Counting Sort
Scan A[1n] in O(n) time
and build an array N[1C]
of element counts.
By scanning N once, we then
reconstruct A in sorted order in
O(n + C) time.
Ideally suited for C = O(n).
With care, can be made stable
(equal elements remain in the
same order).

A[1..n]:

A[1..n]:

3
4

1
N[1..C]:

N[1]: 2

N[2]: 2

N[3]: 4

N[4]: 1

4
11

Radix Sort
Write elements of A[1n]
in some base (radix), r.
(usually we set r = n)
Sort on each digit, starting
with the least significant,
using a stable sort
(usually counting sort)
# digits = logn C, which is
constant if C = nO(1).
Runtime O(n) if C = nO(1) .

A[1...n]:
446
712
309
442
435
120
638
715
437
892
115
509
420
146
595

120
420
712
442
892
435
715
115
595
446
146
437
638
309
509

309
509
712
715
115
120
420
435
437
638
442
446
146
892
595

115
120
146
309
420
435
437
442
446
509
595
638
712
715
892
12

The Skew Algorithm : 3 Simple Steps


0

10

11

T[0..11] = m2 i1 s4 s4 i1 s4 s4 i1 p3 p3 i1 $0
Step 1
Suffix

Starting
index

i$

Step 2
Suffix

Starting
index

10

ippi$

issippi$

ppi$

ississippi$

ssippi$

mississippi$

ssissippi$

pi$

sippi$

sissippi$

Complete
Suffix
Array

11

Step 3
(Merge)

Indices 2 (mod 3)

Indices 0, 1 (mod 3)

13

Step 1 : Building a Suffix Array


from Indices 0, 1 (mod 3)
0

T[0..n 1]:

10

11

m2i1s4s4i1s4s4i1p3p3i1$0

mis, sis, sip, pi$

m2i1s4s4i1s4s4i1p3p3i1$0

iss, iss, ipp, i$_

In O(n) time, radix sort these 2n/3 blocks


and replace each one with its index.
String the results together to form T of
length 2n/3 over alphabet {0, , 2n/3 1}:
0

T: mis3sis6sip5pi$4iss2iss2ipp1i$_0
Recursively build a suffix array from T,
giving entries at indices 0, 1 (mod 3)
in the suffix array for T.

Suffix

Index
T (T)

i$_

7 (10)

ippi$_

6 (7)

issippi$_

5 (4)

ississippi$_

4 (1)

mississippi$...

0 (0)

pi$...

3 (9)

sippi$...

2 (6)

sissippi$...

1 (3)
14

Step 2 : Computing Indices 2 (mod 3)


Using Indices 0, 1 (mod 3)
0

10

11

T[0..11] = m2 i1 s4 s4 i1 s4 s4 i1 p3 p3 i1 $0
We can now use the partial suffix
array of suffixes 0, 1 to reduce
each suffix 2 to an ordered pair.
Examples:

Suffix

Starting
index

i$

ssissippi$ (s, sissippi$) (4, 8)


ssippi$ (s, sippi$) (4, 7)
ppi$ (p, pi$) (3, 6)
s = 4
$ ($, -) (0, 0)
index = 6

Using these ordered pairs as keys,


we can radix sort the suffixes 2 in
O(n) time.

10

ippi$

issippi$

ississippi$

mississippi$

pi$

sippi$

sissippi$

Indices 0, 1 (mod 3)

15

Step 3 : Merging Indices 2 (mod 3)


With Indices 0, 1 (mod 3)
10

ippi$

issippi$

ississippi$

mississippi$

pi$

sippi$

sissippi$

Suffix

To compare two suffixes s1 (in first


table) and s2 (in second table), use
the same trick as before!...

Index

Indices 0, 1

However, to achieve O(n) time, we


must be able to compare two suffixes
in different tables in O(1) time.
(suffixes in same table are already
sorted, so comparing them in O(1)
time is easy using table indices).

Suffix
i$

Index
11

ppi$

ssippi$

ssissippi$

Indices 2

We use the standard merging


algorithm (e.g., from merge sort).

16

Comparing Across Tables : Examples

E.g.: s1: sissippi$ vs. s2: ssissippi$


1

(s, issippi$)
(4, 3)

(s, sissippi$)
(4, 8)

E.g.: s1: issippi$ vs.


0

(i, s, sippi$)
(1, 4, 7)

s2: ssissippi$
1

(s, s, issippi$)
(4, 4, 3)

10

2 ippi$

3 issippi$

4 ississippi$

5 mississippi$

6 pi$

7 sippi$

8 sissippi$

Suffix

Case 2. s1 starts at index 1.

Index
Indices 0, 1

Case 1. s1 starts at index 0.

Suffix
1 i$

Index

1 $

11

2 ppi$

3 ssippi$

4 ssissippi$

Indices 2

Comparing suffix s1 (first table)


versus suffix s2 (second table) in
T = m2i1s4s4i1s4s4i1p3p3i1$0.

17

Comparing Across Tables

If we had initially decomposed into


even and odd suffixes (mod 2), then
this trick doesnt work
Decomposition into even & odd
suffixes leads to a much more
complicated (albeit still linear time)
merging algorithm, which was the
best known approach prior to the
skew algorithm.

Index
10

2 ippi$

3 issippi$

4 ississippi$

5 mississippi$

6 pi$

7 sippi$

8 sissippi$

Suffix
1 $

Indices 0, 1

This is why we use suffixes mod 3.

Suffix
1 i$

Index
11

2 ppi$

3 ssippi$

4 ssissippi$

Indices 2

s1 and s2 can be compared in O(1)


time by mapping each to a pair or
triple of small integers.

18

Running Time Analysis


Let T(n) denote the running time required to build
a suffix array from a length-n text.
We can write T(n) as a recurrence (a recursivelydefined function):
T(n) = T(2n / 3) + (n),
(with T(1) = O(1) as a base case)

Expanding T(n), we have


T(n) = T(2n / 3) + (n)
= T(4n / 9) + (2n / 3) + (n)
= T[8n / 27] + (4n / 9) + (2n / 3) + (n)

= (n). (linear total time!)


19

10

Lecture 30. External-Memory and


Cache-Oblivious Data Structures

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Hierarchical Memory Layout


In the RAM model, we assume every memory
access takes O(1) time.
This isnt particularly realistic.
Most memories have a hierarchical structure:

Secondary Cache
Main Memory

Faster

Smaller

Primary Cache

Disk
Network
2

A Simple 2-Level External Memory Model


Two layers of memory:
Fast cache of size M having M / B blocks of size B.
Slower main memory.

If a memory access reads or writes an element


not in the cache, we transfer the entire block (of
size B) containing the element into the cache.
For simplicity, assume cache is fully associative, so
each memory block can reside anywhere in cache.

Running time = total # of block transfers.


Neglect running time of all other operations (e.g.,
addition, multiplication, comparison), since these
happen in CPU registers and typically run much faster
than memory transfers).

Block Replacement Policies


Cache contains M / B blocks each of size B.
When a new block transferred into the cache, we must
evict some existing block.
For simplicity, we assume an optimal (or ideal) page
replacement policy:
Among existing blocks in cache, evict the one that will be used
farthest ahead in the future.
This is highly unrealistic, but if an algorithm has running time
T(M, B) = O(T(M / 2, B)) on an ideal cache (i.e., if its performance
slows down only by a constant factor when the cache size is
halved), then its running time will be (T(M, B)) using either:
LRU (least recently used), or
FIFO (first-in-first-out)

page replacement (both common policies in practice).


4

Cache-Aware Algorithms
If an algorithm or data structure knows M and B,
it can optimize its performance accordingly.
This is called a cache-aware algorithm.
Example: Searching a sorted array.
Binary search runs in O(log n / B) = O(log n log B)
time. This is not optimal.
Using a B-tree (setting B = block size), we obtain
O(logB n), which is optimal in the comparison model.

However, we often dont know M and B, and in a


multi-level memory system these parameters
may vary substantially from level to level
5

Cache-Oblivious Algorithms
An algorithm is said to be cache-oblivious if it
doesnt know M or B, and yet its running time is
always within a constant factor of an optimal
cache-aware algorithm.
Example: Reversing a length-n array.
Algorithm: scan and swap from both ends inward.
This uses n / B block transfers, which is optimal.

Note: On a multi-level memory with O(1) levels,


since the running time of a cache-oblivious
algorithm is within a O(1) of optimal between
each successive pair of levels, it will still be
within O(1) of optimal overall!
6

Cache-Oblivious Algorithms
For many problems, the best solution on a
standard RAM is not cache-oblivious
Examples:
Searching a sorted array with binary search. Runs
in O(log n log B) time versus optimal O(logB n).
Sorting. Quicksort / merge sort run in time O((n/B)
log (n/B)) time, versus optimal O((n/B) logM/B (n/B)).

For many algorithm and data structure problems,


it is very interesting to ask whether or not we can
develop cache-oblivious solutions!
E.g., priority queues, dynamic search trees, etc.
7

Cache-Oblivious Searching
Consider the problem of searching a sorted
array in the comparison model.
Optimal cache-aware running time: O(logB n)
using a B-tree.
Binary search runs in O(log (n/B)) = O(log n
log B) time, so it is not cache-oblivious!
Can we store a sorted array in memory so that
searching can be done in a cache-oblivious
fashion, using only O(logB n) block transfers?
(remember, we dont know B)
8

Cache-Oblivious Searching
with the vEB Tree Layout

Height
log n

Height
log n

Build a complete BST on top of our array A[1n].


Recursively decompose according to vEB layout:
T
(n leaves)
L1
(n leaves)

L2
(n leaves)

Ln
(n leaves)

A[1n]:

In memory, store T followed by L1 Ln (each


recursively subdivided in the same fashion).

Cache-Oblivious Searching
with the vEB Tree Layout

Height
log n

Height
log n

Recursion effectively stops once we reach


subtrees with B leaves, since each of these fits
into O(1) blocks.
Such subtrees have height (log B).
So a root-leaf path passes through only
(log n / log B) = (logB n) of them.
(therefore we
hit (logB n)
blocks)

A[1n]:
10

Lecture 31. Data Structures


and Computational Geometry

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Introduction
Many data structure problems are motivated by
applications in computational geometry.
In the next 3 lectures, we will investigate several
elegant geometric data structures for a wide
variety of different problems.
Standard computational geometry caveats:
Watch out for special cases!
Euclidean distances in 2+ dimensions can be irrational,
so we must always be careful about potential round-off
errors (in theory, we often use the real RAM model for
computational geometry algorithms for simplicity).
2

A Common Technique : Plane Sweep


(a.k.a. Sweep Line) Algorithms
Input: n line segments in
the 2d plane (assume
axially-aligned for simplicity).
Goal: Count total # of
intersections.
Obvious algorithm (check all pairs of segments
for intersection) runs in O(n2) time.
Using the plane sweep technique, well be able
to count intersections in O(n log n) time and write
down all k intersections in O(k + n log n) time.
3

Counting Intersections with a


Sweep Line
Initially sort all endpoints
by x coordinate.
Sweep a vertical line
7
from left to right, stopping
at each endpoint.
Maintain a balanced BST, T,
with set of y coordinates
for horizontal segments currently
intersecting the sweep line.
When we encounter the left/right
endpoint of a horizontal segment,
perform an insert/delete in T.
When we encounter a vertical
segment, do a range query in T.

5
3
1

T: 5
7

1
3

Counting Intersections with a


Sweep Line : Running Time
Initial sort: O(n log n).
At most n inserts and n
deletions in T (one for each
horizontal segment).
At most n range queries.
Total time: O(n log n).

7
5
3
1

T: 5
7

1
3

More Difficult Example : Area of


Union of Rectangles
Input: n rectangles in the plane.
Goal: compute area of their union.
Initially sort all 2n interesting
x-coordinates and sweep a line
from left to right, stopping at each one.
Using a balanced BST, T, maintain
cross-sectional configuration
currently intersected by sweep line.
Must be able to update T in O(log n) time when we reach the
leading or trailing vertical edge of a rectangle.
Must be able to query T in O(log n) time to obtain the total
current cross-sectional area intersected by sweep line.
What kind of augmented BST should we use for this task?
6

The Segment Tree

Overlap count: 0

2 1

A segment tree encodes, in its left-to-right (inorder)


ordering, the set of O(n) interesting intervals on the
number line defined by a collection of n line segments.

The Segment Tree : Operations


Each node corresponds to an interval.
Inserting a segment:
Splits two intervals, and so requires two insertions.
Requires a range update to increment the overlap counts of all
intervals in the extent of the segment.

Deleting a segment:
Merges two pairs of intervals, and so requires two deletions.
Requires a range update to decrement overlap counts.

Query for area of current footprint:


Augment tree so each node maintains
the footprint area of its subtree.

Query for # of segments


containing a given point
8

The Interval Tree


Another useful way to maintain a set of n static
intervals [a1, b1] [an, bn] is an interval tree.
It can answer tell me all intervals containing the
point x queries in O(k + log n) time.
M
L

xmid

Divide intervals into 3 sets:


M: crossing median x coordinate xmid.
L: right endpoint < xmid.
R: left endpoint > xmid.

The Interval Tree


Root node stores two copies of intervals in M
One copy sorted by increasing left endpoint.
Other copy sorted by increasing right endpoint.

Left subtree recursively stores L.


Right subtree recursively stores R.
Total space: O(n)
Total height: O(log n)
M
L

xmid

M
L

10

The Interval Tree : Queries


Goal: find all intervals containing a query point x.
First, find all intervals in M containing x:
If x xmid, scan left-endpoint-sorted ordering until no
more intervals found.
If x > xmid, scan right-endpoint-sorted ordering.

Next, recurse on L (if x < xmid) or R (if x > xmid).


Total query time: O(k + n log n).
M

R
L

xmid

R
11

Binary Space Partition (BSP) Trees


Input: Static scene comprised of geometric
objects in 2 or 3 dimensions.
Goal: Preprocess scene so that we can quickly
determine, from any viewing location, the correct
back-to-front order in which to render objects so
that closer objects occlude objects further away.
Drawn 1st

Viewer location
and orientation

2nd

and 3rd.

As seen by the viewer


12

Binary Space Partition (BSP) Trees


Choose any line (in 3D, a plane) through the
scene. Make this the root of the tree.
Slice objects (if necessary) so they appear on one
side or the other of root line, but not both.

Left subtree recursively built from contents of +


side of root line; right subtree recursively built
from contents of side.
+

13

The Painters Algorithm


To render a scene, compare the position of the viewer
against the line (plane) corresponding to the root.
If viewer on + side of root line:
Render left subtree, then root, then right subtree.

If viewer on side of root line:


Render right subtree, then root, then left subtree.

Remaining question: how to select the best line (plane)


for the root during tree construction?
Want to avoid slicing too many objects,
since this increases space required.
Choosing randomly usually works well.

14

Lecture 32. Multidimensional


Range Queries

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

1D Range Queries : Recap


Input: An array A[1n] of points in 1 dimension.
Common Problems:
1. Find the sum or min/max of A[ij].
2. Tell me all the points in the range [a, b].
3. Count the number of points in the range [a, b].
4. Find the sum/average/etc of the points in [a, b].
In the dynamic case, all of these require O(log n) query
time O(log n) update time using a balanced BST
Note O(k + log n) query time for #2.
Range updates also possible in O(log n) time.

For the static case, we can answer #1 in O(1) time and


the rest in O(log n) time.
What is the best static data structure for #2#4? A sorted array!
2

Multidimensional Range Queries


We can think of the records in a database as
points in a high-dimensional space:
Age

Household income

Example of a multidimensional range query:


Tell me all records with age in the range [18, 24] and household
income in the range [$50,000, $80,000].

Today, well focus on static multidimensional


range queries (usually 2D); no range updates

The Quadtree
Root node splits plane into 4 quadrants at some
point (usually (xmid, ymid)).
Keep subdividing regions until each contains at
most a single point.
1 2
A
B
Preprocessing time: O(n log n)
Space: O(n)
D
C
Height: O(log n)
Generalizes naturally
A
C
D
to d = 3 (octrees)
(xmid, ymid)
and higher
dimensions.

The Quadtree : Range Queries


To perform a range query, recursively traverse
the parts of the quadtree intersecting a query
rectangle:
A

In practice, this usually runs reasonably quickly.


In theory, however, worst-case performance is
quite bad
5

The Quadtree : Range Queries


Bad example of a quadtree query:

Query essentially traverses the entire quadtree,


but returns no points. Running time: O(n)

The kd-Tree
First split (at root) is in the x direction, next level
splits on y, and so on.
In d > 2 dimensions, we cycle through splits along
each dimension as we move down the tree.
O(n log n) preprocessing time
3
xmid
3
O(n) space
O(log n) height.
2
Worst-case query time:
d = 2: O(k + n)
In general: O(k + n1-1/d).

2
3

The kd-Tree : Analysis (2D)


Since internal nodes are all branching, worst-case query
time = O(# leaves in kd-tree visited during a query).
Leaves in kd-tree correspond to rectangular regions of the
plane, each containing a single point:
Call a leaf region good if its single point is inside the
query rectangle, and bad otherwise.
There k good regions; visiting these takes only O(k) time.
Claim: At most O(n) bad regions.
So visiting them all takes O(n) time, and our total query time is
therefore O(k + n).
8

The kd-Tree : Analysis (2D)


Claim: At most O(n) bad regions.
Note that each bad region straddles the query rectangle:
Query

Bad region

Simpler Claim: At most O(n) bad regions straddle each


of the 4 sides of our query rectangle.
Even Simpler Claim: At most O(n) leaf regions (good or
bad) straddle any vertical or horizontal line.
This implies the simpler claim, which implies the original claim.
9

The kd-Tree : Analysis (2D)


Even Simpler Claim: At most O(n) leaf regions (good or
bad) straddle any vertical or horizontal line.
Consider a vertical line L (same argument for horizontal).
Traverse kd-tree to find all leaf regions intersecting L:
At each x split, we recursively visit only left subtree or only right
subtree, since L lies on one side of the vertical splitting line.
At each y split, we may need to traverse left and right subtrees.
So for every 2 levels of depth, the footprint of our traversal at
most doubles.
If h log2 n denotes the total height of the kd-tree, the maximum
number of leaves we encounter at the bottom is at most
2h/2 2(log2n)/2 = n1/2 = n.
(In d>2 dimensions, this argument easily generalizes to yield n1-1/d.)

10

The kd-Tree : Final Remarks


The query time bound of O(k + n) is tight, due to the
same bad example as for the quadtree:

Despite their poor worst-case performance guarantees,


kd-trees and quadtrees/octrees/etc. are used frequently in
practice and often perform reasonably well.
Theyre also useful for a variety of other useful geometric
tasks; e.g., image compression, nearby point queries,
viewing the contents of a small window moving through a
large geometric scene, etc.

11

The Range Tree (2D)


Step 1: Sort all n points (x1, y1) (xn, yn) by
x coordinate and build a complete balanced
binary tree on top of this ordering:
Interior nodes augmented with
x ranges of their subtrees.

Height = log2 n

1..15

1..6

1..3

(1, 7)

(3, 2)

7..15

4..6

(4, 9)

(6, 1)

7..10

12..15

(7, 0) (10, 4) (12, 5) (15, 6)


12

The Range Tree (2D)

Height = log2 n

Recall that we can answer a range query in x with


a collection of 2log2 n subtrees:

x range query
13

The Range Tree (2D)


Step 2: Augment each internal node with an
array of all points in its subtree, sorted by y:
(7, 0)

(6, 1)

(3, 2)

(10, 4) (12, 5) (15, 6)

(1, 7)

(4, 9)

1..15
(6, 1)

(3, 2)

(3, 2)

(1, 7)

(4, 9)

(7, 0)

1..6

7..15

(1, 7)

(6, 1)

1..3

(1, 7)

(3, 2)

(4, 9)

4..6

(4, 9)

(6, 1)

(7, 0)

7..10

(10, 4) (12, 5) (15, 6)

(10, 4)

(12, 5) (15, 6)

12..15

(7, 0) (10, 4) (12, 5) (15, 6)

Total preprocessing time and space: O(n log n)


(since each point appears in only log2 n arrays)

14

The Range Tree : Answering Queries


To find all points in [x1, x2] x [y1, y2], first do a range query
in the top-level tree based on x:

All points with x coordinate in [x1, x2]

At the root of each of the 2log2 n resulting subtrees,


query augmented y array over the y range [y1, y2].
15

The Range Tree : Query Performance


Each 2D range query results in our performing
2log2 n individual 1D range queries, each with
O(log n) overhead (for a binary search).
Total query time: O(k + log2 n).
In d dimensions: O(k + logd n).
(in a d-dimensional range tree, our top level tree is sorted
by 1st coordinate, and each internal node is augmented
with a (d-1)-dimensional range
tree built using the
other coordinates).

16

Fractional Cascading
Fractional cascading can remove one log factor
from the query time: O(k + logd-1 n).
(7, 0)

(6, 1)

(3, 2)

(10, 4) (12, 5) (15, 6)

(1, 7)

(4, 9)

1..15
(6, 1)

(3, 2)

(3, 2)

(1, 7)

(4, 9)

(7, 0)

1..6

7..15

(1, 7)

(6, 1)

1..3

(1, 7)

(3, 2)

(4, 9)

4..6

(4, 9)

(6, 1)

(7, 0)

7..10

(10, 4)

(10, 4) (12, 5) (15, 6)

(12, 5) (15, 6)

12..15

(7, 0) (10, 4) (12, 5) (15, 6)

Extra pointers allow us to avoid binary searching


in all but the top-level y-sorted array.
17

Lecture 33. Point Location

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Point Location : Problem Statement


Preprocess a planar map (consisting of polygonal cells)
so that we can quickly determine the cell to which a query
point belongs.

Goal: O(log n) query time, where n denotes the number of


vertices in the scene.
2

Application : The Post Office Problem


(Nearest Neighbor Queries)
Goal: Preprocess n points in the plane so that we
can quickly determine the closest neighbor to any
given query point.
We can reduce this to a point location problem on
the Voronoi diagram of our point set:

Note: We can compute the Voronoi diagrams of a


point set in O(n log n) using a complicated plane
sweep approach.

Trapezoidal Decompositions
A popular approach for point location is to construct
(using an elegant randomized approach) a trapezoidal
decomposition (also called a vertical decomposition)
and an associated tree for searching it.

Results:
O(n) expected space.
O(n log n) expected construction time.
O(log n) expected query time.
4

Successive Triangulations
Instead of trapezoidal decompositions, well discuss an
elegant approach due to Kirkpatrick using successive
triangulations.
Note that an n-vertex polygon can be triangulated using
2n 3 edges (in fact, this can be done in O(n) time,
although its somewhat complicated; O(n log n) is easier).
So lets assume our scene consists only of triangles
Outer boundary
usually taken as
a triangle (we use
a square here
since its easier
to picture)
5

Eulers Formula
[Euler] For any planar graph: F + V E = 2, where
F = # of faces (including the external, or outer face)
E = # of edges
V = # of vertices

As a corollary, since each face contains at least 3 edges,


we obtain E 3V 6.
The degree of a vertex is the number of edges incident to
it (i.e., the # of neighboring vertices).
Lets call a vertex low degree if
it has fewer than 12 neighbors.
Claim: every n-vertex planar graph
has n/2 low-degree vertices.
(otherwise, E = v degree(v) (12(V/2)) = 3V).

Low-Degree Vertices
A vertex v has low degree if degree(v) < 12.
An n-vertex planar graph has n/2 low-degree vertices.
Claim: In an n-node planar graph, one can always find a
set of n/24 independent (i.e., mutually non-adjacent)
low-degree vertices.
Pick any low-degree vertex, discard its 11 neighbors, repeat.

As the first step in building our data


structure, find an independent set L
of low-degree vertices in the interior
of the planar map.
(note that |L| n/24 4)

= in the set L

Re-Triangulation
Remove the vertices in L and re-triangulate each of the
resulting empty star-shaped polygonal regions:

Each newly-created triangle intersects at most 11


triangles from the old triangulation (that were destroyed
when L was removed).
Further, our new triangulation uses 23/24n + 4 vertices.
8

Successive Re-Triangulation
Since each re-triangulation reduces the # of vertices in
our planar map by a constant fraction, after O(log n)
successive re-triangulations the map reduces to:

Now, to locate a query point p:


Find triangle T1 containing p in the simple map above in O(1) time.
Now look at the next-finer level of triangulation. T1 intersects at
most 11 triangles here, so check them all in O(1) to find the
triangle T2 containing p.
Continue in this fashion through all O(log n) levels, after which we
will have found the triangle in our original map containing p.

Total query time: O(log n)


9

Lecture 34. Dynamic Connectivity

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Review : Data Structures and Graphs


We have already developed many data structures
that have useful applications in graphs and graph
algorithms. For example,
Disjoint sets : Kruskals minimum spanning tree algorithm
Fibonacci heaps : Dijsktras shortest path algorithm
Cartesian trees and Lowest Common Ancestors : Evolutionary
trees, verifying MSTs, encoding static graph connectivity

Also, recall that by storing the adjacency list of


each node in a universal hash table, we can:
answer adjacent?(x, y) queries in O(1) expected time,
insert and delete edges in O(1) time, and
enumerate the k neighbors of a node in O(1 + k) time.
2

Connectivity Data Structures


Goal: Encode a graph so we can answer
connectivity queries:
connected(x,y) : are x and y in the same connected
component?

Problem Variants:
In the static case, we only answer queries (after first
preprocessing the graph).
In the incremental case, we must support edge
insertions as well as queries.
In the decremental case, we must support edge
deletions as well as queries (preprocessing allowed).
In the fully dynamic case, we must support edge
insertions and deletions, as well as queries.
3

Easy Cases
Static case: O(1) query time easy to achieve.
Label the nodes in each component with integer IDs as
a preprocessing step (say, using DFS or BFS).

Incremental case: O((n)) amortized time for


both insert and connected operations.
Use our fancy tree-based disjoint set data structure
that employs the union by rank and path
compression heuristics.

Decremental case in a forest: O(q + n log n) time


for any sequence of operations starting with an
n-node tree (q of them being queries).
Note that # of delete operations n 1

Decremental Connectivity in a Forest


Similar idea to our list based disjoint set data structures:
Label each node with an integer ID identifying the component
(tree) to which it belongs.
x and y are connected if ID(x) = ID(y). O(1) time / query.
When we delete edge (x, y), this splits a tree into two pieces.
Relabel the smaller one (say, containing k nodes) in O(k) time.
1

1
1
1

2
1

y
1

2
2

Each node relabeled log n times, since each for each


relabel its tree shrinks to its current size.
Implementation detail: how to relabel the smaller half?

Euler-Tour (ET) Trees :


Fully Dynamic Connectivity in a Forest
Consider an n-node tree T with root r (arbitrary).
Perform an Euler tour (around) T to obtain a
sequence of length 2n 1.
Encode this sequence in a balanced BST (say, a
splay tree).
Root:

a
d

c
e
Tree T

abacadedfda
f

Augment each tree node with


a pointer to its first and last
occurrence in the sequence
6

Euler-Tour (ET) Trees :


Fully Dynamic Connectivity in a Forest
To answer connected(x, y) queries, check if x and
y are in same ET tree.
(splay x, then splay y and see if x displaced from root)

Insertion and deletion of edges requires 2 splits


and joins in ET trees.
Initial ET tree contents:
Example (deletion):
abacadedfdaga

2 splits:
g

c
Deleted

abaca
1 join:
(an a is deleted
in the process)

dedfd

aga

abacaga
7

Fully Dynamic Connectivity


ET trees give us a fully-dynamic connectivity data
structure for the special case of a forest
All operations take O(log n) amortized time.

Next, well use ET trees to obtain a fully-dynamic


structure for general graphs with performance:
connected: O(log n) amortized time.
insert / delete: O(log2 n) amortized time.

Lecture 35. Fully Dynamic Connectivity


in a General Graph

CpSc 838: Advanced


Data Structures
Clemson University
Fall 2006

Fully Dynamic Connectivity


Maintain an n-node graph G in a data structure
supporting the following operations:
insert(x, y) : insert a new edge
delete(x, y) : delete an existing edge
connected(x, y) : determine if x and y belong to the
same connected component (i.e., if x and y are
reachable from each-other).

Todays goal:
insert / delete : O(log2 n) amortized time
connected : O(log n) amortized time
2

Review : Euler-Tour (ET) Trees


Store Euler traversal of a tree in a splay tree.
Supports the following in O(log n) amortized time:
Link two trees (via newly inserted edge).
Cutting a tree into two trees (by deleting an edge).
Determine if x and y lie in the same tree.

Example (deletion):

Initial ET tree contents:

abacadedfdaga

2 splits:
g

abaca

c
Deleted

1 join:
(an a is deleted
in the process)

dedfd

aga

abacaga
3

Review : Maximum Spanning Trees


6

12

16
9

11

28

19

14

25
5

20

35

18
7

17

12

Cut optimality conditions : A spanning tree T


is a maximum spanning tree if and only if for
every cut (a partition of its nodes into two sets), T
contains a maximum-value edge crossing the cut.
(for the minimum spanning tree problem, simply replace maximum with minimum)

Restoring a Maximum Spanning Tree


After Edge Deletion
Let T be a maximum spanning tree of G.
If we delete a non-tree edge from G, T remains a
maximum-spanning tree.
If we delete a tree edge (x, y), this splits T into two smaller
trees T1 and T2:
6

12

T1

11

16
9

8
7

25
5

14

20

35

18
7

T2

28

19

17

12

To restore a maximum spanning tree, reconnect T1 and


T2 with a maximum-value edge running between them

Restoring a Maximum Spanning Tree


After Edge Deletion
Suppose all edge values are nonnegative integers.
Delete tree edge (x, y) of value v, splitting T into T1 and T2
(let T1 be the smaller of T1 and T2)
To restore the maximum spanning tree:
For k = v, v 1, v 2, , 0:
Search T1 for a non-tree edge of value k connecting to T2. If
found, use it to reconnect T1 and T2 and stop.
Note that all of the
unsuccessful edges
we search will connect
two nodes within T1.

11

6
7

x
T1

y
5

T2
7

Maximum Spanning Forests


All of our earlier discussion easily generalizes to a
maximum spanning forest (a maximum spanning tree on
each connected component of G).
6
7

y
5

To delete a tree edge (x, y) of value v from T, we apply


the same procedure as on the previous slide to try and
restore the maximum spanning tree on Ts component.
To insert a new edge (x, y) of value 0, link two trees if x
and y belong to separate components.

Back to Dynamic Connectivity


Assign each edge e in G a level l(e) in the range 0 L = log2 n.
We maintain a maximum spanning forest F0 on G with respect to
these edge levels.
Each maximum spanning tree in F0 is stored as an ET tree:
1
0
0

x
3

3
0

2
0

y
1

1
1

1
2

0
0

To insert a new edge e = (x, y), set l(e) = 0. If e joins two previouslydisconnected trees, link them together.
8

A Hierarchy of Forests
F0 is a maximum spanning forest of our graph:
1
0
2

2
0

1
1

F1 is a subset of F0 induced by all edges of level 1:


1

2
3

1
1

1
1

Similarly, we define F2, F3, , FL.


We store every tree at every level in an ET tree.
Each non-tree edge is stored inside only one of these ET
trees the one containing it at level l(e).

Insertion and Deletion


O(n log n) storage space required to store F0, , FL.
Insertion of an edge only interacts with F0, since newlyinserted edges e are assigned level l(e) = 0.
So immediate cost of insertion is O(log n) amortized, to link
together two of the ET trees in F0.

Deletion of a non-tree edge e = (x, y) in O(log n) time:


Delete e from the single ET tree in Fl(e) storing e.
(i.e., look up x and y in this tree, and remove e from the adjacency
lists of x and y).

Tricky case:
deletion of a
tree edge

1
0
0

2
1

2
0

3
0

3
2
1

0
0

10

Deletion of a Tree Edge e = (x, y)


(High-Level Idea)
7

23

7
12

8
7

18

10

10

y
7

Tx

Ty

Note that edge e stored in ET trees T0 Tl(e) in F0 Fl(e).


Split each of these trees into two trees by removing (x, y).
For k = l(e), l(e) 1, l(e) 2, , 0:
Consider the two ET trees Tx and Ty into which Tk was split.
Assume without loss of generality that Tx is the smaller of Tx and Ty (we
know which is smaller since we can augment each ET tree with its size).
For all non-tree edges e incident to Tx (they are all of level l(e) = k):
If e connects to Ty, use it to re-link Tx and Ty (at this level, k, and also at
all smaller levels k 1. k 2, , 0), and stop.

11

Deletion of a Tree Edge e = (x, y)


(Full Detail)
7

23

7
12

8
7

Tx

18

10

10

y
7

7
7

Ty

Note that edge e stored in ET trees T0 Tl(e) in F0 Fl(e).


Split each of these trees into two trees by removing (x, y).
For k = l(e), l(e) 1, l(e) 2, , 0:
Consider the two ET trees Tx and Ty into which Tk was split.
Assume without loss of generality that Tx is the smaller of Tx and Ty (we know which is
smaller since we can augment each ET tree with its size).
For all tree edges e in Tx with l(e) = k: Increment l(e) to k+1.
(this links two trees together at level k+1)
Repeatedly query Tx for an incident edge e (it will be of level l(e) = k):
If e connects to Ty, use it to re-link Tx and Ty (at this level, k, and also at all
smaller levels k 1. k 2, , 0), and stop.
Else, increment l(e) to k+1 (accordingly move e so now stored at level k+1).
12

Deletion of a Tree Edge e = (x, y)


(Analysis)
We maintain two important invariants:
1. F0 is a maximum spanning forest of our graph (with respect to
l(e)s).
2. Each tree comprising Fk contains at most n / 2k nodes.
(implies that maximum edge level is L = log2 n)

Proof sketches:
1 is maintained: Incrementing l(e) for a tree edge doesnt cause
trouble. And we only increment l(e) for a non-tree edge e = (x, y)
from k to k+1 if x and y are already connected by a path at level
k + 1 (so we satisfy the MST cycle optimality conditions).
2 is maintained: When we increment tree edges in Tx, from level
k to level k+1, we might end up with a tree of size |Tx| on level
k+1. However, since Tx is the size of a tree from level k, why
by induction has size n / 2k, the resulting level-(k+1) tree will
have size n / 2k+1.

13

Deletion of a Tree Edge e = (x, y)


(Analysis)
We we only spend O(log n) amortized time on each edge
per level, so we spend O(log2 n) total time per edge.
Charge this up front: O(log2 n) amortized time for insert.

Not counting the work done to edges that we pay for by


incrementing edge levels, delete takes O(log2 n) time.
For connected(x, y), query the ET trees containing x and y
in F0 to see if they have the same root: O(log n) time.
Can actually be improved to O(log n / log log n); details omitted

Generalizations of this data structure (due to Holm, de


Lichtenberg, and Thorup) can also maintain the minimum
or maximum spanning tree of a fully dynamic graph.
14

Você também pode gostar