Você está na página 1de 10

Parallel and Distributed Algorithms

Lecturer: Dr. Kathleen Steinhöfel


Tutor: Leonidas Kapsokalivas
Tuesdays 13:00 Lecture
14:00 Lecture
Tuesdays 15:00 Tutorial
Web Page:
www.dcs.kcl.ac.uk/teaching/units/material/7ccspda/
Literature
• J. JáJá, Introduction to Parallel Algorithms,
Addison-Wesley, 1992.
• F.T. Leighton, Introduction to Parallel Algorithms and
Architectures: Arrays, Trees and Hypercubes, Morgan
Kaufmann, 1991.
• V. Kumar et al., Introduction to Parallel Computing,
Benjamin/Cummings, 2nd edition, 2003.
Why do we study parallel algorithms?
• The speed of serial computers cannot increase for ever.
• Price/megaflop increases suddenly for serial computers
above a certain level.
• Finding better solutions faster is needed forever larger
problems, e.g., weather forecasting, image processing, ...
• Fast and cheap PC-clusters have overtaken special
designed parallel supercomputer.
• The human brain is massively parallel and for some tasks
still outperforms current technology.
Outline
• Part I:
Introduction: Parallel Models, Performance of Parallel
Algorithms, Communication Complexity.
• Part II:
Basic Techniques: Trees, Pointer Jumping, Divide and
Conquer, Partitioning, Pipelining.
• Part III:
Searching, Merging, Sorting; Graph Algorithms; String
Matching and Pattern Analysis.
• If time permits: Selected Arithmetic Computations.
Prerequisites
You should have a good understanding of elementary data
structures and basic techniques for designing and analysing
(e.g., calculating the run-time complexity) sequential
algorithms. There are many references, one is:
Cormen et al., Introduction to Algorithms,
2nd edition, MIT Press, 2001.
Introduction
The bounds on the resources (e.g., time and space) required
by a sequential algorithm are measured as a function of the
input size.
• Worst-case analysis of algorithms; (maximum amount
of that resource required by any instance of size n)
• Bounds are expressed asymptotically using the
following standard notation:
– T (n) = O(f (n))
For positive constant c and n0 such that
T (n) ≤ cf (n), for all n ≥ n0 .
– T (n) = Ω(f (n))
For positive constant c and n0 such that
T (n) ≥ cf (n), for all n ≥ n0 .
– T (n) = Θ(f (n))
If T (n) = O(f (n)) = Ω(f (n)).
• The running time of a sequential algorithm is
estimated by the number of basic operations required.
• Uniform cost criterion: One unit of time is charged to
reading from and writing into the memory, and to basic
arithmetic and logic operations (adding, subtracting,
comparing, multiplying two numbers, logic OR or AND of
two words). The cost does not depend on the word size.
Speedup and Efficiency
Let P be a computational problem and n be its input size.
Definition: The speedup obtained by a parallel algorithm A
using p > 1 processors is:
T (n)
Sp (n) = ,
Tp (n)
where T (n) denotes the best known sequential running time
for P and A solves P in Tp (n) steps.
Note, Sp (n) ≤ p. Therefore, we would like to design
algorithms that achieve a speedup close to p.
Definition: The efficiency obtained by a parallel algorithm
A using p processors is:
T1 (n)
Ep (n) = ,
p Tp (n)
where T1 (n) denotes the running time of A when p = 1.
Note, that T1 (n) is not necessarily the same as T (n).
Ep (n) indicates the effective utilization of the p processors
relative to A. If T1 (n) = T (n) then E = Ep (n) = Sp (n)/p.
Clearly, E ≤ 1. Stated in percentage, efficiency is always less
than 100%.
Example 1: Suppose the best known sequential algorithm
solves a problem P with input size n = 100 in 35000 steps. A
parallel algorithm A uses 60 processors to solve P in 2500
steps. What is the speedup obtained by A and how efficient is
A?
n = 100, T (100) = 35000, p = 60, T60 (100) = 2500,
35000
S60 (n) = 2500 = 14
14
E = 60 = 0.23
Is A a good algorithm?
Note, that we assume T1 (n) = T (n).
What happens if we have more processors available?
Limitations on the Running Time
There exists a limiting bound on the running time, denoted
T∞ (n), beyond which the algorithm cannot run any faster, no
matter what the number of processors.
For any value of p:
Tp (n) ≥ T∞ (n)
T1 (n)
Ep (n) ≤ p T∞ (n) .
Therefore, the efficiency of an algorithm degrades quickly as p
grows beyond:
T1 (n)
p≥ .
T∞ (n)
Example 2: Suppose, for problem P the best known
sequential algorithm runs in O(n) steps, where n is the input
size. A parallel algorithm A uses p processors to solve P in
O(log n + n/p) steps. What is the speedup obtained by A?
n
Sp (n) = O( )
log n + n/p
If p ≤ n/ log n, then log n ≤ n/p and therefore,
n
Sp (n) = O( ) = O(p)
2n/p
Example 2 (cont.):
What happens if p = n?
n n
Sp (n) = O( ) = O( )
log n + 1 log n
This is certainly not good, especially as we need n processors!
Conclusion: Algorithm A has good speedup and efficiency
for up to n/ log n processors but not for more.
Our aim is to develop parallel algorithms that can provably
achieve the best possible speedup. Therefore, our model of
parallel computation must allow the mathematical derivation
of an estimate on the running time Tp (n) and the
establishment of lower bounds on the best possible speedup
for a given problem.
Models of Parallel Computation
• The classic model for serial computation is the Turing
machine.
• A more realistic but still idealised model is the Random
Access Machine (RAM), which has been used
successfully to predict the performance of sequential
algorithms.
• Modelling parallel computation is considerably more
challenging even if we assume unlimited memory and
unit cost.
• Many interconnected processors form a new dimension.
• Suitable framework for presenting and analysing parallel
algorithms which is:
Simple enough to describe parallel algorithms easily, and to
analyse performance measures such as speed, communication
and memory utilization.
General enough that it does not rely on a particular class of
architectures, i.e., is as hardware-independent as possible.
Implementable enough such that the parallel algorithms
developed for the model can easily implemented on parallel
computers.
Accurate enough that the analysis performed captures the
actual performance of the algorithms on parallel computers.
Categorisation of Parallel Architectures
Flynn’s taxonomy introduces in 1966
SISD Single Instruction stream, Single Data stream.
This is a standard Von Neumann serial computer.
SIMD Single Instruction stream, Multiple Data stream.
Multiple processors, possibly using different data
streams, execute the same instruction synchronously at
each time step (or are switched off).
Examples: Illiac IV, MasPar MP-1, MasPar MP-2,
Thinking Machines CM-1.
MISD Multiple Instruction stream, Single Data stream.
Multiple processors using the same data , execute possible
different instructions. (This is not commonly used.)
MIMD Multiple Instruction str., Multiple Data str.
Multiple processors, possibly using different data str.,
execute possibly different instructions. There is no central
control unit. The processors operate autonomously and,
usually, asynchronously. SPMD (Single Program,
Multiple Data) is often used to design MIMD software.
Examples: Beowulf cluster (e.g., a fast Ethernet network of
PC’s running Linux), Myrinet, ASCII Red
See http://www.top500.org for current fastest computers.
Address-Space Organisation
• Message-passing
– Proc. are connected via an interconnection network.
– Local memory that is accessible only by one processor.
– Proc. interact by passing messages via the network.
– Referred to as distributed-memory or
private-memory architecture.
– MIMD message-passing computer are referred to as
multicomputers.
• Shared-Address-Space
– Hardware support for read and write access by all
processors to a shared address space.
– Processors interact by modifying data objects stored
in the shared-address space.
– MIMD shared-address-space computers are called
multiprocessors.
Shared-memory parallel computers are shared-address -
space computers with a shared memory that is equally
accessible to all processors via an interconnection network.

Você também pode gostar