Parallel Computing (Unit5)

PARALLEL C OMPUTING
Bhairvee Singh Asst. Professor Dept. of computer science & engineering School of Engineering & Technology Sharda Universty
TOPICS

TO BE COVERED
Introduction of parallel computing Need for parallel computing Parallel Architectural classification schemes (1) Flynns, classification (2) Fengs, classification
I NTRODUCTION TO PARALLEL C OMPUTING

Traditionally, software has been written for serial computation:

To be run on a single computer having a single Central Processing Unit (CPU); A problem is broken into a discrete series of instructions. Instructions are executed one after another. Only one instruction may execute at any moment in time.

In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:

To be run using multiple CPUs A problem is broken into discrete parts that can be solved concurrently Each part is further broken down to a series of instructions Instructions from each part execute simultaneously on different CPUs

For parallel processing there may be:

A single computer with multiple processors; An arbitrary number of computers connected by a network; A combination of both.
The computational problem usually demonstrates characteristics such as the ability to be:

Broken apart into discrete pieces of work that can be solved simultaneously; Execute multiple program instructions at any moment in time; Solved in less time with multiple compute resources than with a single compute resource.
D EFINITION
Parallel processing is an efficient form of information processing which emphasizes the exploitation of concurrent events in the computing process. Concurrency implies parallelism, simultaneity, and pipelining.

Parallelism: events occurring at same interval of time. Simultaneous events may occur at the same time instant Pipelined events may occur in overlapped time spans.
N EED

OF PARALLEL COMPUTING
Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion, with potential cost savings. Parallel clusters can be built from cheap, commodity components Solve larger problems: Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. For example Web search engines/databases processing millions of transactions per second
Provide concurrency: A single compute resource can only do one thing at a time. Multiple computing resources can be doing many things simultaneously. For example, the Access Grid provides a global collaboration network where people from around the world can meet and conduct work "virtually. Use of non-local resources: Using compute resources on a wide area network, or even the Internet when local compute resources are scarce.
A PPLICATIONS OF PARALLEL
COMPUTING
Design of VLSI circuits 2) CAD/ CAM applications in all spheres of engineering activity. 3) Solving field problems. These are modeled using partial differential equations and require operations on large sized matrices. Example 1. Structural dynamics in aerospace and civil engineering 2. Material and nuclear problems in physics 3. Particle system problems in physics 4. Weather forecasting. 5. Intelligent systems 6. Modeling and simulation in economics, planning and many other areas. 7. Remote sensing and processing of large data 8. problems in nuclear energy
1)
Parallelism can be achieved by: (1)Parallelism in uniprocessor system (2)Parallel computers
PARALLELISM IN
UNIPROCESSOR SYSTEM
A number of parallel processing mechanism have been developed in uniprocessor computers:
(1) Multiplicity of functional units:-Many of the functions of ALU can be distributed to multiple and specialized functional units which can operate in parallel. Ex: CDC-6600 it has 10 functional units in CPU and they are independent of each other. IBM 360/91 it has two parallel execution units , one for fixed point and another for floating point.
PARALLELISM IN
UNIPROCESSOR SYSTEM

(2) parallelism and Pipelining within the CPU:-In contrast to bit serial adder, carry look ahead and carry save adders are used. High speed multiplication and division techniques are used for exploring parallelism. various phases of instruction execution are pipelined, including instruction fetch, decode, operand fetch, execution, and store.
PARALLELISM IN
UNIPROCESSOR SYSTEM
(3) Overlapped CPU and I/O operations: I/O operations can be performed simultaneously by using separate I/O controllers, channels and processors. DMA can be used to provide direct communication b/w memory and I/O.
(4) Use of hierarchical memory system: A hierarchical memory system can be used to close up the speed gap b/w the CPU and memory.
PARALLELISM IN
UNIPROCESSOR SYSTEM
(5) Multiprogramming : With in the same time interval there may be multiple processes active in a computer, competing for memory, I/O and CPU resources. When a process P1 is tied up with I/O operations, the system scheduler can switch the CPU to process P2. this allows the simultaneous execution of several programs in the systems. When P2 is done, CPU can be switched to P3 or to the P1. This interleaving of CPU and I/O operations among several programs is called multiprogramming.
PARALLELISM IN
UNIPROCESSOR SYSTEM
(6) Time sharing: Sometimes a high priority program may occupy the CPU for too long to allow others to share. This problem can be overcome by using the time sharing operating system. The concept extends from multiprogramming by assigning fixed or variable time slices to multiple programs. Each user thinks that he or she is the sole user of the system, because the response is so fast.
A RCHITECTURAL C LASSIFICATION S CHEMES

(1) Flynn s Classification: in general digital computers may be classified into four categories, according to the multiplicity of instruction and data streams.
This scheme is introduced by Michel J. Flynn. The essential computing process is execution of a sequence of instructions on a set of data. The term stream is used here to denote a sequence of items( instruction, data). Instruction stream a sequence of instructions Data stream sequence of data.
F LYNN S

CLASSIFICATION
Single instruction stream-single data stream (SISD) Single instruction stream-Multiple data stream (SIMD) Multiple instruction stream-single data stream (MISD) Multiple instruction stream- Multiple data stream (MIMD) Both instruction and data are fetched from memory. Instructions are decoded by control unit, which sends the decoded instruction scheme to the processor unit for execution. Data streams flow between the Processors and memory module. Each instruction stream is generated by an independent control unit.
SISD
Instruction are executed sequentially, but may be overlapped in execution. Most SISD uniprocessors are pipelined. All functional units are under the supervision of a single control unit.
SIMD
T is class c rresp s t array pr cess rs. T ere are multiple pr cessi g eleme ts supervise y the same c tr l unit. ll PEs receive the same instructi n ut operate on different data set. The shared memory su system may contain multiple modules.
MISD
There are n processor units, each receiving distinct instructions operating over the same data stream and its derivatives. The result of one processor become the input of the next processor. Ex. Systolic Arrays
MIMD
Most multiprocessor systems and multiple computer systems can be classified in this category. If n data streams were derived from disjointed subspace then we would have MSISD which is nothing but a set of n independent SISD uniprocessor systems.
F ENG S C LASSIFICATION

Tse yun Feng has suggested the use of degree of parallelism to classify various computer architectures. The max. no. of binary digits that can be processed within a unit time by a computer system is called maximum parallelism degree (P). Let Pi be the no. of bits that can be processed within the ith processor cycle. If there are total T cycles then average parallelism degree Pa is defined as Pa= Pi/ T Utilization rate mu of a computer within T cycles Mu= Pa/P

Here the horizontal axis show the word length n. the vertical axis shows the bit slice length m. length is no. of bits. The maximum parallelism degree P(C) of a given computer system is represented by the product of word length n and bit slice length m that is: P(C )= n.m The pair (n, m) corresponds to a point in the computer system shown by coordinate system. The P(c) is equal to the area of rectangle defined by the integers n and m.

There are 4 types of processing method (1) word serial and bit serial (WSBS):-- (n=m=1) one bit is processed at a time. First generation computers (2) Word parallel- bit serial:-- (n=1, m>1). Also known as bit slice processing because m bit slice is processed at a time.
1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 1 0 1

(3) Word serial bit parallel:-- (n>1, m=1). As found in most existing computers. Also known as word slice processing. Because one word of n bits is processed at a time. (4) Word parallel- bit parallel:-- ( n>1, m>1). Known as fully parallel processing, in which an array of n.m bits is processed at one time.

Parallel Computing (Unit5)

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Parallel Computing (Unit5)

Enviado por

Direitos autorais:

Formatos disponíveis

PARALLEL C OMPUTING

I NTRODUCTION TO PARALLEL C OMPUTING

Traditionally, software has been written for serial computation:

I NTRODUCTION TO PARALLEL C OMPUTING

I NTRODUCTION TO PARALLEL C OMPUTING

I NTRODUCTION TO PARALLEL C OMPUTING

For parallel processing there may be:

Parallelism can be achieved by: (1)Parallelism in uniprocessor system (2)Parallel computers

A number of parallel processing mechanism have been developed in uniprocessor computers:

A RCHITECTURAL C LASSIFICATION S CHEMES

Você também pode gostar