Parallel Terminology 2

Some General Parallel Terminology
Like everything else, parallel computing has its own "jargon".

Some of the more commonly used terms associated with parallel computing are listed
below. Most of these will be discussed in more detail later.
Task
A logically discrete section of computational work. A task is typically a program or

program-like set of instructions that is executed by a processor. A task is "an execution
path through address space". In other words, a set of program instructions that are loaded
in memory. The address registers have been loaded with the initial address of the
program. At the next clock cycle, the CPU will start execution, in accord with the
program. The sense is that some part of 'a plan is being accomplished'. As long as the
program remains in this part of the address space, the task can continue, in principle,
indefinitely, unless the program instructions contain a halt, exit, or return.
• In the computer field, "task" has the sense of a real-time application, as

distinguished from process, which takes up space (memory), and execution time.
See operating system.
o Both "task" and "process" should be distinguished from event, which takes
place at a specific time and place, and which can be planned for in a
computer program.
 In a computer graphical user interface (GUI), an event can be as
simple as a mouse click or keystroke.
Parallel Task
A task that can be executed by multiple processors safely (yields correct results). Task
parallelism (also known as function parallelism and control parallelism) is a form of
parallelization of computer code across multiple processors in parallel computing
environments. Task parallelism focuses on distributing execution processes (threads)
across different parallel computing nodes. It contrasts to data parallelism as another form
of parallelism.
Serial Execution
Execution of a program sequentially, one statement at a time. In the simplest
sense, this is what happens on a one processor machine. However, virtually all
parallel tasks will have sections of a parallel program that must be executed
serially.
Parallel Execution
Execution of a program by more than one task, with each task being able to execute the
same or different statement at the same moment in time. The simultaneous use of more
than one CPU or processor core to execute a program or multiple computational threads.
Ideally, parallel processing makes programs run faster because there are more engines
(CPUs or cores) running it. In practice, it is often difficult to divide a program in such a
way that separate CPUs or cores can execute different portions without interfering with
each other. Most computers have just one CPU, but some models have several, and multi-
core processor chips are becoming the norm. There are even computers with thousands of
CPUs.
With single-CPU, single-core computers, it is possible to perform parallel processing by

connecting the computers in a network. However, this type of parallel processing requires
very sophisticated software called distributed processing software.
Note that parallel processing differs from multitasking, in which a CPU provides the
illusion of simultaneously executing instructions from multiple different programs by
rapidly switching between them, or "interleaving" their instructions.
Parallel processing is also called parallel computing. In the quest of cheaper computing
alternatives parallel processing provides a viable option. The idle time of processor cycles
across network can be used effectively by sophisticated distributed computing software.
Pipelining
Breaking a task into steps performed by different processor units, with inputs streaming
through, much like an assembly line; a type of parallel computing. In computing, a
pipeline is a set of data processing elements connected in series, so that the output of one
element is the input of the next one. The elements of a pipeline are often executed in
parallel or in time-sliced fashion; in that case, some amount of buffer storage is often
inserted between elements. As the assembly line example shows, pipelining doesn't
decrease the time for a single datum to be processed; it only increases the throughput of
the system when processing a stream of data.
High pipelining leads to increase of latency - the time required for a signal to propagate
through a full pipe.
A pipelined system typically requires more resources (circuit elements, processing units,
computer memory, etc.) than one that executes one batch at a time, because its stages
cannot reuse the resources of a previous stage. Moreover, pipelining may increase the
time it takes for an instruction to finish.
Shared Memory
From a strictly hardware point of view, describes a computer architecture where all
processors have direct (usually bus based) access to common physical memory. In a
programming sense, it describes a model where parallel tasks all have the same "picture"
of memory and can directly address and access the same logical memory locations
regardless of where the physical memory actually exists. In computing, shared memory
is memory that may be simultaneously accessed by multiple programs with an intent to
provide communication among them or avoid redundant copies. Depending on context,
programs may run on a single processor or on multiple separate processors. Using
memory for communication inside a single program, for example among its multiple
threads, is generally not referred to as shared memory.
Symmetric Multi-Processor (SMP)
Hardware architecture where multiple processors share a single address space and access
to all resources; shared memory computing. In computing, symmetric multiprocessing
or SMP involves a multiprocessor computer architecture where two or more identical
processors can connect to a single shared main memory. Most common multiprocessor
systems today use an SMP architecture. In the case of multi-core processors, the SMP
architecture applies to the cores, treating them as separate processors.
SMP systems allow any processor to work on any task no matter where the data for that
task are located in memory; with proper operating system support, SMP systems can
easily move tasks between processors to balance the workload efficiently. SMP has many
uses in science, industry, and business which often use custom-programmed software for
multithreaded (multitasked) processing. However, most consumer products such as word
processors and computer games are written in such a manner that they cannot gain large
benefits from concurrent systems. For games this is usually because writing a program to
increase performance on SMP systems can produce a performance loss on uniprocessor
systems. Recently, however, multi-core chips are becoming more common in new
computers, and the balance between installed uni- and multi-core computers may change
in the coming years.
The nature of the different programming methods would generally require two separate
code-trees to support both uniprocessor and SMP systems with maximum performance.
Programs running on SMP systems may experience a performance increase even when
they have been written for uniprocessor systems. This is because hardware interrupts that
usually suspend program execution while the kernel handles them can execute on an idle
processor instead. The effect in most applications (e.g. games) is not so much a
performance increase as the appearance that the program is running much more smoothly.
In some applications, particularly compilers and some distributed computing projects,
one will see an improvement by a factor of (nearly) the number of additional processors.
In situations where more than one program executes at the same time, an SMP system
will have considerably better performance than a uni-processor because different
programs can run on different CPUs simultaneously.
Systems programmers must build support for SMP into the operating system: otherwise,
the additional processors remain idle and the system functions as a uniprocessor system.
In cases where an SMP environment processes many jobs, administrators often
experience a loss of hardware efficiency. Software programs have been developed to
schedule jobs so that the processor utilization reaches its maximum potential. Good
software packages can achieve this maximum potential by scheduling each CPU
separately, as well as being able to integrate multiple SMP machines and clusters.
Access to RAM is serialized; this and cache coherency issues causes performance to lag
slightly behind the number of additional processors in the system.
Distributed Memory
In hardware, refers to network based memory access for physical memory that is
not common. As a programming model, tasks can only logically "see" local
machine memory and must use communications to access memory on other
machines where other tasks are executing.
Communications
Parallel tasks typically need to exchange data. There are several ways this can be
accomplished, such as through a shared memory bus or over a network, however
the actual event of data exchange is commonly referred to as communications
regardless of the method employed.
Synchronization
The coordination of parallel tasks in real time, very often associated with
communications. Often implemented by establishing a synchronization point
within an application where a task may not proceed further until another task(s)
reaches the same or logically equivalent point.
Synchronization usually involves waiting by at least one task, and can therefore
cause a parallel application's wall clock execution time to increase.
Granularity
In parallel computing, granularity is a qualitative measure of the ratio of
computation to communication.
• Coarse: relatively large amounts of computational work are done between

communication events
• Fine: relatively small amounts of computational work are done between
communication events
Parallel Overhead
The amount of time required to coordinate parallel tasks, as opposed to doing
useful work. Parallel overhead can include factors such as:
• Task start-up time

• Synchronizations
• Data communications
• Software overhead imposed by parallel compilers, libraries, tools,
operating system, etc.
• Task termination time
Massively Parallel
Refers to the hardware that comprises a given parallel system - having many
processors. The meaning of "many" keeps increasing, but currently, the largest
parallel computers can be comprised of processors numbering in the hundreds of
thousands.
Embarrassingly Parallel
Solving many similar, but independent tasks simultaneously; little to no need for
coordination between the tasks. In parallel computing, an embarrassingly parallel
workload (or embarrassingly parallel problem) is one for which little or no effort is
required to separate the problem into a number of parallel tasks. This is often the case
where there exists no dependency (or communication) between those parallel tasks.[1]
Embarrassingly parallel problems are ideally suited to distributed computing and are also
easy to perform on server farms which do not have any of the special infrastructure used
in a true supercomputer cluster.
Scalability
Refers to a parallel system's (hardware and/or software) ability to demonstrate a
proportionate increase in parallel speedup with the addition of more processors.
Factors that contribute to scalability include:
• Hardware - particularly memory-cpu bandwidths and network

communications
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application and coding
Multi-core Processors
Multiple processors (cores) on a single chip. A multi-core processor is a processing

system composed of two or more independent cores (or CPUs). The cores are typically
integrated onto a single integrated circuit die (known as a chip multiprocessor or CMP),
or they may be integrated onto multiple dies in a single chip package. A many-core
processor is one in which the number of cores is large enough that traditional multi-
processor techniques are no longer efficient — this threshold is somewhere in the range
of several tens of cores — and likely requires a network on chip.
Cluster Computing
Use of a combination of commodity units (processors, networks or SMPs) to
build a parallel system. A computer cluster is a group of linked computers,
working together closely so that in many respects they form a single computer.
The components of a cluster are commonly, but not always, connected to each
other through fast local area networks. Clusters are usually deployed to improve
performance and/or availability over that provided by a single computer, while
typically being much more cost-effective than single computers of comparable
speed or availability.[1
Supercomputing / High Performance Computing
supercomputer (a mainframe computer that is one of the most powerful available at a
given time). Use of the world's fastest, largest machines to solve large problems. A
supercomputer is a computer that is at the frontline of current processing capacity,
particularly speed of calculation. Supercomputers introduced in the 1960s were designed
primarily by Seymour Cray at Control Data Corporation (CDC), and led the market into
the 1970s until Cray left to form his own company, Cray Research. He then took over the
supercomputer market with his new designs, holding the top spot in supercomputing for
five years (1985–1990). In the 1980s a large number of smaller competitors entered the
market, in parallel to the creation of the minicomputer market a decade earlier, but many
of these disappeared in the mid-1990s "supercomputer market crash".
Today, supercomputers are typically one-of-a-kind custom designs produced by

"traditional" companies such as Cray, IBM and Hewlett-Packard, who had purchased
many of the 1980s companies to gain their experience. As of July 2009, the IBM
Roadrunner, located at Los Alamos National Laboratory, is the fastest supercomputer in
the world.
The term supercomputer itself is rather fluid, and today's supercomputer tends to become
tomorrow's ordinary computer. CDC's early machines were simply very fast scalar
processors, some ten times the speed of the fastest machines offered by other companies.
In the 1970s most supercomputers were dedicated to running a vector processor, and
many of the newer players developed their own such processors at a lower price to enter
the market. The early and mid-1980s saw machines with a modest number of vector
processors working in parallel to become the standard. Typical numbers of processors
were in the range of four to sixteen. In the later 1980s and 1990s, attention turned from
vector processors to massive parallel processing systems with thousands of "ordinary"
CPUs, some being off the shelf units and others being custom designs. Today, parallel
designs are based on "off the shelf" server-class microprocessors, such as the PowerPC,
Opteron, or Xeon, and most modern supercomputers are now highly-tuned computer
clusters using commodity processors combined with custom interconnects.
Grid computing:
Grid computing is the most distributed form of parallel computing. It makes use of
computers communicating over the Internet to work on a given problem. Because of the
low bandwidth and extremely high latency available on the Internet, grid computing
typically deals only with embarrassingly parallel problems. Many grid computing
applications have been created, of which SETI@home and Folding@Home are the best-
known examples.[31]
Most grid computing applications use middleware, software that sits between the
operating system and the application to manage network resources and standardize the
software interface. The most common grid computing middleware is the Berkeley Open
Infrastructure for Network Computing (BOINC). Often, grid computing software makes
use of "spare cycles", performing computations at times when a computer is idling.

Parallel Terminology 2

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Parallel Terminology 2

Enviado por

Direitos autorais:

Formatos disponíveis

Some General Parallel Terminology

Like everything else, parallel computing has its own "jargon".

A logically discrete section of computational work. A task is typically a program or

• In the computer field, "task" has the sense of a real-time application, as

With single-CPU, single-core computers, it is possible to perform parallel processing by

Symmetric Multi-Processor (SMP)

• Coarse: relatively large amounts of computational work are done between

• Task start-up time

• Hardware - particularly memory-cpu bandwidths and network

Multiple processors (cores) on a single chip. A multi-core processor is a processing

Today, supercomputers are typically one-of-a-kind custom designs produced by

Você também pode gostar