Escolar Documentos
Profissional Documentos
Cultura Documentos
Number Crunching
Linux clusters are all the rage. As this article shows, programming a cluster with MPI (Message Passing Interface) need
not be too difficult. Of course, we realize that most readers will not have a cluster of Linux machines under their desks,
so the sample programs will run on any normal PC. BY HEIKO BAUKE
M
PI is the indisputed leader are well documented, and in case of the MPI namespace. Each MPI program
when it comes to program- installation issues, competent help is is surrounded by the MPI::Init and
ming Linux clusters and other always at hand via the LAM mailing MPI::Finalize tags. MPI calls are illegal if
massively parallel computers. list [4]. they occur before MPI::Init has initial-
The development of parallelized pro- ized the MPI environment, or after
grams leads to a number of issues. The Hello World! MPI::Finalize has closed the MPI envi-
load needs to be evenly spread and the Listing 1 shows a simple MPI program ronment.
work performed by these multiple simul- that outputs the names of the computers The communicator is one of MPI’s
taneous processes needs coordinating. running the processes. fundamental concepts. It groups pro-
The header file included here, mpi.h, cesses that can exchange messages.
Message Exchanges provides MPI specific prototypes. All the Communicators are implemented by the
Message exchanges are the central con- MPI classes and functions reside within MPI::COMM class. The MPI::COMM_
cept at the heart of MPI. Many processes WORLD communicator instance always
combine to solve a large problem and exists; it contains all the MPI processes
talk to each other to coordinate their and is quite sufficient for simple
work. The concept is highly generic and programs.
can be implemented on a variety of com- Libraries who need to encapsulate
puter architectures. This makes it more their communication before the applica-
or less irrelevant to the programmer tion functions use their own
whether the program will run on a large- communicators. The Get_size method
scale SMP machine or throughout the tells you how many processes have been
night on a collection of networked, and assigned to a communicator, Get_rank
otherwise idle, office machines. tells you a process’s rank. The rank and
MPI programs are usually designed on size size variables are local, just like
the basis of the SPMD paradigm (Single all variables in MPI programs, and
Program, Multiple Data). Multiple can thus assume different values for
processes run identical program code, each process.
but each process handles different data. Calling the Get_processor_name
Each process is assigned a unique rank function places the name of the
that influences its execution. computer running the process in a buffer
that the proc_name pointer references.
Implementations proc_name_length contains the length.
The MPI standard defines only one API The string is null terminated before
(or three to be more precise, one each for being sent to the process with the
Fortran, C, and C++). Every super-com- rank zero.
puter manufacturer offers its own This process uses a for loop to receive
implementation, optimized for its own the processor names of the other
hardware. Besides these there are also processes and store them to a file. If the
free implementations such as LAM-MPI file cannot be opened, the Abort method
[3] and MPICH [2]. can be used to kill all the processes.
How an MPI program is compiled, de- If you are wondering why the
bugged and launched will depend on the process ranked zero (and not 42, for
implementation. The information in this example) collects the data, this is
article is based on LAM-MPI and the because there will always be a zero
C++ API. The installation steps for LAM rank process. This ensures that the
data will be sent to an existing process, methods must match. The prototype of to mpirun, other implementations may
and that the program will run indepen- the receiving method is called Recv: standard output to /dev/null.
dently of the number of processes. The The program can be compiled with
number of processes is stipulated when void Comm::Recv(void* buf, intU any C++ compiler, although using the
launching the program. count, const Datatype&U
All non-zero processes use the Send datatype, int source, int tag)U Table 1: An MPI datatype is
method to send their strings. The proto- const available for each standard
type of the send method is as follows: C++ datatype
The arguments mean the same as in MPI Datatype C++ Datatype
void Comm::Send(const void*U the sending method, but in this case MPI::CHAR char
buf, int count, const Datatype&U they refer to the source of the message, MPI::WCHAR wchar_t
datatype, int dest, int tag)U instead of the target, and specify MPI::SHORT signed short
const a receive buffer rather than a send MPI::INT signed int
buffer. Programmers must ensure that MPI::LONG signed long
It expects a pointer to a contiguous the receive buffer is large enough. MPI::SIGNED_CHAR signed char
memory area first. count stipulates the Our Hello World program outputs the MPI::UNSIGNED_CHAR unsigned char
number of elements of the datatype type processor name to a file and not to MPI::UNSIGNED_SHORT unsigned short
should be sent to the process ranked standard output. This makes sense MPI::UNSIGNED unsigned int
dest. because MPI programs are not typically MPI::UNSIGNED_LONG unsigned long int
Table 1 shows C++ datatypes. tag is bound to a terminal. Additionally, the MPI::FLOAT float
used to separate the messages, which are MPI standard does not actually specify MPI::DOUBLE double
MPI::LONG_DOUBLE long double
uniquely identified by their com- what happens when data is output
MPI::BOOL bool
municator, a datatype and a tag. The to STDOUT or STDERR. Programmers
MPI::COMPLEX Complex<float>
receiving process must possess a re- wanting to create portable code are well
MPI::DOUBLE_COMPLEX Complex<double>
ceiving method that matches the sending advised to avoid output of this kind. The
MPI::LONG_DOUBLE_COMPLEX Complex<long
method. So the communicator, datatype, LAM implementation passes any stan- double>
and tag of the sending and receiving dard output to STDOUT by the processes
Wrapper compiler, which is normally have been launched on all the machines fairly useful programs. Network band-
supplied with the MPI implementation that will be running MPI programs later. width plays an important role in the case
will ensure that the required libraries To do so, you will need to create a nodes of Beowulf clusters. A small ping pong
and header files are found. The Wrapper file that contains a hostname in each line program can be used to measure the
compiler for LAM-MPI is called mpiCC. (the simplest case would be just local- bandwidth (see Listing 2), where a fixed
host) and launch the daemons by typing: length message is first passed from
mpiCC -o hello_worldU process 0 to process 1 and then back
hello_world.cc lamboot -v nodes from process 1 to process 0. The elapsed
time is measured and a mean value for
After compling, simpy enter: You only need to launch the daemons multiple attempts ascertained.
once. If you enter lamhalt to terminate You should be familiar with the MPI
mpirun -np 4 hello_world the daemons, you will need to re-launch initialization phase from the Hello World
them explicitly. program. The actual measurement is
to launch four instances of the performed in two loops (starting in line
hello_world program. The -np parameter Ping Pong 37 and 41 respectively). The Barrier is
defines the number of processes. Before The few MPI functions introduced in the new; it ensures that a communicator’s
doing so, ensure that the LAM daemons previous section allow you to author processes are synchronized. Each pro-
Operand Meaning
250
MPI::MAX Maximum
MPI::MIN Minimum
200 MPI::SUM Sum
MPI::PROD Product
150 MPI::LAND logical AND
MPI::BAND binary AND
100 MPI::LOR logical OR
MPI::BAND binary OR
50 MPI::LXOR logical exclusive OR
MPI::BXOR binary exclusive OR
0 MPI::MAXLOC Maximum and occurrence of
0 2 4 6 8
10 10 10 10 10 maximum
MPI::MINLOC Minimum and occurrence of
Figure 1: Results of bandwidth test using shared memory, Ethernet, and Myrinet. Myrinet data courtesy
minimum
of Tobias Czauderna and Andreas Herzog (Univ. Magdeburg)
Reduce uses the operation specified by void Comm::Gather(const voidU screen. The article at [6] provides addi-
the op operator to collect the data that all *sendbuf, int sendcount, constU tional details.
processes have placed in the buffer Datatype &sendtype, voidU After initializing MPI, the process
pointed to by sendbuf. The results are *recvbuf, int recvcount, constU ranked zero reads a configuration file
placed in the recvbuf of the process refer- Datatype &recvtype, int root) that contains geometrical data. A call to
enced by root. The send and receive §§ const Bcast distributes the geometrical data to
buffers contain count elements. The all other processes.
count, datatype, op, and root arguments root specifies the process by or to which The program uses geometrical distrib-
must be identical for all processes. sendtype type data are sent. sendcount ution to parallelize the calculations. The
The case where local data belonging to specifies the amount of data each proces- screen area for which the intensity distri-
one process needs to be distributed sor will send or receive. The sendcount bution will be calculated is divided into
across all processes is handled by the and recvcount, and sendtype and recvtype narrow horizontal bands. After this cal-
Scatter function . In contrast to this, arguments are typically identical. culation, a call to Reduce discovers the
Gather collates distributed data within a In listing 3. each process generates a intensity of the brightest point. Gather
single process. The prototype for Scatter pseudo-random number, and then the collects the image data to allow process
is as follows: maximum, and minimum values, and zero to output a portable graymap file.
the sum of the pseudo-random numbers
void Comm::Scatter(const void U is calculated. Information is collected by Conclusion
*sendbuf, int sendcount, constU process zero and written to a file. MPI is extremely powerful. Amongst
Datatype &sendtype, void U other things MPI provides non-blocking
*recvbuf, int recvcount, constU Digital Diffraction communication, additional collective
Datatype &recvtype, int root)U The program on the web site [12] creates communication functions, derived data-
const parallel diffraction images. It calculates types and special topologies. [9]
the diffraction between spherical waves provides a discussion of several generic
and Gather as follows: originating at various points on the aspects of parallel programming. ■