Escolar Documentos
Profissional Documentos
Cultura Documentos
PRESTACIONS/ES PAC 4
Semestre Setembre 2015
The exercises
Evaluation criteria
Formatting
Deadline
Students doing the third option have to deliver: 1) the scripts developed to carry out the
performance characterization; 2) one document with the answers with the proposed
questions
For both options the maximum length for the document is 4 pages.
The environament and resources
For the practical option, its important to emphasize that it is expected that the student will
carry out its own pathfinding and research to find out solutions in the context of:
performance analysis, metrics collection and program execution.
The first option corresponds to aa open mini-project. As has been already mentioned this is a
completely open mini-project where is expected that the student proposes what he/she wants
to do, evaluate and what are the goals and outputs of the project.
It is suggested that the student divides the presented document in the following sections:
1)
2)
Motivation of the mini-project: why the student selects the given are and topic
for the project.
Related work: what is the related work associated with the project.
3)
The organization: discuss on what the project has consisted, the different tasks
that the student has done and what are the environments and methodologies that the
student has followed.
4)
Discuss the project: results, analysis or whatever is the output or result of the
project.
5)
Conclusions
The total length of the dissertation must be no more than 6 pages. Thereby the student must
syntetize as much as possible.
Describe what the selected algorithm does (include the references that you have used):
2. Parallel implementation
The goal of this second part is to propose a pseudo-code parallel implementation for the parts
identified in 1.2.
2.1
There are any existing solutions? Briefly describe their approaches. List the references.
2.2
2.2.1 What strategy have you selected? (i.e: pipeline, shared memory, message passing etc.)
Why?
2.2.2 What other options you could use?
2.2.3 If possible, can you compare your proposal with the existing solutions proposed in 2.1?
2.2.4 Describe the pseudo-code including comments on why the different parallel selected
parts.
3. Performance projection
3.1
Given the pseudo-code proposed in 2.2 and 1.1 propose a theoretical model to project
the speedup that the parallel implementation may show with respect to the serial
implementation. (Its a model so its not expected to provide 100% accuracy).
3.1.1 Extra: Can you compare your proposed solutions with one of the existing solutions
identified in 2.1?
3.2
Provide a speedup analysis using the previous model for: 1, 2, 4, 16 and 32 threads.
Provide a description of what type of computational system would be better for the
provided implementation and what components would be important to invest more.
/share/apps/aca/benchmarks/NPB3.2/
Inside this folder the student can find the different implementations of the NPB applications. The
NPB3.2-MPI folder contains the compilation for the MPI version (see the bin folder).
The bin
folder contains a set of different executables that correspond to the different applications, with
different levels of parallelism and with different working set size.
#!/bin/bash
#$ -cwd
#$ -S /bin/bash
#$ -N hello
#$ -o hello.out.$JOB_ID
#$ -e hello.err.$JOB_ID
#$ -pe orte 8
mpirun -np 2 /share/apps/aca/benchmarks/NPB3.2/NPB3.2-MPI/bin/cg.A.2
#!/bin/bash
#$ -cwd
#$ -S /bin/bash
#$ -N omp1
#$ -o omp1.out.$JOB_ID
#$ -e omp1.out.$JOB_ID
#$ -pe openmp 4
export OMP_NUM_THREADS=$NSLOTS
/share/apps/aca/benchmarks/NPB3.2/NPB3.2-OMP/bin/bt.S
Questions
1.1 Select one of the NPB Applications. Briefly describe the application: what it solves, what are
input sizes (google to see to what W, S .. is translated for the specific application) and what
level of parallelism it accepts.
1.2 Propose a methodology to evaluate and characterize the performance of an OpenMP
application for different level of parallelism. For example, the simplest approach would be to
propose a methodology that describes the performance of the application based on the runtime.
However, Linux system provide other ways to get metrics from the system. (For example, look
at: http://linux.die.net/man/1/time)
1.3 Using the previous methodology study the impact of the level of parallelism in to the
OpenMP implementation. The student needs to define the experiment parameters: working
set/s, level of parallelism etc. The maximum amount of runs must be 64 runs. Carefully select
the working set. Keep in mind that some of the working sets may have really large runtimes.
The student must provide at the end of the document the scripts used to launch the
experiments. The study must provide plots, tables are not accepted to show the results.
2. OpenMP vs MPI
The goal of this second part is to explore and understand potential ways to compare two
different implementations of the same application. In this case, the student will compare the
OpenMP and MPI implementations for the application selected in 1.1.
Questions
2.1 Extend the methodology proposed in 1.2 to compare and characterize two implementations
of the same algorithm (MPI and OpenMP based).
2.2 Similar to what has been done in section 1.3. Using the methodology described in 2.1
performance a study for the application selected in 1.1. Similar to question 1, the maximum
amount of runs must be 64 runs (results obtained in 1.3 can be re-used). Carefully select the
working set. Keep in mind that some of the working sets may have really large runtimes. The
student must provide at the end of the document the scripts used to launch the experiments.
The study must provide plots. Tables are not accepted to show the results.
Evaluation criteria
Criteria that will be used in the evaluation: proper utilization of MPI or OpenMP models, brevity
and clear results, experiment setup and discussion and analysis.
Format
One PDF document containing all the different answers for the selected option containing:
-
All the different codes developed or scripts must be added as annex section at the end
of the document (no limit)
Provide one tar document with the developed codes (if any)
$ tar cvf tot.tar fitxer1 fitxer2 ...
Deadline