Escolar Documentos
Profissional Documentos
Cultura Documentos
net/publication/2281412
CITATION READS
1 34
6 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Julian Clark Cummings on 20 September 2012.
1 Introduction
This paper concerns the use of C++ and the POOMA Framework [1] to model high-intensity particle accel-
erators. This work is part of the Computational Accelerator Physics Grand Challenge, sponsored by the U.S.
Department of Energy. Another paper in this conference, \The DOE Grand Challenge in Computational
Accelerator Physics," [2] describes the goals and progress to date of the project.
This Grand Challenge project requires implementation of well-known numerical methods in electromag-
netic particle-in-cell simulation on the latest parallel computer architectures, development of alternative
computational approaches, and smooth interaction of multiple physics packages. For these reasons, we have
chosen to use an object-oriented design for our linear accelerator code. By structuring our model in terms of
abstractions (\objects" and \classes") relevant to accelerator physics, we can develop code that is relatively
easy to understand, maintain and extend.
5 Performance Issues
Despite all of these benets, the use of C++ in general and of POOMA in particular would make no sense if
the performance of the resulting code were substantially worse than the performance of equivalent custom-
coded Fortran. Until very recently, numerical codes written in C++ did not perform well in comparison to
equivalent Fortran, but the situation is rapidly changing [3]. One reason for the poor performance of C++
has been the absence of good optimizing compilers. The KCC compiler from Kuck and Associates, Inc. (KAI)
has lled that gap well, and other good optimizing compilers that are fully compliant with the ANSI C++
standard are on the horizon.
Another cause of poor performance is inherent in the C++ language. Consider the following code example:
class Matrix = ::: =;
Matrix A; B; C; D;
= ::: =
A = B + C + D; (1)
Suppose that class Matrix overloads the operators \+" and \=" to perform elementwise addition and
assignment. The nal line will be evaluated in a series of binary operations. These will involve temporary
Matrix objects that store intermediate results: tmp1 = B + C ; tmp2 = tmp1 + D; A = tmp2. Creation and
destruction of temporary objects can severely degrade performance, especially if each object contains a lot of
data.
This problem has been recognized for some time, and various attempts have been made to solve it. The
best solution to date is \expression templates" [4], a
exible and general device that avoids the creation of
temporaries. POOMA relies heavily on expression templates to optimize data-parallel expressions involving
particles and elds. POOMA applications thereby retain the benets of overloaded operators with no loss in
performance.
6 Project Status
Our goal is to produce a \dimension-independent" linear accelerator model capable of simulating beam be-
havior for a variety of beamline elements. We will use classes that are parameterized by dimension using C++
templates. This means that a single code base will support both 2D and 3D models. (Other dimensionalities
are formally possible but have little practical use.) POOMA provides classes templated on dimension, so our
accelerator code can use this feature and derive templated classes from POOMA classes as needed.
We have a 2D-prototype code implemented in C++ and POOMA. It supports a K-V or Gaussian initial
beam distribution in the x-y plane and integrates the beam particles through a series of drift and quadrupole
elements. The integration is performed using a split-operator approach. The beam's self-consistent elec-
trostatic potential is computed by scattering charge density into a Field, performing an FFT, applying a
Green's function in Fourier space, and inverting the FFT. POOMA provides simple functions for scattering
the particle charge density, computing the gradient of the electrostatic potential, and gathering the resulting
electric eld at the particle positions.
Our results are in agreement with results of Ryne and Habib's 2D HPF code. The POOMA code is
instrumented to send particle and eld data to ACLVIS, a Los Alamos visualization package, during a code
run. This provides real-time data visualization capabilities that enable users quickly to spot problems in
code behavior and to study the eects of various beamline elements. Furthermore, POOMA provides a
simple mechanism for proling application codes with the Tau proling tools [5]. Simple macros in the
accelerator code generate timing data. Tau uses the data to chart the CPU time spent in each instrumented
routine by each processor. We are using these proling tools to analyze the performance of our code, and to
compare it with the performance of the HPF code. Our most recent tests, run on an Origin 2000 symmetric
multiprocessor computer, indicate that the POOMA code is comparable to the HPF code. More studies need
to be done before specic performance data can be provided. Our future work includes such studies and
recasting the current code into a generic templated form.
References
[1] J. V. Reynders, V. W. John, P. J. Hinker, J. C. Cummings, S. R. Atlas, S. Banerjee, W. F. Humphrey, S.
R. Karmesin, K. Keahey, M. Srikant, M. Tholburn, in Parallel Programming Using C++, G. V. Wilson
and P. Lu, eds. (MIT Press, Cambridge, 1996).
[2] R. D. Ryne, S. Habib, K. Ko, Z. Li, W. Mi, C.-K. Ng, J. Qiang, M. Saparov, V. Srinivas, Y. Sun, X.
Zhan, Proceedings ICNSP'98.
[3] T. Veldhuizen, http://monet.uwaterloo.ca/~tveldhui/DrDobbs2/drdobbs2.html
[4] T. Veldhuizen, C++ Report 7:5, 26 (June, 1995). Reprinted in C++ Gems, Stanley B. Lippman, ed.
(Sigs Books, NY, 1996).
[5] Tau. http://www.acl.lanl.gov/tau/