Escolar Documentos
Profissional Documentos
Cultura Documentos
BATTERIES INCLUDED
Python offers basic facilities for interactive work and a comprehensive library on top of
which more sophisticated systems can be built. The IPython project provides an enhanced
interactive environment that includes, among other features, support for data visualization
and facilities for distributed and parallel computation.
T
he backbone of scientic computing is All these systems offer an interactive command
mostly a collection of high-perfor- line in which code can be run immediately, without
mance code written in Fortran, C, and having to go through the traditional edit/com-
C++ that typically runs in batch mode pile/execute cycle. This exible style matches well
on large systems, clusters, and supercomputers. the spirit of computing in a scientific context, in
However, over the past decade, high-level environ- which determining what computations must be
ments that integrate easy-to-use interpreted lan- performed next often requires signicant work. An
guages, comprehensive numerical libraries, and interactive environment lets scientists look at data,
visualization facilities have become extremely popu- test new ideas, combine algorithmic approaches,
lar in this eld. As hardware becomes faster, the crit- and evaluate their outcome directly. This process
ical bottleneck in scientic computing isnt always the might lead to a nal result, or it might clarify how
computers processing time; the scientists time is also they need to build a more static, large-scale pro-
a consideration. For this reason, systems that allow duction code.
rapid algorithmic exploration, data analysis, and vi- As this article shows, Python (www.python.org)
sualization have become a staple of daily scientic is an excellent tool for such a workflow.1 The
work. The Interactive Data Language (IDL) and IPython project (http://ipython.scipy.org) aims to
Matlab (for numerical work), and Mathematica and not only provide a greatly enhanced Python shell
Maple (for work that includes symbolic manipula- but also facilities for interactive distributed and par-
tion) are well-known commercial environments of allel computing, as well as a comprehensive set of
this kind. GNU Data Language, Octave, Maxima tools for building special-purpose interactive envi-
and Sage provide their open source counterparts. ronments for scientic computing.
MAY/JUNE 2007 23
In [36]: ls by typing the objects name and one or two ?
tt0.dat tt1.DAT tt2.dat tt3.DAT (two for extra details). These features are useful
# var = !cmd captures a system command into a when developing code, exploring a problem, or us-
Python variable: ing an unfamiliar library because direct experi-
In [37]: les = !ls mentation with the system can help produce
== working code that the user can then copy into an
editor as part of a larger program.
[tt0.dat, tt1.DAT, tt2.dat, tt3.DAT]
Figure 5 shows the information returned by
# Rename the les, using uniform case and 3-digit
IPython after querying an object called
numbers:
DeepThought from a module called universe. In
In [38]: for i, name in enumerate(les): line 2, weve hit the Tab key, so IPython com-
....: newname = time%03d.dat % i pletes a list of all the attributes defined for
....: !mv $name $newname DeepThought. Then, for the sequence
....: DeepThought??, IPython tries to find as much
In [39]: ls information about the object as it can, including
time000.dat time001.dat time002.dat time003.dat its entire source code.
MAY/JUNE 2007 25
PROJECTS USING IPYTHON nrao.edu) uses IPython in its interactive shell.
The Ganga system (http://ganga.web.cern.ch/ganga/),
The PyMAD project at the neutron scattering fa- dow appears. IPython addresses this problem by of-
cility of the Institut Laue Langevin in Grenoble, fering special startup flags that let users choose
France, uses this feature for interactive control of which toolkit they want to control interactively in
experimental devices. The IPython console runs a nonblocking manner.
on a system that connects to the neutron spec- This feature is necessary for one of scientists
trometer over a network, but users interact with most common tasks: interactive data plotting and
the remote system as if it were local, and Tab com- visualization. Many traditional plotting libraries
pletion operates over the network to fetch infor- and programs have Python bindings or process-
mation about remote objects for display in the based interfaces, but most have various limitations
users local console. for interactive use. The matplotlib project (http://
matplotlib.sourceforge.net) is a sophisticated plot-
Graphical Interface Toolkits and Plotting ting library capable of producing publication-qual-
Python provides excellent support for GUI toolk- ity graphics in a variety of formats, and with full
its. It ships by default with bindings for Tk, and LaTeX support.3 Matplotlib renders its plots to
third-party bindings are available for GTK, several back ends, the components responsible for
WxWidgets, Qt, and Cocoa (under Apple OS X). generating the actual gure. Some back ends (such
You can use essentially every major toolkit to as for PostScript, PDF, and Scalable Vector Graph-
write graphical applications from Python. Al- ics) are purely disk-based and meant to generate
though few scientists look forward to GUI design, files; others are meant for display in a window.
they increasingly have to write small- to medium- Matplotlib supports all these toolkits, letting users
sized graphical applications to interface with sci- choose which to use via a conguration le setting.
entific code, drive instruments, or collect data. (The Scientic Programming department on p. 90
Python lets scientists choose the toolkit that best explores matplotlib in more detail.)
fits their needs. IPython and matplotlib developers have collab-
However, graphical applications are notoriously orated to enable automatic coordination between
difficult to test and control from an interactive the two systems. If given the special pylab startup
command line. In the default Python shell, if a user flag, for example, IPython detects the users mat-
instantiates a Qt application, for example, the com- plotlib settings and automatically congures itself
mand line stops responding as soon as the Qt win- to enable nonblocking interactive plotting. This
1
cos ( x sin ) d .
0
J0 ( x ) =
Figure 8. IPython using the pylab flag to enable interactive use of
The last line shows matplotlibs capabilities for ar- the matplotlib library. Plot windows can open without blocking the
ray plotting with a simple 32 32 set of random interactive terminal, using any of the GUI toolkits supported by
numbers. matplotlib (Tk, WxWidgets, GTK, or Qt).
Although matplotlibs main focus is 2D plotting,
several packages exist for 3D plotting and visual-
ization in Python. The Visualization Toolkit
(VTK) is a mature and sophisticated visualization
library written in C++ that ships with Python bind-
ings. Recently, developers have introduced a new
set of bindings called Traits-enabled VTK
(TVTK),5 which provides seamless integration
with the NumPy array objects and libraries as well
as a higher-level API for application development.
Figure 9 shows how to use TVTK interactively
from within an IPython session. Because matplotlib
has WXPython support, you can use both TVTK
and matplotlib concurrently from within IPython.
Interactive Parallel
and Distributed Computing
Although interactive computing environments Figure 9. An IPython session showing a 3D plot done with TVTK.
can be extremely productive, theyve traditionally The GUI toolkit used is WXPython, so IPython is started with the
had one weakness: they havent been able to take -wthread flag.
advantage of parallel computing hardware such as
multicore CPUs, clusters, and supercomputers.
Thus, although scientists often begin projects us- computational nodes, is ParGAP (www.ccs.neu.
ing an interactive computing environment, at edu/home/gene/pargap.html), a parallel-enabled
some point they switch to using languages such as version of the open source package Groups, Al-
C, C++, and Fortran when performance becomes gorithms, and Programming (GAP) for computa-
critical and the projects call for parallelization. In tional group theory.
recent years, several vendors have begun offering In the Python world, several projects also exist
distributed computing capabilities for the major that seek to add support for distributed computing.
commercial technical computing systems (see the The Python-community-maintained wiki keeps a
Distributed Computing Toolkits for Commer- list of such efforts (http://wiki.python.org/moin/
cial Systems sidebar for some examples). These ParallelProcessing). Of particular interest to scien-
provide various levels of integration between the tific users, Python has been used in parallel com-
computational back ends and interactive front puting contexts both with the message-passing
ends. An early precursor to these systems, whose interface (MPI, http://sourceforge.net/projects/
model was one of full interactive access to the pympi; http://mpi4py.scipy.org)6,7 and the Bulk
MAY/JUNE 2007 27
grained parallelism, and a lower-level interface that
DISTRIBUTED COMPUTING gives users direct interactive access to a set of run-
ning engines. This second interface is useful for
TOOLKITS FOR COMMERCIAL SYSTEMS both medium- and fine-grained parallelism that
uses MPI for communications between engines.
Most importantly, advanced users and developers
S ome vendors offer distributed computing capabilities for the ma-
jor commercial technical computing systems: can use these components to build customized in-
teractive parallel/distributed applications in
Matlab Distributed Computing Toolbox, www.mathworks.com/ Python. End users work with the system interac-
products/distribtb. tively by connecting to a controller using a Web
FastDL, www.txcorp.com/products/FastDL. browser, an IPython- or Python-based front end,
Mathematica Parallel Computing Toolkit, http://documents. or a traditional GUI.
wolfram.com/applications/parallel. Specic constraints that are relevant in scientic
Mathematica Personal Grid Edition, www.wolfram.com/products/ computing guided this design:
personalgrid.
Grid Mathematica, www.wolfram.com/products/gridmathematica. It should support many different styles of paral-
HPC-Grid, www.maplesoft.com/products/toolboxes/HPCgrid. lelism, such as message passing using MPI, task
Star-P, www.interactivesupercomputing.com. farming, and shared memory.
It should run on everything from multicore lap-
Other projects seek to support distributed computing using Python tops to supercomputers.
(see http://wiki.python.org/moin/ParallelProcessing). It should integrate well with existing parallel
code and libraries written using C, C++, or For-
tran, and MPI for communications.
All network communications, events, and error
Synchronous Parallel8 models. handling should be fully asynchronous and
Building on Pythons and IPythons strengths as nonblocking.
an interactive computing system, weve begun a It should support all of IPythons existing fea-
signicant effort to add interactive parallel and dis- tures in parallel contexts.
tributed capabilities to IPython. More specically,
our goal is to enable users to develop, test, debug, The architectural requirements for running
execute, and monitor parallel and distributed ap- IPython in a distributed manner are similar to
plications interactively using IPython. To make this those required for decoupling a user front end from
possible, weve refactored IPython to support these a computational back end. Therefore, this restruc-
new features. Weve deliberately built a system turing effort also lets IPython offer new types of
whose basic components make no specic assump- user interfaces for remote and distributed work,
tions about communications models, data distrib- such as a Web browser-based IPython GUI and
ution, or network protocols. The redesigned collaborative interfaces that enable multiple remote
IPython consists of users to simultaneously access and share running
computational resources and data.
the IPython core, which exposes IPythons core
functionality, abstracted as a Python library
T
rather than as a terminal-based application; he first public release of these new
the IPython engine, which exposes the IPython components was in late 2006. While it
cores functionality to other processes (either lo- should still be considered under heavy
cal to the same machine or remote) over a stan- development and subject to changes,
dard network connection; and weve already been contacted by several projects
the IPython controller, which is a process that ex- that have begun using it as a tool in production
poses a clean asynchronous interface for work- codes. Details about this work are available on the
ing with a set of IPython engines. IPython Web site.
MAY/JUNE 2007 29