Você está na página 1de 25

An Architecture for

Interactive and
Collaborative
Parallel Computing
B r i a n E . G r a n g e r
S a n t a C l a r a U n i v e r s i t y

F e r n a n d o P e r e z
U n i v e r s i t y o f C o l o r a d o

A n d r e w G i u s t i n i
S a n t a C l a r a U n i v e r s i t y
The Challenge
Hardware
Cheap, fast and widely available.
Our free lunch is over -> Single CPUs aren’t getting much faster.
Transition to multi-CPU and multi-core CPU based machines.
Clusters and grids.
Software
Software development is labor intensive.
Development of parallel codes is very labor intensive.
Parallel programming tools and paradigms have not evolved much
in the last 2 decades.

The bottleneck in scientific computation is quickly


becoming software development NOT hardware.
Scientific Software
Development
Complex algorithms
Labor intensive
Lots of legacy code still used (BLAS,
LAPACK, your own) Cross language

Need for high-performance Parallelism

The code is always changing Compile/Exec/Debug

Large amounts of data High level languages


Scientists love MATLAB, IDL, Interactive
Mathematica

Collaborative development/execution Collaboration


Interactive and
Collaborative
Parallel Computing
Goal: Create a lightweight architecture that
enables parallel programs to be developed,
monitored, executed and debugged
interactively and collaboratively.
Why Python?
1. It is open source and accessible to everyone.
2. Can be used interactively (like MATLAB, Mathematica, IDL, etc.)
3. Simple, expressive syntax that is readable by human beings.
4. Powerful enough to use in large, complex applications.
5. Supports functional, object-oriented, generic and meta programming.
6. Extremely robust garbage collection.
7. Powerful built-in data-types and libraries.
8. Excellent tools for wrapping Fortran/C/C++/ObjC code (SWIG, F2PY,
Pyrex, Boost, Weave, PyObjC).
9. High quality external libraries for visualization (MayaVi), plotting
(matplotlib), numerical/scientific computing (NumPy/SciPy),
networking (Twisted), etc.
10. Python bindings for major GUI toolkits (wx, Tk, GTK, Qt).
11. Cross platform.
IPython: An enhanced
Interactive Python Shell
IPython is an enhanced interactive Python shell
It is the de facto shell for scientific computing in
Python.
Already comes with every major Linux
distributions.
Capabilities: IPython Session
In [1]:
Extensible syntax
GUI integration (wx, Qt, GTK, etc.)
Seamless system shell access
Object/namespace introspection
Command history/recall
Session logging
Embeddable
http://ipython.scipy.org
Message Passing
Interface (MPI)
Pros:
Robust, optimized, standardized, portable, common
Existing parallel libraries (FFTW, BLACS, ScaLAPACK, ...)
Runs over Ethernet, Infiniband, Myrinet.
Cons:
Trivial things are not trivial -> lots of boilerplate code.
Orthogonal to how scientists think and work.
Load balancing and fault tolerance are difficult to implement (even
for simple cases).
Emphasis on compiled languages (C/C++/Fortran).
Non-interactive and non-collaborative.
Difficult to integrate into other computing environments (GUIs,
visualization and plotting tools, Web based tools, etc.).
Labor intensive compile/execute/debug cycles.
Architectural Overview
IPython Session
In [1]: Kernel Kernel

Python
Kernel Kernel
- Objects
- Commands Kernel Kernel
IPython Session
In [1]: Kernel Kernel

Web Interface SSH Interface

Kernel = Network aware Python Instance


The Kernel
Python instance that listens on a network port
Multi-threaded or multi-process with a execution queue
Uses Twisted -> asynchronous, non-blocking sockets
Multi-protocol aware
Custom control protocol
SSH, HTTP, . . .
Can be started at any time using SSH, Xgrid, PBS,
GridEngine, Condor, . . .
Built-in GUI Integration (wx, Qt, Tk, GTK, Cocoa, . . .)
Pass Python objects, commands, modules, I/O, . . .
Auto-discovery using Bonjour/ZeroConf
The User Interface
Lightweight object oriented user interface in regular Python
Additional syntax in IPython (enhanced Interactive Python)
Medium level of abstraction
Higher level than MPI
Doesn’t assume a particular high-level model
Automatic synchronization of kernels (no barrier() calls)
Non-blocking and blocking modes
Clean handling of remote I/O
Users process can be transient/kernels are persistent
Security
Needed if system is used on an open network.

Start Kernels as user “nobody”

Firewall all but a few Gateway Kernels

Gateway Kernels can have SSL enabled for


encrypted communications.

Authenticate users

Twisted has SSL/Authentication capabilities built-in.


Interactive Usage
In [2]: ic = InteractiveCluster()

# Create an object

In [3]: ic.start(16)




# Start 16 kernels


...do interactive parallel calculations...

In[56]: ic.start(16)




# Start 16 more kernels


...do more calculations...

In[97]: ic.save("mycluster")


# Save the cluster to a file


...turn off laptop, go home, eat dinner, resume...

In [2]: ic = InteractiveCluster()

In [3]: ic.load(“mycluster”)


#Reload cluster from a file

Multiple users can connect simultaneously


Kernels started dynamically at any time
Working with
cluster Objects
In [2]: ic1 = InteractiveCluster()
In [3]: ic1.start(16)




# Start 16 kernels


...do interactive parallel calculations with ic1...

In [56]: ic2 = InteractiveCluster()


In [57]: ic2.load(“myfriendscluster”)
#Reload cluster from a file


...work with both simultaneously...

In [63]: ic1[‘a’] = ic2[0][‘b’]



# Move data between clusters
In [64]: ic1.activate()

# %px and %autopx active for ic1
In [80]: ic2.activate()

# %px and %autopx active for ic2

In [90]: ic3 = ic1 + ic2


# Create a supercluster
In [91]: ic3.activate()

# %px and %autopx active for ic3

In [101]: ic3.remove(45)
# Removes a dead kernel
In [150]: ic3.reset()

# Resets a running kernel
In [151]: ic3.kill()

# Kill a running kernel
Sending Python Commands
execute(): execute command on kernels
In [4]: ic.execute("a = 5")




# Execute a = 5 on all

In [5]: ic.execute("print a",block=True)


# Blocking mode
[129.210.112.39] Out[27]: 5
...
[129.210.112.35] Out[27]: 5

Execute on a subset of kernels


In [6]: ic.execute("a = 5", kernels=[0,2,4,6]) # On kernels 0,2,4,6


...Can also select a kernel using a list/array syntax...

In [7]: ic[0].execute("q = 10")





# q = 10 on kernel 0

The kernel will automatically queue pending


commands
Parallel Magics
It is annoying to type ic.execute(...)
Use IPython’s magic command system. Extended syntax!
%cmd args --> magic_cmd(args)
In [11]: ic.activate()



# Activate %px and %autopx
In [12]: %px a = 5




# ic.execute(“a = 5”)
In [13]: %px b = 10




# ic.execute(“b = 10”)
In [14]: %px c = a + b



# ic.execute(“c = a + b”)

In [15]: %autopx [0,1,2,3,4]



# All local commands sent to
Auto Parallel Enabled



# kernels 0,1,2,3,4

In [16]: q = sin(a)
In [17]: p = sin(b)

In [18]: %autopx
Auto Parallel Disabled

ic.block=True/False toggles I/O forwarding


Moving Python Objects
push(): one way send to a kernel
In [35]: ic.push("w",3.023)

# Send 3.023 to kernels as w

In [36]: ic.push("x",2.4,kernels=[0,1,2,3])
# Only to a subset

pull(): one way recv from a kernel


In [38]: wlist = ic.pull("w")

# Like an MPI gather

In [39]: print wlist





# Print local copy
[3.023,..., 3.023]

Graceful error handling:


In [40]: ic.pull("x")
Out[40]:
[2.4, 2.4, 2.4, 2.4, <NotDefined: x>, <NotDefined: x>,
<NotDefined: x>, <NotDefined: x>, <NotDefined: x>,
<NotDefined: x>]
Dictionary Syntax
Again, it is annoying to type ic.push() and ic.pull()
In [6]: ic['a'] = 20




# ic.push(‘a’,20)

In [7]: ic['a']






# ic.pull(‘a’)
Out[7]: [20, 20, 20, 20, 20, 20, 20, 20, 20, 20]

In [8]: ic[0]['b'] = 100.0





# ic.push(‘b’,100.0,[0])

In [9]: ic[0]['b']





# ic.pull(‘b’,kernels=[0])
Out[9]: 100.0

Can also scatter lists/arrays


In [10]: ic['mylist'] = Scatter(range(20))

In [12]: ic['mylist']
Out[12]:
[[0, 1],[2, 3],[4, 5],[6, 7],[8, 9],[10, 11],[12, 13],[14, 15],
[16, 17],[18, 19]]

In [16]: ic.pull('a',flatten=True)
Out[16]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Parallel Functions
Parallel functions: instant trivial parallelization
In [5]: psin = ic.parallelize('sin')
# psin is a ParallelFunction

In [6]: psin(range(100))


# [sin(0),sin(1),...,sin(99)]
Out[6]:
[0.0,0.8414709848078965,...]


...once a ParallelFunction has been created, it can be reused...

In [7]: pcompute = ic.parallelize(‘compute’)


In [8]: result1 = pcompute(parameter_list1)
In [9]: result2 = pcompute(parameter_list2)

Scatters the list/array to the kernels


Each kernel calls the function on the elements of the array
Results are gathered back to the local process
Took 13 lines of code to implement.
Other Higher Level
Parallel Models
GOAL: Make it easy to implement high level constructs
Distributed Memory Objects
Data parallel computations
Task Systems
Dynamically load balanced task system
Fault tolerant
Could allow tasks to be tightly coupled
Google’s MapReduce
MapReduce is a high-level programming model for processing and
generating large data set on large clusters. Inspired by LISPs map
and reduce.
Interactive implementation is possible.
Dynamic Loading of Code
In the middle of a parallel calculation, you can write a new
Python module and load it into the running kernels

In [29]: import mymodule






# import locally

In [30]: ic.push_module(mymodule)


# Send to kernels

In [31]: %px import mymodule





# import in kernels



...Can now use mymodule on the kernels...

Can also reload() modified modules.


Can use to fix bugs during a calculation
Test new algorithms without restarting
Collaboration
Multiple users can connect to a cluster simultaneously.
Shared namespace and data, common execution queue
Basic chat facility
Separation of control and monitoring of kernels
Some users can monitor the kernels
Others can control them
Arbitrary configurations allowed
Future: Moving Objects
Between Kernels
MPI is great at this, so let’s use it
Not needed in many cases -> MPI is optional
Start kernels with mpiexec and call MPI_Init()
Could wrap other MPI-based libraries.
User can directly make calls to MPI through Python
bindings.
A high level move() function:
In [29]: ic.move(’a’,’b’,{0:1})
# move ic[0][‘a’] -> ic[1][‘b’]



...This calls MPI_Send() on 0 and MPI_Recv() on 1...


...Automatic synchronization...
Other Future Directions
Collaborative visualization/plotting/GUI control
Other network interfaces (web, ssh)
Notebook-like frontend (like Mathematica)
Integration into other cluster environments (PBS,
Condor, GridEngine, Globus)
Scalability + Performance
Security
Full MPI integration
Other high-level parallel constructs
Distribution
The system is open source (BSD) and is part of the IPython
project:
http://ipython.scipy.org
IPython is the de facto shell for interactive scientific computing
in Python and comes with every major Linux distribution.
The kernel will become the foundation of a new version of
IPython.
The working prototype is publicly available on the IPython
subversion repository:
svn co http://ipython.scipy.org/svn/ipython/ipython/branches/chainsaw ipython1
Conclusion
Python is a useful tool in scientific
computation.

The future of parallel computing is


interactive and collaborative.

Scientists want free, open source and


extensible tools.

We don’t have to give up the tools (Fortran/


C/C++/MPI) we love.

Lots of work remains.

Você também pode gostar