Você está na página 1de 48

A codelsss

introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
A codelsss introduction to GPU parallelism
Will Landau
Iowa State University
September 23, 2013
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 1 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Outline
A review of GPU parallelism
Examples of parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte Carlo
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 2 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
A review of GPU parallelism
Outline
A review of GPU parallelism
Examples of parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte Carlo
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 3 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
A review of GPU parallelism
The single instruction, multiple data (SIMD)
paradigm

SIMD: apply the same command to multiple places in a


dataset.
f o r ( i = 0; i < 1e6 ; ++i )
a [ i ] = b [ i ] + c [ i ] ;

On CPUs, the iterations of the loop run sequentially.

With GPUs, we can easily run all 1,000,000 iterations


simultaneously.
i = t hr e a dI dx . x ;
a [ i ] = b [ i ] + c [ i ] ;

We can similarly parallelize a lot more than just loops.


Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 4 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
A review of GPU parallelism
CPU / GPU cooperation

The CPU (host) is in charge.

The CPU sends computationally intensive instruction


sets to the GPU (device) just like a human uses a
pocket calculator.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 5 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
A review of GPU parallelism
How GPU parallelism works
1. The CPU sends a command called a kernel to a GPU.
2. The GPU executes several duplicate realizations of this
command, called threads.

These threads are grouped into bunches called blocks.

The sum total of all threads in a kernel is called a grid.

Toy example:

CPU says: Hey, GPU. Sum pairs of adjacent numbers.


Use the array, (1, 2, 3, 4, 5, 6, 7, 8).

GPU thinks: Sum pairs of adjacent numbers is a


kernel that I need to apply to the array, (1, 2, 3, 4, 5, 6,
8).

The GPU spawns 2 blocks, each with 2 threads:


Block 0 1
Thread 0 1 0 1
Action 1 + 2 3 + 4 5 + 6 7 + 8

I could have also used 1 block with 4 threads and given


the threads dierent pairs of numbers.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 6 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
A review of GPU parallelism
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 7 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism
Outline
A review of GPU parallelism
Examples of parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte Carlo
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 8 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Vector addition
Vector addition

Say I have 2 vectors,


a =
_

_
a
1
a
2
.
.
.
a
n
_

_
b =
_

_
b
1
b
2
.
.
.
b
n
_

I want to compute their component-wise sum,


c =
_

_
c
1
c
2
.
.
.
c
n
_

_
=
_

_
a
1
+ b
1
a
2
+ b
2
.
.
.
a
n
+ b
n
_

_
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 9 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Vector addition
Vector addition
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 10 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Vector addition
Vector addition
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 11 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Vector addition
Vector addition
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 12 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation

Lets take the pairwise sum of the vector,


(5, 2, 3, 1, 1, 8, 2, 6)
using 1 block of 4 threads.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 13 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
Thread 1
6
0
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 14 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
6 10
Thread 2
1
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 15 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
6 10 -1
Thread 32
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 16 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
Thread 3
6 10 -1 7
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 17 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
6 10 -1 7
Synchronize threads
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 18 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Synchronizing threads

Synchronization: waiting for all parallel tasks to reach


a checkpoint before allowing any of then to continue.

Threads from the same block can be synchronized easily.

In general, do not try to synchronize threads from


dierent blocks. Its possible, but extremely inecient.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 19 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
6 10 -1 7
5
Thread 0
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 20 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
6 10 -1 7
5 17
Thread 1
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 21 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
6 10 -1 7
5 17
Synchronize Threads
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 22 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Pairwise summation
5 2 -3 1 1 8 2 6
6 10 -1 7
Thread 0
22
5 17
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 23 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Compare the pairwise sum to the sequential sum

The pairwise sum requires only log


2
(n) sequential steps,
while the sequential sum requires n 1 sequential steps.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 24 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Pairwise summation
Reductions and scans

Reductions

Pairwise sum and pairwise multiplication are examples


of reductions.

Reduction: an algorithm that applies some binary


operation on a vector to produce a scalar.

Scans

Scan (prex sum): an operation on a vector that


produces a sequence of partial reductions.

Example: computing the sequence of partial sums in


pairwise fashion.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 25 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Matrix multiplication
Matrix multiplication

Take an m n matrix, A = (a
ij
), and an n p matrix, B = (b
jk
).
Compute C = A B:

Write A in terms of its rows: A =

a
1.
.
.
.
a
m.

where
a
i .
=

a
i 1
a
in

Write B in terms of its columns: B =

b
.1
b
.p

where
b
.k
=

b
1k
.
.
.
b
nk

Compute C = A B by taking the product of each row of A with


each column of B:
C = A B =

(a
1.
b
.1
) (a
1.
b
.p
)
.
.
.
.
.
.
.
.
.
(a
m.
b
.1
) (a
m.
b
.p
)

Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 26 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Matrix multiplication
Parallelizing matrix multiplication

Entry (i , k) of matrix C is
c
ik
= a
i 1
b
1k
. .
+a
i 2
b
2k
. .
+ + a
in
b
nk
. .
= c
i 1k
+ c
i 2k
+ + c
ink

Assign block (i , k) to compute c


ik
.
1. Spawn n threads.
2. Tell the j th thread to compute c
ijk
= a
ij
b
jk
.
3. Synchronize threads to make sure we have nished
calculating c
i 1k
, c
i 2k
, . . . , c
ink
before continuing.
4. Compute c
ik
=

n
j =1
c
ijk
as a pairwise sum.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 27 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Matrix multiplication
Matrix multiplication

Say I want to compute A B, where:


A =
_
_
1 2
1 5
7 9
_
_
B =
_
8 8 7
3 5 2
_

I write the multiplication as an array of products:


C =
_

1 2

8
3

1 2

8
5

1 2

7
2

1 5

8
3

1 5

8
5

1 5

7
2

7 9

8
3

7 9

8
5

7 9

7
2

_
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 28 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Matrix multiplication
Matrix multiplication

We dont need to synchronize blocks because they


operate independently.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 29 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Matrix multiplication
Matrix multiplication

Consider block (0, 0), which computes


_
1 2

_
8
3
_
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 30 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Matrix multiplication
Matrix multiplication
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 31 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Matrix multiplication
Matrix multiplication
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 32 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism K-means clustering
Lloyds K-means algorithm

Cluster N vectors in Euclidian space into K groups.


Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 33 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism K-means clustering
Step 1: choose initial cluster centers.

The circles are the cluster means, the squares are the
data points, and the color indicates the cluster.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 34 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism K-means clustering
Step 2: assign each data point (square) to its
closest center (circle).
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 35 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism K-means clustering
Step 3: update the cluster centers to be the
within-cluster data means.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 36 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism K-means clustering
Repeat step 2: reassign points to their closest
cluster centers.

. . . and repeat until convergence.


Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 37 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism K-means clustering
Parallel K-means

Step 2: assign points to closest cluster centers.

Spawn N blocks with K threads each.

Let thread (n, k) compute the distance between data


point n and cluster center k.

Synchronize threads.

Let thread (n, 1) assign data point n to its nearest


cluster center.

Step 3: recompute cluster centers.

Spawn one block for each cluster.

Within each block, compute the mean of the data in the


corresponding cluster.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 38 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Markov chain Monte Carlo

Consider a bladder cancer data set:

Available from http://ratecalc.cancer.gov/.

Rates of death from bladder cancer of white males from 2000 to


2004 in each county in the USA.

Let:

y
k
= number of observed deaths in county k.

n
k
= the number of person-years in county k divided by 100,000.


k
= expected number of deaths per 100,000 person-years.

The model:
y
k
ind
Poisson(n
k

k
)

k
iid
Gamma(, )
Uniform(0, a
0
)
Uniform(0, b
0
)

Also assume and are independent and x a


0
and b
0
.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 39 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Full conditional distributions

We want to sample from the joint posterior,


p(, , | y) p(y | , , )p(, , )
p(y | , , )p( | , )p(, )
p(y | , , )p( | , )p()p()

k=1
[p(y
k
|
k
, n
k
)p(
k
| , )]p()p()

k=1

e
n
k

y
k
k

()

1
k
e

I (0 < < a
0
)I (0 < < b
0
)

We iteratively sample from the full conditional distributions.


p( | y, , )
p( | y, , )

k
p(
k
| y,
k
, , ) IN PARALLEL!
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 40 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Full conditional distributions
p(
k
| y,
k
, , ) p(, , | y)
e
n
k

y
k
k

1
k
e

=
y
k
+1
k
e

k
(n
k
+)
Gamma(y
k
+ , n
k
+ )
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 41 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Conditional distributions of and
p( | y, , ) p(, , | y)

k=1
_

1
k

()
_
I (0 < < a
0
)
=
_
K

k=1

k
_

K
()
K
I (0 < < a
0
)
p( | y, , ) p(, , | y)

k=1
_
e

I (0 < < b
0
)
=
K
e

K
k=1

k
I (0 < < b
0
)
Gamma
_
K + 1,
K

k=1

k
_
I (0 < < b
0
)
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 42 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Summarizing the Gibbs sampler
1. Sample from from its full conditional.

Draw the
k
s in parallel from independent
Gamma(y
k
+ , n
k
+ ) distributions.

In other words, assign each thread to draw an individual

k
from its Gamma(y
k
+ , n
k
+ ) distribution.
2. Sample from its full conditional using a random walk
Metropolis step.
3. Sample from its full conditional (truncated Gamma)
using the inverse cdf method if b
0
is low or a
non-truncated Gamma if b
0
is high.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 43 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Preview: a bare bones CUDA C workow
#i n c l u d e <s t d i o . h>
#i n c l u d e <s t d l i b . h>
#i n c l u d e <cuda . h>
#i n c l u d e <cuda r unt i me . h>
g l o b a l voi d s ome ke r ne l ( . . . ) { . . . }
i n t mai n ( voi d ) {
// De c l ar e a l l v a r i a b l e s .
. . .
// Al l o c a t e hos t memory .
. . .
// Dynami cal l y a l l o c a t e de v i c e memory f o r GPU
r e s u l t s .
. . .
// Wr i t e t o hos t memory .
. . .
// Copy hos t memory t o de v i c e memory .
. . .
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 44 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Preview: a bare bones CUDA C workow
// Execut e k e r n e l on t he de v i c e .
s ome ker nel <<< num bl ocks , num t he ads pe r bl oc k
>>>(...) ;
// Wr i t e GPU r e s u l t s i n de v i c e memory back t o
hos t memory .
. . .
// Fr ee dynami c al l y a l l o c a t e d hos t memory
. . .
// Fr ee dynami c al l y a l l o c a t e d de v i c e memory
. . .
}
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 45 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Outline
A review of GPU parallelism
Examples of parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte Carlo
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 46 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Resources
1. J. Sanders and E. Kandrot. CUDA by Example.
Addison-Wesley, 2010.
2. Prof. Jarad Niemis STAT 544 lecture notes.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 47 / 47
A codelsss
introduction to
GPU parallelism
Will Landau
A review of GPU
parallelism
Examples of
parallelism
Vector addition
Pairwise summation
Matrix multiplication
K-means clustering
Markov chain Monte
Carlo
Examples of parallelism Markov chain Monte Carlo
Thats all for today.

Series materials are available at


http://will-landau.com/gpu.
Will Landau (Iowa State University) A codelsss introduction to GPU parallelism September 23, 2013 48 / 47

Você também pode gostar