Escolar Documentos
Profissional Documentos
Cultura Documentos
Dense Matrix
Multiplication
Fall 2012
Ortega, Patricia
Outline
Problem definition
Assumptions
Implementation
Test Results
Future work
Conclusions
Problem definition
mxr rxn
The simplest way of multiplying two
matrices takes about n3 steps.
Problem definition (Cont.)
4. The A sub-blocks are rolled one step to the left and the B
sub-blocks are rolled one step upward.
5. Repeat steps 3 y 4 sqrt(p) times.
Implementation (Cont.)
Example
Matrices to be multiplied
Example:
8 -8
2 1 5 3 6 1
-8 5
0 7 1 6 4 5
4 4
9 2 1 9 2 3
7 2
5 3 4 0 6 5
Example:
Local matrix multiplication.
C0,0= 2 1 X 6 1 = 16 7
0 7 4 5 28 35
8 -8
5 3 16 -25
C0,1 = X -8 5 =
1 6 -40 22
4 4
C1,0 = X 1 9 = 20 36
7 2
4 0 15 63
9 2 2 3 30 37
C1,1 = 5 3 X 6 5 = 42 39
Example:
5 3 2 1 1 9 2 3
1 6 0 7 4 0 6 5
4 4 8 -8
9 2 6 1
7 2 -8 5
5 3 4 5
Example:
Local matrix multiplication.
C0,0= C0,0 + 2 1 X 6 1 = 16 7 + 17 45 = 33 52
0 7 4 5 28 35 25 9 53 44
8 -8
5 3 16 -25 10 11 26 -14
C0,1 = C0,1 + X -8 5 = +
1 6 -40 22 42 35+ = 2 57
4 4
C1,0 = C1,0 + X 1 9 = 20 36 + 62 19 + = 82 55
7 2
4 0 15 63 42 33 57 96
9 2 2 3 30 37 0 -12 30 25
C1,1 = C1,1 + 5 3 X 6 5 = 42 39 + 40 -46 + = 82 -7
Test
Objective:
Analyze speedup achieved by the parallel
algorithm when increases the size of the input
data and the number of cores of the
architecture.
Constraints:
The number of cores used must be a exact
square root.
Must be possible the exact distribution to the
total amount of data into the available cores.
Test
Execution:
Run sequential algorithm on a single processor/
core.
For test the parallel algorithm were used the
following number of cores:
4,9,16,25,36,49,64,100
The results were obtained from the average
over three tests of the algorithms.
Test performed in matrices with dimensions up
1000x1000, increasing with steps of 100.
Test Results - Sequential
Sequential
9
6
Running Time (sec.)
Sequential
4
0
100 200 300 400 500 600 700 800 900 1000
Test Results Sequential vs. Parallel
Sequential vs Parallel
Procs # 100
9
6
Time sec.
5
Procs. 1
4 Procs. 100
0
100 200 300 400 500 600 700 800 900 1000
Test Results - Parallel
Size:600x600
4
3.5
2.5
Time (sec)
2
Size:600x600
1.5
0.5
0
Procs. 1 Procs. 25 Procs. 36 Procs. 64 Procs. 100
Test Results - Parallel
Speedup
16
14
12
10
8
Speedup
0
Procs. 25 Procs. 36 Procs. 64 Procs. 100
Test Results Parallel (Counter
Intuitive in small test data)
Size:100 x 100
0.06
0.05
0.04
Time (sec.)
0.03
Size:100 x 100
0.02
0.01
0
Procs. 1 Procs. 4 Procs. 16 Procs. 25 Procs. 100
Further work