Vector Code Example

EENG/CSCI 641 Computer Architecture 1
Vector Code Example

Name:
Grade:
Example:
Consider this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = a*x[i] + y[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D
L.D.
L.D.
L.D.
L.D.
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ
F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8
R2,R2,8
R3,R3,8
R10,R10,-1
R10,Loop
;
;
;
;
;
;
;
;
;
;
load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]
increment array pointer for y[]
increment array pointer for z[]
decrement loop counter
branch R10 != zero
process
array x
array y
array z
b) Develop the VMIPS assembly code for this C code.

L.D
L.D.
L.D.
L.D.
L.D.
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
LOOP:
LV
MULVS.D
LV
ADDVV.D
SV
DADDUI
V1,R1
V2,V1,F0
V3,R2
V4,V2,V3
R3,V4
R1,R1,16*8
;
;
;
;
;
;
load vector X
vector-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]
process
array x
array y
array z
Page 1 of 6
DADDUI
DADDUI
DADDUI
BNEZ
c)
R2,R2,16*8 ; increment array pointer for y[]

R3,R3,16*8 ; increment array pointer for z[]
R10,R10,-16 ; decrement loop counter
R10,Loop
; branch R10 != zero
How many cycles it takes to execute the scalar code, assuming no memory latencies?
Instruction
L.D
L.D.
L.D.
L.D.
L.D.
Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
BNEZ
Number of times
executed
1
1
1
1
1
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8
;
;
;
;
;
;
load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer
128
128
128
128
128
128
R2,R2,8
; increment array pointer
128
R3,R3,8
; increment array pointer
128
process
array x
array y
array z

R10,Loop
128
127*2 + 1*1
Total number of instruction cycles = 1(1+1+1+1+1) + 128 (9) + 127*2 + 1 = 1412

d) How many cycles it takes to execute the scalar code, assuming no memory latencies?
Instruction
L.D
L.D.
L.D.
L.D.
L.D.
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
LOOP:
L.D
MULVS.D
LV
ADDVV.D
SV
DADDUI
V1,R1
; load vector X
V2,V1,F0
; vector-scalar multiply
V3,R2
; load vector Y
V4,V2,V3
; add
R3,V4
; store the result
R1,R1,16*8 ; increment array pointer
process
array x
array y
array z
Number of times
executed
1
1
1
1
1
8
8
8
8
8
8
Page 2 of 6
for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
counter
BNEZ
R10,R10,-16
R10,Loop
; decrement loop
7*2 + 1*1
Total number of instruction cycles = 1(1+1+1+1+1) + 8 (9*8) + 7*2 +1 = 92

e)
What is the speed up?
1412/92 = 15.34
f)
What would be the speed up if the vector length is 32?
I leave it for you to figure this out. Remember now per loop iteration, you calculate 32 elements, as opposed to 16.
Page 3 of 6
Exercise:
Repeat the previous example for this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = (x[i] + y[i]) * w[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D.
L.D.
L.D.
L.D.
L.D.
R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000
;
;
;
;
;
128 elements
load address
load address
load address
load address
to
of
of
of
of
process
array x
array y
array z
array w
Loop:
L.D
L.D
L.D
ADD.D
MUL.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ
F1,[R1]
F2,[R2]
F4,[R4]
F3,F1,F2
F5, F3, F4
[R3],F5
R1,R1,8
R2,R2,8
R3,R3,8
R4,R4,8
R10,R10,-1
R10,Loop
;
;
;
;
;
;
;
;
;
;
;
;
load vector X
load vector Y
load vector W
add X & Y
Z = (X+Y) * W
store the result
decrement loop counter
branch R10 != zero
for
for
for
for
x[]
y[]
z[]
w[]
b) Develop the VMIPS assembly code for this C code.

L.D.
L.D.
L.D.
L.D.
L.D.
R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000
;
;
;
;
;
128 elements
load address
load address
load address
load address
to
of
of
of
of
process
array x
array y
array z
array w
LOOP:
LV
LV
LV
ADDVV.D
MULVV.D
SV
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ
V1,R1
; load vector X
V2,R2
; load vector Y
V4,R4
; load vector W
V3,V1,V2
; vector-vector add
V5,V3,V4
; add
R3,V5
; store the result
R1,R1,16*8 ; increment array pointer for x[]
R2,R2,16*8 ; increment array pointer for y[]
R3,R3,16*8 ; increment array pointer for z[]
R4,R3,16*8 ; increment array pointer for w[]
R10,Loop
Page 4 of 6
c)
How many cycles it takes to execute the scalar code, assuming no memory latencies?
Instruction
Number of times
executed
Total number of instruction cycles =

d) How many cycles it takes to execute the scalar code, assuming no memory latencies?
Instruction
Number of times
executed
Total number of instruction cycles =

Page 5 of 6
e)
What is the speed up?
f)
What would be the speed up if the vector length is 32?
Page 6 of 6

Vector Code Example

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Vector Code Example

Enviado por

Direitos autorais:

Formatos disponíveis

EENG/CSCI 641 Computer Architecture 1

Vector Code Example

b) Develop the VMIPS assembly code for this C code.

R2,R2,16*8 ; increment array pointer for y[]

; increment array pointer

; increment array pointer

R10,R10,-1 ; decrement loop counter

Total number of instruction cycles = 1(1+1+1+1+1) + 128 (9) + 127*2 + 1 = 1412

R2,R2,16*8 ; increment array pointer

R3,R3,16*8 ; increment array pointer

; branch R10 != zero

Total number of instruction cycles = 1(1+1+1+1+1) + 8 (98) + 72 +1 = 92

What is the speed up?

What would be the speed up if the vector length is 32?

b) Develop the VMIPS assembly code for this C code.

Total number of instruction cycles =

Total number of instruction cycles =

What is the speed up?

What would be the speed up if the vector length is 32?

Você também pode gostar