Você está na página 1de 6

EENG/CSCI 641 Computer Architecture 1

Vector Code Example


Name:

Grade:

Example:
Consider this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = a*x[i] + y[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D
L.D.
L.D.
L.D.
L.D.

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ

F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8
R2,R2,8
R3,R3,8
R10,R10,-1
R10,Loop

;
;
;
;
;
;
;
;
;
;

load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]
increment array pointer for y[]
increment array pointer for z[]
decrement loop counter
branch R10 != zero

process
array x
array y
array z

b) Develop the VMIPS assembly code for this C code.


L.D
L.D.
L.D.
L.D.
L.D.

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

LOOP:
LV
MULVS.D
LV
ADDVV.D
SV
DADDUI

V1,R1
V2,V1,F0
V3,R2
V4,V2,V3
R3,V4
R1,R1,16*8

;
;
;
;
;
;

load vector X
vector-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]

process
array x
array y
array z

Page 1 of 6

DADDUI
DADDUI
DADDUI
BNEZ
c)

R2,R2,16*8 ; increment array pointer for y[]


R3,R3,16*8 ; increment array pointer for z[]
R10,R10,-16 ; decrement loop counter
R10,Loop
; branch R10 != zero

How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction
L.D
L.D.
L.D.
L.D.
L.D.
Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
BNEZ

Number of times
executed
1
1
1
1
1

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8

;
;
;
;
;
;

load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer

128
128
128
128
128
128

R2,R2,8

; increment array pointer

128

R3,R3,8

; increment array pointer

128

process
array x
array y
array z

R10,R10,-1 ; decrement loop counter


R10,Loop
; branch R10 != zero

128
127*2 + 1*1

Total number of instruction cycles = 1(1+1+1+1+1) + 128 (9) + 127*2 + 1 = 1412


d) How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction
L.D
L.D.
L.D.
L.D.
L.D.

F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000

;
;
;
;
;

load scalar a
128 elements to
load address of
load address of
load address of

LOOP:
L.D
MULVS.D
LV
ADDVV.D
SV
DADDUI

V1,R1
; load vector X
V2,V1,F0
; vector-scalar multiply
V3,R2
; load vector Y
V4,V2,V3
; add
R3,V4
; store the result
R1,R1,16*8 ; increment array pointer

process
array x
array y
array z

Number of times
executed
1
1
1
1
1

8
8
8
8
8
8
Page 2 of 6

for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
counter
BNEZ

R2,R2,16*8 ; increment array pointer

R3,R3,16*8 ; increment array pointer

R10,R10,-16

R10,Loop

; decrement loop

; branch R10 != zero

7*2 + 1*1

Total number of instruction cycles = 1(1+1+1+1+1) + 8 (9*8) + 7*2 +1 = 92


e)

What is the speed up?

1412/92 = 15.34
f)

What would be the speed up if the vector length is 32?

I leave it for you to figure this out. Remember now per loop iteration, you calculate 32 elements, as opposed to 16.

Page 3 of 6

Exercise:
Repeat the previous example for this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = (x[i] + y[i]) * w[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D.
L.D.
L.D.
L.D.
L.D.

R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000

;
;
;
;
;

128 elements
load address
load address
load address
load address

to
of
of
of
of

process
array x
array y
array z
array w

Loop:
L.D
L.D
L.D
ADD.D
MUL.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ

F1,[R1]
F2,[R2]
F4,[R4]
F3,F1,F2
F5, F3, F4
[R3],F5
R1,R1,8
R2,R2,8
R3,R3,8
R4,R4,8
R10,R10,-1
R10,Loop

;
;
;
;
;
;
;
;
;
;
;
;

load vector X
load vector Y
load vector W
add X & Y
Z = (X+Y) * W
store the result
increment array pointer
increment array pointer
increment array pointer
increment array pointer
decrement loop counter
branch R10 != zero

for
for
for
for

x[]
y[]
z[]
w[]

b) Develop the VMIPS assembly code for this C code.


L.D.
L.D.
L.D.
L.D.
L.D.

R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000

;
;
;
;
;

128 elements
load address
load address
load address
load address

to
of
of
of
of

process
array x
array y
array z
array w

LOOP:
LV
LV
LV
ADDVV.D
MULVV.D
SV
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ

V1,R1
; load vector X
V2,R2
; load vector Y
V4,R4
; load vector W
V3,V1,V2
; vector-vector add
V5,V3,V4
; add
R3,V5
; store the result
R1,R1,16*8 ; increment array pointer for x[]
R2,R2,16*8 ; increment array pointer for y[]
R3,R3,16*8 ; increment array pointer for z[]
R4,R3,16*8 ; increment array pointer for w[]
R10,R10,-16 ; decrement loop counter
R10,Loop
; branch R10 != zero
Page 4 of 6

c)

How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction

Number of times
executed

Total number of instruction cycles =


d) How many cycles it takes to execute the scalar code, assuming no memory latencies?

Instruction

Number of times
executed

Total number of instruction cycles =


Page 5 of 6

e)

What is the speed up?

f)

What would be the speed up if the vector length is 32?

Page 6 of 6

Você também pode gostar