Você está na página 1de 99

ARM processor organization

P. Bakowski

bako@ieee.org

ARM register bank


The register bank, which stores the processor state.

r00 r01 r14 r15

P. Bakowski

ARM register bank


It has two read ports and one write port which can each be used to access any register, plus an additional read port and an additional write port that give special access to r15, the program counter. r00 r01 r14 r15
P. Bakowski
3

ARM register bank


It has two read ports and one write port which can each be used to access any register, plus an additional read port and an additional write port that give special access to r15, the program counter. r00 r01 r14 r15
P. Bakowski
4

ARM register bank


It has two read ports and one write port which can each be used to access any register, plus an additional read port and an additional write port that give special access to r15, the program counter. r00 r01 r14 r15
P. Bakowski
5

ARM register bank


It has two read ports and one write port which can each be used to access any register, plus an additional read port and an additional write port that give special access to r15, the program counter. r00 r01 r14 r15
P. Bakowski
6

ARM barrel shifter


The barrel shifter, which can shift or rotate one operand by any number of bits.

number of bits

P. Bakowski

ARM ALU
The ALU, which performs the arithmetic and logic functions required by the instruction set. operands functions

P. Bakowski

ARM 3-stage pipeline


The address register and incrementer, which select and hold addresses and generate sequential addresses when required.

address register incrementer

P. Bakowski

ARM 3-stage pipeline


The data registers, which hold data passing to and from memory. data data out register instructions

data in register

to/from memory
P. Bakowski

d[31:0]

10

ARM 3-stage pipeline


The instruction decoder and associated control logic.
data in register
11

control path control signals data path

instructions

P. Bakowski

Three stage pipeline : ARM 1,2,3


FETCH; the instruction is fetched from memory and placed in the instruction pipeline. address register data in register memory

to instruction register

fetch clock cycle


P. Bakowski
12

Three stage pipeline : ARM 1,2,3


DECODE; the instruction is decoded and the datapath control signals prepared for the next cycle. instruction register control path control signals fetch decode fetch
P. Bakowski
13

Three stage pipeline : ARM 1,2,3


EXECUTE; the instruction controls the datapath; the register bank is read, an operand shifted the ALU result generated and written back into a destination register. control signals data path fetch decode fetch
P. Bakowski

execute decode
14

Three stage pipeline : ARM 1,2,3


instruction throughput : 1 instruction per clock cycle instruction latency : 3 clock cycles

fetch

decode fetch

execute decode fetch execute decode execute

clock cycle
P. Bakowski
15

Three stage pipeline : ARM 1,2,3


instruction throughput : 1 instruction per clock cycle instruction latency : 3 clock cycles

fetch

decode fetch

execute decode fetch execute decode execute

P. Bakowski

16

ARM 1,2,3 architecture


a[31:0] instruction register control path control signals
incrementer

Bbus

PC register bank

BS Abus ALU

ALUbus
P. Bakowski

M multiplier, BS barrel shifter

data out register data in register d[31:0]


17

address register

Multi cycle execution


fetch decode STR fetch STR store instruction needs two execution cycles: address calculation cycle and data transfer cycle
P. Bakowski
18

Multi cycle execution


fetch decode STR execute decode fetch fetch address

address calculation cycle and data transfer cycle


P. Bakowski
19

Multi cycle execution


4 clock cycles fetch decode STR execute decode fetch fetch address transfer decode execute decode

address calculation cycle and data transfer cycle


P. Bakowski
20

Multi cycle execution


4 clock cycles fetch decode STR execute decode fetch fetch address transfer decode execute decode

address calculation cycle and data transfer cycle


P. Bakowski

Attention: only one memory transfer per clock cycle


21

Processor performance
The time T, required to execute a given program is given by:

T=(Ninst*CPI)/fclk Ninst - number of instructions in the program


CPI - clock cycles per instruction

fclk
P. Bakowski

- clock frequency
22

Processor performance
The time T, required to execute a given program is given by:

T=(Ninst*CPI)/fclk Ninst - number of instructions in the program


CPI - clock cycles per instruction

fclk
P. Bakowski

- clock frequency
23

Processor performance
The time T, required to execute a given program is given by:

T=(Ninst*CPI)/fclk Ninst - number of instructions in the program


CPI - clock cycles per instruction (throughput)

fclk
P. Bakowski

- clock frequency
24

Processor performance
The time T, required to execute a given program is given by:

T=(Ninst*CPI)/fclk Ninst - number of instructions in the program


CPI - clock cycles per instruction

fclk
P. Bakowski

- clock frequency
25

Processor performance
Since Ninsi is constant for a given program there are only two ways to increase performance: increase the clock rate, fclk. reduce the average number of clock cycles per instruction, CPI.

P. Bakowski

26

Clock rate increase


Increase the clock rate, fclk. This requires the logic in each pipeline stage to be simplified and, therefore, the number of pipeline stages to be increased.

stage1

stage2

stage3

clock cycle 3 stages

P. Bakowski

27

Clock rate increase


Increase the clock rate, fclk. This requires the logic in each pipeline stage to be simplified and, therefore, the number of pipeline stages to be increased.

stage1

stage2

stage3

clock cycle 3 stages stage1 stage2 stage3 stage4 stage5

clock cycle 5 stages


P. Bakowski
28

Clock rate increase


Note that the clock rate to be used depends heavily on the implementation technology.

stage1

stage2

stage3

clock cycle 3 stages stage1 stage2 stage3

clock cycle 3 stages new implementation technology


P. Bakowski
29

More hardware resources


Reduce the average number of clock cycles per instruction, CPI. This requires the introduction of more parallelism that means more hardware resources to be used in a given clock cycle.

D/I M data read or instruction fetch


P. Bakowski

DM

IM

data read and instruction fetch


30

5-stage pipeline organization


Higher performance ARM cores employ a 5-stage pipeline and have separate instruction and data memories. Breaking instruction execution down into five stages rather than three reduces the maximum work which must be completed in a clock cycle, and hence allows a higher clock frequency to be used. The separate instruction and data memories seen as separate caches connected to a unified instruction and data main memory allow a significant reduction in the core's CPI.
P. Bakowski
31

5-stage pipeline organization


Higher performance ARM cores employ a 5-stage pipeline and have separate instruction and data memories. Breaking instruction execution down into five stages rather than three reduces the maximum work which must be completed in a clock cycle, and hence allows a higher clock frequency to be used. The separate instruction and data memories seen as separate caches connected to a unified instruction and data main memory allow a significant reduction in the core's CPI.
P. Bakowski
32

5-stage pipeline organization


Higher performance ARM cores employ a 5-stage pipeline and have separate instruction and data memories. Breaking instruction execution down into five stages rather than three reduces the maximum work which must be completed in a clock cycle, and hence allows a higher clock frequency to be used. The separate instruction and data memories seen as separate caches connected to a unified instruction and data main memory allow a significant reduction in the core's CPI.
P. Bakowski
33

Fetch stage
fetch decode execute buffer write

FETCH - the instruction is fetched from memory and placed in the instruction cache.

next PC

incrementer

I cache

to decoder

P. Bakowski

34

Decode stage
fetch decode execute buffer write

DECODE - the instruction is decoded and register operands read from the register file.

P. Bakowski

I - decode

35

Decode stage
fetch decode execute buffer write

There are three operand read ports in the register file, so most instructions can obtain all their operands in one cycle.

I - decode

register file

P. Bakowski

36

Execute stage
fetch decode execute buffer write

EXECUTE - an operand is shifted and the ALU result generated. register file

BS M ALU +4

ALUbus

P. Bakowski

37

Execute stage
fetch decode execute buffer write

EXECUTE - an operand is shifted and the ALU result generated. register file

BS M ALU +4

ALUbus

P. Bakowski

38

Execute stage
fetch decode execute buffer write

If the instruction is a load or store the memory address is computed in the ALU. register file

BS M ALU +4

ALUbus

P. Bakowski

39

Buffer stage
fetch decode execute buffer write

BUFFER data - data memory is accessed if required. Otherwise the ALU result is simply buffered for one clock cycle to give the same pipeline flow for all instructions. byte replication D cache +4
P. Bakowski
40

rotation/sign extension

Write back stage


fetch decode execute buffer write

WRITE-back; the results generated by the instruction are written back to the register file, including any data loaded from memory. ALUbus ALU D cache
P. Bakowski

register file rotation/sign extension


41

Data forwarding
In the 5-stage pipeline instruction execution is spread across three pipeline stages, the only way to resolve data dependencies without stalling the pipeline is to introduce forwarding paths. fetch decode execute buffer write

P. Bakowski

42

Data forwarding
to register file fetch decode execute buffer fetch write write

decode execute buffer

Data dependencies arise when an instruction needs to use the result of one of its predecessors before that result has returned to the register file.

P. Bakowski

43

Data forwarding
Forwarding paths (by-pass) allow the intermediate results to be passed between stages as soon as they are available, in the 5-stage ARM pipeline each of the three source operands can be forwarded from any of three intermediate result registers to register file fetch decode execute buffer write by-pass paths fetch
P. Bakowski

decode execute buffer

write
44

PC organization - compatibility
The programming behavior of the PC implemented through r15 is based on the operational characteristics of the 3-stage ARM pipeline. Basically the 5-stage pipeline reads the instruction operands one stage earlier and that is incompatible with 3-stage design.

P. Bakowski

45

PC organization - compatibility
The programming behavior of the PC implemented through r15 is based on the operational characteristics of the 3-stage ARM pipeline. Basically the 5-stage pipeline reads the instruction operands one stage earlier and that is incompatible with 3-stage design.

P. Bakowski

46

PC organisation - solution
This problem is resolved by the incrementation of the PC value from the fetch stage in the decode stage, bypassing the pipeline register between the two stages. PC+4 for the next instruction is equal to PC+8 for the current instruction (4 bytes farther), so the correct r15 value is obtained without additional hardware.

P. Bakowski

47

PC organisation - solution
PC+4 for the next instruction is equal to PC+8 for the current instruction (4 bytes farther), so the correct r15 value is obtained without additional hardware. next PC+4
incrementer

I cache

register to decoder file r15

next PC+8
P. Bakowski
48

ARM programming model


The Instruction Set Architecture (ISA) defines the operations that the programmer can use to change the state of the system incorporating the processor. This state usually comprises the values of the data items in the visible registers and the memory. Each instruction performs a defined transformation from the state before the instruction is executed to the state after it has completed.

P. Bakowski

49

ARM programming model


The Instruction Set Architecture (ISA) defines the operations that the programmer can use to change the state of the system incorporating the processor. This state usually comprises the values of the data items in the visible registers and the memory. Each instruction performs a defined transformation from the state before the instruction is executed to the state after it has completed.

P. Bakowski

50

ARM programming model


The Instruction Set Architecture (ISA) defines the operations that the programmer can use to change the state of the system incorporating the processor. This state usually comprises the values of the data items in the visible registers and the memory. Each instruction performs a defined transformation from the state before the instruction is executed to the state after it has completed.

P. Bakowski

51

ARM memory subsystem


ARM memory may be viewed as a linear array of bytes numbered from zero up to 232-1. 232-1

linear array of bytes 0

P. Bakowski

52

ARM memory subsystem


Data items may be 8-bit bytes, 16-bit half-words or 32-bit words. 232-1

bytes

words

3
P. Bakowski

0
53

ARM memory subsystem


Words are always aligned on 4-byte boundaries (the two least significant address bits are zero) and halfwords are aligned on even byte boundaries. 232-1

byte number 00 word address

0
54

P. Bakowski

ARM memory subsystem


Words are always aligned on 4-byte boundaries (the two least significant address bits are zero) and halfwords are aligned on even byte boundaries. 232-1 little endian organization

0
55

P. Bakowski

ARM load-store architecture


The processing instruction (add, subtract, and so on) take the values from the registers and always place the results into a register. register file

BS M ALU +4

ALUbus

P. Bakowski

56

ARM load-store architecture


The processing instruction (add, subtract, and so on) take the values from the registers and always place the results into a register. register file

BS M ALU +4

ALUbus

P. Bakowski

57

ARM load-store architecture


The only instructions which apply to memory state are ones which copy memory values into register (load instructions) or copy register values into memory (store instructions).

register file

D -cache

memory

P. Bakowski

58

ARM load-store architecture


The only instructions which apply to memory state are ones which copy memory values into register (load instructions) or copy register values into memory (store instructions).

register file

D -cache

memory

P. Bakowski

59

ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions

P. Bakowski

60

ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions

P. Bakowski

61

ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions

P. Bakowski

62

ARM instructions
In general the ARM instructions fall into one of the following three categories: data processing instructions data transfer instructions control flow instructions

P. Bakowski

63

ARM data processing


Data processing instructions: these use and change only register values;

P. Bakowski

64

ARM data processing


For example, an instruction can add two registers and place the result in a register.

register file M

BS ALU +4

ALUbus

P. Bakowski

65

ARM data transfer


Data transfer instructions copy memory values into registers (load instructions) or copy register values into memory (store instructions); An additional form, useful only in systems code, exchanges a memory value with a register value.

P. Bakowski

66

ARM data transfer


Data transfer instructions copy memory values into registers (load instructions) or copy register values into memory (store instructions); An additional form, useful only in systems code, exchanges a memory value with a register value.

P. Bakowski

67

ARM data transfer


Data transfer instructions copy memory values into registers (load instructions) or copy register values into memory (store instructions); An additional form, useful only in systems code, exchanges a memory value with a register value.

e.g. test and set instruction

P. Bakowski

68

ARM control flow


Control flow instructions cause execution to switch to a different address, either permanently (branch instructions) or saving a return address to resume the original sequence (branch and link instructions) or trapping into system code (supervisor calls).

P. Bakowski

69

ARM control flow


Control flow instructions cause execution to switch to a different address, either permanently (branch instructions) or saving a return address to resume the original sequence (branch and link instructions) or trapping into system code (supervisor calls).

link address

P. Bakowski

70

ARM control flow


Control flow instructions cause execution to switch to a different address, either permanently (branch instructions) or saving a return address to resume the original sequence (branch and link instructions) or trapping into system code (supervisor calls).

link address

system code
P. Bakowski
71

ARM supervisor mode


The ARM processor supports a protected supervisor mode. The protection mechanism ensures that user code cannot gain supervisor privileges without appropriate checks being carried out to ensure that the code is not attempting illegal operations.

I/O driver illegal operation ?

P. Bakowski

user code

72

ARM supervisor mode


The upshot of this for the user-level programmer is that system-level functions can only be accessed through specified supervisor call. system/supervisor call

I/O driver system code user code


P. Bakowski
73

ARM I/O programming


The ARM handles I/0 (input/output) peripherals (such as disk controllers, network interfaces, and so on) as memory-mapped devices with interrupt support.

memory-mapped devices
P. Bakowski
74

ARM I/O programming


The internal registers in these devices appear as addressable locations within the ARM's memory map and may be read and written using the same (loadstore) instructions as any other memory locations.

store load memory locations


P. Bakowski
75

ARM I/O interruptions


Peripherals may attract the processor's attention by making an interrupt request using either the normal interrupt (IRQ) or the fast interrupt (FIQ) input. to CPU - FIQ to CPU IRQ

P. Bakowski

76

ARM I/O interruptions


Both interrupt inputs are level-sensitive and maskable. Normally most interrupt sources share the IRQ input, with just one or two time-critical sources connected to the higher-priority FIQ input.

IRQ

FIQ
P. Bakowski
77

ARM I/O interruptions


Some systems may include direct memory access (DMA) hardware external to the processor to handle high-bandwidth traffic. system bus DMA traffic

P. Bakowski

78

ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.

P. Bakowski

79

ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.

P. Bakowski

80

ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.

P. Bakowski

81

ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.

P. Bakowski

82

ARM exceptions
The ARM architecture supports a range of : interrupts traps supervisor calls all grouped under the general heading of exceptions.

P. Bakowski

83

ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.

P. Bakowski

84

ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.

P. Bakowski

85

ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.

P. Bakowski

86

ARM exceptions
The general way of exception handling is the same in all cases: the current state is saved by copying the PC into rl4_exc and the CPSR into SPSR_exc (where exc stands for the exception type); the processor operating mode is changed to the appropriate exception mode; the PC is forced to a value between 0016 and 1C16, the particular value depending on the type of exception.

P. Bakowski

87

ARM instruction execution


a[31:0] instruction register control path control signals
incrementer

Bbus

PC register bank

BS Abus ALU

ALUbus
P. Bakowski

M multiplier, BS barrel shifter

data out register data in register d[31:0]


88

address register

Data processing instruction


register register operations Bbus
a[31:0]

data out register data in register


89

address register

incrementer

PC register bank

BS Abus ALU

ALUbus
P. Bakowski

d[31:0]

Data processing instruction


ir register register immediate operations Bbus
a[31:0]

data out register data in register


90

address register

incrementer

PC register bank

BS Abus ALU

ALUbus
P. Bakowski

d[31:0]

Store instruction
ir register compute address operation Bbus
a[31:0]

data out register data in register


91

address register

incrementer

PC register bank

BS Abus ALU

ALUbus
P. Bakowski

new address

d[31:0]

Store instruction
ir register store data auto-index Bbus
a[31:0]

data out register data in register


92

address register

incrementer

PC register bank

BS Abus ALU

ALUbus
P. Bakowski

auto-index

d[31:0]

Branch instruction
ir register compute branch address Bbus
a[31:0]

data out register data in register


93

address register

incrementer

PC register bank

BS Abus ALU

ALUbus
P. Bakowski

branch address

d[31:0]

Branch instruction
ir register store return address Bbus
a[31:0]

data out register data in register


94

address register

incrementer

PC register bank R14

BS Abus ALU

ALUbus
P. Bakowski

return address

d[31:0]

Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions

P. Bakowski

95

Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions

P. Bakowski

96

Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions

P. Bakowski

97

Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions

P. Bakowski

98

Summary
ARM register bank ARM barrel shifter and ALU ARM 3-stage and 5-stage pipelines ARM programming model ARM instructions

P. Bakowski

99

Você também pode gostar