Você está na página 1de 85

Follow Steve Furber 'ARM System on a Chip Architecture Lecture Notes

1. Course overview
2. Intro to PICOBLAZE, C and Number systems and Boolean Algebra
3. Course overview with microprocessor MU0 (I)
4. Course overview with microprocessor MU0 (II)
5. Verilog HDL
6. Digital system components using schematics and Verilog
7. Combinational logic standard forms. Karnaugh maps
8. Combinational ccts and congurable logic devices
9. Simple Sequential circuits, ip ops
10. Sequential circuits, counters, registers, memories
11. Non-ideal effects in digital circuits
12. Finite State Machines
13. Design of FSMs
14. Design of FSMs
15. Datapaths

16. An introduction to Processor Design


17. The ARM Architecture
18. ARM Asssembly Language Programming
19. Programming in C
...

3213: Digital Systems & Microprocessors: L#14_15


ARM history
1983 developed by Acorn computers
To replace 6502 in BBC computers
4-man VLSI design team
Its simplicity comes from the inexperienced
team
Match the needs for generalized SoC for
reasonable power, performance and die size

1990 ARM (Advanced RISC Machine), owned


by Acorn, Apple and LSI
3213: Digital Systems & Microprocessors: L#22_23
ARM Ltd
Design and license ARM core design but not fabricate
Why ARM?
One of the most licensed and thus widespread processor cores
in the world
Used in PDA, cell phones, multimedia players, handheld
game console, digital TV and cameras
ARM7: GBA, iPod
ARM9: NDS, PSP, Sony Ericsson, BenQ
ARM11: Apple iPhone, Nokia N93, N800
75% of 32-bit embedded processors
Cortex beagleboard open hardware
Used especially in portable devices due to its low power
consumption and reasonable performance

3213: Digital Systems & Microprocessors: L#16-17


http://beagleboard.org/

3213: Digital Systems & Microprocessors: L#16-17


http://alwaysinnovating.com

3213: Digital Systems & Microprocessors: L#16-17


ARM LPC2368 Dev. Board
in HLABs (Furturlec)

3213: Digital Systems & Microprocessors: L#16-17


ARM LPC2368 Dev. Board (Furturlec)
Includes NXP LPC2368 ARM Microcontroller with a huge 512kb Internal Flash Program
Memory
Operating Speed up to 72 MHz
Direct In-Circuit Programming via RS232 Connection for Easy Program Updates
Up to 25 I/O points with easy to connect standard headers
Full Speed USB 2.0 Port
Ethernet LAN 10/100Mb Connection for full networking
Large 58k Data RAM
6 Channels 10-Bit A/D
1 Channel 10-Bit DAC
2 Channels standard CAN network
Real Time Clock with Battery Back-Up
SD Card Connector for Data Storage and Transfer
JTAG Connector for Program Download and Debug
LCD Connection with Contrast Adjustment
Load and Reset Button
On-Board 3.3V Regulator
Ideal as an Interchangeable Controller for Real-Time Systems

3213: Digital Systems & Microprocessors: L#16-17


Computers
All modern-general purpose computers employ the principles of
a stored program digital computer (dates to 1940s)
First implemented SSEM ('Baby') June 1948 Uni Manchester

The Small-Scale Experimental Machine, known as SSEM, or the "Baby", was designed and built at
The University of Manchester, and made its first successful run of a program on June 21st
1948. It was the first machine that had all the components now classically regarded as
characteristic of the basic computer. Most importantly it was the first computer that could store
not only data but any (short!) user program in electronic memory and process it at electronic
speed.

Most advances in computing due to electronics... but ..


Computer Architecture = User view: instruction set, visible
registers, memory management...
Computer Organisation: user-invisible pipeline structure,
transparent cache, ...

3213: Digital Systems & Microprocessors: L#16-17


The state in a stored-program digital computer

FF..FF16

instructions

registers address
data

processor
instructions memory
and data
00..0016

3213: Digital Systems & Microprocessors: L#16-17


MU0 A simple microprocessor
A simple form of processor can be built with Program counter PC
Accumulator or working register
Arithmetic logic unit
Instruction register (IC)
Instruction decode and control logic that employs the above
components to achieve the desired results from each instruction


MU0 is a 16 bit machine with a 12 bit address space

Instructions are 16 bits long with a 4 bit opcode and a 12 bit address word


The datapath: All the components carrying (buses), storing (registers) or
processing (alu, mux) bits in parallel form the components of the datapath.
Use the RTL description actually datapath for us!


The Control Logic: Everything else such as decode and control use FSM
approach.
3213: Digital Systems & Microprocessors: L#16-17
MU0 Datapath design
Need a guiding principle to limit design possibilities usually based on clock
constraint in microprocessors

Each instruction takes the number of clock cycles equal to the number
of memory accesses it must make

We assume an instruction starts when the instruction appears in the


instruction register. There are generally two steps to execute an instruction

1) Access the memory operand and perform the desired operation


2) Fetch the next instruction to be executed

The processor must start in a known location we can do this with a reset.

3213: Digital Systems & Microprocessors: L#16-17


MU0 control logic

3213: Digital Systems & Microprocessors: L#16-17


The MU0 instruction format
12 bit address space => 8k memory

4096 individually addressable memory locations

4 bit opcode => 16 possible assembly language


Commands only use 8! - good practice

4 bits 12 bits
opcode S

3213: Digital Systems & Microprocessors: L#16-17


The MU0 instruction set
First four have two memory accesses and will need two clock cycles

Last four could execute in one cycle


Instructi Opcod Effect
onS
LDA e
0000 ACC := mem [S]
16

STO S 0001 mem 16 [S] := ACC


ADD S 0010 ACC := ACC +
SUB S 0011 mem 16 [S]
ACC := ACC -
JMP S 0100 mem S[S]
PC := 16
JGE S 0101 if ACC >= 0 PC := S
JNE S 0110 if ACC !=0 PC := S
STP 0111 stop

3213: Digital Systems & Microprocessors: L#16-17


The MU0 control
Next we need to determine exactly what controls (logic levels) are needed to
make the datapath execute the correct functions given the op-code

We assume that all registers change state on the rising edge of the clock
(c.f negative edge Furber probably because external SRAM used for the
actually memory is posedge triggered??).

For the registers the control signals prevent or disallow transitions at the clock.

There are also feedback control signals from the datapath to the control FSM
opcode bits, signals from the accumulator indicating whether its contents
are zero or negative which control the respective conditional jump instructions.

All we need to do is develop a two state FSM to generate the control signals

Since there are just two states and lots of control


outputs => do a NS table and forget the SD approach

3213: Digital Systems & Microprocessors: L#16-17


FSMs have no memory of
outputs
State 1 Inputs
______ Inputs
Inputs ...
Some Inputs
Outputs ...
...
State 2
______

Other
Inputs Outputs
Inputs ...
...
Inputs
Inputs
...

3213: Digital Systems & Microprocessors: L#16-17


module vreg16( clk, q, d, en, rs );

output [15:0] q;
input [15:0] d;
input clk;
input en;
Memory comes from memory input rs;

reg [15:0] q;

Here is how a 16 bit register... always @(posedge clk) begin

if(en && ~rs) q <= d;


else if(en && rs) q <= 16'h0;
else q <= q;

end

endmodule // v_reg16

3213: Digital Systems & Microprocessors: L#16-17


Dealing with output dont cares
Input Databus
Register Enable (Eno)
Memory Address
Clear (Clr) bus
Read/Write (R/W)
Output Databus
From Furber

The dont cares in FSM outputs set have a different meaning.

As the control state must generate these signals we must eventually


define what they will be (Not X's)

Note that Eno is always defined; this is because it is essential to know if the total
register is to change or not on any given clock edge. If it is changing (Eno = 1)
then R/W and Clr control what it is doing; however if it is not changing
(Eno = 0) then it doesnt matter what value is presented to the register it will
ignore it.

Allowing latitude at this time gives more freedom in the logic


reduction, simpler equations and thus smaller (& faster) circuits.
MU0 datapath example
address bus

control
PC IR

memory
ALU ACC

data bus

3213: Digital Systems & Microprocessors: L#16-17


MU0 datapath example

3213: Digital Systems & Microprocessors: L#16-17


MU0 memory

register MEMrq RnW


IRce
MU0

transfer IR
opcode

level PC
ACCoe

organization
Asel PCce

ALUfs ALU
B A

ACCce

ACC[15]
ACC
ACCz

Bsel 0 mux 1

3213: Digital Systems & Microprocessors: L#16-17


MU0 control logic
In p ut s Out p ut s
Op c o de Ex / f t ACC1 5 Bs el PCc e ACCo e MEMrq Ex / f t
In s t ruc t i o n Re s e t ACCz As e l ACCc e IRc e ALUf s Rn W
Reset xxxx 1 x x x 0 0 1 1 1 0 =0 1 1 0
LDA S 0000 0 0 x x 1 1 1 0 0 0 =B 1 1 1
0000 0 1 x x 0 0 0 1 1 0 B+1 1 1 0
STO S 0001 0 0 x x 1 x 0 0 0 1 x 1 0 1
0001 0 1 x x 0 0 0 1 1 0 B+1 1 1 0
ADD S 0010 0 0 x x 1 1 1 0 0 0 A+B 1 1 1
0010 0 1 x x 0 0 0 1 1 0 B+1 1 1 0
SUB S 0011 0 0 x x 1 1 1 0 0 0 A-B 1 1 1
0011 0 1 x x 0 0 0 1 1 0 B+1 1 1 0
JMP S 0100 0 x x x 1 0 0 1 1 0 B+1 1 1 0
JGE S 0101 0 x x 0 1 0 0 1 1 0 B+1 1 1 0
0101 0 x x 1 0 0 0 1 1 0 B+1 1 1 0
JNE S 0110 0 x 0 x 1 0 0 1 1 0 B+1 1 1 0
0110 0 x 1 x 0 0 0 1 1 0 B+1 1 1 0
STOP 0111 0 x x x 1 x 0 0 0 0 x 0 1 0

3213: Digital Systems & Microprocessors: L#16-17


MU0 machine language program
prog.lst
0006
In s tru c ti O p c o d E ffe c t
oA
LD n S 0e
0 0 0 A C C := m e1m
6[S ]
3007
STO S 0001 m e m1 6[S ] := A C C
1006
ADD S 0010 A C C := A C C +
0006 SU B S 0011 m
A CeCm1:=
6[S ]A C C -

JM P S 0100 m
PCe m 1 6[S
:= S]
5000

7000
JG E S 0101 if A C C > = 0 PC :=
JN E S 0110 if A C C != 0 PC := S
0004
STP 0111 s to p
0001
3213: Digital Systems & Microprocessors: L#16-17
MU0 Function

3213: Digital Systems & Microprocessors: L#16-17


MU0 Extensions
MU0 is a simple processor but not useful as a compiler target.
Some extensions seem appropriate

Extending the address space


Adding more addressing modes
Allowing the PC to be saved in order to support
a subroutine mechanism
Adding more registers, supporting interrupts,
etc...
More peripherals watchdog timer, ..

. Overall MU0's instruction set is not a good place to start so let


us redesign.
3213: Digital Systems & Microprocessors: L#16-17
Where to start?
Let us start with the core of the microprocessor functionality
- The instruction

Let us looking at a basic ADD for example

Some bits to differentiate from other instructions


Some bits to specify operand addresses
Some bits to specify where the results should be
placed - desitnation
Some bits to specify the address of the next
instruction

3213: Digital Systems & Microprocessors: L#16-17


A 4-address instruction format
Assembly language instruction format might look like

ADD d, s1, s2, next_i ; d := s1 + s2

Requires (4n + f bits)

f bits n bits n bits n bits n bits


function op 1 addr. op 2 addr. dest. addr. next_i addr.

3213: Digital Systems & Microprocessors: L#16-17


A 3-address instruction format
One way to reduce the number of bits required for each instruction
is to make the address of the next instruction implicit.

We assume that the next instruction is PC+ Sizeof(instruction),


(note that in MU0 the default next instruction was at PC+1. But if
We generalise then maybe there can be more than one address to
contain the contents of an instruction)

ADD d, s1, s2; d := s1 + s2


f bits n bits n bits n bits
function op 1 addr. op 2 addr. dest. addr.

3213: Digital Systems & Microprocessors: L#16-17


A 2-address instruction format

A further saving can be made by making the


destination register the same as one of the source registers

ADD d, s1; d := d + s2

f bits n bits n bits


function op 1 addr. dest. addr.

3213: Digital Systems & Microprocessors: L#16-17


A 1-address (accumulator)
instruction format
If the destination register is implicit then it is often
Called the accumulator

ADD s1; accumulator := accumulator + s2

f bits n bits
function op 1 addr.

3213: Digital Systems & Microprocessors: L#16-17


A 0-address instruction format

Finally all registers may be made implicit by introducing


An evaluation stack

ADD ; top_of_stack := top_of_stack + next_on_stack

f bits
function

3213: Digital Systems & Microprocessors: L#16-17


Examples of n-address use
All of the above have been used in processor instruction sets
apart from the 4-address form which, although it is used
internally in some microcode(??) designs is
unnecessarily expensive

The Inmos transputer uses a 0-address evaluation
stack architecture

The MU0 example in the previous section is a 1-address
architecture

The Thumb instruction set used for high code density in the
ARM micros is predominatly of the 2-address form

The standard ARM instruction set uses a 3-address
architecture
3213: Digital Systems & Microprocessors: L#16-17
Addresses

An address in the MU0 architecture is the absolute address of
the memory location which contains the operand

The addresses in the three adderss ARM instruction format are
register specifiers

In general the term 3-address architecture refers to an
instruction set in which the two source and the destination
can be specified independently

3213: Digital Systems & Microprocessors: L#16-17


Instruction Types
Data processing instructions such as add, subtract, multiply

Data movement instructions that copy data from one place


to another such move (mv)

Control flow instructions that switch control of a program from


one place to another depending possibly on data values

Special instructions that change the processor's state such as


a privileged mode to carry out an operating system instruction
Note that many instructions fit into more than one class

3213: Digital Systems & Microprocessors: L#16-17


Addressing modes
1. Immediate addessing: the operand of an instruction is
present in the instruction itself

1+2

ADD 0x01 0x02

3213: Digital Systems & Microprocessors: L#16-17


Addressing modes
2. Absolute addessing: the instruction contains the absolute
memory address of the operand

ADD 0x0007 0x0008

0x0006
0x01 0x0007
0x02 0x0008

3213: Digital Systems & Microprocessors: L#16-17


Addressing modes
3. Indirect addessing: the instruction contains the binary
address of a memory location that contains the binary address
of a value

ADD 0x01 0x0007

0x0006 Indirect
0x0008 0x0007
0x02 0x0008 Immediate

3213: Digital Systems & Microprocessors: L#16-17


Addressing modes
4. Register addessing: the instruction contains the register
number of a register that contains the operand

ADD 0x01 regnum

Register
number

0x02 regnum
Immediate

3213: Digital Systems & Microprocessors: L#16-17


Addressing modes
5. Register Indirect addessing: the instruction contains register
number of a register which contains the binary address of the
operand in memory

ADD 0x01 regnum

0x0006 Register
0x0007 number

0x02 0x0008 Immediate

0x0008 regnum

3213: Digital Systems & Microprocessors: L#16-17


Addressing modes
6. Base plus offset addessing: the instruction contains a
register number (the base) an offset the actual register that
contains the operand

Actual example: the parallel port registers

// Read from the status port (BASE+1)


printf("status: %d\n", inb(BASEPORT + 1));

// Set the data signals (D0-7) of the port to all low (0) */
outb(255, BASEPORT);

3213: Digital Systems & Microprocessors: L#16-17


Control Flow Instructions
A control flow instruction is used to modify the PC directly

The simplest such instructions are called 'branches' or 'jumps'

Since most of these are short n memory range the formulation is


normally relative

B LABEL


LABEL

The assembler works out the displacement which must


be added to the PC in order to force the PC to point to LABEL

The max range of the branch is determined by the number of bits


alocated to the displacement.

3213: Digital Systems & Microprocessors: L#16-17


Conditional branches and conditional code
registers
Some processors allow the values of general purpose registers to
control whether a program will branch or not (MU0)


Branch if a particular register is zero / non-zero / negative etc

Branch if the contents of two specified registers are equal or not


Some processors have special purpose registers (conditional code
registers or flag registers that store flags which are the results of
particular instructions.

For example whether it was negative or a carry resulted etc

3213: Digital Systems & Microprocessors: L#16-17


Subroutines
Sometimes a branch occurs to a subroutine via a call which executes
And then returns when on termination

Since the same subroutine may be called from many places a


Record of the calling address in the calling program must be kept

The calling program could compute a suitable address and place it in
a suitable memory location accessible to the subroutine

The return address could be pushed onto a stack

The return address could be placed in a register

The RET command in the subroutine places the return address in the
PC

3213: Digital Systems & Microprocessors: L#16-17


A CALL/RETURN in MU0

To call a subroutine a stack is needed to store the contents of the PC register

3213: Digital Systems & Microprocessors: L#16-17


CALL/RETURN Timing

3213: Digital Systems & Microprocessors: L#16-17


Call Fetch

3213: Digital Systems & Microprocessors: L#16-17


Call Execute

3213: Digital Systems & Microprocessors: L#16-17


RET Fetch

3213: Digital Systems & Microprocessors: L#16-17


RET Execute

3213: Digital Systems & Microprocessors: L#16-17


System Calls and Exceptions
Some instrucitons such as those directly referring to commands in the
operating system of a microprocessor are privileged

System calls pass through protection barriers in a controlled way

E.G. The inb and outb in the parallel port program

Execptions are operations performed by a microprocessor to handle


an error condition in a controlled way

3213: Digital Systems & Microprocessors: L#16-17


Continue our Architecture Rethink

Why not ask what a computer does with its time?

Typical dynamic instruction usage

Instruction type Dynamic usage


Data movement 43%
Control flow 23%
Arithmetic operations 15%
Comparisons 13%
Logical operations 5%
Other 1%

3213: Digital Systems & Microprocessors: L#16-17


A 3-address instruction format
One way to reduce the number of bits required for each instruction
is to make the address of the next instruction implicit.

We assume that the next instruction is PC+ Sizeof(instruction),


(note that in MU0 the default next instruction was at PC+1. But if
We generalise then maybe there can be more than one address to
contain the contents of an instruction)

ADD d, s1, s2; d := s1 + s2


f bits n bits n bits n bits
function op 1 addr. op 2 addr. dest. addr.

3213: Digital Systems & Microprocessors: L#16-17


Pipelined instruction execution
1. Fetch the instruction from memory (fetch)
2. Decode to see what it does (dec)
3. Access any operands from the register bank (reg)
4. Combine the operands to form the result or a memory address (ALU)
5. Access memory for a data operand (mem)
6. Write the result back to the register bank (res)

1 fetch dec reg ALU mem res

2 fetch dec reg ALU mem res

3 fetch dec reg ALU mem res


instruction
time

3213: Digital Systems & Microprocessors: L#16-17


Read-after-write pipeline hazard

1 fetch dec reg ALU mem res

2 fetch dec stall reg ALU mem res


instruction
time

3213: Digital Systems & Microprocessors: L#16-17


Branches cause even more
problems for pipelining
The fetch step of the following instruction is affect by the branch target
Computation

Unfortunately subsequent fetches will be taking place while the branch


is being decoded and before it has been recognised as a branch

If for example the branch target calculation at the ALU operation


then three fetches will have occurred before the branch is available

Solutions:

Extra hardware to allow the branch instruction to be calculated

Speculative calculation of the branch during the DEC cycle
i.e. before we know it is a branch!
3213: Digital Systems & Microprocessors: L#16-17
Pipelined branch behaviour
1 (branch) fetch dec reg ALU mem res

2 fetch dec reg ALU mem res

3 fetch dec reg ALU mem res

4 fetch dec reg ALU mem res

5 (branch target) fetch dec reg ALU mem res


instruction
time

3213: Digital Systems & Microprocessors: L#16-17


The Reduced Instruction Set Computer
(RISC) versus
The Complex Instruction Set Computer (CISC)


Hard wired instruction decode logic like MU0 CISC processors
used large microcode ROMs to decode logic (still in use...)

Pipeline execution. CISC processors usually do not

Single clock cycle execution. CISC typically took many instructions

To execute a single instruction (it was thought to be economically


to design computers with a plentiful high level instruction set to
reduce the abstraction level to high level languages such as C and
to make up for the lack of pipelining

3213: Digital Systems & Microprocessors: L#16-17


Complex Instruction Set Computer
(CISC, x86, 68000)

Wired logic microcode control

Temptingly easy extensibility

Performance tuning

HW implementation of some high-level functions

Marketing


Add successful instructions of competitors

New feature hype

Compatibility: only extensions are possible

3213: Digital Systems & Microprocessors: L#16-17


CISC Problems

Performance tuning unsuccessful

Rarely used high-level instructions

Sometimes slower than equivalent sequence

High complexity

Pipelining bottlenecks lower clock rates

Interrupt handling can complicate even more

Marketing

Prolonged design time and frequent microcode errors hurt
competitiveness

3213: Digital Systems & Microprocessors: L#16-17


Reduced Instruction Set Computer
(RISC, ARM, MIPS, PPC)

Low complexity

Generally results in overall speedup

Less error-prone implementation by hardwired logic or
simple microcodes

VLSI implementation advantages

Less transistors

Extra space: more registers, cache

Marketing

Reduced design time, less errors, and more options
increase competitiveness

3213: Digital Systems & Microprocessors: L#16-17


RISC Compiler Issues

The compilers themselves

Computationally more complex

More portable


The compiler writer

Less instructions probably easier job

Simpler instructions probably less bugs

Can reuse optimisation techniques

3213: Digital Systems & Microprocessors: L#16-17


RISC vs CISC?
CISC

Effectively realizes one particular High Level Language Computer System in HW -


recurring HW development costs when change needed

RISC

Allows effective realisation of any High Level Language Computer System in SW -


recurring SW development costs when change needed

MU0 is a RISC and so is easy to learn.

ARM is RISC the flagship RISC computer

3213: Digital Systems & Microprocessors: L#16-17


RISC vs CISC
- google the seminal (non-technical) article

3213: Digital Systems & Microprocessors: L#16-17


ARM Architecture

- Originally Advanced RISC Machines Limited


- Later Acorn RISC Machine

Acorn Computers Limited of Cambridge


originally developed ARM from 1983-1985
based on a 1 year student project at
Stanford/UC Berkeley that led to more cost
effective high performance design to compete
with CISC (viz PDP-11, VAX by DEC
corporation)

3213: Digital Systems & Microprocessors: L#16-17


ARMs visible registers
r0
usable in user mode
r1
r2
r3 system modes only
r4
r5
r6
r7
r8_fiq
r8
r9 r9_fiq
r10_fiq
r10
r11 r11_fiq
r12_fiq r13_irq r13_und
r12 r13_abt
r13_fiq r13_svc r14_irq r14_und
r13 r14_svc r14_abt
r14 r14_fiq
r15 (PC)

SPSR_und
SPSR_abt SPSR_irq
CPSR SPSR_fiq SPSR_svc

fiq svc abort irq undefi ned


user mode mode mode mode mode mode
ARM CPSR format

31 28 27 8 7 6 5 4 0
NZCV unused IF T mode
ARM memory organization

bit 31 bit 0
23 22 21 20

19 18 17 16
word16
15 14 13 12
half-word14 half-word12
11 10 9 8
word8
7 6 5 4
byte6 half-word4
3 2 1 0 byte
byte3 byte2 byte1 byte0 address
The structure of the ARM cross-
development toolkit
C source C libraries asm source

C compiler assembler

.aof
object
libraries
linker

.axf debug

ARMsd
system model

development
ARMulator
board
ARM shift operations
31 0 31 0

00000 00000

LSL #5 LSR #5

31 0 31 0

0 1

00000 0 11111 1

ASR #5 , positive operand ASR #5 , negative operand

31 0 31 0
C

C C

ROR #5 RRX
Multiple register transfer addressing
r9
modes 1018
16 r9 r5 1018
16
r5 r1
r1 r0
r9 r0 100c 16 r9 100c 16

1000 1000
16 16

STMIA r9!, {r0,r1,r5} STMIB r9!, {r0,r1,r5}

1018 1018
16 16

r9 r5 100c 16 r9 100c 16
r1 r5
r0 r1
r9 1000 r9 r0 1000
16 16

STMDA r9!, {r0,r1,r5} STMDB r9!, {r0,r1,r5}


The mapping between the stack and
block copy views of the load and
store multiple instructions
As c e n di n g De s c e n di n g
Ful l Emp t y Ful l Emp t y
B e f o re STMIB LDMIB
In c re me n t STMFA LDMED
Af t e r STMIA LDMIA
STMEA LDMFD
B e f o re LDMDB STMDB
De c re me n t LDMEA STMFD
Af t e r LDMDA STMDA
LDMFA STMED
Branch conditions
B ran c h In t e rp re t at i o n No rmal us e s
B Unconditional Always take this branch
BAL Always Always take this branch
BEQ Equal Comparison equal or zero result
BNE Not equal Comparison not equal or non-zero result
BPL Plus Result positive or zero
BMI Minus Result minus or negative
BCC Carry clear Arithmetic operation did not give carry-out
BLO Lower Unsigned comparison gave lower
BCS Carry set Arithmetic operation gave carry-out
BHS Higher or same Unsigned comparison gave higher or same
BVC Overflow clear Signed integer operation; no overflow occurred
BVS Overflow set Signed integer operation; overflow occurred
BGT Greater than Signed integer comparison gave greater than
BGE Greater or equal Signed integer comparison gave greater or equal
BLT Less than Signed integer comparison gave less than
BLE Less or equal Signed integer comparison gave less than or equal
BHI Higher Unsigned comparison gave higher
BLS Lower or same Unsigned comparison gave lower or same
Naming ARM
ARMxyzTDMIEJFS
x: series
y: MMU
z: cache
T: Thumb
D: debugger
M: Multiplier
I: Interrupt
E: Enhanced
J: Jazelle
F: Floating-point
S: Source
3213: Digital Systems & Microprocessors: L#22_23
Popular ARM architecture
ARM7TDMI
3 pipeline stages
One of the most used ARM-version (for low-end
systems)
ARM9TDMI
Compatible with ARM7
5 pipeline stages
Separate instruction and data cache
ARM11

3213: Digital Systems & Microprocessors: L#22_23


ARM architecture

Load/store architecture
A large array of uniform
registers
Fixed-length 32-bit
instructions
3-address instructions

3213: Digital Systems & Microprocessors: L#22_23


Processor modes

3213: Digital Systems & Microprocessors: L#22_23


ARM architecture

37 registers
1 Program counter
1 current program status
registers
5 saved program status
registers
30 general purpose
registers

3213: Digital Systems & Microprocessors: L#22_23


Registers
Only 16 registers are visible to a specific mode. A
mode could access
A particular set of r0-r12
r13 (sp, stack pointer)
r14 (lr, link register)
r15 (pc, program counter)
Current program status register (cpsr)

3213: Digital Systems & Microprocessors: L#22_23


Register organization

3213: Digital Systems & Microprocessors: L#22_23


General-purpose registers
31 24 23 16 15 87 0

8-bit Byte
16-bit Half word
32-bit word

6 data types (signed/unsigned)


All ARM operations are 32-bit. Shorter data types
are only supported by data transfer operations.
3213: Digital Systems & Microprocessors: L#22_23
Program counter
Store the address of the instruction to be executed
All instructions are 32-bit wide and word-aligned
Thus, the last two bits of pc are undefined.

3213: Digital Systems & Microprocessors: L#22_23


Program status register (CPSR)

mode bits
overflow state bit
carry/borrow FIQ disable
zero IRQ disable
negative

3213: Digital Systems & Microprocessors: L#22_23


Summary
Load/store architecture
Most instructions are RISCy, operate in single
cycle.
Some multi-register operations take longer.
All instructions can be executed conditionally.

3213: Digital Systems & Microprocessors: L#22_23


Linux usage on Intel (4 level Von Neuman)

3213: Digital Systems & Microprocessors: L#18

Você também pode gostar