General-Purpose Processors Explained: Architecture, Operations & Development

Chapter 3
General-Purpose
Processors
Introduction
General-Purpose Processor
Processor designed for a variety of computation

tasks
Low unit cost, in part because manufacturer
spreads NRE over large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in
1996 alone
Carefully designed since higher NRE is acceptable

Can yield good performance, size and power
Low NRE cost, short time-to-market/prototype,

high flexibility
User just writes software; no processor design
a.k.a. microprocessor micro used when they

were implemented on one or a few chips rather
than entire rooms
2
Basic Architecture
Processor
Control unit and

datapath.
Control unit
Datapath
ALU
Controller
Control
/Status
Datapath is general.
Control unit doesnt
store the algorithm
the algorithm is
programmed into
the memory
Registers
PC
IR
I/O
Memory
Datapath Operations
Load
Processor
Read memory
location into register
Control unit
Datapath
ALU
ALU operation
Controller
Input certain registers through

ALU, store back in register
Registers
Store
Write register to
memory location
+1
Control
/Status
10
PC
11
IR
I/O
Memory
...
10
11
...
4
Control Unit
Control unit: configures the

datapath operations
Processor
Control unit
Sequence of desired operations

(instructions) stored in
memory program
Datapath
ALU
Controller
Control
/Status
Instruction cycle broken into

several sub-operations, each
one clock cycle, e.g.:
Fetch: Get next instruction into
IR
Decode: Determine what the
instruction means
Fetch operands: Move data
from memory to datapath
register
Execute: Move data through
the ALU
Store results: Write data from
register to memory
Registers
PC
IR
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
5
Control Unit Sub-Operations

Fetch
Get next
instruction into
IR
PC: program
counter, always
points to next
instruction
IR: holds the
fetched
instruction
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
6

Decode
Processor
Control unit
Determine
what the
instruction
means
Datapath
ALU
Controller
Control
/Status
Registers
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
7

Fetch
operands
Move data
from memory
to datapath
register
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
8

Execute
Move data
through the
ALU
This particular
instruction
does nothing
during this
sub-operation
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
9

Store results
Write data
from register
to memory
This particular
instruction
does nothing
during this
sub-operation
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
10
PC
100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
10
Instruction Cycles
PC=100
Processor
Fetch
Fetch Decode ops
Exec.
Store
results
Control unit
Datapath
ALU
clk
Controller
Control
/Status
Registers
10
PC 100
IR
load R0, M[500]
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
R1
...
10
...
11
Instruction Cycles
PC=100
Processor
Fetch
Fetch Decode ops
Exec.
Store
results
Control unit
Datapath
ALU
clk
PC=101
Controller
Fetch
Fetch Decode ops
Exec.
+1
Control
/Status
Store
results
Registers
clk
10
PC 101
IR
inc R1, R0
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
500
501
11
R1
...
10
...
12
Instruction Cycles
PC=100
Processor
Fetch
Fetch Decode ops
Exec.
Store
results
Control unit
Datapath
ALU
clk
PC=101
Controller
Fetch
Fetch Decode ops
Exec.
Control
/Status
Store
results
Registers
clk
10
PC 102
PC=102
Fetch
Fetch Decode ops

clk
Exec.
IR
store M[501], R1
Store
results
R0
I/O
100 load R0, M[500]
101
inc R1, R0
102 store M[501], R1
Memory
11
R1
...
500 10
501 11
...
13
Architectural Considerations
N-bit processor
N-bit ALU,
registers, buses,
memory data
interface
Embedded: 8-bit,
16-bit, 32-bit
common
Desktop/servers:
32-bit, even 64
PC size
determines
address space
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
I/O
Memory
14
Architectural Considerations
Clock frequency
Inverse of clock
period
Must be longer
than longest
register to
register delay in
entire processor
Memory access
is often the
longest
Processor
Control unit
Datapath
ALU
Controller
Control
/Status
Registers
PC
IR
I/O
Memory
15
Pipelining: Increasing
Instruction Throughput
1
Non-pipelined
1
Decode
Fetch ops.
Time
Instruction 1
pipelined instruction execution
pipelined
Execute
Store res.
Pipelined
non-pipelined
Fetch-instr.
Time
Pipelined
Time
16
Superscalar and VLIW

Architectures
Performance can be improved by:
Faster clock (but theres a limit)
Pipelining: slice up instruction into stages,
overlap stages
Multiple ALUs to support more than one
instruction stream
Superscalar
Scalar: non-vector operations
Fetches instructions in batches, executes as many as possible
May require extensive hardware to detect independent
instructions
VLIW: each word in memory has multiple independent
instructions
Relies on the compiler to detect and schedule instructions
Currently growing in popularity
17
Superscalar Vs VLIM
SuperscalarCPUs use hardware to decide
which operations can run in parallel at runtime
VLIW CPUs use software (the compiler) to
decide which operations can run in parallel in
advance.
f12 = f0 * f4,
f8 = f8 + f12,
f0 = f7-f4;
Because the complexity of instruction
scheduling is pushed off onto the compiler,
complexity of the hardware can be
substantially reduced.
18
Two Memory Architectures

Processor
Princeton
Processor
Fewer memory
wires
Harvard
Simultaneous
program and
data memory
access
Program
memory
Data memory
Harvard
Memory
(program and data)
Princeton
19
Cache Memory
Memory access may
be slow
Cache is small but
fast memory close
to processor
Holds copy of part of
memory
Hits and misses
Fast/expensive technology, usually on

the same chip
Processor
Cache
Memory
Slower/cheaper technology, usually on

a different chip
20
Programmers View
Programmer doesnt need detailed understanding of
architecture
Instead, needs to know what instructions can be executed
Two levels of instructions:

Assembly level
Structured languages (C, C++, Java, etc.)
Most development today done using structured

languages
But, some assembly level programming may still be necessary
Drivers: portion of program that communicates with and/or
controls (drives) another device
Often have detailed timing considerations, extensive bit manipulation
Assembly level may be best for these
21
Assembly-Level Instructions
Instruction 1
opcode
operand1
operand2
Instruction 2
opcode
operand1
operand2
Instruction 3
opcode
operand1
operand2
Instruction 4
opcode
operand1
operand2
...
Instruction Set
Defines the legal set of instructions for that processor
Data transfer: memory/register, register/register, I/O, etc.
Arithmetic/logical: move register through ALU and back
Branches: determine next PC value when not just PC+1
22
Addressing Modes
Addressing
mode
Operand field
Immediate
Data
Register-direct
Register-file
contents
Memory
contents
Register address
Data
Register
indirect
Register address
Memory address
Direct
Memory address
Data
Indirect
Memory address
Memory address
Data
Data
24
Sample Programs
C program
Equivalent assembly program

0
1
2
3
Loop:
int total = 0;
for (int i=10; i!=0; i--)
total += i;
// next instructions...
6
7
Next:
MOV R0, #0;

MOV R1, #10;
MOV R2, #1;
MOV R3, #0;
// total = 0
// i = 10
// constant 1
// constant 0
JZ R1, Next;
ADD R0, R1;
SUB R1, R2;
JZ R3, Loop;
// Done if i=0
// total += i
// i-// Jump always
// next instructions...
25
Programmer Considerations
Program and data memory space
Embedded processors often very limited
e.g., 64 Kbytes program, 256 bytes of RAM
(expandable)
Registers: How many are there?

Only a direct concern for assembly-level
programmers
I/O
How communicate with external signals?
Interrupts
26
Software Development
Process
Compilers
C File
C File
Compiler
Binary
File
Cross compiler
Asm.
File
Runs on one
processor, but
generates
code for
another
Assembler
Binary
File
Binary
File
Linker
Library
Exec.
File
Implementation Phase
Debugger
Profiler
Verification Phase
Assemblers
Linkers
Debuggers
Profilers
27
Running a Program
If development processor is different than
target, how can we run our compiled code?
Two options:
Download to target processor
Simulate
Simulation
One method: Hardware description language
But slow, not always available
Another method: Instruction set simulator (ISS)

Runs on development processor, but executes
instructions of target processor
28
Application-Specific
Instruction-Set Processors
(ASIPs)
General-purpose processors
Sometimes too general to be effective in

demanding application
e.g., video processing requires huge video buffers and
operations on large arrays of data, inefficient on a GPP
But single-purpose processor has high NRE, not

programmable
ASIPs targeted to a particular domain

Contain architectural features specific to that
domain
e.g., embedded control, digital signal processing, video
processing, network processing, telecommunications,
etc.
Still programmable
29
A Common ASIP:
Microcontroller
For embedded control applications
Reading sensors, setting actuators
Mostly dealing with events (bits): data is present, but not
in huge amounts
e.g., VCR, disk drive, digital camera (assuming SPP for
image compression), washing machine, microwave oven
Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register
space
On-chip program and data memory

Direct programmer access to many of the chips pins
Specialized instructions for bit-manipulation and other
low-level operations
30
Another Common ASIP: Digital

Signal Processors (DSP)
For signal processing applications
Large amounts of digitized data, often
streaming
Data transformations must be applied fast
e.g., cell-phone voice filter, digital TV, music
synthesizer
DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction,
other instrs.
Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.
31
Trend: Even More Customized

ASIPs
In the past, microprocessors were acquired as chips
Today, we increasingly acquire a processor as
Intellectual Property (IP)
e.g., synthesizable VHDL model
Opportunity to add a custom datapath hardware

and a few custom instructions, or delete a few
instructions
Can have significant performance, power and size impacts
Problem: need compiler/debugger for customized ASIP
Remember, most development uses structured languages
One solution: automatic compiler/debugger generation
e.g., www.tensillica.com
Another solution: retargettable compilers

e.g., www.improvsys.com (customized VLIW architectures)
32
Selecting a Microprocessor
Issues
Technical: speed, power, size, cost
Other: development environment, prior expertise, licensing, etc.
Speed: how evaluate a processors speed?

Clock speed but instructions per cycle may differ
Instructions per second but work per instr. may differ
Dhrystone: Synthetic benchmark, developed in 1984.
Dhrystones/sec.
MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digitals VAX
11/780). A.k.a. Dhrystone MIPS. Commonly used today.
So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
SPEC: set of more realistic benchmarks, but oriented to desktops

EEMBC EDN Embedded Benchmark Consortium,
www.eembc.org
Suites of benchmarks: automotive, consumer electronics,
networking, office automation, telecommunications
33
General Purpose Processors

Processor
Clock speed
Intel PIII
1GHz
IBM
PowerPC
750X
MIPS
R5000
StrongARM
SA-110
550 MHz
Intel
8051
Motorola
68HC811
250 MHz
233 MHz
12 MHz
3 MHz
TI C5416
160 MHz
Lucent
DSP32C
80 MHz
Periph.
2x16 K
L1, 256K
L2, MMX
2x32 K
L1, 256K
L2
2x32 K
2 way set assoc.
None
4K ROM, 128 RAM,

32 I/O, Timer, UART
4K ROM, 192 RAM,
32 I/O, Timer, WDT,
SPI
128K, SRAM, 3 T1
Ports, DMA, 13
ADC, 9 DAC
16K Inst., 2K Data,
Serial Ports, DMA
Bus Width
MIPS
General Purpose Processors
32
~900
Power
Trans.
Price
97W
~7M
$900
32/64
~1300
5W
~7M
$900
32/64
NA
NA
3.6M
NA
32
268
1W
2.1M
NA
Microcontroller
~1
~0.2W
~10K
$7
~.5
~0.1W
~10K
$5
Digital Signal Processors

16/32
~600
NA
NA
$34
32
NA
NA
$75
40
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
34
Designing a General
Purpose Processor
Not something an
embedded system
designer normally
would do
FSMD
Declarations:
bit PC[16], IR[16];
bit M[64k][16], RF[16][16];
PC=0;
Fetch
IR=M[PC];
PC=PC+1
Decode
But instructive to see

how simply we can build
one top down
Remember that real
processors arent usually
built this way
Much more optimized,
Aliases:
much more bottom-up
op IR[15..12]
rn IR[11..8]
design
rm IR[7..4]
Reset
from states
below
Mov1
RF[rn] = M[dir]
to Fetch
Mov2
M[dir] = RF[rn]
to Fetch
Mov3
M[rn] = RF[rm]
to Fetch
Mov4
RF[rn]= imm
to Fetch
op = 0000
0001
0010
0011
0100
dir IR[7..0]
imm IR[7..0]
rel IR[7..0]
0101
0110
Add
RF[rn] =RF[rn]+RF[rm]
to Fetch
Sub
RF[rn] = RF[rn]-RF[rm]
to Fetch
Jz
PC=(RF[rn]=0) ?rel :PC

to Fetch
35
Architecture of a Simple
Microprocessor
Storage devices for each

declared variable
Control unit
register file holds each of

the variables
Controller
(Next-state and
control
logic; state register)
Functional units to carry

out the FSMD operations
One ALU carries out
every required operation
Connections added
among the components
ports corresponding to
the operations required
by the FSM
Unique identifiers created
for every control signal
To all
input
control
signals
16
PCld
PCinc
PC
IR
From all
output
control
signals
Irld
RFs
2x1 mux
RFwa
RFw
RFwe
RF (16)
RFr1a
RFr1e
RFr2a
RFr2e
RFr1
RFr2
ALUs
PCclr
2
Ms
Datapath
3x1 mux
ALUz
ALU
Mre Mwe
Memory
36
A Simple Microprocessor
Reset
PC=0;
PCclr=1;
Fetch
IR=M[PC];
PC=PC+1
MS=10;
Irld=1;
Mre=1;
PCinc=1;
Decode
from states
below
op = 0000
0001
0010
0011
0100
0101
0110
Mov1
RF[rn] = M[dir]
to Fetch
RFwa=rn; RFwe=1; RFs=01;

Ms=01; Mre=1;
Mov2
M[dir] = RF[rn]
to Fetch
RFr1a=rn; RFr1e=1;
Ms=01; Mwe=1;
Mov3
M[rn] = RF[rm]
to Fetch
RFr1a=rn; RFr1e=1;
Ms=10; Mwe=1;
RF[rn]= imm
to Fetch
RF[rn] =RF[rn]+RF[rm]
to Fetch

RFr1a=rn; RFr1e=1;
RFr2a=rm; RFr2e=1; ALUs=00
Sub
RF[rn] = RF[rn]-RF[rm]
to Fetch

RFr1a=rn; RFr1e=1;
RFr2a=rm; RFr2e=1; ALUs=01
Jz
PC=(RF[rn]=0) ?rel :PC

to Fetch
PCld= ALUz;
RFrla=rn;
RFrle=1;
Mov4
Add
FSMD
Control unit
FSM operations that replace the FSMD

operations after a datapath is created
To all
input
control
signals
Controller
(Next-state and
control
logic; state
register)
16
PCld
PCinc
PC
IR
From all
output
control
signals
Irld
RFs
2x1 mux
RFwa
RFwe
RFr1a
RFw
RF (16)
RFr1e
RFr2a
RFr2e
RFr1
RFr2
ALUs
PCclr
2
Ms
Datapath
3x1 mux
ALUz
ALU
Mre Mwe
Memory
You just built a simple microprocessor!

37
Chapter Summary
General-purpose processors
Good performance, low NRE, flexible
Controller, datapath, and memory

Structured languages prevail
But some assembly level programming still necessary
Many tools available

Including instruction-set simulators, and in-circuit emulators
ASIPs
Microcontrollers, DSPs, network processors, more customized
ASIPs
Choosing among processors is an important step

Designing a general-purpose processor is conceptually
the same as designing a single-purpose processor
38

General-Purpose Processors Explained: Architecture, Operations & Development

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

General-Purpose Processors Explained: Architecture, Operations & Development

Enviado por

Direitos autorais:

Formatos disponíveis

Chapter 3

Processor designed for a variety of computation

Carefully designed since higher NRE is acceptable

Low NRE cost, short time-to-market/prototype,

a.k.a. microprocessor micro used when they

Control unit and

Input certain registers through

Control unit: configures the

Sequence of desired operations

Instruction cycle broken into

Control Unit Sub-Operations

Control Unit Sub-Operations

Control Unit Sub-Operations

Control Unit Sub-Operations

Control Unit Sub-Operations

Fetch Decode ops

Fetch Decode ops

Fetch Decode ops

Fetch Decode ops

Fetch Decode ops

Fetch Decode ops

pipelined instruction execution

Superscalar and VLIW

Two Memory Architectures

Fast/expensive technology, usually on

Slower/cheaper technology, usually on

Two levels of instructions:

Most development today done using structured

Equivalent assembly program

MOV R0, #0;

Registers: How many are there?

Another method: Instruction set simulator (ISS)

Sometimes too general to be effective in

But single-purpose processor has high NRE, not

ASIPs targeted to a particular domain

On-chip program and data memory

Another Common ASIP: Digital

Trend: Even More Customized

Opportunity to add a custom datapath hardware

Another solution: retargettable compilers

Speed: how evaluate a processors speed?

SPEC: set of more realistic benchmarks, but oriented to desktops

General Purpose Processors

4K ROM, 128 RAM,

Digital Signal Processors

But instructive to see

PC=(RF[rn]=0) ?rel :PC

Storage devices for each

register file holds each of

Functional units to carry

RFwa=rn; RFwe=1; RFs=01;

RFwa=rn; RFwe=1; RFs=10;

RFwa=rn; RFwe=1; RFs=00;

RFwa=rn; RFwe=1; RFs=00;

PC=(RF[rn]=0) ?rel :PC

FSM operations that replace the FSMD

You just built a simple microprocessor!

Controller, datapath, and memory

Many tools available

Choosing among processors is an important step

Você também pode gostar