Você está na página 1de 49

CC513: Computing Systems

Lecture 3: Processors & Memory Hierarchy


Chapter 4: Advanced Computing Architecture
Kai Hwang

1 By: Dr Wael Hosny


Processors & Memory Hierarchy
 In this chapter the following points will be covered
1. Instruction set Architectures
 CISC
 RISC
2. Processors
 Superscalar
 VLIW
 Superpipelined
 Vector
3. Memory Hierarchy & Capacity Planning
4. Virtual memory, address translation & page replacement
policies
2 By: Dr Wael Hosny
Advanced Processor Technology
 Major processor families
 CISC
 RISC
 Superscalar
 VLIW
 Superpipelined
 Vector
 Symbolic
 Scalar & vector processors  Numerical computations
 Symbolic processors  AI applications

3 By: Dr Wael Hosny


Design Space of Processors

 Processors families are classified according to:


 Clock rates (Hz)
 CPI (cycle per instruction)
 It is better to have
 More clock rates
 Less CPI (by hardware & software approaches)

4 By: Dr Wael Hosny


Design Space of Modern Processor
Families
Scalar
CISC Super
pipelined
Scalar
RISC
CPI Rate

Superscalar
RISC

Vector
VLIW Supercomputer

5 By: Dr Wael Hosny


Clock Rate
 Note that CISC falls at upper left where high CPI (bad) & low clock
rate (bad)
 Complex instruction set computing e.g. i486, M68040
 Today RISC faster clock rates (Reduced Instruction Set Computer)
 Superscalar processors:
 Special subclasses of RISC processors which allows multiple instruction
to be issued simultaneously during each cycle
 VLIW (Very Long Instruction Word): Uses more functional units
than superscalar processor uses long instructions (256  1024 bits
per instruction) implemented in microprogrammed control
 Superpipelined
 Use multiphase clock
 Increase in clock rate (good)
 CPI is high (bad)
6 By: Dr Wael Hosny
 Vector supercomputers
 Processors are pipelined
 Use multiple functional units for concurrent scalar & vector
operations

 Cost increases if going to lower right corner

7 By: Dr Wael Hosny


Instruction Pipelines

 Execution of typical instruction includes 4 phases:


 Fetch
 Decode
 Execute
 Write back
 These phases are often executed in an instruction pipeline

8 By: Dr Wael Hosny


Pipeline Cycle

 It is the time required for each phase to complete its


operation, assuming equal delay in all phases (pipeline stages)

9 By: Dr Wael Hosny


Basic definitions associated with instruction pipeline
operations:

1. Instruction Pipeline Cycle


 The clock period of instruction pipeline
2. Instruction issue latency
 The time (in cycles) required between the issuing of two adjacent
instructions
3. Instruction issue rate
 The number of instructions issued per cycle (degree of superscalar processor)

10 By: Dr Wael Hosny


4. Simple Operation Latency (measured in number of cycles)
 Simple operations make up the vast majority of instructions
executed by the machine
 E.g.: Adds, loads, stores, branches, moves…etc
 But complex operations: requires order of magnitude longer
latencies
 E.g.: Divides, cache misses
5. Resource Conflict
 Refers to situation where 2 or more instruction demand the use of
the same functional unit at the same time

11 By: Dr Wael Hosny


Base Scalar Processor
 Machine with ONE instruction issued per cycle
 One latency for simple operation
 One cycle latency between instruction issues (fully utilized)

 Instruction latency can be more than 1 cycle

12 By: Dr Wael Hosny


Execution in Base Scalar Processor

Successive Instructions
F D E W
F D E W
F D E W
F D E W

Time in base cycles

13 By: Dr Wael Hosny


 Although max utilization for instruction pipeline is when one
instruction issue per cycle

14 By: Dr Wael Hosny


Underpipelined with 2 cycles per
instruction issue
Successive Instructions

F D E W
F D E W
F D E W
F D E W

Time in base cycles

15 By: Dr Wael Hosny


 There would be instruction issue latency is 2 cycles per
instruction  instruction underutilized

16 By: Dr Wael Hosny


Underpipelined with twice the base cycle
Successive Instructions

F D E W
F D E W
F D E W

Time in base cycles

17 By: Dr Wael Hosny


 Another underpipelined situation pipeline cycle time is
doubled by combining pipeline stages
 In this case fetch & decode are combined into one pipeline
stage and also execute & write back  poor utilization
 The effective CPI is 1 cycle per instruction
 The clock rate is lowered by one half

18 By: Dr Wael Hosny


Processors & Coprocessors
 Processor = Central Processing Unit (CPU)
 CPU is essentially a scalar processor that consists of multiple
functional units
 Functional units
 Arithmetic & Logic Unit (ALU)
 Floating Point Accelerator
 … etc

19 By: Dr Wael Hosny


Architectural Models of a Basic Scalar
Computer System

1. Shows CPU with built in floating point unit

2. Shows CPU with an attached coprocessor

20 By: Dr Wael Hosny


CPU with built in Floating Point Unit
DMA
Main Memory
CPU

Integer ALU Cache I/O Mass


subsystem Storage
Floating Control I/O bus
Point Unit Unit
User

21 By: Dr Wael Hosny


CPU with an Attached Coprocessor

Main Memory

Data Instruction/data

Instruction CPU
Coprocessor
processor

I/O Mass
subsystem Storage

22 By: Dr Wael Hosny


 The coprocessor executes, instructions dispatched from the
CPU
 A coprocessor may be a floating point accelerator executing
 Scalar data
 Vector processor
 Digital Signal Processor (DSP)
 LISP processor executing AI programs

23 By: Dr Wael Hosny


Instruction Set Architecture
 In this section we characterize instruction sets & examine
higher features built into RISC & CISC scalar processors
 The instruction set of a computer specifies the primitive
commands or machine instructions that a programmer can
use in programming the machine
 The complexity of an instruction set is attributed to the
 Instruction formats
 Data formats
 Addressing modes
 General purpose registers
 Opcode specifications & flow control mechanisms used

24 By: Dr Wael Hosny


Complex Instruction Sets
 At the early days of computer families they started with
instruction set was simple because of the high cost of hardware
 But the hardware cost has dropped & software cost has gone up in
last 3 decades
 The result that more functional units have been built into the
hardware
 Making the instruction set very large & complex
 The growth of instruction sets was encouraged by the popularity
of microprogrammed control (which was flexible in use)
 Typical CISC instruction set contains approx 120 – 350
instructions using variable instruction/data formats

25 By: Dr Wael Hosny


Reduced Instruction Sets
 We started with RISC instruction sets & gradually moved to CISC
instruction set during 1980s
 After 2 decades of using CISC processors, computer users began to
reevaluate the performance relationship between:
 Instruction set architecture
 Available hardware/software technology
 Through many years of programming they have found that:
 25% of instructions are used frequently about 95% of the time
 This implies that about 75% of hardware supported instructions
often are not used at all
 So why we waste valuable chip area for rarely used instructions (good
tip)
26 By: Dr Wael Hosny
 So for these rarely used instruction remove them from hardware
& let software deals with them
 In addition: we can use the saved area (chip area) to empower the
RISC performance
 RISC instruction set contains less than 100 instruction  fixed
instruction format
 3-5 simple addressing modes where used
 Hardware control
 Large register file for multiusers system used for fast context
switching among them

27 By: Dr Wael Hosny


Architectural Distinctions
 Hardware features built into CISC & RISC processors are
compared below

 The following figure shows the architecture distinction between


modern CISC & traditional RISC
a. CISC architectural with microprogrammed control & unified
cache
b. RISC architecture with hardwired control & split instruction
cache & data cache

28 By: Dr Wael Hosny


CISC Architectural with Microprogrammed
Control & Unified cache

Control Unit Instruction & data path

Microprogrammed Cache
Control Memory

Main
Memory

29 By: Dr Wael Hosny


RISC Architecture with Hardwired Control &
split instruction cache & data cache

Hardwired Control Unit Instruction & data path

Instruction Cache Data Cache

Instruction Data
Main Memory

30 By: Dr Wael Hosny


 In CISC uses a unified cache for holding both instructions & data
 They must share the same data/instruction path

 In RISC separate instruction & data caches are used with


different access paths

31 By: Dr Wael Hosny


Architectural CISC RISC
Characteristics
Instruction set size & Large set of instruction with Small sets of instruction with
instruction formats variable formats (16 – 64 bits fixed (32 bit)
per instruction)
Addressing modes 12-24 Limited to 3-5
General purpose register & 8-24 GPR-unified cache for Large number (32-192) of
cache design instruction & data GPR wit mostly split data
cache & instruction cache
Clock rate & CPI 30-50MHz 50-150 MHz
CPI=215 CPI < 1.5
CPU control Most micro-coded using Most hardwired without
control memory (ROM), but control memory
modern CISC also use
hardwired control

32 By: Dr Wael Hosny


CISC Scalar Processors
Digital Equipment VAX 8600 Processor Architecture
 It is an example on a typical CISC processor architecture
 It consists of 2 functional units for concurrent execution of
integer & floating point instructions
 Unified cache for
 Instruction
 Data
 Translation lookaside buffer (TLB) is used in the memory
control unit for fast generation of a physical address from a
virtual address
 Performance of processor pipelines relies on the cache hit
ratio and on minimal branching damage to the pipeline flow
33 By: Dr Wael Hosny
Typical CISC Processor Architecture
Console bus
Console Virtual Address

Execution
Instruction Memory & I/O I/O
Unit Integer
Unit (16 GPR)
Cache Control (TLB) Subsystem
ALU

Main
Control Memory
Floating Point Memory
Unit

Operand bus
34 By: Dr Wael Hosny
Example: The Motorola MC68040
Microprocessor Architecture
 The processor implements over 100 instruction using 16 general
purpose registers
 4Kbyte data cache
 4Kbyte instruction cache with separate memory management
units (MMU) supported by an Address Translation Cache (ATC)
 ATC= TLB used in other systems
 18 addressing modes are supported
 Integer unit is organized in 6 stage instruction pipeline
 Floating point units consists of 3 pipeline stages

35 By: Dr Wael Hosny


Motorola MC68040 Microprocessor
Architecture

36 By: Dr Wael Hosny


 Dual MMUs allow interleaved fetch of instructions & data from
the main memory
 Both the address bus data bus are 32 bits wide
 3 simultaneous memory requests can be generated by the dual
MMU including data operand read & write
 Snooping logic is built into the memory units for monitoring bus
events for cache invalidation
 Memory management is provided with a virtual demand paged
operating system

37 By: Dr Wael Hosny


RISC Scalar Processor
 Scalar RISC are designed to issue one instruction per cycle
 In theory, both RISC & CISC scalar processors should perform
about the same if they run with the same clock rate & equal
program length
 These 2 assumptions are not always valid, because the
architecture affects
 Quality
 Density of code generated by the compiler
 The reliance on a good compiler is much demanding in a RISC
processor than in CISC
 Instruction level parallelism is exploited by pipelining in both
processor architectures
38 By: Dr Wael Hosny
 Without
Neither RISC nor CISC can perform
 High clock rate as well as designed
 Low CPI
 Good compilation support
 Simplicity of a RISC processor may lead to the ideal
performance of the base scalar as
Successive Instructions

F D E W
F D E W
F D E W
F D E W

39 By: Dr Wael Hosny Time in base cycles


Representation RISC Processors
 4 based processors use 32 bit instructions
 The instruction sets consist of 51 to124 basic instructions
 Among the 4 scalar RISC processors, we choose to examine the
SUN SPARC & i860 architecture
 SPARC stands for scalable processor architecture
 The scalability of SPARC refers to use of a different number of
register window in different SPARC implementations
 But scalability in M88100 refers to the number of special
functional units (SFUs)

40 By: Dr Wael Hosny


SUN Microsystems SPARC Architecture
 Implement floating point unit (FPU) on a separate coprocessor
 Contains a RISC integer unit (IU) implemented with 2 to 32
register windows
 The SPARC runs each procedure with a set of thirty two 32-bit IU
registers
 8 of these registers are global registers shared by all procedures
 24 are window registers
 Each register window is divided into 3 eight register sections
labeled INs, locals and OUTs
 Local registers are only locally addressable by each procedure

41 By: Dr Wael Hosny


 The INs and OUTs are shared among procedures

15
31
24
INs Locals OUTs 8
15
31
24
INs Locals OUTs 8

7
0 Globals

42 By: Dr Wael Hosny


 The calling procedure passes parameters to the called procedure
via its OUTs (r8 to r15) registers which are the Ins of the called
procedure
 The window of the currently running procedure is called the
active window pointed by the current window pointer
 Window Invalid Mask (WIM) is used to indicate which window is
invalid
 The overlapping windows can significantly save the time required
for interprocedure communications resulting in much faster
context switching among cooperative procedure

43 By: Dr Wael Hosny


RISC Impacts
 RISC processor lacks same sophisticated instructions found
in CISC processors
 The increase in RISC program length implies more
instruction traffic & greater memory demand
 Problem caused by the large register file
 Although register file
 Reduce data traffic between CPU & memory
 Holds intermediate results
 RISC hardwired: less flexible

44 By: Dr Wael Hosny


Superscalar & Vector Processors
 CISC or RISC scalar processors can be improved with a
superscalar or vector architecture
 Scalar processors are these executing 1 instruction per cycle
(i.e., only 1 instruction issued per cycle)
 The superscalar processor, multiple instruction pipelines are
used
 This implies multiple instruction are issued per cycle and
multiple results are generated per cycle
 A vector processor executes vector instruction on arrays of
data
 Thus, each instruction involves a strong of repeated operations
which are ideal pipelining with one result per cycle
45 By: Dr Wael Hosny
Superscalar Processors
 Superscalar processors are designed to exploit more instruction
level parallelism in user programs
 Only INDEPENDENT instructions can be executed in parallel
without causing a wait state
 The amount of instruction level parallelism varies widely
depending on the type of code being executed
 The shown fig. shows the use of 3 instruction pipelines in parallel
for triple issue processor
 Superscalar processors were developed as an alternative for
vector processors
 A scalar processor of degree m can issue m instructions per cycle

46 By: Dr Wael Hosny


 Thus, the base scalar processor, implemented in either RISC or
CISC has m=1
 A superscalar machine that can issue a fixed point, floating point,
load and branch all in one cycle achieves the same effective
parallelism as a vector machine which executes a vector load,
chained to vector add
 A typical architecture for superscalar RISC
 Multiple instruction pipelines are used
 Instruction cache supplies multiple instructions per fetch
 The actual number of instructions used to various function units
may vary each cycle
 The number of instructions is constrained by
 Data dependencies
 Resource conflicts among instructions
47 By: Dr Wael Hosny
Instruction Memory A Typical Architecture for Superscalar RISC
Instruction Cache

Decoder Register File Reorder Buffer

RS

Branch ALU Shifter Load Store

Decoder Register File Reorder Buffer

RS

Float Float Float Float Float Float


Add Convert Multiply Divide Load Store
Address/data
48
Data Memory Data Cache
 Multiple functional units are built into the integer unit and into
floating point unit
 Multiple data buses exist among the functional units
 All functional units can be simultaneously used
 No conflict
 Independencies

49 By: Dr Wael Hosny

Você também pode gostar