Escolar Documentos
Profissional Documentos
Cultura Documentos
8-2
CISC Architecture
Examples
Intel x86, IBM Z-Series Mainframes, older CPU architectures
Characteristics
Few general purpose registers Many addressing modes Large number of specialized, complex instructions Instructions are of varying sizes
Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-3
RISC Features
Examples
Power PC, Sun Sparc, Motorola 68000
Limited and simple instruction set Fixed length, fixed format instruction words
Enable pipelining, parallel fetches and executions
8-6
8-7
8-8
VLIW Architecture
Transmeta Crusoe CPU 128-bit instruction bundle = molecule
4 32-bit atoms (atom = instruction) Parallel processing of 4 instructions
EPIC Architecture
Intel Itanium CPU 128-bit instruction bundle
3 41-bit instructions 5 bits to identify type of instructions in bundle
128 64-bit general purpose registers 128 82-bit floating point registers Intel X86 instruction set included Programmers and compilers follow guidelines to ensure parallel execution of instructions
8-11
Paging
Managed by the operating system Built into the hardware Independent of application
8-12
8-14
8-15
8-16
Memory Enhancements
Memory is slow compared to CPU processing speeds!
2Ghz CPU = 1 cycle in of a billionth of a second 70ns DRAM = 1 access in 70 millionth of a second
Retrieve multiple bytes instead of 1 byte at a time Partition memory into subsections, each with its own address register and data register
Memory Interleaving
Cache Memory
Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-17
Memory Interleaving
8-18
Why Cache?
Even the fastest hard disk has an access time of about 10 milliseconds 2Ghz CPU waiting 10 milliseconds wastes 20 million clock cycles!
8-19
Cache Memory
Blocks: 8 or 16 bytes Tags: location in main memory Cache controller
hardware that checks tags
Cache Line
Unit of transfer between storage and cache memory
Hit Ratio: ratio of hits out of total requests Synchronizing cache and memory
Write through Write back
Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-20
8-21
8-22
Performance Advantages
Hit ratios of 90% common 50%+ improved execution speed Locality of reference is why caching works
Most memory references confined to small region of memory at any given time Well-written program in small loop, procedure or function Data likely in array Variables stored together
Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-23
Two-level Caches
Why do the sizes of the caches have to be different?
8-24
8-25
8-26
Timing Issues
Computer clock used for timing purposes MHz million steps per second GHz billion steps per second Instructions can (and often) take more than one step Data word width can require multiple steps
8-27
Several instructions are fetched in parallel and held in a buffer until decoded and executed IP Instruction Pointer register
Execute Unit
Receives instructions from the decode unit Appropriate execution unit services the instruction
Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-28
8-29
Instruction Pipelining
Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions Only one instruction is being executed to completion at a time Scalar processing
Average instruction execution is approximately equal to the clock speed of the CPU
8-30
8-31
Pipelining Example
8-32
Superscalar Processing
Process more than one instruction per clock cycle Separate fetch and execute cycles as much as possible Buffers for fetch and decode phases Parallel execution units
8-33
8-34
8-35
Superscalar Issues
Out-of-order processing dependencies (hazards) Data dependencies Branch (flow) dependencies and speculative execution Parallel speculative execution or branch prediction Branch History Table Register access conflicts
Logical registers
Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-36
Hardware Implementation
Hardware operations are implemented by logic gates Advantages
Speed RISC designs are simple and typically implemented in hardware
8-37
Microprogrammed Implementation
Microcode are tiny programs stored in ROM that replace CPU instructions Advantages
More flexible Easier to implement complex instructions Can emulate other CPUs
Disadvantage
Requires more clock cycles
Chapter 8: CPU and Memory: Design, Implementation, and Enhancement 8-38
8-39