Você está na página 1de 46

COMPUTER ORGANIZATION AND DESIGN

The Hardware/Software Interface

Chapter 1*
Computer Abstractions
and Technology

* Slides based on the Book


Computer Organization and Design,
The Hardware/Software Interface – RISC-V edition;
2021 (2nd edition); David A. Patterson, John L. Hennessy

Some text, pictures and slides were added to update the original
edition

L.EEC012 António José Araújo (aja@fe.up.pt)


Arquitetura de Computadores Hélio Sousa Mendonça (hsm@fe.up.pt)
§1.1 Introduction
The Computer Revolution
 Progress in computer technology
 Underpinned by Moore’s Law and domain-
specific accelerators
 Makes novel applications feasible
 Computers in automobiles
 Cell phones
 Human genome project
 World Wide Web
 Search Engines
 Computers are pervasive
Chapter 1 — Computer Abstractions and Technology — 2
Moore’s Law

 Implications:
 More transistors,
→ more innovations
 Smaller transistors,
→ higher frequency

Chapter 1 — Computer Abstractions and Technology — 3


Classes of Computers
 Personal computers
 General purpose, variety of software
 Subject to cost/performance tradeoff

 Server computers
 Network based
 High capacity, performance, reliability
 Range from small servers to building sized

Chapter 1 — Computer Abstractions and Technology — 4


Classes of Computers
 Supercomputers
 Type of server
 High-end scientific and
engineering calculations
 Highest capability but represent a small
fraction of the overall computer market

 Embedded computers
 Hidden as components of
systems
 Stringent power/performance/cost constraints
Chapter 1 — Computer Abstractions and Technology — 5
The PostPC Era

Chapter 1 — Computer Abstractions and Technology — 6


The PostPC Era
 Personal Mobile Device (PMD)
 Battery operated
 Connects to the Internet
 Hundreds of dollars
 Smart phones, tablets, electronic glasses
 Cloud computing
 Warehouse Scale Computers (WSC)
 Software as a Service (SaaS)
 Portion of software run on a PMD and a
portion run in the Cloud
 Amazon and Google
Chapter 1 — Computer Abstractions and Technology — 7
What You Will Learn
 How programs are translated into the
machine language
 And how the hardware executes them
 The hardware/software interface
 What determines program performance
 And how it can be improved
 How hardware designers improve
performance
 What is parallel processing
Chapter 1 — Computer Abstractions and Technology — 8
Understanding Performance

 Algorithm
 Determines number of operations executed
 Programming language, compiler, architecture
 Determine number of machine instructions executed
per operation
 Processor and memory system
 Determine how fast instructions are executed
 I/O system (including OS)
 Determines how fast I/O operations are executed

Chapter 1 — Computer Abstractions and Technology — 9


§1.2 Seven Great Ideas in Computer Architecture
Seven Great Ideas
 Use abstraction to simplify design

 Make the common case fast

 Performance via parallelism

 Performance via pipelining

 Performance via prediction

 Hierarchy of memories

 Dependability via redundancy

Chapter 1 — Computer Abstractions and Technology — 10


§1.3 Below Your Program
Below Your Program
 Application software
 Written in high-level language
 System software
 Compiler: translates HLL code to
machine code
 Operating System: service code
 Handling input/output
 Managing memory and storage
 Scheduling tasks & sharing resources
 Hardware
 Processor, memory, I/O controllers

Chapter 1 — Computer Abstractions and Technology — 11


Levels of Program Code
 High-level language
 Level of abstraction closer
to problem domain
 Provides for productivity
and portability
 Assembly language
 Textual representation of
instructions
 Hardware representation
 Binary digits (bits)
 Encoded instructions and
data

Chapter 1 — Computer Abstractions and Technology — 12


§1.4 Under the Covers
Components of a Computer
The BIG Picture  Same components for
all kinds of computer
 Desktop, server,
embedded
 Input/output includes
 User-interface devices
 Display, keyboard, mouse
 Storage devices
 Hard disk, CD/DVD, flash
 Network adapters
 For communicating with
other computers

Chapter 1 — Computer Abstractions and Technology — 13


Opening the Box (iPhone XS)

Chapter 1 — Computer Abstractions and Technology — 14


Inside the Processor (CPU)
 Datapath: performs operations on data
 Control: sequences datapath, memory, ...
 Cache memory: Small fast SRAM memory for
immediate access to data

A12 processor

Chapter 1 — Computer Abstractions and Technology — 15


Abstractions
The BIG Picture

 Abstraction helps us deal with complexity


 Hide lower-level detail
 Instruction set architecture (ISA)
 The hardware/software interface
 Application binary interface
 The ISA plus system software interface
 Implementation
 The details underlying and interface
Chapter 1 — Computer Abstractions and Technology — 16
§1.5 Technologies for Building Processors and Memory
Technology Trends
 Electronics technology continues to evolve
 Increased capacity and performance
 Reduced cost

DRAM capacity

Year Technology Relative performance/cost


1951 Vacuum tube 1
1965 Transistor 35
1975 Integrated circuit (IC) 900
1995 Very large scale IC (VLSI) 2,400,000
2013 Ultra large scale IC 250,000,000,000

Chapter 1 — Computer Abstractions and Technology — 17


Semiconductor Technology

 Silicon: semiconductor
 Add materials to transform properties:
 Conductors
 Insulators
 Switch

Chapter 1 — Computer Abstractions and Technology — 18


Manufacturing ICs

 Yield: proportion of working dies per wafer


Chapter 1 — Computer Abstractions and Technology — 19
Intel Core i7 vs 10th Gen Wafer

300mm wafer, 280 chips, 32nm 300mm wafer, 506 chips, 10nm
Each chip is 20.7 x 10.5 mm Each chip is 11.4 x 10.7 mm

Chapter 1 — Computer Abstractions and Technology — 20


§1.6 Performance
Defining Performance
 Which airplane has the best performance?

Chapter 1 — Computer Abstractions and Technology — 21


Response Time and Throughput
 Response time
 How long it takes to do a task
 Throughput
 Total work done per unit time
 e.g., tasks/transactions/… per hour
 How are response time and throughput affected
by
 Replacing the processor with a faster version?
 Adding more processors?
 We’ll focus on response time for now…

Chapter 1 — Computer Abstractions and Technology — 22


Relative Performance
 Define Performance = 1/Execution Time
 “X is n time faster than Y”
Performanc e X Performanc e Y
= Execution time Y Execution time X = n

 Example: time taken to run a program


 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 23
Measuring Execution Time
 Elapsed time
 Total response time, including all aspects
 Processing, I/O, OS overhead, idle time
 Determines system performance
 CPU time
 Time spent processing a given job
 Discounts I/O time, other jobs’ shares
 Comprises user CPU time and system CPU
time
 Different programs are affected differently by
CPU and system performance
Chapter 1 — Computer Abstractions and Technology — 24
CPU Clocking
 Operation of digital hardware governed by a
constant-rate clock
Clock period

Clock (cycles)

Data transfer
and computation
Update state

 Clock period: duration of a clock cycle


 e.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 25
CPU Time
CPU Time = CPU Clock Cycles × Clock Cycle Time
CPU Clock Cycles
=
Clock Rate

 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate
 Hardware designer must often trade off clock
rate against cycle count

Chapter 1 — Computer Abstractions and Technology — 26


CPU Time Example
 Computer A: 2GHz clock, 10s CPU time
 Designing Computer B
 Aim for 6s CPU time
 Can do faster clock, but causes 1.2 × clock cycles
 How fast must Computer B clock be?
Clock CyclesB 1.2 × Clock Cycles A
Clock RateB = =
CPU Time B 6s
Clock Cycles A = CPU Time A × Clock Rate A
= 10s × 2GHz = 20 × 10 9
1.2 × 20 × 10 9 24 × 10 9
Clock RateB = = = 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 27
Instruction Count and CPI
Clock Cycles = Instruction Count × Cycles per Instruction
CPU Time = Instruction Count × CPI × Clock Cycle Time
Instruction Count × CPI
=
Clock Rate

 Instruction Count for a program


 Determined by program, ISA and compiler
 Average cycles per instruction
 Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 28
CPI Example
 Computer A: Cycle Time = 250ps, CPI = 2.0
 Computer B: Cycle Time = 500ps, CPI = 1.2
 Same ISA
 Which is faster, and by how much?
CPU Time = Instruction Count × CPI × Cycle Time
A A A
= I × 2.0 × 250ps = I × 500ps A is faster…
CPU Time = Instruction Count × CPI × Cycle Time
B B B
= I × 1.2 × 500ps = I × 600ps

B = I × 600ps = 1.2
CPU Time
…by this much
CPU Time I × 500ps
A
Chapter 1 — Computer Abstractions and Technology — 29
CPI in More Detail
 If different instruction classes take different
numbers of cycles
n
Clock Cycles =  (CPIi × Instruction Count i )
i=1

 Weighted average CPI


Clock Cycles n
 Instruction Count i 
CPI = =   CPIi × 
Instructio n Count i=1  Instructio n Count 

Relative frequency

Chapter 1 — Computer Abstractions and Technology — 30


CPI Example
 Alternative compiled code sequences using
instructions in classes A, B, C

Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1

 Sequence 1: IC = 5  Sequence 2: IC = 6
 Clock Cycles  Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
 Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 31
Performance Summary
The BIG Picture

Instructions Clock cycles Seconds


CPU Time = × ×
Program Instruction Clock cycle

 Performance depends on
 Algorithm: affects IC, possibly CPI
 Programming language: affects IC, CPI
 Compiler: affects IC, CPI
 Instruction set architecture: affects IC, CPI, Tc

Chapter 1 — Computer Abstractions and Technology — 32


§1.7 The Power Wall
Power Trends
 In CMOS IC technology

Power = Capacitive load × Voltage 2 × Frequency

Example: ≈ ×40 ≈ ×1 5V → 1V ×1000

Chapter 1 — Computer Abstractions and Technology — 33


Reducing Power
 Suppose a new CPU has
 85% of capacitive load of old CPU
 15% voltage and 15% frequency reduction
Pnew Cold × 0.85 × (Vold × 0.85) 2 × Fold × 0.85
= = 0.85 4
= 0.52
Cold × Vold × Fold
2
Pold

 The power wall


 We can’t reduce voltage further
 We can’t remove more heat
 How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 34
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance

Constrained by power, instruction-level parallelism,


memory latency

Chapter 1 — Computer Abstractions and Technology — 35


Multiprocessors
 Multicore microprocessors
 More than one processor per chip
 Requires explicitly parallel programming
 Compare with instruction level parallelism
 Hardware executes multiple instructions at once
 Hidden from the programmer
 Hard to do
 Programming for performance
 Load balancing
 Optimizing communication and synchronization

Chapter 1 — Computer Abstractions and Technology — 36


SPEC CPU Benchmark
 Programs used to measure performance
 Supposedly typical of actual workload
 Standard Performance Evaluation Corp (SPEC)
 Develops benchmarks for CPU, I/O, Web, …
 SPEC CPU2006
 Elapsed time to execute a selection of programs
 Negligible I/O, so focuses on CPU performance
 Normalize relative to reference machine
 Summarize as geometric mean of performance ratios
 CINT2006 (integer) and CFP2006 (floating-point)

n
n
∏ Execution time ratio
i=1
i

Chapter 1 — Computer Abstractions and Technology — 37


CINT2006 for Intel Core i7 920

Chapter 1 — Computer Abstractions and Technology — 38


Why Geometric Mean?
 It will be independent of the used
reference

Chapter 1 — Computer Abstractions and Technology — 39


§1.11 Fallacies and Pitfalls
Amdahl’s Law
 The performance increase (Speedup) obtained with a given
enhancement is limited by the fraction of time the enhancement can
actually be used. Performance t t +t
Speedup = = =
new org 1 2
Performanceorg t new t1 + t new2
 Fraction of the original time that benefits from the improvement:
t2
t2 Fenh =
 Improvement gain: Genh = t1 + t 2
t new 2
 New execution time is: t new = torg × [(1 − Fenh ) + ( Fenh / Genh )]
1
 Amdahl’s Law: Speedup =
(1 − Fenh ) + ( Fenh / Genh )

 It is therefore advisable to improve the most frequent case!


 In general it is also the simplest case.
 90% of the time programs run 10% of the code (proximity/locality principle)

Chapter 1 — Computer Abstractions and Technology — 40


Amdahl’s Law example
 A program takes 100 seconds to run, of
which 40 are spent performing
multiplication. In order to improve its
performance, multiplication became 5
times faster.
 What is the increase in performance
(Speedup)?
 How many times would you have to improve
the performance of multiplications so that
Speedup was equal to 2?

Chapter 1 — Computer Abstractions and Technology — 41


Amdahl’s Law example
40
Fenh = = 0,4 Genh = 5
100
1
Speedup = = 1,47
(1 − 0,4) + (0,4 / 5)

1
Speedup = =2
(1 − 0,4) + (0,4 / Genh )

(1 − 0,4) + (0,4 / Genh ) = 0,5


0,6 + (0,4 / Genh ) = 0,5
Impossible!!!

Chapter 1 — Computer Abstractions and Technology — 42


Pitfall: MIPS as a Performance Metric
 MIPS: Millions of Instructions Per Second
 Doesn’t account for
 Differences in ISAs between computers
 Differences in complexity between instructions

Instruction count
MIPS =
Execution time × 10 6
Instruction count Clock rate
= =
Instruction count × CPI CPI × 10 6
× 10 6

Clock rate
 CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 43
MIPS vs CPU time example
 For a given machine (fck = 100 MHz), the
number of instructions for the same
program but with different compilers was
measured.
Class CPI Number of inst. per class (106)
Compiler
A 1 A B C
B 2 1 5 1 1
C 3 2 10 1 1

 What is the fastest code according to the


MIPS metric?
 And according to CPU time?
Chapter 1 — Computer Abstractions and Technology — 44
MIPS vs CPU time
5 × 1 + 1× 2 + 1× 3
CPI1 = = 1,43
5 +1+1
Freq 100 × 106
MIPS1 = 6 = 6 = 70
10 × CPI1 10 × 1,43
N inst1 (5 + 1 + 1) ×10 6
texec1 = = = 0,10 s
MIPS1 × 10 6
70 ×10 6

10 × 1 + 1× 2 + 1× 3
CPI 2 = = 1,25
10 + 1 + 1
Freq 100 ×10 6
MIPS 2 = 6 = 6 = 80
10 × CPI1 10 × 1,25
N inst 2 (10 + 1 + 1) × 106
texec 2 = = = 0,15s
MIPS 2 × 10 6
80 × 10 6

Chapter 1 — Computer Abstractions and Technology — 45


§1.12 Concluding Remarks
Concluding Remarks
 Performance/cost is improving
 Due to underlying technology development
 Hierarchical layers of abstraction
 In both hardware and software
 Instruction set architecture
 The hardware/software interface
 Execution time: the best performance
measure
 Power is a limiting factor
 Use parallelism to improve performance
Chapter 1 — Computer Abstractions and Technology — 46

Você também pode gostar