Escolar Documentos
Profissional Documentos
Cultura Documentos
•Dependability
•Performance
•Quantitative principles of Computer
Design
Define and quantify dependability
• Infrastructure providers now offer Service Level
Agreements (SLA) to guarantee that their service would
be dependable
• Systems alternate between 2 states of service with
respect to an SLA:
1. Service accomplishment, where the service is delivered
as specified in SLA
2. Service interruption, where the delivered service is
different from the SLA
• Failure = transition from state 1 to state 2
• Restoration = transition from state 2 to state 1
7/23/2018 2
Define and quantify dependability
• Module reliability measure of continuous service
accomplishment
2 metrics
1. Mean Time To Failure (MTTF) measures Reliability
2. Failures In Time (FIT) = 1/MTTF, the rate of failures
• Traditionally reported as failures per billion hours of operation
• Mean Time To Repair (MTTR) measures Service
Interruption
– Mean Time Between Failures (MTBF) = MTTF+MTTR
• Module availability measures service as alternate
between the 2 states of accomplishment and
interruption (number between 0 and 1, e.g. 0.9)
• Module availability = MTTF / ( MTTF + MTTR)
7/23/2018 3
Example calculating reliability
• If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is the
sum of failure rates of the modules
• Calculate FIT and MTTF for disk subsystem
with 10 disks (1M hour MTTF per disk), 1 disk
controller (0.5M hour MTTF), and 1 power
supply (0.2M hour MTTF):
FailureRat e
MTTF
7/23/2018 4
Example calculating reliability
• If modules have exponentially distributed
lifetimes (age of module does not affect
probability of failure), overall failure rate is the
sum of failure rates of the modules
• Calculate FIT and MTTF for 10 disks (1M hour
MTTF per disk), 1 disk controller (0.5M hour
MTTF), and 1 power supply (0.2M hour MTTF):
FailureRat e 10 (1 / 1,000,000) 1 / 500,000 1 / 200,000
10 2 5 / 1,000,000
17 / 1,000,000
17,000 FIT
MTTF 1,000,000,000 / 17,000
59,000hours
7/23/2018 5
How to measure performance of Computer?
7/23/2018 6
Performance: What to measure?
• Typical performance metrics:
– Response time
– Throughput
• X is n times faster than Y
– Execution timeY / Execution timeX
• Execution time
– Wall clock time: includes all system overheads
– CPU time: only computation time
• Benchmarks
– Kernels (e.g. matrix multiply)
– Toy programs (e.g. sorting)
– Synthetic benchmarks (e.g. Dhrystone)
– Benchmark suites (e.g. SPEC06fp, TPC-C)
7/23/2018 7
Performance: How to measure?
• SPECCPU: popular desktop benchmark suite
– CPU only, split between integer and floating point programs
– SPECSFS (NFS file server) and SPECWeb (WebServer) added as
server benchmarks
7/23/2018 9
How Summarize Suite Performance?
• Arithmetic average of execution time of all pgms?
– But they vary by 4X in speed, so some would be more important
than others in arithmetic average
• Could add a weights per program, but how pick
weight?
– Different companies want different weights for their products
• SPECRatio: Normalize execution times to reference
computer, yielding a ratio proportional to
performance =
time on reference computer
time on computer being rated
7/23/2018 10
How Summarize Suite Performance
• If program SPECRatio on Computer A is 1.25
times bigger than Computer B, then
ExecutionTimereference
SPECRatio A ExecutionTime A
1.25
SPECRatioB ExecutionTimereference
ExecutionTimeB
ExecutionTimeB Performance A
ExecutionTime A PerformanceB
• Note that when comparing 2 computers as a ratio,
execution times on the reference computer drop
out, so choice of reference computer is irrelevant
7/23/2018 11
How Summarize Suite Performance
• Since ratios, proper mean is geometric mean
(SPECRatio unitless, so arithmetic mean meaningless)
n
GeometricMean n SPECRatio
i 1
i
7/23/2018 14
1) Taking Advantage of Parallelism
7/23/2018 15
Pipelined Instruction Execution
Time (clock cycles)
ALU
n Ifetch Reg DMem Reg
s
t
r.
ALU
Ifetch Reg DMem Reg
O
r
ALU
Ifetch Reg DMem Reg
d
e
r
ALU
Ifetch Reg DMem Reg
7/23/2018 16
2) The Principle of Locality
7/23/2018 17
3) Focus on the Common Case
• In making a design trade-off, favor the frequent
case over the infrequent case
– E.g., Instruction fetch and decode unit used more frequently
than multiplier, so optimize it 1st
– E.g., If database server has 50 disks / processor, storage
dependability dominates system dependability, so optimize it 1st
• Frequent case is often simpler and can be done
faster than the infrequent case
• What is frequent case and how much performance
improved by making case faster => Amdahl’s Law
7/23/2018 18
Amdahl’s Law
• It states that performance improvement gained from
using enhancement is limited by the fraction of the
time the enhancement can be used.
• Speedup:
7/23/2018 19
Amdahl’s Law
• Execution Time using computer with the enhancement
= Unenhanced portion of computer + time spent using
enhancement
Fraction enhanced
Fraction enhanced / Speedup enhanced
7/23/2018 20
Problem
• Suppose FP square root (FPSQR) is responsible for 20% of the
execution time for a graphics. One proposal is to enhance the FPSQR
hardware and speed up this operation by a factor of 10. The other
alternative is just to try to make all FP instructions in the graphics
processor run faster by a factor of 1.6; FP instructions are responsible
for half of the execution time for the application. The design team
believes that they can make all FP instructions run 1.6 times faster
with the same effort as required for the fast square root. Compare
these two design alternatives.
Design 1: FPSQR enhancement
Fraction enhanced = 20%
Speedup enhanced = 10
Design 2: FP enhancement
Fraction enhanced = 50%
Speedup enhanced = 1.6
7/23/2018 21
Amdahl’s Law
ExTimeold 1
Speedupoverall
ExTimenew Fractionenhanced
1 Fractionenhanced
Speedupenhanced
7/23/2018 22
Amdahl’s Law example
• Suppose in a web server new CPU 10X faster
• Assume that it’s an I/O bound server, so 60% time
waiting for I/O
1
Speedup overall
1 Fraction enhanced Fraction enhanced
Speedup enhanced
1 1
1.56
1 0.4 0.4 0.64
10
• Apparently, its human nature to be attracted by 10X
faster, vs. keeping in perspective its just 1.6X faster
7/23/2018 23
Processor performance equation
• Micro-processors are based on a clock running at
a constant rate
CPU time = CPU clock cycles for a program * Clock cycle time 1
7/23/2018 26
𝑛
7/23/2018 27
Problem
• Consider a graphics card, with
– FP operations (excluding FPSQR): frequency 25%,
average CPI 4.0
– FPSQR operations only: frequency 2%, average CPI
20
– all other instructions: average CPI 1.3333333
7/23/2018 28
𝑛
𝐼𝐶𝑖
𝐶𝑃𝐼𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 = ∗ 𝐶𝑃𝐼𝑖
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑖=1
= 0.25 ∗ 4 + 0.02 ∗ 20 + 0.73 ∗ 1.33
= 2.3709
𝑛
𝐼𝐶𝑖
𝐶𝑃𝐼1 = ∗ 𝐶𝑃𝐼𝑖
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑖=1
= 0.25 ∗ 4 + 0.02 ∗ 2 + 0.73 ∗ 1.33
= 2.0109
𝑛
𝐼𝐶𝑖
𝐶𝑃𝐼2 = ∗ 𝐶𝑃𝐼𝑖
𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛 𝐶𝑜𝑢𝑛𝑡
𝑖=1
= 0.25 ∗ 2.5 + 0.02 ∗ 20 + 0.73 ∗ 1.33
= 1.9959
7/23/2018 29
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙
𝑆𝑝𝑒𝑒𝑑𝑢𝑝1 =
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒1
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 2.3709
= =
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 1 2.0109
= 1.179
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙
𝑆𝑝𝑒𝑒𝑑𝑢𝑝2 =
𝐶𝑃𝑈 𝑇𝑖𝑚𝑒2
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 2.3709
= =
𝐼𝐶 ∗ 𝐶𝑙𝑜𝑐𝑘 𝐶𝑦𝑐𝑙𝑒 ∗ 𝐶𝑃𝐼 2 1.9959
= 1.187
7/23/2018 30
Problem
• Suppose a program (or a program task) takes 1 billion
instructions to execute on a processor running at 2 GHz. 50% of
the instructions execute in 3 clock cycles, 30% execute in 4
clock cycles, and 20% execute in 5 clock cycles. What is the
execution time for the program or task? If the processor is
redesigned such that all instructions that initially executed in 5
cycles now execute in 4 cycles. What is the overall percentage
improvement?
7/23/2018 31