Você está na página 1de 29

1

Performance
What do you mean by performance of computer?
Two important metrics
Response Time or Latency Time taken for
completion of a single job. Smaller is better.
Throughput Number of jobs done per unit of time.
Larger is better.

Does one imply the other?
Yes. Eg. If latency decreases, throughput will increase.
No. Eg. In pipelining, latency may have be increased to
increase throughput!
2
CPU Performance Equation
Rate Clock
n instructio Per Clocks ns Instructio No
TIME CPU
Time Cycle Clock Needed Cycles Clock TIME CPU
_
_ _ * _ .
_
_ _ * _ _ _

What is this Response Time or Throughput??


3
How can we Improve Performance
?
No. Instructions can be reduced by:
Better ISA
Better Compiler
Better Algorithm
Clocks Per Instruction can be reduced by:
Better Hardware Design
Make the common case faster
Clock Rate can be increased by:
Hardware Design
4
Numerical Assignment
A computer (3.06 GHz) has the following CPI

Instruction Type A B C
CPI 1 2 3

An algorithm may be implemented in 2 ways I1 and I2, for each implementation
the number of instructions used (in million) are as follows

Instruction Type A B C
I1 0 2 2
I2 2 2 1

1. Which implementation has lesser number of instructions?
2. What is average CPI for both implementations? Which implementation is
faster?
3. What is the total time taken for executing I1 and I2?
4. What can you say about the MIPS rating?

5
1. No. of Instructions I1 = 4 M
No. of Instructions I2 = 5 M
Hence I1 has lesser number of instructions
2. Clocks req. by I1 = 2*2 + 2*3 = 10 M.
by I2 = 2*1 + 2*2 + 1*3 = 9 M.
Average CPI for I1 = 10/4 = 2.5
I2 = 9/5 = 1.8
I2 is faster as it requires lesser number of
clock cycles. Notice that number of instructions
required by I1 is lesser.
3. Total Time for I1 = 10 M / 3.06 GHz = 3.27 mS
I2 = 9 M / 3.06 GHz = 2.94 mS
6
4. MIPS rating = Million Instructions per second.
This can be calculated from
CPI and clock rate of machine
MIPS = clock rate / CPI * 10
-6


Total Execution Time and Instruction Count
MIPS = Instruction Count / Total Execution Time * 10
-6

MIPS rating for I1 = 1224 MIPS
for I2 = 1700 MIPS
MIPS rating for I2 machine > MIPS rating for I1 machine.
This is as expected, since I2 has lesser execution
time.
7
Probable Conclusions
1. Total Number of instructions is definitely
not a good metric.
2. MIPS is a good metric.

8
Numerical Assignment
A computer (3.06 GHz) has the following CPI

Instruction Type A B C
CPI 5 2 3

An algorithm may be implemented in 2 ways I1 and I2, for each implementation
the number of instructions used (in million) are as follows

Instruction Type A B C
I1 0 2 2
I2 1 2 0

1. Which implementation has lesser number of instructions?
2. What is average CPI for both implementations? Which implementation is
faster?
3. What is the total time taken for executing I1 and I2?
4. What can you say about the MIPS rating?

9
1. No. of Instructions I1 = 4 M
No. of Instructions I2 = 3 M
Hence number of instructions for I1 is greater
than number of instructions for I2.
2. Clocks req. by I1 = 2*2 + 2*3 = 10 M.
by I2 = 1*5 + 2*2 = 9 M.
Average CPI for I1 = 10/4 = 2.5
I2 = 9/3 = 3
I2 is faster as it requires lesser number of
clock cycles.
3. Total Time for I1 = 10 M / 3.06 GHz = 3.27 mS
I2 = 9 M / 3.06 GHz = 2.94 mS
10
4. MIPS rating for I1 = 1224 MIPS
for I2 = 1020 MIPS

MIPS rating for I1 machine > MIPS rating for I2
machine. This is unexpected, since I2 has
lesser execution time.

Conclusion
MIPS is also not a good metric for overall system
performance.
11
Conclusion
Total time of execution is always a better metric as
it sums up all factors and can not be replaced
by considering
1. MIPS
2. Total number of instructions
3. Clock Rate
alone.

12
Measuring Performance
Now that we know that performance is
dependent upon program, which
program(s) should be used to measure
performance?
Benchmarks.
13
Benchmarks
Are a set of programs that are specifically
chosen for measuring performance.
Types of Benchmarks
Real Programs
Kernel
Extract the key feature from a program
Component
Synthetic
Dhrystone floating Point
Whetstone Integer and String Arithemetic
I/O
Parallel

14
Challenges
1. Vendors may tinker with benchmark to
make them run better on their platform.
At-times this is permitted.
2. Give data set rather than a single
performance number.
3. Concentrate only on computational
power.
15
Popular Benchmarks
SPEC - Standard Performance Evaluation Corporation
Floating point
Integer
Web
Graphics
TPC Transaction Processing Performance Council
Web Server
Transaction Processing
Decision Support Systems
BAPCo Business Applications Performance Corporation
Popular business applications
EEMBC Embedded Microprocessor Benchmark Consortium
Embedded Applications

16
Statistical Summarization of Data
For Response time metric
Arithmetic Mean

For Throughput metric
Harmonic Mean or Geometric Mean.
SPEC uses Geometric Mean
17
Are Benchmarks enough?
Benchmarks give the overall performance, if
one wants to optimize performance, it may
be necessary to know about the instruction
or section of program where maximum
time is being spent.

Profilers do this job.
18
Profiling or Dynamic Program
Analysis
Program behavior is analysed as it is
being run.
Techniques used
Instruction Set Simulation
Hardware Interrupts
OS Hooks
Code instrumentation
Example, Intel Vtune, Gprof

19
Simulation
Difficult to build the system. Simulation is
cost effective.
Beneficial for learning/improving some
aspect of architecture.
Simulators available are :
Kiel Instruction Simulator
Little Mans Simulator Simulator of a
machine
Cacheprof Cache Simulator
20
Moores Law (1965)
Moore's Law states that the number of
transistors on a single chip at the same
price will double every 18 to 24 months.
21
Implication?
As more transistors are added to the chip of
the same area, their speed increases,
hence circuits become faster. Or clock rate
increases.
Moores Law in combination with various
other factors like ILP (Instruction Level
Parallelism) were responsible for major
improvements till a long time.

22
Trends in Computing
(Intel Processors)
Fastest Processor
reported in Text, 2003
Current fastest processor,
2008
Intel Processor
name
Pentium 4
Intel Core i7-965 Processor
Extreme Edition
Processor speed 3.20 GHz 3.20GHz
Processor
Primary Level
Cache
12KB + 8KB 4x32KB
Processor
secondary
cache
512 KB 4x256KB Level 2 cache
Processor third
level cache
2 MB Unified inclusive 8MB L3
23
Observations
Fastest Processor
reported in Text 2003
Current fastest processor,
2008
Intel Processor
name
Pentium 4
Intel Core i7-965 Processor
Extreme Edition
Processor speed 3.20 GHz 3.20GHz
Processor
Primary Level
Cache
12KB + 8KB 4x32KB
Processor
secondary
cache
512 KB 4x256KB Level 2 cache
Processor third
level cache
2 MB Unified inclusive 8MB L3
Processor Speed or Clock Rate has not changed!!!
24
Observations
Fastest Processor
reported in Text 2003
Current fastest processor,
2008
Intel Processor
name
Pentium 4
Intel Core i7-965 Processor
Extreme Edition
Processor speed 3.20 GHz 3.20GHz
Processor
Primary Level
Cache
12KB + 8KB 4x32KB
Processor
secondary
cache
512 KB 4x256KB Level 2 cache
Processor third
level cache
2 MB Unified inclusive 8MB L3
What is 4?
25
The Answer
Multi Core Approach - Actually more
transistors are being used to pack more
cores into a chip, rather than increasing
clock speed.
Why?
1. Power Wall
2. Memory Wall
3. No more ILP.
26
Topics for further Study
Papers
Performance papers
Memory Wall.
Software
Intel Vtune or any other profiling tool
Little Mans Computer Simulator or any other
simulator apart from keil.

27
Amdahls Law
Execution time after improvement
= Execution time affected by improvement
Amount of improvement
+ Execution time unaffected by improvement

28
What this means?
Even if we substantially increase performance any
one component, it may not result in overall
substantial performance improvement.

A new architecture increases the speed of memory
instructions by 50%. If memory instructions
account for 50% of total time taken. What is the
overall increase in performance?
T
old
= 100, T
new
= 25 + 50 = 75. Imp = 25%
29
What is better?
a. 20% increase in perf. of instructions
executing 90% of time.
b. 90% increase in perf of instructions
executing 20% of time.

Você também pode gostar