Você está na página 1de 20

ECE 5367

4436

Introduction to Computer Architecture


and Design
Ji Chen
Section : T TH 1:00PM 2:30PM
Prerequisites: ECE 4436

ECE 5367
4436

Instructor:

Ji Chen
Email: jchen18@uh.edu
Tel: (713)-743-4423
Office: W328
Office Hour: T TH 2:30-3:30 or
by appointment

TA:

None

ECE 5367
4436

ECE 5367
4436
Course Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.

Introduction, basic computer organization


Instruction formats, instruction sets and their design
ALU design: Adders, subtracters, logic operations
Multiplication, division, floating point arithmetic
Datapath design
Control design: Hardwired control, microprogrammed control
Pipelining
Memory systems
I/O

ECE 5367
4436

Web: http://www.egr.uh.edu/courses/ece/ECE5367/
Grading

HW/Quiz/Lab

10 %

Project

15 %

Exam 1

25 %

Exam 2

25 %

Exam 3

25 %

Academic Honesty Statement

ECE 5367
4436
Computer Organization and Design: The Hardware/Software
Interface
by David A. Patterson, John L. Hennessy, 3rd edition

Required

NOT REQUIRED

ECE 5367
4436
Home works/quiz: There will be several graded homework/lab
assignments.Home works

Labs:

turned in late will be


accepted only under extraordinary circumstances.

Laboratory assignments may be worked in teams of two (2);


however, there should be no collaboration between teams ..
Lab assignments turned in late will be penalized 25 points for each calendar

day.
Both students in a team will receive the same grade for the project.

Projects:

Teams of four (4): describe computer architecture of a modern technology

Exams:
you have

two mid-term exams, and one final exam.


A missed exam will result in a grade of zero Let me know immediately if
any situation
Final Exam - TBD

Grading: Your final grade will be computed as follows:


HW/Quiz/Lab

10 %

Project

15 %

Exam 1

25 %

Exam 2

25 %

Exam 3

25 %

ECE 5367
4436
Since 1946 all computers have had 5 components

Processor
Input
Control
Memory
Datapath

Output

ECE 5367
4436
TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
MBus Module

SuperSPARC
Floating-point Unit

L2
$

Integer Unit

Inst
Cache

Ref
MMU

Data
Cache
Store
Buffer

Bus Interface
Message Bus (Mbus)

CC
MBus

L64852 MBus control


M-S Adapter

SBus
SBus
DMA

SBus
Cards

SCSI
Ethernet

DRAM
Controller
STDIO
serial
kbd
mouse
audio
RTC
Floppy

ECE 5367
4436
Computer Architecture
Application
Operating
System
Compiler

Firmware

Instr. Set Proc. I/O system

Instruction Set
Architecture

Datapath & Control


Digital Design
Circuit Design
Layout

Coordination of many levels of


abstraction
Under a rapidly changing set of forces
Design, Measurement, and Evaluation

ECE 5367
4436

Forces on Computer Architecture

Technology

Programming
Languages

Applications
Computer
Architecture

Operating
Systems

Cleverness

History

ECE 5367
4436
Mixed-Signal

Where are We Going??

ECE 5367
4436
In p u t
M u ltip lie r

In p u t
M u ltip lic a n d
32

M u lt ip lic a n d
R e g is t e r

<<1

32
34

34

32=>34
s ig n E x

34 x2 M U X

34

34

M u lt i x 2 / x 1

Arithmetic

S u b /A d d

3 4 -b it A L U

C o n tro l
L o g ic

34
32

LoadHI

L O r e g is te r
( 1 6 x 2 b it s )

32

R e s u lt [ H I ]

Prev

Booth
Encoder

H I r e g is te r
( 1 6 x 2 b it s )

LO[1]

S h ift A ll

LoadLO

ClearHI

32

Extra
2 bits

Single/multicycle
Datapaths

LoadM p

32=>34
s ig n E x

E N C [2 ]
E N C [1 ]
E N C [0 ]

L O [1 : 0 ]

32

R e s u lt [L O ]

1000

Exec Mem WB

IFetchDcd

Exec Mem WB

Performance

100

Processor-Memory
Performance Gap:
(grows 50% / year)

10
DRAM
9%/yr.
DRAM (2X/10
yrs)

19
19
80
81
19
19
82
19
83
84
19
85
19
86
19
19
87
88
19
19
89
90
19
91
19
92
19
19
93
94
19
95
19
96
19
19
97
98
19
99
20
00

IFetchDcd

ECE 5367
Spring 08

Moores Law

Proc
CPU 60%/yr.
(2X/1.5yr)

Time

IFetchDcd

Exec Mem WB

IFetchDcd

Exec Mem WB

Pipelining
I/O
Memory Systems

ECE 5367
4436

Purchasing perspective
Given a collection of machines, which has the
Best performance ?
Least cost ?
Best performance / cost ?
Design perspective
Faced with design options, which has the
Best performance improvement ?
Least cost ?
Best performance / cost ?
Both require
basis for comparison
metric for evaluation
Our goal: understand cost & performance implications of
architectural
choices

Two Notions of Performance

ECE 5367
4436

Plane

DC to Paris

Speed

Passengers

Throughput
(pmph)

Boeing 747

6.5 hours

610 mph

470

286,700

Concorde

3 hours

1350 mph

132

178,200

Which has higher performance?


Time to do the task (Execution Time)
execution time, response time, latency
Tasks per day, hour, week, sec, ns. .. (Performance)
throughput, bandwidth
Response time and throughput often are in opposition

ECE 5367
4436
Definitions
Performance is in units of things-per-second
bigger is better
If we are primarily concerned with response time
performance(x) =
1
execution_time(x)
" X is n times faster than Y" means

Performance(X)
---------------------Performance(Y)

Example

ECE 5367
4436

Time of Concorde vs. Boeing 747?


Concord is 1350 mph / 610 mph = 2.2 times faster
= 6.5 hours / 3 hours
Throughput of Concorde vs. Boeing 747 ?
Concord is 178,200 pmph / 286,700 pmph
Boeing is 286,700 pmph / 178,200 pmph

= 0.62 times faster


= 1.60 times faster

Boeing is 1.6 times (60%) faster in terms of throughput


Concord is 2.2 times (120%) faster in terms of flying time
We will focus primarily on execution time for a single job
Lots of instructions in a program => Instruction throughput important!

ECE 5367
4436
CPU
== Seconds
CPU
Seconds
Performance
Performance Program
Program

==Instructions
xx Seconds
Instructions xx Cycles
Cycles
Seconds
Program
Instruction
Cycle
Program
Instruction
Cycle

ECE 5367
4436

Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Performance w/ E
Speedup(E) = -------------------- = --------------------ExTime w/ E
Performance w/o E

Suppose that enhancement E accelerates a fraction F of the task


by a factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) x ExTime(without E)
Speedup(with E) =

1
(1-F) + F/S

ECE 5367
4436
Base Machine
Op
ALU
Load
Store
Branch

Freq
50%
20%
10%
20%
Typical Mix

Cycles
1
5
3
2

CPI(i)
.5
1.0
.3
.4
2.2

% Time
23%
45%
14%
18%

How much faster would the machine be if a better data cache


reduced the average load time to 2 cycles?
How does this compare with using branch prediction to save a
cycle off the branch time?
What if two ALU instructions could be executed at once?

Você também pode gostar