Você está na página 1de 89

Processor Technology

Processor technology
The architecture of the computation engine used to implement a
systems desired functionality
Processor does not have to be programmable

Processor not equal to general-purpose processor


Controller

Datapath

Controller

Datapath

Controller

Datapath

Control
logic and
State register

Control logic
and State
register

Registers

Control
logic

index

Register
file

IR

PC

General
ALU

IR

Custom
ALU

Data
memory

total = 0
for i =1 to
General-purpose (software)

2004 (jinsoo@cs.kaist.ac.kr)

PC
Data
memory

Program
memory
Assembly code
for:

State
register

total

Data
memory

Program memory

Assembly code
for:
total = 0
for i =1 to
Application-specific

Single-purpose (hardware)

Processor technology

Processors vary in their customization for the problem at hand

Desired
functionality

General-purpose
processor

total = 0
for i = 1 to N loop
total += M[i]
end loop

Application-specific
processor

2004 (jinsoo@cs.kaist.ac.kr)

Single-purpose
processor

General-purpose processors
Programmable device used in a variety of
applications

Also known as microprocessor

Features
Program memory
General datapath with large register file and
general ALU

User benefits
Low time-to-market and NRE costs
High flexibility

Pentium the most well-known, but there


are hundreds of others

Controller

Datapath

Control
logic and
State register

Register
file

IR

PC

Program
memory

General
ALU

Data
memory

Assembly code
for:
total = 0
for i =1 to

2004 (jinsoo@cs.kaist.ac.kr)

Single-purpose processors
Digital circuit designed to execute exactly one
program

a.k.a. coprocessor, accelerator or peripheral

Features
Contains only the components needed to
execute a single program
No program memory

Controller

Datapath

Control
logic

index

total
State
register

Benefits
Fast
Low power
Small size

2004 (jinsoo@cs.kaist.ac.kr)

Data
memory

Application-specific processors
Programmable processor optimized for a
particular class of applications having
common characteristics

Compromise between general-purpose and


single-purpose processors

Controller

Datapath

Control
logic and
State register

Registers

Features
Program memory
Optimized datapath
Special functional units

Benefits
Some flexibility, good performance, size and
power

Custom
ALU
IR

PC

Program
memory

Data
memory

Assembly code
for:
total = 0
for i =1 to

2004 (jinsoo@cs.kaist.ac.kr)

Processor Technology
General Purpose (software)
Application Specific
Single Purpose (Hardware)
IC technology
Full Custom/VLSI
Semi-custom ASIC (gate-array, standard cell)
PLD

2004 (jinsoo@cs.kaist.ac.kr)

Custom single-purpose
processors: Hardware

Introduction
Processor

Digital circuit that performs a computation


tasks
Controller and datapath
General-purpose: variety of computation
tasks
Single-purpose: one particular
computation task
Custom single-purpose: non-standard task

A custom single-purpose processor


may be

Fast, small, low power


But, high NRE, longer time-to-market, less
flexible

Digital camera chip


CCD
A2D

CCD
preprocessor

D2A

lens
JPEG codec

Microcontroller

Multiplier/Accum

DMA controller

Memory controller

2004 (jinsoo@cs.kaist.ac.kr)

Pixel coprocessor

Display
ctrl

ISA bus interface

UART

LCD ctrl

Custom single-purpose processor basic


model

external
control
inputs

external
data
inputs

controller

datapath
control
inputs

datapath
control
outputs

external
control
outputs

datapath

controller

datapath

next-state
and
control
logic

registers

state
register

functional
units

external
data
outputs

controller and datapath

2004 (jinsoo@cs.kaist.ac.kr)

a view inside the controller and datapath

10

Example: greatest common divisor


First create algorithm
Convert algorithm to

complex state machine


Known as FSMD: finitestate machine with
datapath
Can use templates to
perform such conversion

2004 (jinsoo@cs.kaist.ac.kr)

!1

(a) black-box
view

1:
1

!(!go_i)

(c) state
diagram

2:

go_i

x_i

y_i

!go_i
2-J:

GCD

3:

x = x_i

4:

y = y_i

d_o

(b) desired functionality


0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6:
if (x < y)
7:
y = y - x;
else
8:
x = x - y;
}
9: d_o = x;
}

!(x!=y)

5:
x!=y
6:
x<y
7:

y = y -x

!(x<y)

8: x = x - y

6-J:

5-J:
9:

d_o = x

1-J:

11

State diagram templates


Assignment statement

Loop statement

a=b
next statement

a=b

Branch statement

while (cond) {
loop-bodystatements
}
next statement

!cond

C:

if (c1)
c1 stmts
else if c2
c2 stmts
else
other stmts
next statement
C:
c1

cond
loop-bodystatements

next
statement

c2 stmts

!c1*!c2
others

J:

J:
next
statement

2004 (jinsoo@cs.kaist.ac.kr)

c1 stmts

!c1*c2

next
statement

12

Creating the datapath


Create a register for any

!1
1:

declared variable
Create a functional unit for
each arithmetic operation
Connect the ports, registers
and functional units

!(!go_i)

2:

x_i

!go_i

y_i

Datapath

2-J:
x_sel
3:

x = x_i

4:

y = y_i

x_ld

Based on reads and writes


Use multiplexors for
multiple sources

6:
x<y

for each datapath


component control input and
output

y = y -x

0: x

0: y

!(x!=y)
x!=y

7:

n-bit 2x1

y_ld

5:

Create unique identifier

n-bit 2x1

y_sel

!(x<y)

8: x = x - y

!=
5: x!=y
x_neq_y
x_lt_y

<
6: x<y

subtractor
8: x-y

subtractor
7: y-x

9: d

d_ld
d_o

6-J:

5-J:
9:

d_o = x

1-J:

2004 (jinsoo@cs.kaist.ac.kr)

13

Creating the controllers FSM


go_i

!1

1:

Controller
1

!(!go_i)

0000

1:

0001

2:

!1
1

2:
!go_i

!(!go_i)

!go_i

2-J:

0010 2-J:

3:

x = x_i

4:

y = y_i

0011

x_sel = 0
3: x_ld = 1

0100

y_sel = 0
4: y_ld = 1

!(x!=y)

5:

0101
0110
x<y
7:

y = y -x

!(x<y)

8: x = x - y

5:

Datapath
x_sel

x_neq_y

6:

!x_lt_y
x_sel
=1
8:
x_ld = 1

9:
1-J:

d_o = x

!=
5: x!=y
x_neq_y
x_lt_y

1011

9:

d_ld = 1

1100 1-J:

2004 (jinsoo@cs.kaist.ac.kr)

n-bit 2x1

0: x

0: y

y_ld

1000

1010 5-J:

n-bit 2x1

y_sel
x_ld

x_lt_y
7: y_sel = 1
y_ld = 1
1001 6-J:

5-J:

y_i

!x_neq_y

0111

6-J:

actions/conditions with
datapath configurations
x_i

x!=y
6:

Same structure as FSMD


Replace complex

<
6: x<y

subtractor
8: x-y

subtractor
7: y-x

9: d

d_ld
d_o

14

Splitting into a controller and datapath


go_i

Controller implementation model

Controller
0000

go_i

!1
x_i

1:
1

x_sel

Combinational
logic

y_sel

0001

x_neq_y

!(!go_i)
x_sel

!go_i
0010 2-J:
0011

x_lt_y
d_ld
0100

x_ld

x_sel = 0
3: x_ld = 1

5:

0110

6:

!=
x_neq_y=0

x_neq_y=1

State register
I2

I1

I0

n-bit 2x1

0: x

0: y

y_ld

y_sel = 0
4: y_ld = 1

0101

n-bit 2x1

y_sel

Q3 Q2 Q1 Q0

I3

(b) Datapath

2:

x_ld
y_ld

y_i

x_lt_y=1
7: y_sel = 1
y_ld = 1

x_lt_y=0
x_sel
=1
8:
x_ld = 1

0111

5: x!=y
x_neq_y
x_lt_y

<
6: x<y

subtractor
8: x-y

subtractor
7: y-x

9: d

d_ld
d_o

1000

1001 6-J:
1010 5-J:
1011

9:

d_ld = 1

1100 1-J:

2004 (jinsoo@cs.kaist.ac.kr)

15

Controller state table for the GCD


example
Inputs
Q3

Q2

Q1

Q0

Outputs
x_lt_
y
*

go_i

I3

I2

I1

I0

x_sel

y_sel

x_ld

y_ld

d_ld

x_neq
_y
*

2004 (jinsoo@cs.kaist.ac.kr)

16

Completing the GCD custom singlepurpose processor design


We finished the datapath
We have a state table for

the next state and control


logic
All thats left is
combinational logic design

This is not an optimized

design, but we see the basic


steps

controller

datapath

next-state
and
control
logic

registers

state
register

functional
units

a view inside the controller and datapath

2004 (jinsoo@cs.kaist.ac.kr)

17

Summary
Custom single-purpose processors

Straightforward design techniques


Can be built to execute algorithms
Typically start with FSMD
CAD tools can be of great assistance

2004 (jinsoo@cs.kaist.ac.kr)

18

General-Purpose
Processors: Software

Introduction
General-Purpose Processor
Processor designed for a variety of computation tasks
Low unit cost, in part because manufacturer spreads NRE over
large numbers of units
Motorola sold half a billion 68HC05 microcontrollers in 1996

alone

ARM processors : 1.5 billion processors


Carefully designed since higher NRE is acceptable
Can yield good performance, size and power
Low NRE cost, short time-to-market/prototype, high flexibility
User just writes software; no processor design
a.k.a. microprocessor micro used when they were
implemented on one or a few chips rather than entire rooms

2004 (jinsoo@cs.kaist.ac.kr)

20

Why use microprocessors?


Alternatives: field-programmable gate arrays (FPGAs),

custom logic, etc. (Custom Single-purpose Processor or HW


Logic)
Microprocessors are often very efficient: can use same logic to
perform many different functions.
Microprocessors simplify the design of families of products.

2004 (jinsoo@cs.kaist.ac.kr)

21

The performance paradox


Microprocessors use much more logic to implement a function
than does custom logic.
But microprocessors are often at least as fast:
heavily pipelined;
large design teams;
aggressive VLSI technology.

2004 (jinsoo@cs.kaist.ac.kr)

22

Power
Custom logic is a clear winner for low power devices.
Modern microprocessors offer features to help control power
consumption.
Software design techniques can help reduce power
consumption.

2004 (jinsoo@cs.kaist.ac.kr)

23

Basic Architecture

Basic Architecture
Control unit and

Processor

datapath

Control unit

Note similarity to
single-purpose
processor

Datapath
ALU

Controller

Control
/Status

Key differences
Datapath is general
Control unit doesnt
store the algorithm
the algorithm is
programmed into the
memory

Registers

PC

IR

I/O
Memory

2004 (jinsoo@cs.kaist.ac.kr)

25

Superscalar and VLIW Architectures


Performance can be improved by:
Faster clock (but theres a limit)
Pipelining: slice up instruction into stages, overlap stages
Multiple ALUs to support more than one instruction stream
Superscalar
Scalar: non-vector operations
Fetches instructions in batches, executes as many as
possible
May require extensive hardware to detect
independent instructions
VLIW: each word in memory has multiple independent
instructions
Currently growing in popularity
Relies on the compiler to detect and schedule instructions

2004 (jinsoo@cs.kaist.ac.kr)

26

Pipelining: Increasing Instruction

Throughput
Wash

Non-pipelined
Dry

Decode

Time

Instruction 1

pipelined instruction execution

2004 (jinsoo@cs.kaist.ac.kr)

pipelined dish cleaning

Execute
Store res.

Fetch ops.

Pipelined

non-pipelined dish cleaning

Fetch-instr.

Time

Pipelined

Time

27

Two Memory Architectures

Processor

Princeton

Processor

Fewer memory
wires

Harvard
Simultaneous
program and data
memory access

Program
memory

Data memory

Harvard

2004 (jinsoo@cs.kaist.ac.kr)

Memory
(program and data)

Princeton

28

Princeton vs. Harvard


Harvard cant use self-modifying code.
Harvard allows two simultaneous memory fetches.
Most DSPs use Harvard architecture for streaming data:
greater memory bandwidth;
more predictable bandwidth.

2004 (jinsoo@cs.kaist.ac.kr)

29

Cache Memory
Memory access may be slow
Cache is small but fast memory
close to processor

Holds copy of part of memory


Hits and misses

Fast/expensive technology, usually on


the same chip
Processor

Cache

Memory

Slower/cheaper technology, usually on


a different chip

2004 (jinsoo@cs.kaist.ac.kr)

30

Application-Specific
Instruction-Set Processors
(ASIPs)

Application-Specific Instruction-Set
Processors (ASIPs)
General-purpose processors
Sometimes too general to be effective in demanding application
e.g., video processing requires huge video buffers and
operations on large arrays of data, inefficient on a GPP
But single-purpose processor has high NRE, not programmable

ASIPs targeted to a particular domain


Contain architectural features specific to that domain
e.g., embedded control, digital signal processing, video
processing, network processing, telecommunications, etc.
Still programmable

2004 (jinsoo@cs.kaist.ac.kr)

32

Microprocessor varieties
Microcontroller: includes I/O devices, on-board memory.
Digital signal processor (DSP): microprocessor optimized for
digital signal processing.
Typical embedded word sizes: 8-bit, 16-bit, 32-bit.

2004 (jinsoo@cs.kaist.ac.kr)

33

Embedded Processors

Netsilicon NET+ARM Embedded


Processor

2004 (jinsoo@cs.kaist.ac.kr)

34

Many Types of Programmable


Processors

Past
Microprocessor
Microcontroller
DSP
Graphics

Processor

2004 (jinsoo@cs.kaist.ac.kr)

Now / Future
Network

Processor
Sensor Processor
Cryptoprocessor
Game Processor
Wearable Processor
Mobile Processor

35

A Common ASIP: Microcontroller


For embedded control applications
Reading sensors, setting actuators
Mostly dealing with events (bits): data is present, but not in huge
amounts
e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven

Microcontroller features
On-chip peripherals
Timers, analog-digital converters, serial communication, etc.
Tightly integrated for programmer, typically part of register space
On-chip program and data memory
Direct programmer access to many of the chips pins
Specialized instructions for bit-manipulation and other low-level
operations

2004 (jinsoo@cs.kaist.ac.kr)

36

Another Common ASIP: Digital Signal


Processors (DSP)
For signal processing applications
Large amounts of digitized data, often streaming
Data transformations must be applied fast
e.g., cell-phone voice filter, digital TV, music synthesizer

DSP features
Several instruction execution units
Multiple-accumulate single-cycle instruction, other instrs.
Efficient vector operations e.g., add two arrays
Vector ALUs, loop buffers, etc.

2004 (jinsoo@cs.kaist.ac.kr)

37

Trend: Even More Customized ASIPs


In the past, microprocessors were acquired as chips
Today, we increasingly acquire a processor as Intellectual Property
(IP)

e.g., synthesizable VHDL model

Opportunity to add a custom datapath hardware and a few custom


instructions, or delete a few instructions

Can have significant performance, power and size impacts


Problem: need compiler/debugger for customized ASIP
Remember, most development uses structured languages
One solution: automatic compiler/debugger generation
e.g., www.tensillica.com
Another solution: retargettable compilers
e.g., www.improvsys.com (customized VLIW architectures)

2004 (jinsoo@cs.kaist.ac.kr)

38

Reconfigurable SoC

Other Examples
Atmels FPSLIC
(AVR + FPGA)

Alteras Nios
(configurable
RISC on a PLD)

Triscends A7 CSoC
2004 (jinsoo@cs.kaist.ac.kr)

39

Selecting a Microprocessor
Issues
Technical: speed, power, size, cost
Other: development environment, prior expertise, licensing, etc.

Speed: how evaluate a processors speed?


Clock speed but instructions per cycle may differ
Instructions per second but work per instr. may differ
Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec.
MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digitals VAX
11/780). A.k.a. Dhrystone MIPS. Commonly used today.
So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
SPEC: set of more realistic benchmarks, but oriented to desktops
EEMBC EDN Embedded Benchmark Consortium, www.eembc.org
Suites of benchmarks: automotive, consumer electronics,
networking, office automation, telecommunications

2004 (jinsoo@cs.kaist.ac.kr)

40

Processors
Processor

Clock speed

Intel PIII

1GHz

IBM
PowerPC
750X
MIPS
R5000
StrongARM
SA-110

550 MHz

Intel
8051
Motorola
68HC811

12 MHz

250 MHz
233 MHz

3 MHz

TI C5416

160 MHz

Lucent
DSP32C

80 MHz

Periph.
2x16 K
L1, 256K
L2, MMX
2x32 K
L1, 256K
L2
2x32 K
2 way set assoc.
None

4K ROM, 128 RAM,


32 I/O, Timer, UART
4K ROM, 192 RAM,
32 I/O, Timer, WDT,
SPI
128K, SRAM, 3 T1
Ports, DMA, 13
ADC, 9 DAC
16K Inst., 2K Data,
Serial Ports, DMA

Bus Width
MIPS
General Purpose Processors
32
~900

Power

Trans.

Price

97W

~7M

$900

32/64

~1300

5W

~7M

$900

32/64

NA

NA

3.6M

NA

32

268

1W

2.1M

NA

Microcontroller
~1

~0.2W

~10K

$7

~.5

~0.1W

~10K

$5

Digital Signal Processors


16/32
~600

NA

NA

$34

32

NA

NA

$75

40

Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998

2004 (jinsoo@cs.kaist.ac.kr)

41

Summary
General-purpose processors
Good performance, low NRE, flexible

Controller, datapath, and memory


Structured languages prevail
But some assembly level programming still necessary

Many tools available


Including instruction-set simulators, and in-circuit emulators

ASIPs
Microcontrollers, DSPs, network processors, more customized ASIPs

Choosing among processors is an important step


Designing a general-purpose processor is conceptually the same as
designing a single-purpose processor

2004 (jinsoo@cs.kaist.ac.kr)

42

Instruction Sets

RISC vs. CISC


Complex instruction set computer (CISC):// tp lnh phc tp
many addressing modes;
many operations nhiu php ton

Reduced instruction set computer (RISC):// tp lnh rt gn


load/store;
pipelinable instructions.

2004 (jinsoo@cs.kaist.ac.kr)

44

CISC
Intel

1971

4004

2,250

, Busicom

1972

8008

2,500

Mark-8 ,

1974

8080

5,000

Altair

1978

8086/8088

1982

29,000

IBM-PC XT ,

80286

120,000

IBM-PC AT , 6 5

1985

80386

275,000

32

1989

80486

1,180,000

1993

Pentium

3,100,000

1995

Pentium Pro

5,500,000

Dynamic Execution

1997

Pentium 2

7,500,000

MMX

1999

Pentium 3

24,000,000

SIMD , 12

2001

Itanium

25,000,000

64, Explicitly Parallel Instruction Computing(EPIC)

2002

Pentium 4

55,000,000

20 ,

2003

Itanium 2

410,000,000

2004 (jinsoo@cs.kaist.ac.kr)

Machine Check Architecture, EPIC, 6MB L3

45

CISC - History : Packaging

2004 (jinsoo@cs.kaist.ac.kr)

46

CISC - History

2004 (jinsoo@cs.kaist.ac.kr)

47

Instruction set characteristics

Fixed vs. variable length.


Addressing modes.
Number of operands.
Types of operands.

2004 (jinsoo@cs.kaist.ac.kr)

48

ARM data processing Instruction Format


(RISC)
Data processing immediate shift
31

28

cond

25

000

21

19

opcode S

16

Rn

12

Rd

shift amount shift

Rm

Data processing register shift


31

28

cond

25

000

21

19

opcode S

16

Rn

12

Rd

Rs

0 shift

Rm

Data processing 32-bit immediate


31

28

cond

25

001

21

opcode S

19

16

Rn

2004 (jinsoo@cs.kaist.ac.kr)

12

Rd

rotate

immediate-8
49

Intel IA-32 Instruction Format (CISC)

2004 (jinsoo@cs.kaist.ac.kr)

50

Programming model
Programming model: registers visible to the programmer.
Some registers are not visible (IR).

2004 (jinsoo@cs.kaist.ac.kr)

51

Multiple implementations
Successful architectures have several implementations:

varying clock speeds;


different bus widths;
different cache sizes;
etc.

2004 (jinsoo@cs.kaist.ac.kr)

52

IC Technology

IC technology
The manner in which a digital (gate-level) implementation is
mapped onto an IC

IC: Integrated circuit, or chip


IC technologies differ in their customization to a design
ICs consist of numerous layers (perhaps 10 or more)
IC technologies differ with respect to who builds each layer
and when

IC package

IC
source

gate
oxide
channel

drain
Silicon substrate

2004 (jinsoo@cs.kaist.ac.kr)

54

IC technology
Three types of IC technologies
Full-custom/VLSI
Semi-custom ASIC (gate array and standard cell)
PLD (Programmable Logic Device)

2004 (jinsoo@cs.kaist.ac.kr)

55

Outline
Anatomy of integrated circuits

Full-Custom (VLSI) IC Technology


Semi-Custom (ASIC) IC Technology

Programmable Logic Device (PLD) IC Technology

2004 (jinsoo@cs.kaist.ac.kr)

56

CMOS transistor
Source, Drain

Diffusion area where electrons can flow


Can be connected to metal contacts (vias)

Gate

Polysilicon area where control voltage is applied

Oxide

Si O2 Insulator so the gate voltage cant leak

2004 (jinsoo@cs.kaist.ac.kr)

57

End of the Moores Law?


Every dimension of the MOSFET has to scale
(PMOS) Gate oxide has to scale down to
Increase gate capacitance
Reduce leakage current from S to D
Current gate oxide thickness is about 2.5-3nm

Thats about 25 atoms!!!

IC package

IC

source

gate
oxide
channel

drain
Silicon substrate

2004 (jinsoo@cs.kaist.ac.kr)

58

NMOS Inverter

NMOS Transistor(NMOS FET)

SiO2 ( 0.6 micron)


P type silicon
gate oxide( 0.05 micron)

polysilicon(Low Pressure Chemical


Vapor Deposition )

AS
n+

Source, drain

n+

2004 (jinsoo@cs.kaist.ac.kr)

59

NMOS Inverter


n+

n+
contact
aluminum ,

n+

n+
Length unit --- l
(micron)

2l

2004 (jinsoo@cs.kaist.ac.kr)

60

NAND

Metal layers for routing (~10)


PMOS dont like 0
NMOS dont like 1
A stick diagram form the basis for mask sets

2004 (jinsoo@cs.kaist.ac.kr)

61

Silicon manufacturing steps

Tape out

Spin

Send design to manufacturing

One time through the manufacturing process

Photolithography

Drawing patterns by using photo-resist to form barriers for deposition

2004 (jinsoo@cs.kaist.ac.kr)

62

Full Custom
Very Large Scale Integration (VLSI)
Placement
Place and orient transistors

Routing
Connect transistors

Sizing
Make fat, fast wires or thin, slow wires
May also need to size buffer

Design Rules

simple rules for correct circuit function


Metal/metal spacing, min poly width

2004 (jinsoo@cs.kaist.ac.kr)

63

Full Custom
Best size, power, performance
Hand design
Horrible time-to-market/flexibility/NRE cost
Reserve for the most important units in a processor
ALU, Instruction fetch

Physical design tools


Less optimal, but faster

2004 (jinsoo@cs.kaist.ac.kr)

64

Semi-Custom
Gate Array
Array of prefabricated gates place and route
Higher density, faster time-to-market
Does not integrate as well with full-custom

Standard Cell

A library of pre-designed cell


Place and route
Lower density, higher complexity
Integrate great with full-custom

2004 (jinsoo@cs.kaist.ac.kr)

65

Semi-Custom
Most popular design style

Jack of all trade


Good
Power, time-to-market,
performance, NRE cost, per-unit
cost, area

Master of none
Integrate with full custom for critical
regions of design

2004 (jinsoo@cs.kaist.ac.kr)

66

Programmable Logic Device

Programmable Logic Device

All layers already exist

Designers can purchase an IC


To implement desired functionality
Connections on the IC are either created or destroyed to implement

Benefits

Programmable Logic Array, Programmable Array Logic, Field Programmable Gate


Array

Very low NRE costs


Great time to market

Drawback

High unit cost, bad for large volume


Power
Except special PLA
slower

1600 usable gate, 7.5 ns


$7 list price

2004 (jinsoo@cs.kaist.ac.kr)

67

2004 (jinsoo@cs.kaist.ac.kr)

68

Xilinx FPGA

2004 (jinsoo@cs.kaist.ac.kr)

69

Configurable Logic Block (CLB)

2004 (jinsoo@cs.kaist.ac.kr)

70

I/O Block

2004 (jinsoo@cs.kaist.ac.kr)

71

IC technology
The manner in which a digital (gate-level) implementation is
mapped onto an IC

IC: Integrated circuit, or chip


IC technologies differ in their customization to a design
ICs consist of numerous layers (perhaps 10 or more)
IC technologies differ with respect to who builds each layer
and when

IC package

IC
source

gate
oxide
channel

drain
Silicon substrate

2004 (jinsoo@cs.kaist.ac.kr)

72

Full-custom/VLSI
All layers are optimized for an embedded systems particular
digital implementation
Placing transistors
Sizing transistors
Routing wires

Benefits
Excellent performance, small size, low power

Drawbacks
High NRE cost (e.g., $300k), long time-to-market

2004 (jinsoo@cs.kaist.ac.kr)

73

Semi-custom
Lower layers are fully or partially built
Designers are left with routing of wires and maybe placing some
blocks

Benefits
Good performance, good size, less NRE cost than a full-custom
implementation (perhaps $10k to $100k)

Drawbacks
Still require weeks to months to develop

2004 (jinsoo@cs.kaist.ac.kr)

74

PLD (Programmable Logic Device)


All layers already exist
Designers can purchase an IC
Connections on the IC are either created or destroyed to
implement desired functionality
Field-Programmable Gate Array (FPGA) very popular

Benefits
Low NRE costs, almost instant IC availability

Drawbacks
Bigger, expensive (perhaps $30 per unit), power hungry, slower

2004 (jinsoo@cs.kaist.ac.kr)

75

Structured ASIC
From the Paper,
Paradigm shift in ASIC technology
In Standard Metal
Out Standard Cell,
Zvi Or-Bach, eASIC founder and CEO

Structured ASIC
About 20 years ago
Full custom design Standard Cell
Design cost of Full custom : $10 million

Today
Standard Cell : exceeds $10 million

Paradigm Shift in ASIC technology?


In Standard Metal : Structured ASIC
Out Standard Cell

2004 (jinsoo@cs.kaist.ac.kr)

77

ASIC development costs

2004 (jinsoo@cs.kaist.ac.kr)

78

Definition
Structured ASIC
Key to reducing design cost and complexity
Reducing number of custom mask and via layers
Typically, two or three (sometimes 5) user-modifiable metal
layers
Multiple input lookup tables, F/Fs, and MUXs

2004 (jinsoo@cs.kaist.ac.kr)

79

Interconnection Taking Over delay


domination

2004 (jinsoo@cs.kaist.ac.kr)

80

Interconnection
At 100 nm
Interconnect switching energy = TR switching energy x 3
At 35 nm, 30 times greater

Crosstalk

2004 (jinsoo@cs.kaist.ac.kr)

81

Improvement (MHz) vs. Previous Geometry

2004 (jinsoo@cs.kaist.ac.kr)

82

Paradigm Shift
Transistor sizing (Full custom) gate sizing (Standard Cell)
Move to an even coarser building block

Immediate benefit of coarse grain cells

2004 (jinsoo@cs.kaist.ac.kr)

83

Placement and Routing

Basic Cell (14 m)

2004 (jinsoo@cs.kaist.ac.kr)

84

Feature-limited Taking over the Yield Domination

2004 (jinsoo@cs.kaist.ac.kr)

85

Feature-limited Taking over the Yield Domination

RET: Reticle Enhancement Technique

Natural solution for this yield: use repetitive patterns, just as in SRAM

2004 (jinsoo@cs.kaist.ac.kr)

86

Mask Set Cost Becomes Prohibitive

2004 (jinsoo@cs.kaist.ac.kr)

87

Implication of the design cost increase

2004 (jinsoo@cs.kaist.ac.kr)

88

Independence of processor and IC


technologies
Basic tradeoff
General vs. custom
With respect to processor technology or IC technology
The two technologies are independent

General,
providing improved:

Generalpurpose
processor

ASIP

Singlepurpose
processor

Flexibility
Maintainability
NRE cost
Time- to-prototype
Time-to-market
Cost (low volume)

Customized,
providing improved:
Power efficiency
Performance
Size
Cost (high volume)

PLD

2004 (jinsoo@cs.kaist.ac.kr)

Semi-custom

Full-custom

89

Você também pode gostar