Embedded System Lecture Notes by Prof. Dr. Surendra Shrestha Sir

Dr.
Surendra Shrestha
surendra@ioe.edu.np, surendtha@gmail.com
Education:
• Post Doc. (Graphene Tech.), University Polytechnica de Madrid , Spain
•PhD (Major: Nanoscience), Sun Moon University, S. Korea
• M.Sc. Engg., Tashkent Electro-Technical Institute of Communication, Uzbekistan
PROFESSIONAL EXPERIENCE:
-Associate Professor, Department of Electronics and Computer Engineering,

Pulchowk Campus, Institute of Engineering, Pulchowk, Lalitpur, Nepal
-Program Coordinator, M.Sc. In ICE, Department of Electronics and Computer

Engineering, Pulchowk Campus, Institute of Engineering, Pulchowk,
Lalitpur, Nepal
Embedded
Systems
Reference Books:
• David E. Simon, “An Embedded Software
Primer”, Addison-Wesley, 2005
• Muhammad Ali Mazidi, “8051
Microcontroller and Embedded Systems”,
Prentice Hall, 2006
• Frank Vahid, Tony Givargis, “Embedded System
Design”, John Wiley & Sons, 2008
• Douglas L. Perry, “VHDL Programming by
example”, McGraw Hill, 2002
• Shibu K V, “Introduction to EMBEDDED
SYSTEMS”, McGrawHill, 2009
Unit Hour Numbers of Mark
question Distribution
1 3 4
2 4 8
3 6 8
4 5 8
5 6 8
6 8 12
7 3 8
8 3 8
9 3 8
10 4 8
Total 45 10 80
1. Introduction to Embedded System [3 Hrs]
1.1 Embedded Systems overview

1.2 Classification of Embedded Systems
1.3 Hardware and Software in a system
1.4 Purpose and Application of Embedded
Systems
1.1 Embedded Systems overview
An Embedded System
is an electronic/electromechanical system designed to perform a

specific function and is a combination of both hardware and
firmware (software).
is a system built to perform its duty, completely or partially

independent of human intervention.
is specially designed to perform a few tasks in the most efficient

way.
Interacts with physical elements in our environment, controlling and

driving a motor, sensing temperature, …
Embedded systems overview
• Computing systems are everywhere

• Most of us think of “desktop” computers
– PC’s
– Laptops
– Mainframes
– Servers
• But there’s another type of computing
system
– Far more common...
Embedded systems overview
• Embedded computing
systems Computers are in here...
– Computing systems and here...
embedded within and even here...
electronic devices
– Hard to define. Nearly any
computing system other
than a desktop computer
Lots more of these,
– Billions of units produced though they cost a lot
less each.
yearly, versus millions of
desktop units
General Purpose Computing System Embedded System
• A system which is a • A system which is a combination
combination of a generic of special purpose hardware and
hardware and a General embedded OS for executing a
Purpose Operating System for specific set of application
executing a variety of
application
• May or may not contain an
• Contains a General Purpose operating system for functioning
Operating System (GPOS)
• The firmware of the embedded
• Applications are alterable system is pre- programmed and
(programmable) by the user (It it is non- alterable by the end-
is possible for the end user to user (there may be exceptions
re-install the operating system, for system supporting OS kernel
and also add or remove user image flashing through special
application) hardware settings)
General Purpose Computing
System … Embedded System …
• Performance is the key • Application-specific requirement
deciding factor in the (like performance, power
selection of the system. requirements, memory usage, etc.)
Always, ‘Faster is Better’ are the key deciding factors
• Less/not at all tailored • Highly tailored to take advantage of
the power saving modes supported
towards reduced operating
by the hardware and the operating
power requirements, options system
for different levels of power
• For certain category of ESs like
management. mission critical systems, the
• Response requirements are response time requirement is
not time-critical highly critical
• Need not be deterministic in • Execution behavior is deterministic
execution behavior for certain types of ESs like ‘Hard
Real Time’ systems
A “short list” of embedded systems
•Anti-lock brakes •Modems
•Auto-focus cameras •MPEG decoders
•Automatic teller machines •Network cards
•Automatic toll systems •Network switches/routers
•Automatic transmission •Pagers
•Avionic systems •Photocopiers
•Battery chargers •Point-of-sale systems
•Camcorders •Portable video games
•Cell phones •Printers
•Cell-phone base stations •Satellite phones
•Cordless phones •Scanners
•Cruise control •Smart ovens/dishwashers
•Digital cameras •Speech recognizers
•Disk drives •Stereo systems
•Electronic card readers •Teleconferencing systems
•Electronic instruments •Televisions
•Electronic toys/games •Temperature controllers
•Factory control •Theft tracking systems
•Fax machines •TV set-top boxes
•Fingerprint identifiers •VCR’s, DVD players
•Home security systems •Video game consoles
•Life-support systems •Video phones
•Medical testing systems •Washers and dryers
And the list goes on and on … … …
Some common characteristics of ESs
• Single-functioned
– Executes a single program, repeatedly
• Tightly-constrained
– Low cost, low power, small, fast, etc.
• Reactive and real-time
– Continually reacts to changes in the system’s
environment
– Must compute certain results in real-time without
delay
An embedded system example –
a digital camera
Digital camera chip
CCD
CCD preprocessor Pixel coprocessor D2A

A2D
lens
JPEG codec Microcontroller Multiplier/Accum
DMA controller Display ctrl
Memory controller ISA bus interface UART LCD ctrl
• Single-functioned -- always a digital camera

• Tightly-constrained -- Low cost, low power, small, fast
• Reactive and real-time -- only to a small extent
1.2 Classification of Embedded Systems
Based on different criteria:
1. Based on generation
2. Complexity and performance
requirements
3. Based on deterministic behaviour
4. Based on triggering
Classification based on Generation:
• First Generation: ES were built around 8 bit
microprocessors like 8085, and Z80, and 4 bit
microcontrollers. Simple in hardware circuits with
firmware developed in Assembly code. e.g. telephone
keypads, stepper motor control unit.
• Second Generation: ES are built around 16 bit
microprocessors and 8 or 16 bit microcontrollers,
following the first generation ESs. The instruction set for
the second generation processors/controllers were much
more complex and powerful then 1st generation. Some of
2nd G ESs contained embedded operating systems for their
operation. Data Acquisition System, SCADA (Supervisory
Control And Data Acquisition) system.
Classification based on Generation: …
• Third Generation: With advances in processor tech.,
ES developers started making use of powerful 32 bit
processor 16 bit microcontrollers for their design. e.g.
DSPs, Application Specific Integrated Circuits (ASICs),
processors like Intel, Pentium, Motorola 68K.
• Fourth Generation: The advent of System on Chip

(SoC), reconfigurable processors and multicore processors
are bringing high performance, tight integration and
miniaturization into the embedded device market. The SoC
technique implements a total system on a chip by
integrating different functionalities with a processor core
on an IC. Smart phone devices, mobile internet device.
Classification based on Complexity & Performance:
• Small-Scale ESs: ESs which are simple in

application needs where the performance requirements are
not time critical fall under this category. e.g. electronic toy,
built around low performance and low cost 8 or 16 bit
microprocessors/microcontrollers.
• Medium-Scale ESs: ESs are slightly complex in

hardware and firmware requirements fall under this
category. e.g. low cost 16 or 32 bit microprocessors /
microcontrollers or DSP. They usually contain an
embedded operating system (either general purpose or real
time operating system) for functioning.
Classification based on Complexity & Performance: …
• Large-Scale ESs/Complex Systems: ESs

which are highly complex hardware and firmware
requirements fall under this category. They are employed
in mission critical applications demanding high
performance. Such systems are commonly built around
high performance 32 or 64 bit RISC processor/controllers
or Reconfigurable System on Chip (RSoC) or multi-core
processor and programmable logic devices. e.g. multiple
processor/controllers and co-units/hardware accelerators
for offloading the processing requirements from the main
processor of the system, Decoding/encoding of media,
cryptographic function implementation.
RTOS for task scheduling, prioritization and management.
1.4 Major Application of Embedded Systems
1.Consumer electronics: camcorders, cameras
2.Household Appliances: TV, DVD players, washing machine
3.Home automation and security systems: Aircon, CCTV, fire alarms
4.Automatic industry: engine control, ignition system, navigation
5.Telecom: Cell Phone, tel. switches, hand set multimedia appl.
6.Computer peripherals: Printer, scanners, fax machines
7.Computer Networking System: Network routers, switches, hubs
8.Healthcare: diff. kind of scanners, EEG, ECG machine
9.Measurement & Instrumentation: digital multi-meter, CROs
10.Banking & Retail: ATM, currency counter, point of sales (POS)
11.Card Reader: Barcode, smart card reader, hand held devices
1.4 Purpose of Embedded Systems
1. Data Collection/ Storage/

Representation
2. Data Communication
3. Data (signal) processing
4. Monitoring
5. Control
6. Application specific user interface
Design challenge – optimizing design metrics
• Obvious design goal:
– Construct an implementation with desired
functionality
• Key design challenge:
– Simultaneously optimize numerous design
metrics
• Design metric
– A measurable feature of a system’s
implementation
– Optimizing design metrics is a key challenge
• Common metrics
– Unit cost: the monetary cost of manufacturing each copy of the
system, excluding NRE cost
– NRE cost (Non-Recurring

Recurring Engineering cost):
The one-time monetary cost of designing the system
– Size: the physical space required by the system

– Performance: the execution time or throughput of the
system
– Power: the amount of power consumed by the system

– Flexibility: the ability to change the functionality of the system
without incurring heavy NRE cost
• Common metrics (continued)

–Time-to-prototype: the time needed to
build a working version of the system
–Time-to-market: the time required to
develop a system to the point that it can be
released and sold to customers
–Maintainability: the ability to modify the
system after its initial release
–Correctness, safety, many more
Design metric competition --
improving one may worsen others
Power • Expertise with both
software and hardware is
Performance Size
needed to optimize design
metrics
– Not just a hardware or
NRE cost software expert, as is common
– A designer must be
CCD
Digital camera chip comfortable with various
A2D
CCD preprocessor Pixel coprocessor D2A
technologies in order to
lens
choose the best for a given
DMA controller Display ctrl
application and constraints
Memory controller ISA bus interface UART LCD ctrl

Hardware
Software
UART: Universal Asynchronous Receiver Transmitter
Embedded Systems
Chapter -2
Hardware Design Issues

2. Hardware Design Issues [4 Hrs]
2.1 Combination Logic

2.2 Sequential Logic
2.3 Custom Single-Purpose
Single
Processor Design
2.4 Optimizing Custom Single-
Single
Purpose Processors
Introduction
• Processor
– Digital circuit that performs a
computation tasks
– Controller and datapath Digital camera chip
– General-purpose: variety of CCD
computation tasks A2D

CCD
preprocessor
Pixel coprocessor D2A
– Single-purpose: one particular lens

computation task
– Custom single-purpose: non-
standard task DMA controller Display
• A custom single-purpose
ctrl
processor may be Memory controller ISA bus interface UART LCD ctrl
– Fast, small, low power

– But, high NRE, longer time-to-
market, less flexible
CMOS transistor on silicon
• Transistor
– The basic electrical component in digital systems
– Acts as an on/off switch
– Voltage at “gate” controls whether current flows from
source to drain
– this “gate” with a logic gate
source
gate Conducts
1 if gate=1
drain
gate
IC package IC oxide
source channel drain
Silicon substrate
5
6
7
8
CMOS transistor implementations
• Complementary Metal source source
Oxide Semiconductor gate Conducts

if gate=1
gate Conducts
if gate=0
• refer to logic levels drain

drain
pMOS
– Typically 0 is 0V, 1 is 5V nMOS
• Two basic CMOS types

– nMOS conducts if gate=1
– pMOS conducts if gate=0 1 1 1
x y x
– Hence “complementary” x F = x'
F = (xy)' y
x F = (x+y)'
• Basic gates 0 y x y
– Inverter, NAND, NOR 0 0

inverter NAND gate NOR gate
Basic logic gates
x x x x
F x F F
x y F F x y F F x y F
y
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
F=x F=xy 1 0 0 F=x+y 1 0 1 F=xy 1 0 1
1 1 1 1 1 1 1 1 0
Driver AND OR XOR
x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x y)’ 1 0 1 F = (x+y)’ 1 0 0 F=x y 1 0 0
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1
Combinational logic design
A) Problem description B) Truth table C) Output equations
Inputs Outputs
a b c y z y = a'bc + ab'c' + ab'c +
y is 1 if a is to 1, or b and c are
0 0 0 0 0 abc' + abc
1. z is 1 if b or c is to 1, but not 0 0 1 0 1
both, or if all are 1. 0 1 0 0 1
z = a'b'c + a'bc' + ab'c +
0 1 1 1 0 abc' + abc
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1
z = ab + b’c + bc’
Combinational components
I(log n -1) I0 A A B
B A B
I(m-1) I1 I0 n
… n n n n
n …
log n x n n-bit n bit,
S0 n-bit, m x 1 n-bit
Decoder Adder m function S0
… Multiplexor Comparator
ALU …
… n
S(log m) S(log m)
n n
O(n-1) O1 O0 carry sum less equal greater
O O
O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B

I0 if S=0..00 O1 =1 if I=0..01 (first n bits) equal =1 if A=B op determined
I1 if S=0..01 … carry = (n+1)’th greater=1 if A>B by S.
… O(n-1) =1 if I=1..11 bit of A+B
I(m-1) if S=1..11
With enable With carry-in May have status

input e  all O’s input Ci outputs carry,
are 0 if e=0 sum = A + B + Ci zero, etc.
Sequential components
I
n
load shift n-bit
n-bit n-bit
Register Shift register Counter
clear I Q
n n
Q Q
Q= Q = lsb Q=
0 if clear=1, - Content shifted 0 if clear=1,
I if load=1 and clock=1, Q(prev)+1 if count=1
- I stored in msb
Q(previous) otherwise. and clock=1.
Sequential logic design
A) Problem Description C) Implementation Model D) State Table (Moore-type)

You want to construct a clock
x
divider. Slow down your pre-
pre a Combinational logic Inputs Outputs
I1 Q1 Q0 a I1 I0 x
existing clock so that you
I0 0 0 0 0 0
output a 1 for every four 0
0 0 1 0 1
clock cycles 0 1 0 0 1 0
Q1 Q0 0 1 1 1 0
1 0 0 1 0 0
B) State Diagram State register 1 0 1 1 1
1 1 0 1 1
x=0 x=1 a=0 1
a=0 1 1 1 0 0
I1 I0
0 a=1 3
a=1 a=1
1
a=1
2
a=0
• Given this implementation model
a=0 x=0 x=0
– Sequential logic design quickly reduces to
combinational logic design
Sequential logic design (cont.)
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11 10
a
0 0 0 1 1 x
I1 = Q1’Q0a + Q1a’ +
1 Q1Q0’
0 1 0 1
I0 Q1Q0 I1
00 01 11 10
a
0 0 1 1 0 I0 = Q0a’ + Q0’a
1 1 0 0 1
I0
x Q1Q0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0
Custom single-
single-purpose processor basic model
… …
external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic
datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …
controller and datapath

a view inside the controller and datapath
Example: Greatest Common Divisor
!1
(a) black-box 1:
(c) state
• First create algorithm view
2:
1 !(!go_i) diagram
• Convert algorithm to
go_i x_i y_i !go_i
2-J:
GCD
“complex” state d_o
3: x = x_i
machine 4: y = y_i
(b) desired functionality !(x!=y)

– Known as FSMD:
5:
0: int x, y; x!=y
1: while (1) {
finite-state machine 2: while (!go_i);
6:
x<y !(x<y)
3: x = x_i;
with datapath 4: y = y_i; 7: y = y -x 8: x = x - y
5: while (x != y) {
– Can use templates to
6-J:
6: if (x < y)
7: y = y - x;
perform such 8:
else
x = x - y;
5-J:
d_o = x
9:
conversion }
9: d_o = x; 1-J:
}
State diagram templates
Creating the datapath
• Create a register for any
declared variable 1:
!1
1
• Create a functional unit
!(!go_i)
2:
!go_i
for each arithmetic 2-J:
operation 3: x = x_i
• Connect the ports, 4: y = y_i
registers and functional 5: !(x!=y)
x!=y
units 6:
x<y !(x<y)
– Based on reads and writes 7: y = y -x 8: x = x - y
– Use multiplexors for 6-J:
multiple sources
5-J:
• Create unique identifier 9: d_o = x
– for each datapath 1-J:

component control input
and output
Creating the controller’s FSM
• Same structure as FSMD
!1 go_i
1:
Controller !1
1 !(!go_i) 0000 1:
2:
!go_i
0001 2:
1 !(!go_i) • Replace complex
2-J:
0010 2-J:
!go_i
actions/conditions with
3: x = x_i
0011
x_sel = 0
3: x_ld = 1
datapath configurations
4: y = y_i
y_sel = 0 x_i y_i
0100 4: y_ld = 1
!(x!=y)
Datapath
5: !x_neq_y
0101 5: x_sel
x!=y n-bit 2x1 n-bit 2x1
x_neq_y y_sel
6: 0110 6:
x_ld
x<y !(x<y) x_lt_y !x_lt_y 0: x 0: y
y_ld
7: y = y -x 8: x = x - y 7: y_sel = 1 8: x_sel =1
y_ld = 1 x_ld = 1
6-J: 0111 1000

!= < subtractor subtractor
1001 6-J:
5: x!=y 6: x<y 8: x-y 7: y-x
5-J: x_neq_y
1010 5-J:
x_lt_y 9: d
9: d_o = x 1011 9: d_ld = 1
d_ld
1-J: 1100 1-J: d_o

Splitting into a controller and datapath
go_i
Controller implementation model Controller !1

0000 1: x_i y_i
go_i
x_sel 1 !(!go_i) (b) Datapath
Combinational y_sel 0001 2:
logic !go_i x_sel
x_ld n-bit 2x1 n-bit 2x1
y_ld 0010 2-J: y_sel
x_neq_y x_sel = 0 x_ld
0011 3: x_ld = 1 0: x 0: y
x_lt_y y_ld
d_ld
y_sel = 0
0100 4: y_ld = 1
!= < subtractor subtractor
x_neq_y=0 5: x!=y 6: x<y 8: x-y 7: y-x
0101 5: x_neq_y
Q3 Q2 Q1 Q0 x_neq_y=1
0110 6: x_lt_y 9: d
State register d_ld
x_lt_y=1 x_lt_y=0
I3 I2 I1 I0
7: y_sel = 1 8: x_sel =1 d_o
y_ld = 1 x_ld = 1
0111 1000
1001 6-J:
1010 5-J:
1011 9: d_ld = 1
1100 1-J:
Controller state table for the Greatest
Common Divisor (GCD) example
Inputs Outputs
Q3 Q2 Q1 Q0 x_neq x_lt_y go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld

_y
0 0 0 0 * * * 0 0 0 1 X X 0 0 0
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0
Completing the GCD custom single-
single-
purpose processor design
• We finished the … …
datapath controller datapath
• We have a state table next-state registers

and
for the next state and control
logic
control logic
– All that’s left is state
register
functional
units
combinational logic
design
• This is not an optimized … …
design, but we see the a view inside the controller and datapath
basic steps
RT--level custom single-
RT single-purpose
processor design
• We often start with a
Problem Specification
state machine Sende
r rdy_in
Bridge
A single-purpose processor that rdy_out
Rece
iver
converts two 4-bit inputs, arriving one
– Rather than algorithm clock at a time over data_in along with a
rdy_in pulse, into one 8-bit output on
– Cycle timing often too data_in(4)

data_out along with a rdy_out pulse.
data_out(8)
central to functionality
• Example rdy_in=0
rdy_in=1
Bridge rdy_in=1
– Bus bridge that converts WaitFirst4 RecFirst4Start

data_lo=data_in
RecFirst4End
4-bit bus to 8-bit bus rdy_in=0 rdy_in=0 rdy_in=1

rdy_in=1
– Start with FSMD FSMD
WaitSecond4 RecSecond4Start RecSecond4End
data_hi=data_in
– Known as register- rdy_in=0
Inputs
transfer (RT) level Send8Start
data_out=data_hi Send8End
rdy_in: bit; data_in: bit[4];
Outputs
rdy_out=0
– Exercise: complete the
& data_lo rdy_out: bit; data_out:bit[8]
rdy_out=1 Variables
data_lo, data_hi: bit[4];
design
RT-level custom single-purpose processor design (cont’)
Bridge
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
rdy_in=0 rdy_in=0 rdy_in=1

rdy_in=1
WaitSecond4 RecSecond4Start RecSecond4End
data_hi_ld=1
Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1
rdy_in rdy_out
clk
data_in(4) data_out
data_lo_ld
data_out_ld
data_hi_ld
data_hi data_lo
registers
to all
data_out
(b) Datapath
Optimizing single-
single-purpose processors
• Optimization is the task of making
design metric values the best
possible
• Optimization opportunities
–original program
–FSMD
–datapath
–FSM
Optimizing the original program
• Analyze program attributes and look

for areas of possible improvement
–number of computations
–size of variable
–time and space complexity
–operations used
• multiplication and division very expensive
Optimizing the original program (cont’)
original program optimized program
0: int x, y; 0: int x, y, r;
1: while (1) { 1: while (1) {
2: while (!go_i); 2: while (!go_i);
3: x = x_i; // x must be the larger number
4: y = y_i; 3: if (x_i >= y_i) {
5: while (x != y) { 4: x=x_i;
replace the subtraction
6: if (x < y) 5: y=y_i;
operation(s) with modulo
7: y = y - x; }
operation in order to speed
else 6: else {
up program
8: x = x - y; 7: x=y_i;
} 8: y=x_i;
9: d_o = x; }
} 9: while (y != 0) {
10: r = x % y;
11: x = y;
12: y = r;
}
13: d_o = x;
GCD(42, 8) - 9 iterations to complete the loop }
x and y values evaluated as follows : (42, 8), (34, GCD(42,8) - 3 iterations to complete the loop
8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). x and y values evaluated as follows: (42, 8), (8,2),
(2,0)
Optimizing the FSMD
• Areas of possible improvements
– merge states
• states with constants on transitions can be
eliminated, transition taken is already known
• states with independent operations can be merged
– separate states
• states which require complex operations
(a*b*c*d) can be broken into smaller states to
reduce hardware size
– scheduling
Optimizing the FSMD (cont.)
int x, y; !1 optimized FSMD
original FSMD
1:
int x, y;
1 !(!go_i) eliminate state 1 – transitions have constant values
2: 2:
!go_i go_i !go_i
2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:
4: y = y_i x<y x>y

merge state 3 and state 4 – assignment operations are
independent of one another 7: y = y -x 8: x = x - y
5: !(x!=y)
x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6
x<y !(x<y) can be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each state
6-J: can be done from state 7 and state 8, respectively
5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:
1-J:
Optimizing the datapath
• Sharing of functional units

– one-to-one
one mapping, as done previously, is
not necessary
– if same operation occurs in different states,
they can share a single functional unit
• Multi-functional units
– ALUs support a variety of operations, it can
be shared among operations occurring in
different states
Optimizing the FSM
• State encoding
– task of assigning a unique bit pattern to each
state in an FSM
– size of state register and combinational logic vary
– can be treated as an ordering problem
• State minimization
– task of merging equivalent states into a single
state
• state equivalent if for all possible input combinations
the two states generate the same outputs and
transitions to the next same state
Embedded Systems
Chapter -3
Software design issues

3. Software Design Issues [6 Hrs.]
3.1 Basic Architecture

3.2 Operation
3.3 Programmer’s View
3.4 Development Environment
3.5 Application-Specific Instruction-Set
Processors
3.6 Selecting a Microprocessor
3.7 General-Purpose Processor Design
Introduction
• General-Purpose Processor
– Processor designed for a variety of computation
tasks
– Low unit cost, in part because manufacturer spreads
NRE over large numbers of units
• Motorola sold half a billion 68HC05 microcontrollers in
1996 alone
– Carefully designed since higher NRE is acceptable
• Can yield good performance, size and power
– Low NRE cost, short time-to-market/prototype, high
flexibility
• User just writes software; no processor design
– “microprocessor” – “micro” used when they were
implemented on one or a few chips rather than
entire rooms
Basic Architecture
• Control unit and
datapath Processor
Datapath
– similarity to single- Control unit
purpose processor ALU
Control
Controller /Status
• Key differences Registers
– Datapath is general
PC IR
– Control unit
doesn’t store the
I/O
algorithm – the
algorithm is Memory
“programmed”
into the memory
Datapath Operations
• Load Processor
– Read memory Control unit Datapath
location into ALU

Controller Control +1
register /Status
• ALU operation Registers
– Input certain
registers through 10 11
PC IR
ALU, store back in
register
I/O
...
Memory
10
• Store 11
...
– Write register to
memory location
Control Unit
• Control unit: configures the
datapath operations
Processor
– Sequence of desired operations
(“instructions”) stored in Control unit Datapath
memory – “program” ALU
Controller Control
• Instruction cycle – broken into /Status
several sub-operations, each
one clock cycle, e.g.: Registers
– Fetch: Get next instruction into IR

(Instruction Register)
– Decode: Determine what the PC IR R0 R1
instruction means
– Fetch operands: Move data from
memory to datapath register
I/O
– Execute: Move data through the ...
ALU 100 load R0, M[500] Memory
500 10
– Store results: Write data from 101 inc R1, R0 501 ...
register to memory 102 store M[501], R1
Control Unit Sub-Operations
• Fetch Processor
Control unit Datapath
– Get next ALU
instruction into Controller Control

/Status
IR Registers
– PC: program
counter, always PC 100 IR R0 R1
load R0, M[500]
points to next
instruction I/O
...
– IR: holds the 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 ...
fetched 102 store M[501], R1
instruction
• Decode Control unit

Processor
Datapath
–Determine Controller Control

ALU
/Status
what the
Registers
instruction
means PC 100 IR
load R0, M[500] R0 R1
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0
102 store M[501], R1
501 ...
• Fetch Control unit
Processor
Datapath
operands Controller Control

ALU
/Status
–Move data Registers
from
10
memory to PC 100 IR
load R0, M[500] R0 R1
datapath I/O
...
register 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
102 store M[501], R1

501 ...
• Execute Control unit
Processor
Datapath
– Move data Controller

ALU
Control
through the /Status
ALU Registers
– This particular 10
PC 100 IR R0 R1
instruction load R0, M[500]
does nothing I/O

...
during this 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 ...
sub-operation 102 store M[501], R1
• Store results Control unit
Processor
Datapath
– Write data Controller

ALU
Control
from register /Status
to memory Registers
– This particular 10
PC 100 IR R0 R1
instruction load R0, M[500]
does nothing I/O

...
during this 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 ...
sub-operation 102 store M[501], R1
Instruction Cycles
PC=100 Processor
Fetch Decode Fetch Exec. Store Control unit Datapath

ops result ALU
clk s Controller Control
/Status
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
Instruction Cycles
PC=100 Processor

ops result ALU
clk s Controller Control +1
/Status
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops result
clk s 10 11
PC 101 IR R0 R1
inc R1, R0
I/O
...
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
Instruction Cycles
PC=100 Processor

ops result ALU
clk s Controller Control
/Status
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops result
clk s 10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch Decode Fetch Exec. Store I/O
ops result ...
s 100 load R0, M[500] Memory
clk 500 10
101 inc R1, R0 501 11
...
102 store M[501], R1
Architectural Considerations
• N-bit processor Processor
– N-bit ALU, Control unit Datapath
registers, buses, ALU
memory data Controller Control

/Status
interface
– Embedded: 8-bit, Registers
16-bit, 32-bit
common
PC IR
– Desktop/servers:
32-bit, even 64
• PC size determines
I/O
Memory
address space
Architectural Considerations
• Clock frequency Processor
– Inverse of clock Control unit Datapath
ALU
period Controller Control
/Status
– Must be longer
than longest Registers
register to
register delay in PC IR
entire processor
– Memory access is I/O
often the longest Memory

Pipelining: Increasing Instruction Throughput
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
non-pipelined dish cleaning Time pipelined dish cleaning Time
Fetch-instr. 1 2 3 4 5 6 7 8
Decode 1 2 3 4 5 6 7 8
Fetch ops. 1 2 3 4 5 6 7 8 Pipelined
Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8
Time
pipelined instruction execution
Superscalar and VLIW Architectures
• Performance can be improved by:
– Faster clock (but there’s a limit)
– Pipelining: slice up instruction into stages, overlap
stages
– Multiple ALUs to support more than one instruction
stream
• Superscalar
– Scalar: non-vector operations
– Fetches instructions in batches, executes as many as possible
» May require extensive hardware to detect independent
instructions
– VLIW (Very Long Instruction Word): each word in memory has
multiple independent instructions
» Relies on the compiler to detect and schedule instructions
» Currently growing in popularity
Two Memory Architectures
Processor Processor
• Princeton
– Fewer memory
wires
• Harvard
– Simultaneous Program
memory
Data memory Memory
(program and data)
program and data
memory access
Harvard Princeton
Cache Memory
• Memory access may Fast/expensive technology, usually on
the same chip
be slow Processor
• Cache is small but

fast memory close to Cache
processor
– Holds copy of part of Memory
memory Slower/cheaper technology, usually

on a different chip
– Hits and misses

Programmer’s View
• Programmer doesn’t need detailed understanding of
architecture
– Instead, needs to know what instructions can be executed
• Two levels of instructions:

– Assembly level
– Structured languages (C, C++, Java, etc.)
• Most development today done using structured languages

– But, some assembly level programming may still be necessary
– Drivers: portion of program that communicates with and/or controls
(drives) another device
• Often have detailed timing considerations, extensive bit manipulation
• Assembly level may be best for these
Assembly-Level Instructions
Instruction 1 opcode operand1 operand2
...
• Instruction Set
– Defines the legal set of instructions for that processor
• Data transfer: memory/register, register/register, I/O, etc.
• Arithmetic/logical: move register through ALU and back
• Branches: determine next PC value when not just PC+1
A Simple (Trivial) Instruction Set
Assembly instruct. First byte Second byte Operation
MOV Rn, direct 0000 Rn direct Rn = M(direct)
MOV direct, Rn 0001 Rn direct M(direct) = Rn
MOV @Rn, Rm 0010 Rn Rm M(Rn) = Rm
MOV Rn, #immed. 0011 Rn immediate Rn = immediate
ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm
SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm
JZ Rn, relative 0110 Rn relative PC = PC+ relative

(only if Rn is 0)
opcode operands
Addressing Modes
Addressing Register-file Memory
mode Operand field contents contents
Immediate Data
Register-direct
Register address Data
Register
Register address Memory address Data
indirect
Direct Memory address Data
Indirect Memory address Memory address
Data
Sample Programs
C program Equivalent assembly program
0 MOV R0, #0; // total = 0

1 MOV R1, #10; // i = 10
2 MOV R2, #1; // constant 1
3 MOV R3, #0; // constant 0
Loop: JZ R1, Next; // Done if i=0

int total = 0;
5 ADD R0, R1; // total += i
for (int i=10; i!=0; i--) 6 SUB R1, R2; // i--
total += i; 7 JZ R3, Loop; // Jump always
// next instructions... Next: // next instructions...
• Try some others

– Handshake: Wait until the value of M[254] is not 0, set
M[255] to 1, wait until M[254] is 0, set M[255] to 0
(assume those locations are ports).
– (Harder) Count the occurrences of zero in an array stored

in memory locations 100 through 199.
Programmer Considerations
• Program and data memory space
– Embedded processors often very limited
• e.g., 64 Kbytes program, 256 bytes of RAM (expandable)
• Registers: How many are there?
– Only a direct concern for assembly-level programmers
• I/O
- How communicate with external signals?
• Interrupts
– is defined as a signal that initiates changes in normal
program execution flow.
Use of Interrupts
• I/O data transfer between peripheral devices
and processor/ controller
• Timing applications
• Handling emergency situations (e.g. switch
off the system when the battery status falls
bellow the critical limit in battery operated
systems)
• Context switching/ Multitasking/ Real-time
application programming
• Event driven programming
Microprocessor Architecture
Overview
Example: parallel port driver
LPT Connection Pin I/O Direction Register Address
Pin 13
1 Output 0th bit of register #2 Switch
PC Parallel port
2-9 Output 0th bit of register #2
Pin 2 LED
10,11,12,13,15 Input 6,7,5,4,3th bit of register #1
14,16,17 Output 1,2,3th bit of register #2
• Using assembly language programming we can

configure a PC parallel port to perform digital I/O
– write and read to three special registers to accomplish this
table provides list of parallel port connector pins and
corresponding register location
– Example : parallel port monitors the input switch and turns
the LED on/off accordingly
Parallel Port Example
; This program consists of a sub-routine that reads extern “C” CheckPort(void); // defined in
; the state of the input pin, determining the on/off state
; of our switch and asserts the output pin, turning the LED
// assembly
; on/off accordingly void main(void) {
.386 while( 1 ) {
CheckPort();
CheckPort proc
}
push ax ; save the content
push dx ; save the content }
mov dx, 3BCh + 1 ; base + 1 for register #1
in al, dx ; read register #1
and al, 10h ; mask out all but bit # 4 Pin 13
cmp al, 0 ; is it 0?
jne SwitchOn ; if not, we need to turn the LED on Switch
PC Parallel port
SwitchOff:
mov dx, 3BCh + 0 ; base + 0 for register #0 Pin 2 LED
in al, dx ; read the current state of the port
and al, f7h ; clear first bit (masking)
out dx, al ; write it out to the port
jmp Done ; we are done
SwitchOn: LPT Connection Pin I/O Direction Register Address

mov dx, 3BCh + 0 ; base + 0 for register #0 1 Output 0th bit of register #2
in al, dx ; read the current state of the port
or al, 01h ; set first bit (masking) 2-9 Output 0th bit of register #2
out dx, al ; write it out to the port
10,11,12,13,15 Input 6,7,5,4,3th bit of register
Done: pop dx ; restore the content #1
pop ax ; restore the content 14,16,17 Output 1,2,3th bit of register #2
CheckPort endp
Operating System
• Optional software layer
providing low-level services
to a program (application).
– File management, disk
access
– Keyboard/display interfacing DB file_name “out.txt” -- store file name
MOV R0, 1324 -- system call “open” id
– Scheduling multiple MOV

INT
JZ
R1, file_name
34
R0, L1
--
--
--
address of file-name
cause a system call
if zero -> error
programs for execution . . . read the file

JMP L2 -- bypass error cond.
• Or even just multiple threads L1:

. . . handle the error
from one program L2:
– Program makes system calls

to the OS
Development Environment
• Development processor
– The processor on which we write and debug our
programs
• Usually a PC
• Target processor
– The processor that the program will run on in our
embedded system
• Often different from the development processor
Software Development Process
• Compilers
C File C File Asm.
– Cross compiler
File
• Runs on one
Compiler Assemble
processor, but
r generates code
Binary Binary Binary for another
File File File
• Assemblers
Linker
Library Debugger
• Linkers
Exec.
File Profiler
• Debuggers
Implementation Phase Verification Phase
• Profilers
Running a Program
• If development processor is different than
target, how can we run our compiled code? Two
options:
– Download to target processor
– Simulate
• Simulation
– One method: Hardware description language
• But slow, not always available
– Another method: Instruction set simulator (ISS)
• Runs on development processor, but executes instructions
of target processor
Instruction Set Simulator For A Simple
Processor
#include <stdio.h> }
typedef struct { }
unsigned char first_byte, second_byte; return 0;
} instruction; }
instruction program[1024]; //instruction memory int main(int argc, char *argv[]) {

unsigned char memory[256]; //data memory
FILE* ifs;
void run_program(int num_bytes) {
If( argc != 2 ||
int pc = -1; (ifs = fopen(argv[1], “rb”) == NULL ) {
unsigned char reg[16], fb, sb; return –1;
}
while( ++pc < (num_bytes / 2) ) { if (run_program(fread(program,
fb = program[pc].first_byte; sizeof(program) == 0) {
sb = program[pc].second_byte; print_memory_contents();
switch( fb >> 4 ) { return(0);
case 0: reg[fb & 0x0f] = memory[sb]; break; }
case 1: memory[sb] = reg[fb & 0x0f]; break; else return(-1);
case 2: memory[reg[fb & 0x0f]] = }
reg[sb >> 4]; break;
case 3: reg[fb & 0x0f] = sb; break;
case 4: reg[fb & 0x0f] += reg[sb >> 4]; break;
case 5: reg[fb & 0x0f] -= reg[sb >> 4]; break;
case 6: pc += sb; break;
default: return –1;
Testing & Debugging
(a) (b) • ISS
Implementation Implementation
– Gives us control over time –
Phase Phase set breakpoints, look at
register values, set values,
step-by-step execution, ...
Verification
Phase Development processor – But, doesn’t interact with
real environment
Debugger • Download to board
/ ISS
– Use device programmer
Emulator
– Runs in real environment,
but not controllable
External tools
• Compromise: emulator
– Runs in real environment, at
speed or near
Programmer – Supports some
Verification
controllability from the PC
Phase
Application-Specific Instruction-Set
Processors (ASIPs)
• General-purpose processors
– Sometimes too general to be effective in demanding
application
• e.g., video processing – requires huge video buffers and
operations on large arrays of data, inefficient on a GPP
– But single-purpose processor has high NRE, not
programmable
• ASIPs – targeted to a particular domain
– Contain architectural features specific to that domain
• e.g., embedded control, digital signal processing, video
processing, network processing, telecommunications, etc.
– Still programmable
A Common ASIP: Microcontroller
• For embedded control applications
– Reading sensors, setting actuators
– Mostly dealing with events (bits): data is present, but not in
huge amounts
– e.g., VCR, disk drive, digital camera (assuming SPP for image
compression), washing machine, microwave oven
• Microcontroller features
– On-chip peripherals
• Timers, analog-digital converters, serial communication, etc.
• Tightly integrated for programmer, typically part of register space
– On-chip program and data memory
– Direct programmer access to many of the chip’s pins
– Specialized instructions for bit-manipulation and other low-
level operations
Another Common ASIP: Digital Signal
Processors (DSP)
• For signal processing applications
– Large amounts of digitized data, often streaming
– Data transformations must be applied fast
– e.g., cell-phone voice filter, digital TV, music synthesizer
• DSP features
– Several instruction execution units
– Multiple-accumulate single-cycle instruction, other
instrs.
– Efficient vector operations – e.g., add two arrays
• Vector ALUs, loop buffers, ….
Trend: Even More Customized ASIPs
• In the past, microprocessors were acquired as chips
• Today, we increasingly acquire a processor as Intellectual

Property (IP)
– e.g., synthesizable VHDL model
• Opportunity to add a custom datapath hardware and a few

custom instructions, or delete a few instructions
– Can have significant performance, power and size impacts
– Problem: need compiler/debugger for customized ASIP
• Remember, most development uses structured languages
• One solution: automatic compiler/debugger generation
• Another solution: retargettable compilers
– (customized VLIW architectures)
Selecting a Microprocessor
• Issues
– Technical: speed, power, size, cost
– Other: development environment, prior expertise, licensing, …
• Speed: how evaluate a processor’s speed?

– Clock speed – but instructions per cycle may differ
– Instructions per second – but work per instr. may differ
– Dhrystone: Synthetic benchmark, developed in 1984
(A short synthetic benchmark program by Reinhold Weicker, intended to
be representative of system (integer) programming. It is available
in ADA, Pascal and C. ). Dhrystones/sec.
• MIPS (Million Instructions Per Second): 1 MIPS = 1757 Dhrystones per second
(based on Digital’s VAX 11/780). Dhrystone MIPS. Commonly used today.
– So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second
– SPEC: set of more realistic benchmarks, but oriented to desktops
– EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org
• Suites of benchmarks: automotive, consumer electronics, networking, office
automation, telecommunications
General Purpose Processors
Processor Clock speed Periph. Bus Width MIPS Power Trans. Price
General Purpose Processors
Intel PIII 1GHz 2x16 K 32 ~900 97W ~7M $900
L1, 256K
L2, MMX
IBM 550 MHz 2x32 K 32/64 ~1300 5W ~7M $900
PowerPC L1, 256K
750X L2
MIPS 250 MHz 2x32 K 32/64 NA NA 3.6M NA
R5000 2 way set assoc.
StrongARM 233 MHz None 32 268 1W 2.1M NA
SA-110
Microcontroller
Intel 12 MHz 4K ROM, 128 RAM, 8 ~1 ~0.2W ~10K $7
8051 32 I/O, Timer, UART
Motorola 3 MHz 4K ROM, 192 RAM, 8 ~.5 ~0.1W ~10K $5
68HC811 32 I/O, Timer, WDT,
SPI
Digital Signal Processors
TI C5416 160 MHz 128K, SRAM, 3 T1 16/32 ~600 NA NA $34
Ports, DMA, 13
ADC, 9 DAC
Lucent 80 MHz 16K Inst., 2K Data, 32 40 NA NA $75
DSP32C Serial Ports, DMA
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
Designing a General Purpose Processor
FSMD
• Not something an embedded Declarations:

bit PC[16], IR[16]; Reset PC=0;
bit M[64k][16], RF[16][16];
system designer normally Fetch IR=M[PC];
PC=PC+1
would do Decode from states
below
– But instructive to see how Mov1 RF[rn] = M[dir]
simply we can build one top op = 0000 to Fetch
down 0001
Mov2 M[dir] = RF[rn]
to Fetch
– Remember that real processors Mov3 M[rn] = RF[rm]

0010 to Fetch
aren’t usually built this way
Mov4 RF[rn]= imm
• Much more optimized, much 0011 to Fetch
more bottom-up design Add RF[rn] =RF[rn]+RF[rm]

0100 to Fetch
Sub RF[rn] = RF[rn]-RF[rm]

Aliases: 0101 to Fetch
op IR[15..12] dir IR[7..0]
rn IR[11..8] imm IR[7..0]
Jz PC=(RF[rn]=0) ?rel :PC
rm IR[7..4] rel IR[7..0]
0110 to Fetch
Architecture of a Simple Microprocessor
• Storage devices for each Datapath
declared variable Control unit To all
input RFs
1
2x1 mux
0
control
– register file holds each of the signals
RFwa
RFw
variables Controller
(Next-state and RFwe
control From all RF (16)
• Functional units to carry out logic; state register) output
control
RFr1a
RFr1e
the FSMD operations 16
signals
RFr2a
Irld
– One ALU carries out every
PCld
PC IR RFr1 RFr2
PCinc RFr2e
required operation PCclr

ALUs
ALU
• Connections added among 2 1 0

ALUz
the components’ ports Ms

3x1 mux Mre Mwe
corresponding to the
operations required by the A Memory D
FSM
• Unique identifiers created
for every control signal
A Simple Microprocessor
Reset PC=0; PCclr=1;
Fetch IR=M[PC]; MS=10; Datapath 1

Control unit To all 0
PC=PC+1 Irld=1; RFs
input 2x1 mux
Decode from states Mre=1;
contro
below PCinc=1;
l
signals RFwa
Mov1 RF[rn] = M[dir] RFwa=rn; RFwe=1; RFs=01; Controller RFw
op = 0000 to Fetch Ms=01; Mre=1; (Next-state and RFwe
control From all RF (16)
Mov2 M[dir] = RF[rn] RFr1a=rn; RFr1e=1; output
logic; state RFr1a
0001 to Fetch Ms=01; Mwe=1; control
register)
signals RFr1e
Mov3 M[rn] = RF[rm] RFr1a=rn; RFr1e=1;
0010 to Fetch Ms=10; Mwe=1; 16 RFr2a
PCld Irld
RF[rn]= imm RFwa=rn; RFwe=1; RFs=10; PC IR RFr1 RFr2
Mov4 RFr2e
0011 to Fetch PCinc
ALUs
Add RF[rn] =RF[rn]+RF[rm] RFwa=rn; RFwe=1; RFs=00; PCclr
0100 ALU
to Fetch RFr1a=rn; RFr1e=1; ALUz
RFr2a=rm; RFr2e=1; ALUs=00 2 1 0
Sub RF[rn] = RF[rn]-RF[rm] RFwa=rn; RFwe=1; RFs=00;
0101 to Fetch RFr1a=rn; RFr1e=1;
RFr2a=rm; RFr2e=1; ALUs=01 Ms
PCld= ALUz; 3x1 mux Mre Mwe
Jz PC=(RF[rn]=0) ?rel :PC
0110 to Fetch RFrla=rn;
RFrle=1;
FSM operations that replace the FSMD
FSMD
operations after a datapath is created Memory
A D
Embedded Systems
Chapter – 4
Memory
4. Memory [5 Hrs.]
4.1 Memory Write Ability and

Storage Permanence
4.2 Types of Memory
4.3 Composing Memory
4.4 Memory Hierarchy and Cache
Introduction
• Embedded system’s functionality aspects
– Processing
• processors
• transformation of data
– Storage
• memory
• retention of data
– Communication
• buses
• transfer of data
Semiconductor Memory Types
Memory: basic concepts
• Stores large number of bits
– m x n: m words of n bits each
– k = Log2(m) address input signals
– or m = 2^k words
– e.g., 4,096 x 8 memory:
• 32,768 bits
• 12 address input signals
• 8 input/output data signals
• Memory access
– r/w: selects read or write
– enable: read or write only when asserted
– multiport: multiple accesses to different
locations simultaneously
Write ability/ storage permanence
• Traditional ROM/RAM distinctions
– ROM
• read only, bits stored without power
– RAM
• read and write, lose stored bits
without power
• Traditional distinctions blurred
– Advanced ROMs can be written to
• e.g., EEPROM
– Advanced RAMs can hold bits without
power
• e.g., NVRAM
• Write ability
– Manner and speed a memory can be
written
• Storage permanence
– ability of memory to hold stored bits
after they are written
Write ability
• Ranges of write ability
– High end
• processor writes to memory simply and quickly
• e.g., RAM
– Middle range
• processor writes to memory, but slower
• e.g., FLASH, EEPROM
– Lower range
• special equipment, “programmer”, must be used to write to memory
• e.g., EPROM, OTP ROM
– Low end
• bits stored only during fabrication
• e.g., Mask-programmed ROM
• In-system programmable memory
– Can be written to by a processor in the embedded system using the
memory
– Memories in high end and middle range of write ability
Storage permanence
• Range of storage permanence
– High end
• essentially never loses bits
• e.g., mask-programmed ROM
– Middle range
• holds bits days, months, or years after memory’s power source turned off
• e.g., NVRAM
– Lower range
• holds bits as long as power supplied to memory
• e.g., SRAM
– Low end
• begins to lose bits almost immediately after written
• e.g., DRAM
• Nonvolatile memory
– Holds bits after power is no longer supplied
– High end and middle range of storage permanence
Semiconductor Memory
• RAM
–Misnamed as all semiconductor
memory is random access
–Read/Write
–Volatile
–Temporary storage
–Static or dynamic
Memory Cell Operation
Dynamic RAM
• Bits stored as charge in capacitors
• Charges leak
• Need refreshing even when powered
• Simpler construction
• Smaller per bit
• Less expensive
• Need refresh circuits
• Slower
• Main memory
• Essentially analogue
– Level of charge determines value
Dynamic RAM Structure
DRAM Operation
• Address line active when bit read or written
– Transistor switch closed (current flows)
• Write
– Voltage to bit line
• High for 1 low for 0
– Then signal address line
• Transfers charge to capacitor
• Read
– Address line selected
• transistor turns on
– Charge from capacitor fed via bit line to sense amplifier
• Compares with reference value to determine 0 or 1
– Capacitor charge must be restored
Static RAM
• Bits stored as on/off switches
• No charges to leak
• No refreshing needed when powered
• More complex construction
• Larger per bit
• More expensive
• Does not need refresh circuits
• Faster
• Cache
• Digital
– Uses flip-flops
Stating RAM Structure
Static RAM Operation
• Transistor arrangement gives stable logic state
• State 1
– C1 high, C2 low
– T1 T4 off, T2 T3 on
• State 0
– C2 high, C1 low
– T2 T3 off, T1 T4 on
• Address line transistors T5 T6 is switch
• Write – apply value to B & compliment to B
• Read – value is on line B
Basic types of RAM
• SRAM: Static RAM memory cell internals
– Memory cell uses flip-flop to store bit
– Requires 6 transistors SRAM
– Holds data as long as power supplied
Data' Data
• DRAM: Dynamic RAM

– Memory cell uses MOS transistor and W
capacitor to store bit

– More compact than SRAM
DRAM
– “Refresh” required due to capacitor
leak Data
• word’s cells refreshed when read W
– Typical refresh rate 15.625 microsec.
– Slower to access than SRAM
Enhanced DRAMs
All enhanced DRAMs are built around the
conventional DRAM core.
— Fast page mode DRAM (FPM DRAM)
– Access contents of row with [RAS, CAS, CAS, CAS, CAS] instead
of [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)].
— Extended data out DRAM (EDO DRAM)
– Enhanced FPM DRAM with more closely spaced CAS signals.
— Synchronous DRAM (SDRAM)
– Driven with rising clock edge instead of asynchronous control
signals.
— Double data-rate synchronous DRAM (DDR SDRAM)
– Enhancement of SDRAM that uses both clock edges as control
signals.
— Video RAM (VRAM)
– Like FPM DRAM, but output is produced by shifting row buffer
– Dual ported (allows concurrent reads and writes)
Ram variations
• PSRAM: Pseudo-static RAM
– DRAM with built-in memory refresh controller
– Popular low-cost high-density alternative to SRAM
• NVRAM: Nonvolatile RAM
– Holds data after external power removed
– Battery-backed RAM
• SRAM with own permanently connected battery
• writes as fast as reads
• no limit on number of writes unlike nonvolatile ROM-based memory
– SRAM with EEPROM or flash
• stores complete RAM contents on EEPROM or flash before power
turned off
Read Only Memory (ROM)
• Permanent storage
–Nonvolatile
• Microprogramming
• Library subroutines
• Systems programs (BIOS)
• Function tables
Types of ROM
• Written during manufacture
– Very expensive for small runs
• Programmable (once)
– PROM
– Needs special equipment to program
• Read “mostly”
– Erasable Programmable (EPROM)
• Erased by UV
– Electrically Erasable (EEPROM)
• Takes much longer to write than read
– Flash memory
• Erase whole memory electrically
Organisation in detail
• A 16Mbit chip can be organised as 1M of 16 bit
words
• A bit per chip system has 16 lots of 1Mbit chip
with bit 1 of each word in chip 1 and so on
• A 16Mbit chip can be organised as a 2048 x
2048 x 4bit array
– Reduces number of address pins
• Multiplex row address and column address
• 11 pins to address (211=2048)
• Adding one more pin doubles range of values so x4
capacity
ROM: “Read-Only” Memory
• Nonvolatile memory
• Can be read from but not written to, by a
processor in an embedded system External view
• Traditionally written to, “programmed”, 2k × n ROM

enable
before inserting to embedded system
A0
• Uses
…
Ak-1
– Store software program for general-purpose
…
processor
• program instructions can be one or more Qn-1 Q0
ROM words
– Store constant data needed by system
– Implement combinational circuit
Example: 8 x 4 ROM
• Horizontal lines = words
• Vertical lines = data Internal view
• Lines connected only at circles 8 × 4 ROM
• Decoder sets word 2’s line to 1 if enable decoder

3×8 word 0
word 1
address input is 010 A
word 2
A
0
word
1
• Data lines Q3 and Q1 are set to 1 A 2 line

because there is a “programmed” data line
connection with word 2’s line Programmable wired-OR
connection Q Q Q Q
• Word 2 is not connected with 3 2 1 0
data lines Q2 and Q0

• Output is 1010
Implementing combinational function
• Any combinational circuit of n functions of
same k variables can be done with 2^k x n ROM
Mask-programmed ROM
• Connections “programmed” at fabrication
– set of masks
• Lowest write ability
– only once
• Highest storage permanence
– bits never change unless damaged
• Typically used for final design of high-volume
systems
– spread out NRE cost for a low unit cost
OTP ROM: One-time programmable ROM
• Connections “programmed” after manufacture by user
– user provides file of desired contents of ROM
– file input to machine called ROM programmer
– each programmable connection is a fuse
– ROM programmer blows fuses where connections should not exist
• Very low write ability
– typically written only once and requires ROM programmer device
• Very high storage permanence
– bits don’t change unless reconnected to programmer and more
fuses blown
• Commonly used in final products
– cheaper, harder to inadvertently modify
EPROM: Erasable programmable ROM
• Programmable component is a MOS transistor
– Transistor has “floating” gate surrounded by an insulator
0V
– (a) Negative charges form a channel between source and drain floating gate
storing a logic 1 source drain
– (b) Large positive voltage at gate causes negative charges to

move out of channel and get trapped in floating gate storing a
logic 0 (a)
– (c) (Erase) Shining UV rays on surface of floating-gate causes
negative charges to return to channel from floating gate
restoring the logic 1 +15V
– (d) An EPROM package showing quartz window through which
UV light can pass source drain
(b)
• Better write ability
– can be erased and reprogrammed thousands of times 5-30 min
• Reduced storage permanence
– program lasts about 10 years but is susceptible to source drain
radiation and electric noise (c)

• Typically used during design development
(d)
.
EEPROM: Electrically erasable
programmable ROM
• Programmed and erased electrically
– typically by using higher than normal voltage
– can program and erase individual words
• Better write ability
– can be in-system programmable with built-in circuit to provide higher
than normal voltage
• built-in memory controller commonly used to hide details from memory user
– writes very slow due to erasing and programming
• “busy” pin indicates to processor EEPROM still writing
– can be erased and programmed tens of thousands of times
• Similar storage permanence to EPROM (about 10 years)
• Far more convenient than EPROMs, but more expensive
Flash Memory
• Extension of EEPROM
– Same floating gate principle
– Same write ability and storage permanence
• Fast erase
– Large blocks of memory erased at once, rather than one word at a time
– Blocks typically several thousand bytes large
• Writes to single words may be slower
– Entire block must be read, word updated, then entire block written back
• Used with embedded systems storing large data items in
nonvolatile memory
– e.g., digital cameras, TV set-top boxes, cell phones
RAM: “Random-access” memory
• Typically volatile memory
– bits are not held without power supply
• Read and written to easily by
embedded system during execution
• Internal structure more complex
than ROM
– a word consists of several memory cells,
each storing 1 bit
– each input and output data line
connects to each cell in its column
– rd/wr connected to every cell
– when row is enabled by decoder, each
cell has logic that stores input data bit
when rd/wr indicates write or outputs
stored bit when rd/wr indicates read
Example: HM6264 & 27C256 RAM/ROM devices
• Low-cost low-capacity
memory devices
• Commonly used in 8-
bit microcontroller-
based embedded
systems
• First two numeric
digits indicate device
type
– RAM: 62
– ROM: 27
• Subsequent digits
indicate capacity in
kilobits
Example: TC55V2325FF-100 memory device
• 2-megabit
synchronous
pipelined burst
SRAM memory
device
• Designed to be
interfaced with
32-bit
processors
• Capable of fast
sequential
reads and
writes as well
as single byte
I/O
Composing memory
• Memory size needed often differs from size of readily
available memories
• When available memory is larger, simply ignore unneeded
high-order address bits and higher data lines
• When available memory is smaller, compose several smaller
memories into one larger memory
– Connect side-by-side to increase width of words
– Connect top to bottom to increase number of words
• added high-order address line selects smaller memory
containing desired word using a decoder
– Combine techniques to increase number and width of
words
Memory hierarchy
• Want inexpensive,
fast memory
• Main memory
– Large, inexpensive,
slow memory
stores entire
program and data
• Cache
• Small, expensive, fast memory stores copy of likely accessed
parts of larger memory
• Can be multiple levels of cache
Cache
• Usually designed with SRAM
– faster but more expensive than DRAM
• Usually on same chip as processor
– space limited, so much smaller than off-chip main memory
– faster access ( 1 cycle vs. several cycles for main memory)
• Cache operation:
– Request for main memory access (read or write)
– First, check cache for copy
• cache hit
– copy is in cache, quick access
• cache miss
– copy not in cache, read address and possibly its neighbors into cache
• Several cache design choices

– cache mapping, replacement policies, and write techniques
Cache mapping
• Far fewer number of available cache addresses
• Are address’ contents in cache?
• Cache mapping used to assign main memory address
to cache address and determine hit or miss
• Three basic techniques:
– Direct mapping
– Fully associative mapping
– Set-associative mapping
• Caches partitioned into indivisible blocks or lines of
adjacent memory addresses
– usually 4 or 8 addresses per line
Direct mapping
• Main memory address divided into 2
fields
– Index Tag Index Offset
• cache address
• number of bits determined by cache size
V T D
– Tag
• compared with tag stored in cache at
address indicated by index
• if tags match, check valid bit Data
• Valid bit
– indicates whether data in slot has been Valid
loaded from memory =
• Offset
– used to find particular word in cache
line
Fully associative mapping
• Complete main memory address stored in each cache address
• All addresses stored in cache simultaneously compared with
desired address
• Valid bit and offset same as direct mapping
Set-associative mapping
• Compromise between direct mapping and fully
associative mapping
• Index same as in direct mapping
• But, each cache address contains content and tags of
2 or more memory address locations
• Tags of that set
simultaneously compared
as in fully associative
mapping
• Cache with set size N called
N-way set-associative
– 2-way, 4-way, 8-way are
common
Cache-replacement policy
• Technique for choosing which block to replace
– when fully associative cache is full
– when set-associative cache’s line is full
• Direct mapped cache has no choice
• Random
– replace block chosen at random
• LRU: least-recently used
– replace block not accessed for longest time
• FIFO: first-in-first-out
– push block onto queue when accessed
– choose block to replace by popping queue
Cache write techniques
• When written, data cache must update main
memory
• Write-through
– write to main memory whenever cache is written to
– easiest to implement
– processor must wait for slower main memory write
– potential for unnecessary writes
• Write-back
– main memory only written when “dirty” block replaced
– extra dirty bit for each block set when cache block
written to
– reduces number of slow main memory writes
Cache impact on system performance
• Most important parameters in terms of performance:
– Total size of cache
• total number of data bytes cache can hold
• tag, valid and other house keeping bits not included in total
– Degree of associativity
– Data block size
• Larger caches achieve lower miss rates but higher access cost e.g.,
• 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
– avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles
• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
(improvement)
• 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change
(worse)
Cache performance trade-offs
• Improving cache hit rate without increasing size
– Increase line size
– Change set-associativity
Advanced RAM
• DRAMs commonly used as main memory in
processor based embedded systems
– high capacity, low cost
• Many variations of DRAMs proposed
– need to keep space with processor speeds
– FPM DRAM: fast page mode DRAM
– EDO DRAM: extended data out DRAM
– SDRAM/ESDRAM: synchronous and enhanced
synchronous DRAM
– RDRAM: rambus DRAM
Basic DRAM
• Address bus multiplexed between row and column components
• Row and column addresses are latched in, sequentially, by strobing ras
and cas signals, respectively
• Refresh circuitry can be external or internal to DRAM device
– strobes consecutive memory address periodically causing memory content to
be refreshed
– Refresh circuitry disabled during read or write operation
Typical 16 Mb DRAM (4M x 4)
Packaging
Fast Page Mode DRAM (FPM DRAM)
•Each row of memory bit array is viewed as a page
•Page contains multiple words
•Individual words addressed by column address
•Timing diagram:
– row (page) address sent
– 3 words read consecutively by sending column address for each
•Extra cycle eliminated on each read/write of words from same page
Extended data out DRAM (EDO DRAM)
• Improvement of FPM DRAM
• Extra latch before output buffer
– allows strobing of cas before data read operation
completed
• Reduces read/write latency by additional cycle
Advanced DRAM Organization
• Basic DRAM same since first RAM chips

• Enhanced DRAM
– Contains small SRAM as well
– SRAM holds last line read (c.f. Cache!)
• Cache DRAM
– Larger SRAM component
– Use as cache or serial buffer
Synchronous DRAM (SDRAM)
• Access is synchronized with an external clock
• Address is presented to RAM
• RAM finds data (CPU waits in conventional DRAM)
• Since SDRAM moves data in time with system
clock, CPU knows when data will be ready
• CPU does not have to wait, it can do something
else
• Burst mode allows SDRAM to set up stream of data
and fire it out in block
• DDR-SDRAM sends data twice per clock cycle
(leading & trailing edge)
Enhanced DRAMs
All enhanced DRAMs are built around the
conventional DRAM core.
— Fast page mode DRAM (FPM DRAM)
– Access contents of row with [RAS, CAS, CAS, CAS, CAS] instead
of [(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)].
— Extended data out DRAM (EDO DRAM)
– Enhanced FPM DRAM with more closely spaced CAS signals.
— Synchronous DRAM (SDRAM)
– Driven with rising clock edge instead of asynchronous control
signals.
— Double data-rate synchronous DRAM (DDR SDRAM)
– Enhancement of SDRAM that uses both clock edges as control
signals.
— Video RAM (VRAM)
– Like FPM DRAM, but output is produced by shifting row buffer
– Dual ported (allows concurrent reads and writes)
IBM 64Mb SDRAM
SDRAM Operation
(S)ynchronous and
Enhanced Synchronous (ES) DRAM
• SDRAM latches data on active edge of clock
• Eliminates time to detect ras/cas and rd/wr signals
• A counter is initialized to column address then incremented on
active edge of clock to access consecutive memory locations
• ESDRAM improves SDRAM
– added buffers enable overlapping of column addressing
– faster clocking and lower read/write latency possible
Rambus DRAM (RDRAM)
• More of a bus interface architecture
than DRAM architecture
• Data is latched on both rising and falling
edge of clock
• Broken into 4 banks each with own row
decoder
– can have 4 pages open at a time
• Capable of very high throughput
RAMBUS
• Adopted by Intel for Pentium & Itanium
• Main competitor to SDRAM
• Vertical package – all pins on one side
• Data exchange over 28 wires < cm long
• Bus addresses up to 320 RDRAM chips at
1.6Gbps
• Asynchronous block protocol
– 480ns access time
– Then 1.6 Gbps
RAMBUS Diagram
DRAM integration problem
• SRAM easily integrated on same chip as
processor
• DRAM more difficult
– Different chip making process between DRAM and
conventional logic
– Goal of conventional logic (IC) designers:
• minimize parasitic capacitance to reduce signal propagation
delays and power consumption
– Goal of DRAM designers:
• create capacitor cells to retain stored information
– Integration processes beginning to appear
Memory Management Unit (MMU)
• Duties of MMU
– Handles DRAM refresh, bus interface and
arbitration
– Takes care of memory sharing among multiple
processors
– Translates logic memory addresses from processor
to physical memory addresses of DRAM
• Modern CPUs often come with MMU built-in
• Single-purpose processors can be used
Embedded Systems
Chapter – 5
Interfacing
5. Interfacing [6 Hrs.]
5.1 Communication Basics

5.2 Microprocessor Interfacing: I/O
Addressing, Interrupts, DMA
5.3 Arbitration
5.4 Multilevel Bus Architectures
5.5 Advanced Communication Principles
Introduction
• Embedded system functionality aspects
– Processing
• Transformation of data
• Implemented using processors
– Storage
• Retention of data
• Implemented using memory
– Communication
• Transfer of data between processors and memories
• Implemented using buses
• Called interfacing
A simple bus
• Wires:
– Uni-directional or bi-directional
– One line may represent
multiple wires
• Bus
– Set of wires with a single
function
• Address bus, data bus
– Or, entire collection of wires
• Address, data and control
• Associated protocol: rules for
communication
Ports
• Conducting device on periphery

• Connects bus to processor or memory
• Often referred to as a pin
– Actual pins on periphery of IC package that plug into socket on printed-
circuit board
– Sometimes metallic balls instead of pins
– metal “pads” connecting processors and memories within single IC
• Single wire or set of wires with single function
– e.g., 12-wire address port
Timing Diagrams
• Most common method for describing a rd'/wr
communication protocol
enable
• Time proceeds to the right on x-axis addr
• Control signal: low or high

data
– May be active low (e.g., go’, /go, or go_L)
– Use terms assert (active) and deassert tsetup tread
– Asserting go’ means go=0 read protocol
• Data signal: not valid or valid

rd'/wr
• Protocol may have subprotocols enable
– Called bus cycle, e.g., read and write
– Each may be several clock cycles addr
• Read example data

– rd’/wr set low,address placed on addr for at least
tsetup time before enable asserted, enable triggers tsetup twrite
memory to place data on data wires by time tread
write protocol
Basic protocol concepts
• Actor: master initiates, servant (slave) respond
• Direction: sender, receiver
• Addresses: special kind of data
– Specifies a location in memory, a peripheral, or a register within a peripheral
• Time multiplexing
– Share a single set of wires for multiple pieces of data
– Saves wires at expense of time
Basic protocol concepts: control methods
(ack – acknowledge
req - request)
A strobe/handshake compromise
ISA bus protocol – memory access
• ISA: Industry
Standard
Architecture
– Common in 80x86’s
• Features
– 20-bit address
– Compromise
strobe/handshake
control
• 4 cycles default
• Unless CHRDY (channel
ready) deasserted –
resulting in additional
wait cycles (up to 6)
Microprocessor interfacing:
I/O addressing
• A microprocessor communicates with other
devices using some of its pins
– Port-based I/O (parallel I/O)
• Processor has one or more N-bit ports
• Processor’s software reads and writes a port just like a
register; e.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit ports
– Bus-based I/O
• Processor has address, data and control ports that form a
single bus
• Communication protocol is built into the processor
• A single instruction carries out the read or write protocol on
the bus
Compromises/extensions
• Parallel I/O peripheral
– When processor only supports bus-based
I/O but parallel I/O needed
– Each port on peripheral connected to a
register within peripheral that is
read/written by the processor
• Extended parallel I/O
– When processor supports port-based I/O
but more ports needed
– One or more processor ports interface
with parallel I/O peripheral extending
total number of ports available for I/O
– e.g., extending 4 ports to 6 ports in figure
Types of bus-based I/O:
memory-mapped I/O and standard I/O
• Processor talks to both memory and peripherals using
same bus – two ways to talk to peripherals
– Memory-mapped I/O
• Peripheral registers occupy addresses in same address
space as memory
• e.g., Bus has 16-bit address
– lower 32K addresses may correspond to memory
– upper 32k addresses may correspond to peripherals
– Standard I/O (I/O-mapped I/O)
• Additional pin (M/IO) on bus indicates whether a memory
or peripheral access
• e.g., Bus has 16-bit address
– all 64K addresses correspond to memory when M/IO set to 0
– all 64K addresses correspond to peripherals when M/IO set to 1
Memory-mapped I/O vs. Standard I/O
• Memory-mapped I/O
– Requires no special instructions
• Assembly instructions involving memory like MOV and
ADD work with peripherals as well
• Standard I/O requires special instructions (e.g., IN, OUT)
to move data between peripheral registers and memory
• Standard I/O
– No loss of memory addresses to peripherals
– Simpler address decoding logic in peripherals
possible
• When number of peripherals much smaller than address
space then high-order address bits can be ignored
– smaller and/or faster comparators
ISA bus
• Industry Standard Architecture (ISA) supports standard
I/O
– /IOR (IO - read) distinct from /MEMR (memory - read)for
peripheral read
• /IOW used for writes
– 16-bit address space for I/O vs. 20-bit address space for memory
– Otherwise very similar to memory protocol
A basic memory protocol
P0 Adr. 7..0 Data
P2 Adr. 15…8
Q Adr. 7…0
ALE
/RD
• Interfacing an 8051 to external memory

– Ports P0 and P2 support port-based I/O when 8051 internal
memory being used
– Those ports serve as data/address buses when external memory
is being used
– 16-bit address and 8-bit data are time multiplexed; low 8-bits of
address must therefore be latched with aid of ALE (Address
Latch Enable) signal
A more complex memory protocol
• Generates control signals to drive the TC55V2325FF memory

chip in burst read mode (i.e., pipeline read operation)
– Addr0 is the starting address input to device
– GO is enable/disable input to device
Microprocessor interfacing: interrupts
• Servicing:- Such a process, Suppose the
program running on a microprocessor must,
among other tasks, read and process data from
a peripheral has new data.
• Polling:- Repeated checking by microprocessor
for data; If the peripheral gets new data at
unpredictable intervals, how can the program
determine when the peripheral has new data?
The most straightforward approach is to
interleave the microprocessor’s other tasks with
a routine that checks for new data in peripheral,
perhaps by checking for a 1 in a particular bit in
a register of the peripheral.
• Suppose a peripheral intermittently receives
data, which must be serviced by the processor
– The processor can poll the peripheral regularly to
see if data has arrived – wasteful
– The peripheral can interrupt (Int) the processor
when it has data
• Requires an extra pin or pins: Int
– If Int is 1, processor suspends current program,
jumps to an Interrupt Service Routine (ISR)
– Known as interrupt-driven I/O
– Essentially, “polling” of the interrupt pin is built-
into the hardware, so no extra time!
• What is the address (interrupt address vector)
of the ISR? (2 - methods)
– Fixed interrupt
• Address built into microprocessor, cannot be changed
• Either ISR stored at address or a jump to actual ISR stored
if not enough bytes available
– Vectored interrupt
• Peripheral must provide the address
• Common when microprocessor has multiple peripherals
connected by a system bus
– Compromise: interrupt address table
Interrupt-driven I/O using fixed ISR
location
location
1(a): P is executing its ISR
Program memory μP Data memory
16: MOV R0, 0x8000

main program 17: # modifies R0 System bus
18: MOV 0x8001, R0
19: RETI # ISR return
1(b): P1 receives input ... Int P1 P2
Main program
data in a register with ...
100: instruction
PC 0x8000 0x8001
address 0x8000. 101: instruction
location
2: P1 asserts Int to ISR
16: MOV R0, 0x8000
request servicing 17: # modifies R0 System bus
18: MOV 0x8001, R0
by the 19: RETI # ISR return
... Int P1 P2
microprocessor Main program
... PC
1
0x8000 0x8001
100: instruction
101: instruction
location
3: After completing ISR
instruction at 100, P 16: MOV R0, 0x8000
17: # modifies R0 System bus
sees Int asserted, 18: MOV 0x8001, R0
saves the PC’s value ... Int P1 P2
Main program
of 100, and sets PC to ...
100: instruction
PC 0x8000 0x8001
the ISR fixed location 101: instruction 100
of 16.
location
4(a): The ISR reads ISR
16: MOV R0, 0x8000
data from 0x8000, 17: # modifies R0 System bus
modifies the data, 18: MOV 0x8001, R0
and writes the ... Int P1 P2
Main program 0
resulting data to ...
100: instruction
PC 0x8000 0x8001
0x8001. 101: instruction 100
4(b): After being

read, P1 deasserts Int.
location
5: The ISR returns, ISR
16: MOV R0, 0x8000
thus restoring PC 17: # modifies R0 System bus
18: MOV 0x8001, R0
to 100+1=101, 19: RETI # ISR return
...
where P resumes
Int P1 P2
Main program
... PC 0x8000 0x8001
100: instruction
executing. 101: instruction 100
+1
Interrupt-driven I/O using vectored
interrupt
interrupt
1(a): P is executing ISR
16: MOV R0, 0x8000
its main program 17: # modifies R0 System bus
18: MOV 0x8001, R0
1(b): P1 receives ...
Main program
Inta
Int
P1 P2
... 16
input data in a 100: instruction
PC
0x8000 0x8001
101: instruction 100
register with
address 0x8000.
interrupt
2: P1 asserts Int to ISR
16: MOV R0, 0x8000
request servicing by 17: # modifies R0 System bus
18: MOV 0x8001, R0
the microprocessor 19: RETI # ISR return
... Inta P1 P2
Main program Int
... PC 1 16
100: instruction 0x8000 0x8001
interrupt
3: After ISR
16: MOV R0, 0x8000
System bus
completing 17: # modifies R0
18: MOV 0x8001, R0
instruction at ...
Main program
Inta
Int
1
P1 P2
100, μP sees Int ...

100: instruction
PC 16
0x8000 0x8001
asserted, saves
the PC’s value of
100, and asserts
Inta
interrupt
4: P1 detects Inta ISR
16: MOV R0, 0x8000
System bus
and puts 17: # modifies R0
18: MOV 0x8001, R0
16

interrupt address ...
Main program
Inta
Int
P1 P2
... 16
vector 16 on the 100: instruction
PC
0x8000 0x8001
data bus
interrupt
5(a): PC jumps to the ISR
16: MOV R0, 0x8000
address on the bus (16). 17: # modifies R0 System bus
The ISR there reads 18: MOV 0x8001, R0

data from 0x8000, ... Inta
Int
P1 P2
Main program
modifies the data, and ...
100: instruction
PC 0 16
0x8000 0x8001
writes the resulting 101: instruction 100
data to 0x8001.
5(b): After being read,

P1 deasserts Int.
interrupt
6: The ISR ISR
16: MOV R0, 0x8000
System bus
returns, thus 17: # modifies R0
18: MOV 0x8001, R0
restoring the PC ... Int P1 P2
Main program
to 100+1=101, ...
100: instruction
PC
+1
0x8000 0x8001
100
where the μP
101: instruction
resumes
Interrupt address table
• Compromise between fixed and vectored
interrupts
– One interrupt pin
– Table in memory holding ISR addresses (may
be 256 words)
– Peripheral doesn’t provide ISR address, but
rather index into table
• Fewer bits are sent by the peripheral
• Can move ISR location without changing
peripheral
Additional interrupt issues
• Maskable vs. non-maskable interrupts
– Maskable: programmer can set bit that causes processor to
ignore interrupt
• Important when in the middle of time-critical code
– Non-maskable: a separate interrupt pin that can’t be masked
• Typically reserved for drastic situations, like power failure requiring
immediate backup of data to non-volatile memory
• Jump to ISR
– Some microprocessors treat jump same as call of any
subroutine
• Complete state saved (PC, registers) – may take hundreds of cycles
– Others only save partial state, like PC only
• Thus, ISR must not modify registers, or else must save them first
• Assembly-language programmer must be aware of which registers
stored
Direct memory access
• Buffering
– Temporarily storing data in memory before processing
– Data accumulated in peripherals commonly buffered
• Microprocessor could handle this with ISR

– Storing and restoring microprocessor state inefficient
– Regular program must wait
• DMA controller more efficient

– Separate single-purpose processor
– Microprocessor put aside of control of system bus to DMA controller
– Microprocessor can meanwhile execute its regular program
• No inefficient storing and restoring state due to ISR call
• Regular program need not wait unless it requires the system bus
– Harvard architecture – processor can fetch and execute instructions as long as
they don’t access data memory – if they do, processor stalls
Peripheral to memory transfer without
DMA, using vectored interrupt
1(a): μP is executing its main program.

Time
1(b): P1 receives input data in a register

with address 0x8000.
2: P1 asserts Int to request servicing by

the microprocessor.
3: After completing instruction at 100, μP sees Int
asserted, saves the PC’s value of 100, and asserts Inta.
4: P1 detects Inta and puts interrupt
address vector 16 on the data bus.
5(a): μP jumps to the address on the bus (16). The ISR

there reads data from 0x8000 and then writes it to
0x0001, which is in memory. 5(b): After being read, P1 deasserts Int.
6: The ISR returns, thus restoring PC to 100+1=101,

where μP resumes executing.
1(a): P is executing Program memory μP Data memory
ISR 0x0000 0x0001
16: MOV R0, 0x8000
its main program 17: # modifies R0
18: MOV 0x0001, R0 System bus
...
Main program Inta
1(b): P1 receives ...
100: instruction
Int
P1
16
input data in a 101: instruction PC
0x8000
register with
address 0x8000.
2: P1 asserts Int ISR
16: MOV R0, 0x8000
0x0000 0x0001
17: # modifies R0
to request 18: MOV 0x0001, R0
System bus
...
servicing by the Main program
...
Inta
Int
P1
100: instruction 16
microprocessor 101: instruction PC
100
1
0x8000
DMA, using vectored interrupt (cont’)
μP Data memory
3: After completing ISR
Program memory
0x0000 0x0001
16: MOV R0, 0x8000
instruction at 100, 17: # modifies R0
18: MOV 0x0001, R0 System bus
P sees Int asserted, 19: RETI # ISR return
... 1
saves the PC’s value Main program

...
100: instruction
Inta
Int
16
P1
of 100, and asserts 101: instruction PC
100
0x8000
Inta.
4: P1 detects ISR
16: MOV R0, 0x8000
0x0000 0x0001
17: # modifies R0
Inta and puts 18: MOV 0x0001, R0
16
System bus
...
interrupt Main program
...
Inta
Int
P1
16
100: instruction
address vector 101: instruction PC
100
0x8000
16 on the data
bus.
5(a): P jumps to the Program memory μP Data memory
ISR 0x0000 0x0001
16: MOV R0, 0x8000
address on the bus (16). 17: # modifies R0
System bus
The ISR there reads 18: MOV 0x0001,
0x8001, R0
...
data from 0x8000 and Main program Inta P1
...
then writes it to 0x0001, 100: instruction
Int
0
16
101: instruction PC
0x8000
which is in memory. 100
5(b): After being read,

P1 de-asserts Int.
μP Data memory
6: The ISR returns, ISR
Program memory
0x0000 0x0001
16: MOV R0, 0x8000
thus restoring PC to 17: # modifies R0
18: MOV 0x0001,
0x8001, R0 System bus
100+1=101, where P 19: RETI # ISR return
...
Main program Inta
resumes executing. ...
100: instruction
Int
P1
16
101: instruction PC
+1 0x8000
100
Peripheral to memory transfer with DMA
1(a): μP is executing its main program. 1(b): P1 receives input

Time
It has already configured the DMA ctrl data in a register with

registers. address 0x8000.
3: DMA ctrl asserts Dreq

4: After executing instruction 100, μP to request control of
sees Dreq asserted, releases the system system bus. 2: P1 asserts req to request
bus, asserts Dack, and resumes servicing by DMA ctrl.
execution. μP stalls only if it needs the
system bus to continue executing.
5: (a) DMA ctrl asserts
ack (b) reads data from
0x8000 and (b) writes that
data to 0x0001.
6:. DMA de-asserts Dreq

and ack completing
handshake with P1.
7(a): μP de-asserts Dack and resumes 7(b): P1 de-asserts req.
control of the bus.
Peripheral to memory transfer with DMA (cont’)
1(a): P is executing its Program memory μP

0x0000
Data memory
0x0001
main program. It has No ISR needed!

System bus
already configured the
DMA ctrl registers ...
Main program
Dack DMA ctrl P1
... Dreq
0x0001 ack
100: instruction PC 0x8000 req
0x8000
1(b): P1 receives input 101: instruction
100
data in a register with

address 0x8000.
μP Data memory
2: P1 asserts req to Program memory
0x0000 0x0001
request servicing No ISR needed!

System bus
by DMA ctrl. ... Dack DMA ctrl P1

Main program Dreq
... 0x0001 ack
1
100: instruction PC req
3: DMA ctrl asserts 101: instruction
100
0x8000
1
0x8000
Dreq to request
control of system bus
μP Data memory
4: After executing Program memory
0x0000 0x0001
instruction 100, P sees No ISR needed!

System bus
Dreq asserted, releases
1
the system bus, asserts ...
Main program
Dack DMA ctrl P1
... Dreq
Dack, and resumes 100: instruction PC
0x0001 ack
req
execution, P stalls only
0x8000 0x8000
101: instruction
100
if it needs the system bus

to continue executing.
5: DMA ctrl (a)

μP Data memory
asserts ack, (b) reads Program memory
0x0000 0x0001
data from 0x8000, No ISR needed!

System bus
and (c) writes that ... Dack DMA ctrl P1

data to 0x0001. Main program
... Dreq
0x0001 ack
1
100: instruction PC 0x8000 req 0x8000
101: instruction
100
(Meanwhile,
processor still
executing if not
stalled!)
μP Data memory
6: DMA de-asserts Program memory
0x0000 0x0001
No ISR needed!
Dreq and ack System bus
completing the ...

Main program
Dack DMA ctrl
0
P1
... Dreq
0x0001 ack
handshake with 100: instruction
101: instruction
PC
0
0x8000 req
0x8000
100
P1.
ISA bus DMA cycles
Processor Memory
ISA-Bus
R A
R
DMA A I/O Device
DMA Memory-Write Bus Cycle DMA Memory-Read Bus Cycle
CYCLE C1 C2 C3 C4 C5 C6 CYCLE C1 C2 C3 C4 C5 C6
C7 C7
CLOCK CLOCK
D[7-0] DATA D[7-0] DATA
A[19-0] ADDRESS A[19-0] ADDRESS
ALE ALE
/IOR /MEMR
/MEMW /IOW
CHRDY CHRDY
Arbitration: Priority arbiter
 Consider the situation where multiple peripherals request service from
single resource (e.g., microprocessor, DMA controller) simultaneously -
which gets serviced first?
 Priority arbiter
 Single-purpose processor
 Peripherals make requests to arbiter, arbiter makes requests to
resource
 Arbiter connected to system bus for configuration only
Micro-
processor
System bus 7
Inta 5
Priority Peripheral1 Peripheral2
Int arbiter
3
Ireq1 2 2
Iack1 6
Ireq2
Iack2
Arbitration using a priority arbiter
Micro-
processor
System bus 7
Inta 5
Priority Peripheral1 Peripheral2
Int arbiter
3
Ireq1 2 2
Iack1 6
Ireq2
Iack2
1. 1. Microprocessor is executing its program.

2. 2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2.
3. 3. Priority arbiter sees at least one Ireq input asserted, so asserts Int.
4. 4. Microprocessor stops executing its program and stores its state.
5. 5. Microprocessor asserts Inta.
6. 6. Priority arbiter asserts Iack1 to acknowledge Peripheral1.
7. 7. Peripheral1 puts its interrupt address vector on the system bus
8. 8. Microprocessor jumps to the address of ISR read from data bus, ISR executes and returns
9. (and completes handshake with arbiter).
10. 9. Microprocessor resumes executing its program.
Arbitration: Priority arbiter
 Types of priority
 Fixed priority
 each peripheral has unique rank
 highest rank chosen first with simultaneous requests
 preferred when clear difference in rank between peripherals
 Rotating priority Arbitration (called, round-robin)
 priority changed based on history of servicing
 better distribution of servicing especially among peripherals with
similar priority demands
Arbitration: Daisy-chain
arbitration
 Arbitration done by peripherals
 Built into peripheral or external logic added
 req input and ack output added to each peripheral
 Peripherals connected to each other in daisy-chain manner
 One peripheral connected to resource, all others connected “upstream”
 Peripheral’s req flows “downstream” to resource, resource’s ack flows
“upstream” to requesting peripheral
 Closest peripheral has highest priority
P
System bus
Peripheral1 Peripheral2
Inta
Ack_in Ack_out Ack_in Ack_out
Int Req_out Req_in Req_out Req_in 0
Daisy-chain aware peripherals

Arbitration: Daisy-chain
arbitration
 Prospective/constraint
 Easy to add/remove peripheral - no system redesign needed
 Does not support rotating priority
 One broken peripheral can cause loss of access to other
peripherals
Micro-
P
processor System bus
System bus
Inta
Priority Peripheral Peripheral Peripheral1 Peripheral2
Int arbiter 1 2 Inta
Ack_in Ack_out Ack_in Ack_out
Ireq1 Int Req_out Req_in Req_out Req_in 0
Iack1
Ireq2
Iack2 Daisy-chain aware peripherals
Network-oriented arbitration
 When multiple microprocessors share a bus

(sometimes called a network)
 Arbitration typically built into bus protocol
 Separate processors may try to write simultaneously
causing collisions
 Data must be resent
 Don’t want to start sending again at same time
 statistical methods can be used to reduce chances
 Typically used for connecting multiple distant chips
 Trend – use to connect multiple on-chip processors
Example: Vectored interrupt using
an interrupt table
 Fixed priority: i.e., Peripheral1 has highest
priority
Processor
 Keyword “_at_” followed by memory address
forces compiler to place variables in specific
MEMORY memory locations
MASK  e.g., memory-mapped registers in arbiter,
IDX0 peripherals
IDX1 Priority Arbiter
 A peripheral’s index into interrupt table is sent
ENABLE to memory-mapped register in arbiter
Memory Bus
 Peripherals receive external data and raise
DATA Peripheral 1 Peripheral 2 Jump Table interrupt
void Peripheral1_ISR(void) {
unsigned char data;
data = PERIPHERAL1_DATA_REG;
// do something with the data
}
unsigned char ARBITER_MASK_REG _at_ 0xfff0; void Peripheral2_ISR(void) {
unsigned char data;
unsigned char ARBITER_CH0_INDEX_REG _at_ 0xfff1;
data = PERIPHERAL2_DATA_REG;
unsigned char ARBITER_CH1_INDEX_REG _at_ 0xfff2;
// do something with the data
unsigned char ARBITER_ENABLE_REG _at_ 0xfff3; }
unsigned char PERIPHERAL1_DATA_REG _at_ 0xffe0; void InitializePeripherals(void) {
unsigned char PERIPHERAL2_DATA_REG _at_ 0xffe1; ARBITER_MASK_REG = 0x03; // enable both channels
unsigned void* INTERRUPT_LOOKUP_TABLE[256] _at_ 0x0100; ARBITER_CH0_INDEX_REG = 13;
ARBITER_CH1_INDEX_REG = 17;
void main() { INTERRUPT_LOOKUP_TABLE[13] = (void*)Peripheral1_ISR;
InitializePeripherals(); INTERRUPT_LOOKUP_TABLE[17] = (void*)Peripheral2_ISR;
for(;;) {} // main program goes here ARBITER_ENABLE_REG = 1;
} }
Intel 8237 DMA controller
Signal Description
D[7..0] Intel 8237 REQ 0
A[19..0] ACK 0 D[7..0] These wires are connected to the system bus (ISA) and are used by the
ALE microprocessor to write to the internal registers of the 8237.
MEMR REQ 1
ACK 1
A[19..0] These wires are connected to the system bus (ISA) and are used by the DMA to
MEMW
IOR issue the memory location where the transferred data is to be written to. The 8237 is
IOW REQ 2 ALE* also addressed
This by the
is the address micro-processor
latch through
enable signal. The 8237theuselower bits ofwhen
this signal thesedriving
addressthelines.
ACK 2 system bus (ISA).
HLDA MEMR* This is the memory write signal issued by the 8237 when driving the system bus
HRQ REQ 3 (ISA).
ACK 3
MEMW* This is the memory read signal issued by the 8237 when driving the system bus (ISA).
IOR* This is the I/O device read signal issued by the 8237 when driving the system bus
(ISA) in order to read a byte from an I/O device
IOW* This is the I/O device write signal issued by the 8237 when driving the system bus
(ISA) in order to write a byte to an I/O device.
HLDA This signal (hold acknowledge) is asserted by the microprocessor to signal that it has
relinquished the system bus (ISA).
HRQ This signal (hold request) is asserted by the 8237 to signal to the microprocessor a
request to relinquish the system bus (ISA).
REQ 0,1,2,3 An attached device to one of these channels asserts this signal to request a DMA
transfer.
ACK 0,1,2,3 The 8237 asserts this signal to grant a DMA transfer to an attached device to one of
these channels.
*See the ISA bus description in this chapter for complete details.
Intel 8259 programmable priority
controller
D[7..0] Intel 8259 IR0 Signal Description
A[0..0] IR1 D[7..0] These wires are connected to the system bus and are used by the microprocessor to
RD IR2 write or read the internal registers of the 8259.
WR IR3
INT IR4 A[0..0] This pin actis in cunjunction with WR/RD signals. It is used by the 8259 to decipher
INTA IR5 various command words the microprocessor writes and status the microprocessor
IR6 wishes to read.
CAS[2..0] IR7
SP/EN WR When this write signal is asserted, the 8259 accepts the command on the data line, i.e.,
the microprocessor writes to the 8259 by placing a command on the data lines and
asserting this signal.
RD When this read signal is asserted, the 8259 provides on the data lines its status, i.e., the
microprocessor reads the status of the 8259 by asserting this signal and reading the data
lines.
INT This signal is asserted whenever a valid interrupt request is received by the 8259, i.e., it
is used to interrupt the microprocessor.
INTA This signal, is used to enable 8259 interrupt-vector data onto the data bus by a sequence
of interrupt acknowledge pulses issued by the microprocessor.
IR An interrupt request is executed by a peripheral device when one of these signals is

0,1,2,3,4,5,6,7 asserted.
CAS[2..0] These are cascade signals to enable multiple 8259 chips to be chained together.
SP/EN This function is used in conjunction with the CAS signals for cascading purposes.
Multilevel bus architectures
• Don’t want one bus for all communication
– Peripherals would need high-speed, processor-specific bus interface
• excess gates, power consumption, and cost; less portable
– Too many peripherals slows down bus
 Processor-local bus Micro-
processor
Cache Memory
controller
DMA
controller
 High speed, wide, most frequent
communication
 Connects microprocessor, cache, Processor-local bus
memory controllers, etc.
Peripheral Peripheral Peripheral Bridge
 Peripheral bus
 Lower speed, narrower, less frequent
communication
 Typically industry standard bus (ISA, Peripheral bus
PCI) for portability
• Bridge
– Single-purpose processor converts communication between busses
Advanced communication principles
 Layering
 Break complexity of communication protocol into pieces easier to
design and understand
 Lower levels provide services to higher level
 Lower level might work with bits while higher level might work with packets
of data
 Physical layer
 Lowest level in hierarchy
 Medium to carry data from one actor (device or node) to another
 Parallel communication
 Physical layer capable of transporting multiple bits of data
 Serial communication
 Physical layer transports one bit of data at a time
 Wireless communication
 No physical connection needed for transport at physical layer
Parallel communication
 Multiple data, control, and possibly power wires
 One bit per wire
 High data throughput with short distances
 Typically used when connecting devices on same

IC or same circuit board
 Bus must be kept short
 long parallel wires result in high capacitance values which
requires more time to charge/discharge
 Data misalignment between wires increases as length increases
 Higher cost, bulky

Serial communication
 Single data wire, possibly also control and power wires
 Words transmitted one bit at a time
 Higher data throughput with long distances

 Less average capacitance, so more bits per unit of time
 Cheaper, less bulky
 More complex interfacing logic and communication

protocol
 Sender needs to decompose word into bits
 Receiver needs to recompose bits into word
 Control signals often sent on same wire as data increasing
protocol complexity
Wireless communication
 Infrared (IR)
 Electronic wave frequencies just below visible light spectrum
 Diode emits infrared light to generate signal
 Infrared transistor detects signal, conducts when exposed to
infrared light
 Cheap to build
 Need line of sight, limited range
 Radio frequency (RF)

 Electromagnetic wave frequencies in radio spectrum
 Analog circuitry and antenna needed on both sides of
transmission
 Line of sight not needed, transmitter power determines range
Error detection and correction
 Often part of bus protocol
 Error detection: ability of receiver to detect errors during transmission
 Error correction: ability of receiver and transmitter to cooperate to correct

problem
 Typically done by acknowledgement/retransmission protocol
 Bit error: single bit is inverted
 Burst of bit error: consecutive bits received incorrectly
 Parity: extra bit sent with word used for error detection
 Odd parity: data word plus parity bit contains odd number of 1’s
 Even parity: data word plus parity bit contains even number of 1’s
 Always detects single bit errors, but not all burst bit errors
 Checksum: extra word sent with data packet of multiple words

 e.g., extra word contains XOR sum of all data words in packet
Serial protocols: 2
IC
 I2C (Inter-IC)
 Two-wire serial bus protocol developed by
Philips Semiconductors nearly 20 years ago
 Enables peripheral ICs to communicate using
simple communication hardware
 Data transfer rates up to 100 kbits/s and 7-bit
addressing possible in normal mode
 3.4 Mbits/s and 10-bit addressing in fast-mode
 Common devices capable of interfacing to I2C
bus:
 EPROMS, Flash, and some RAM memory, real-time
clocks, watchdog timers, and microcontrollers
I2C bus structure
Serial Clock Line, SCL
Serial Data Line, SDA
Micro- EEPROM Temp. LCD-

controller (servant) Sensor controller
(master) (servant) (servant) < 400 pF
Addr=0x01 Addr=0x02 Addr=0x03
SDA SDA SDA SDA
SCL SCL SCL SCL
Start condition Sending 0 Sending 1 Stop condition
From From
Servant receiver
D
C
S A A A A R A D D D A S O
T R 6 5 0 / C 8 7 0 C T P
T w K K
Typical read/write cycle
Serial protocols: CAN
 CAN (Controller area network)
 Protocol for real-time applications
 Developed by Robert Bosch GmbH
 Originally for communication among components of cars
 Applications now using CAN include:
elevator controllers, copiers, telescopes, production-line control
systems, and medical instruments
 Data transfer rates up to 1 Mbit/s and 11-bit addressing
 Common devices interfacing with CAN:
 8051-compatible 8592 processor and standalone CAN controllers
 Actual physical design of CAN bus not specified in protocol
 Requires devices to transmit/detect dominant and recessive signals to/from
bus
 e.g., ‘1’ = dominant, ‘0’ = recessive if single data wire used
 Bus guarantees dominant signal prevails over recessive signal if asserted
simultaneously
Serial protocols: FireWire
 FireWire (I-Link, or Lynx, IEEE 1394)
 High-performance serial bus developed by Apple Computer Inc.
 Designed for interfacing independent electronic components
 e.g., Desktop, scanner
 Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing
 Plug-and-play capabilities
 Packet-based layered design structure
 Applications using FireWire include:
 disk drives, printers, scanners, cameras
 Capable of supporting a LAN similar to Ethernet
 64-bit address:
 10 bits for network ids, 1023 subnetworks
 6 bits for node ids, each subnetwork can have 63 nodes
 48 bits for memory address, each node can have 281 terabytes of distinct
locations
Serial protocols: USB
 USB (Universal Serial Bus)
 Easier connection between PC and monitors, printers, digital speakers,
modems, scanners, digital cameras, joysticks, multimedia game
equipment
 2 data rates:
 12 Mbps for increased bandwidth devices
 1.5 Mbps for lower-speed devices (joysticks, game pads)
 Tiered (layered) star topology can be used
 One USB device (hub) connected to PC
 hub can be embedded in devices like monitor, printer, or keyboard or can be
standalone
 Multiple USB devices can be connected to hub
 Up to 127 devices can be connected like this
 USB host controller
 Manages and controls bandwidth and driver software required by each
peripheral
 Dynamically allocates power downstream according to devices
connected/disconnected
Parallel protocols: PCI Bus
 PCI Bus (Peripheral Component Interconnect)
 High performance bus originated at Intel in the early
1990’s
 Standard adopted by industry and administered by
PCISIG (PCI Special Interest Group)
 Interconnects chips, expansion boards, processor
memory subsystems
 Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit
addressing
 Later extended to 64-bit while maintaining compatibility with
32-bit schemes
 Synchronous bus architecture
 Multiplexed data/address lines
Parallel protocols: ARM Bus
 ARM Bus
 Designed and used internally by ARM
Corporation
 Interfaces with ARM line of processors
 Many IC design companies have own bus
protocol
 Data transfer rate is a function of clock speed
 If clock speed of bus is X, transfer rate = 16 x X bits/s
 32-bit addressing
Wireless protocols: IrDA
 IrDA
 Protocol suite that supports short-range point-to-
point infrared data transmission
 Created and promoted by the Infrared Data
Association (IrDA)
 Data transfer rate of 9.6 kbps and 4 Mbps
 IrDA hardware deployed in notebook computers,
printers, PDAs, digital cameras, public phones, cell
phones
 Lack of suitable drivers has slowed use by
applications
 Windows 2000/98 now include support
 Becoming available on popular embedded OS’s
Wireless protocols: Bluetooth
 Bluetooth
 New, global standard for wireless
connectivity
 Based on low-cost, short-range radio
link
 Connection established when within 10
meters of each other
 No line-of-sight required
 e.g., Connect to printer in another room
Wireless Protocols: IEEE 802.11
 IEEE 802.11
 Proposed standard for wireless LANs
 Specifies parameters for PHY and MAC layers of
network
 PHY layer
 physical layer
 handles transmission of data between nodes
 provisions for data transfer rates of 1 or 2 Mbps
 operates in 2.4 to 2.4835 GHz frequency band (RF)
 or 300 to 428,000 GHz (IR)
 MAC layer
 medium access control layer
 protocol responsible for maintaining order in shared medium
 collision avoidance/detection
Embedded Systems
Chapter – 6
Real-Time Operating System
8/11/2015 1
6. Real-Time Operating System [8 Hrs.]
6.1 Operating System Basics

6.2 Task, Process, and Threads
6.3 Multiprocessing and Multitasking
6.4 Task Scheduling
6.5 Task Synchronization
6.6 Device Drivers
8/11/2015 2
How the increasing need for time critical
response for task/events is addressed in
embedded applications?
-Assign priority to task & execute the high priority
task when the task is ready to execute.
-Dynamically change the priorities of tasks if
required on a need basis.
- Schedule the execution of tasks based on the
priorities.
-Switch the execution of task when a task is waiting
for an external event or a system resource
including I/O device operation.
8/11/2015 3
Operating System Basics
- acts as a bridge between the user application/
tasks & the underlying system resources through
a set of system functionalities and services.
- Manages the system resources and makes them
available to the user application/task on a need
basis.
- Primary functions are:
- Make the system convenient to use
- Organize & manage the system resources
efficiently and correctly.
8/11/2015 4
Fire alarm system: an example
Central server
TCP/IP over radio
Controllers: ARM based
Low bandwidth radio links
Sensors: microcontroller based
8/11/2015 5
Fire Alarm System
• Problem
– Hundreds of sensors, each fitted with Low Range Wireless
• Sensor information to be logged in a server & appropriate action
initiated
• Possible Solution
– Collaborative Action
• Routing
– Dynamic – Sensors/controllers may go down
– Auto Configurable – No/easy human intervention.
– Less Collision/Link Clogging
– Less no of intermediate nodes
» Fast Response Time
– Secure
8/11/2015 6
RTOS: Target Architectures
Processors MIPS
Microcontrollers ~20
ARM7 100-133
ARM9 180-250
Strong ARM 206
Intel Xscale 400
Mips4Kcore 400
X86
8/11/2015 7
Operating System Basics contd…
contd …
The Kernel is:

- core of operating system
- responsible for managing the system resources
and the communication among the hardware
and other system services.
- act as the abstraction layer between system
resources and user applications.
- contains a set of system libraries and services.
8/11/2015 8
contd …
8/11/2015 9
contd …
Process Management:
• deals with managing the processes/tasks.
• Includes setting up the memory space for the process
• Loading the process’s code into the memory space
• Allocating system resources
• Scheduling and managing the execution of the process
• Setting up and managing the process control Block
(PCB)
• Inter process communication and synchronization
• Process termination/deletion
8/11/2015 10
contd …
Primary Memory Management:

• Refers to the volatile memory (RAM) where processes
are loaded and variables and shared data associated
with each process are stored.
• Memory Management Unit (MMU) of the kernel is
responsible for
• Keeping track of which part of the memory area is
currently used by which process
• Allocating and De-allocating memory space on a
need basis (DMA)
8/11/2015 11
contd …
File System Management: responsible for

• The creation, deletion and alteration of files.
• Creation, deletion and alteration of directories
• Saving of files in the secondary storage
memory
• Providing automatic allocation of file space
based on the amount of free space available
• Providing a flexible naming convention for the
files.
8/11/2015 12
Operating System Basic contd…
contd …
I/O System (Device)Management

• loading and unloading of device drivers
• exchanging information and the system
specific control signals to and from the device
Secondary storage management
•Disk storage allocation
•Disk scheduling (time interval at which the
disk is activated to backup data)
•Free disk space management
8/11/2015 13
Operating System Basic contd…
contd …
Protection systems (deals

( with
implementing the security policies to
restrict the access to both user and system
resources by different application or
processes or users)
users
Interrupt Handler (Kernel

( provides handler
mechanism for all external/internal
interrupts generated by the system)
8/11/2015
system 14
Operating System Types contd…
contd …
General Purpose Operating System (GPOS)

Real - Time Operating System (RTOS)
• Implies deterministic timing behavior
• Means the OS services consumes only known and
expected amounts of time regardless the number of
services.
• Implements policies and rules concerning time critical
allocation of a system’s resources
• Applications should run in which order and how much
time need to be allocated for each application.
8/11/2015 15
– A more complex software architecture is needed to handle multiple tasks,
coordination, communication, and interrupt handling – an RTOS
architecture
– Distinction:
• Desktop OS – OS is in control at all times and runs applications, OS runs
in different address space
• RTOS – OS and embedded software are integrated, ES starts and
activates the OS – both run in the same address space (RTOS is less
protected)
• RTOS includes only service routines needed by the ES application
• RTOS vendors: VsWorks, VTRX, Nucleus, LynxOS, uC/OS
• Most conform to POSIX (IEEE standard for OS interfaces)
• Desirable RTOS properties: use less memory, application programming
interface, debugging tools, support for variety of microprocessors,
already-debugged network drivers
8/11/2015 16
Hard and Soft Real Time Systems
• Hard Real Time System
– Failure to meet deadlines is fatal
– example : Flight Control System
• Soft Real Time System

– Late completion of jobs is undesirable but not fatal.
– System performance degrades as more & more jobs miss
deadlines
– Online Databases
• Qualitative Definition.
8/11/2015 17
Hard and Soft Real Time Systems
(Operational Definition)
• Hard Real Time System
– Validation by provably correct procedures or extensive
simulation that the system always meets the timings
constraints
• Soft Real Time System

– Demonstration of jobs meeting some statistical
constraints suffices.
• Example – Multimedia System

– 25 frames per second on an average
8/11/2015 18
Operating System Types contd…
contd …
The Real-Time Kernel: is highly specialized

and it contains only the minimal set of
services required for a running the user
application/tasks. Basic functions are
• Task/Process management
• Task/Process scheduling
• Task/Process synchronization
• Error/Exception handling
• Memory management
• Interrupt handling time management
8/11/2015 19
Tasks & Task State
Task are very simple to write: under most RTOSs a task
is simply a subroutine.
1. Running— the microprocessor is executing the instructions that
make up this task. one microprocessor, and hence only one task
that is in the running state at any given time.
2. Ready— some other task is in the running state but that this
task has things that it could do if the microprocessor becomes
available. Any number of tasks can be in this state.
3. Blocked— this task hasn't got anything to do right now, even
if the microprocessor becomes available. Tasks get into this state
because they are waiting for some external event. For example, a
task that handles data coming in from a network will have nothing
to do when there is no data. A task that responds to the user
when he presses a button has nothing to do until the user presses
the button. Any number of tasks can be in this state as well.
8/11/2015 20
• ES application makes calls to the RTOS functions to start tasks,
passing to the OS, start address, stack pointers, of the tasks
• Task States:
– Running
– Ready (possibly: suspended, pended)
– Blocked (possibly: waiting, dormant, delayed)
– [Exit]
– Scheduler – schedules/shuffles tasks between Running and Ready

states
– Blocking is self-blocking
blocking by tasks, and moved to Running state via other
tasks’ interrupt signaling (when block-factor is removed/satisfied)
– When a task is unblocked with a higher priority over the ‘running’ task,
the scheduler ‘switches’ context immediately (for all pre-emptive
RTOSs)
8/11/2015 21
Tasks
Blocked Ready
Task States
Running
8/11/2015 22
Tasks
Here are answers to some common questions
about the scheduler and task states'.
How does the scheduler know when a task has

become blocked or unblocked?
What happens if all the tasks are blocked?
What if two tasks with the same priority are

ready?
8/11/2015 23
• Tasks – 1
– Issue – Scheduler/Task signal exchange for block-unblock of

tasks via function calls
– Issue – All tasks are blocked and scheduler idles forever (not
desirable!)
– Issue – Two or more tasks with same priority levels in Ready
state (time-slice, FIFO)
– Example: scheduler switches from processor-hog vLevelsTask

to vButtonTask (on user interruption by pressing a push-
button), controlled by the main() which initializes the RTOS,
sets priority levels, and starts the RTOS
Tasks
8/11/2015 25
Tasks
Microprocessor Responds to a Button under an RTOS;
8/11/2015 26
Tasks
RTOS Initialization Code
8/11/2015 27
• Tasks and Data
– Each tasks has its won context - not shared, private

registers, stack, etc.
– In addition, several tasks share common data (via
global data declaration; use of ‘extern’ in one task to
point to another task that declares the shared data
– Shared data caused the ‘shared-data problem’
without solutions or use of ‘Reentrancy’
characterization of functions
– (See Fig 6.5, Fig 6.6, Fig 6.7, and Fig 6.8)
Tank Monitoring System
8/11/2015 30
Tasks in the Underground Tank System
8/11/2015 31
8/11/2015
Tank Monitoring Design
32
• Tasks – 2
• Reentrancy – A function that works correctly regardless

of the number of tasks that call it between interrupts
• Characteristics of reentrant functions –

– Only access shared variable in an atomic-way, or when
variable is on callee’s stack
– A reentrant function calls only reentrant functions
– A reentrant function uses system hardware (shared resource)
atomically
• Inspecting code to determine Reentrancy:
– See Fig 6.9 – Where are data stored in C? Shared, non-shared,

shared,
or stacked?
– See Fig 6.10 – Is it reentrant? What about variable fError? Is

printf reentrant?
– If shared variables are not protected, could they be accessed

using single assembly instructions (guaranteeing non-
atomicity)?
• Semaphores and Shared Data – A new tool for atomicity
– Semaphore – a system resource for implementing mutual

exclusion in shared resource access or restricting the access to
the shared resources (to avoid shared-data problems in RTOS)
– Protection at the start is via primitive function, called take,
indexed by the semaphore
– Protection at the end is via a primitive function, called release,,
also indexed similarly
– Simple semaphores – Binary semaphores are often adequate

for shared data problems in RTOS
• Semaphores and Shared Data – 1
– RTOS Semaphores & Initializing Semaphores
– Using binary semaphores to solve the ‘tank monitoring’ problem

– (See Fig 6.12 and Fig 6.13)
– The nuclear reactor system: The issue of initializing the semaphore

variable in a dedicated task (not in a ‘competing’ task) before initializing
the OS – timing of tasks and priority overrides, which can undermine the
effect of the semaphores
– Solution: Call OSSemInit() before OSInit()
– (See Fig 6.14)

– Reentrancy, Semaphores, Multiple Semaphores, Device Signaling,
– Fig 6.15 – a reentrant function, protecting a shared data, cErrors, in

critical section
– Each shared data (resource/device) requires a separate semaphore for

individual protection, allowing multiple tasks and data/resources/devices
to be shared exclusively, while allowing efficient implementation and
response time
– Fig 6.16 – example of a printer device signaled by a report-buffering

buffering task,
via semaphore signaling, on each print of lines constituting the formatted
and buffered report
– Semaphore Problems – ‘Messing up’ with semaphores

• The initial values of semaphores – when not set properly or at the
wrong place
• The ‘symmetry’ of takes and releases – must match or correspond –
each ‘take’ must have a corresponding ‘release’ somewhere in the ES
application
• ‘Taking’ the wrong semaphore unintentionally (issue with multiple
semaphores)
• Holding a semaphore for too long can cause ‘waiting’ tasks’ deadline to
be missed
• Priorities could be ‘inverted’ and usually solved by ‘priority
inheritance/promotion’
• (See Fig 6.17)
• Causing the deadly embrace problem (cycles)
• (See Fig 6.18)
– Variants:
• Binary semaphores – single resource, one-at-a time, alternating in use
(also for resources)
• Counting semaphores – multiple instances of resources,
increase/decrease of integer semaphore variable
• Mutex – protects data shared while dealing with priority inversion
problem
– Summary – Protecting shared data in RTOS

• Disabling/Enabling interrupts (for task code and interrupt routines),
faster
• Taking/Releasing semaphores (can’t use them in interrupt routines),
slower, affecting response times of those tasks that need the
semaphore
• Disabling task switches (no effect on interrupt routines), holds all other
tasks’ response
Process:
- is a program, or part of it execution.
- an instance of a program in execution; multiple
instances of the same program can execute
simultaneously.
- Requires various system resources like CPU for
executing the process, memory for storing the
code corresponding to the process and
associated variables, I/O devices for information
exchange.
- is sequential in execution.
8/11/2015 58
Process Structure:
Process
Stack
Stack Pointer
Working registers
Status registers
Program Counter (PC) Code memory

corresponding to the
Process
8/11/2015 59
Process Life Cycle – process
changes its state from newly
created to execution
completed
Created state – a process is

being created is referred. OS
recognizes a process but no
resources are allocated to
the process.
Ready State – the state,

where a process is incepted
into the memory and
awaiting the processor time
for execution.
Process states and state transition representation
8/11/2015 60
Ready List – queue
maintained by the OS.
Running State – the state

where in the source code
instructions corresponding
to the process is being
executed.
Blocked State/Wait state –

refers to a state where a
running process is
temporarily suspended from
execution and does not have
immediate access to
resources..
Process states and state transition representation
8/11/2015 61
Completed State – a state
where the process completes its
execution
State transition – the transition

of a process from one state to
another
Process Management – deals

with the creation of a process,
setting up the memory space for
the process, loading the
process’s code into the memory
space, allocating system
resources, setting up a Process
Control Block (PCB) for the
process and process termination Process states and state transition representation
/ deletion.
8/11/2015 62
Process Management
• Deals with the creation of a process
• Setting up the memory space for the
process
• Loading the process’s code into the
memory space
• Allocating system resources
• Setting up a Process Control Block (PCB)
for the process termination / deletion
8/11/2015 63
Threads:
Stack memory for thread 1
Stack Memory for

• Is the primitive that
Stack memory for thread 2
can execute code
process
• Is a single sequential
flow of control Data memory for process
within a process Code memory for process
• Also known as light Memory organization of a process
weight process and its associated Threads
• A process can have many threads of execution

8/11/2015 64
Threads: contd. …
•Different threads, which are part of a

process, share the same address space;
meaning they share the data memory,
code memory and the heap memory area.
•Threads maintain their own thread status

(CPU register values), Program Counter
(PC) and stack.
8/11/2015 65
Multithreading
• Application may
complex and lengthy
• Various sub -
operations like
getting input from
I/O devices
connected to the
processor
• Performing some
internal calculations /
operations
• Updating some I/O
devices
8/11/2015 66
Multithreading ……
all the sub-functions of a task are executed in sequence (?)
– the CPU utilization may not be efficient
Advantages of multiple threads to execute:
• Better memory utilization (same process share the
address space of the same memory & reduces complexity
of inter threads comm.)
• Speed up execution of the process (splitting into different
threads, when one thread enters a wait state, the CPU
can be utilized by the other threads of the process that do
not require the event, which other thread is waiting, for
processing)
• Efficient CPU utilization. CPU – engaged all time.
8/11/2015 67
Thread Standards: deals with different standards available
for thread creation and management; utilized by OS
Thread Class libraries are:
• POSIX Threads (Portable Operating System Interface)
• Win 32 Threads
• Java Threads
8/11/2015 68
• POSIX Threads (Portable Operating System Interface)
POSIX.4 standard deals with the Real-Time

Real extensions
POSIX.4a standard deals with thread extensions
““Pthreads”” library defines the set of POSIX thread creation

and management functions in C language
8/11/2015 69
8/11/2015 70
8/11/2015 71
Win 32 Threads:
• are the threads supported by various flavors of windows
OS.
• Win 32 Application Programming Interface (Win 32 API)
libraries provide the standard set of Win 32 thread
creation and management functions.
• Win 32 threads are created with the API
HANDLE CreateYThread (LPSECURITY_ATTRIBUTES

lpThreadAttributes, DWORD dwStackSize,
LPTHREAD_START_ROUTINE lpStartAddresss, LPVOID
lpParameter, DWORD dwCreationFlags, LPWORD
lpThreadId ) ;
8/11/2015 72
Thread Process
• is a single unit of execution and • Is a program in execution &
is part of process contains 1 or more threads
• Does not have its own data • Has its own code memory, data
memory and heap memory. memory & stack memory
Shares these memory with other • Contains at least one thread
threads of the same process • Threads within a process share
the code, data & heap memory.
• Cannot live independently; it Each thread holds separate
lives within the process memory area for stack (shares
• Can be multiple threads in a the total stack memory of the
process; the first thread (main process)
thread) calls the main function • Are very expensive to create.
and occupies the start of stack Involves many OS overhead
memory of the process • Context switching is complex and
• Are very inexpensive to create involves lot of OS overhead & is
comparatively slower
• Context switching is inexpensive • If process dies, the resources
and fast allocated to it are reclaimed by
• If a thread expires, its stack is OS & all the associated threads
reclaimed by the process of the process also dies
8/11/2015 73
Multiprocessing & Multitasking
Context
8/11/2015 switching 74
Real-Time Kernels
• A process is an abstraction of a running
program and is the logical unit of work
scheduled by OS
• Threads are light-weighted processes sharing

resources of the parent process
• RTOS task management functions: scheduling,

dispatching, intercommunication and
synchronization
8/11/2015 75
• The kernel of the OS is the smallest portion
that provides for task management functions
• A scheduler determines which task will run

next
• A dispatcher provides a necessary bookkeeping

to start the next task
• Intertask communication and synchronization

assures that the tasks cooperate
8/11/2015 76
8/11/2015 77
Pseudo-kernels
•Polled Loop
For(;;){/*do forever*/
if (packet_here){/*check flag*/
process_data();/*process data*/ packet_here=0;/*reset flag*/
}
}
•Synchronized polled loop
For(;;){/*loop forever*/
if (flag){ pause(20); /* wait 20 ms to avoid switch-bounce*/
process_event(); flag=0;
}
}
8/11/2015 78
Cyclic Executives
For(;;){/* do forever in round-robin fashion*/
Process1();
Process2();
..
ProcessN();
}
Different rates example:
For(;;){/* do forever in round-robin fashion*/
Process1();
Process2();
Process3();/*process 3 executes 50% of the time*/
Process3();
}
8/11/2015 79
State-Driven Code
It uses if-then, case statements or finite state automata to break up
processing of functions into code segments
For(;;){/*dining philosophers*/
switch (state)
case Think: pause(random()); state=Wait; break;
case Wait: if (forks_available()) state=Eat;
case Eat: pause(random()); return_forks(); state=Think;
}
Return forks
}
Eat
Think Take forks
Take forks
Wait forks
Wait
8/11/2015 80
Coroutines
Void process_i(){//code of the i-th process
switch (state_i){// it is a state variable of the i-th process
case 1: phase1_i(); break;
case 2: phase2_i(); break;
..
case N: phaseN_i();break; 1 2 N
}
}
Dispatcher(){
For(;;){ /*do forever*/
Dispatcher
process_1();
..
process_M();
}
8/11/2015 81
Interrupt-Driven Systems
Interrupt Service Routine (ISR) takes action in response to the interrupt
Reentrant code can be used by multiple processes. Reentrant ISR can
serve multiple interrupts. Access to critical resources in mutually
exclusive mode is obtained by disabling interrupts
On context switching save/restore:
•General registers
•PC, PSW
•Coprocessor registers
•Memory page register
•Images of memory-mapped I/O locations
The stack model is used mostly in embedded systems
8/11/2015 82
Pseudocode for Interrupt Driven System
Main(){//initialize system, load interrupt handlers
init();
while(TRUE);// infinite loop
}
Intr_handler_i(){// i-th interrupt handler
save_context();// save registers to the stack
task_i(); // launch i-th task
restore_context();// restore context from the stack
}
Work with a stack:
Push x: SP-=2; *SP=x;
Pop x: x=*SP; SP+=2;
8/11/2015 83
Preemptive Priority System
A higher-priority task is said to preempt a lower-priority task if it interrupts the lower-
priority task
The priorities assigned to each interrupt are based on the urgency of the task associated
with the interrupt
Prioritized interrupts can be either priority or dynamic priority
Low-priority tasks can face starvation due to a lack of resources occupied by high-priority
tasks
In rate-monotonic systems higher priority have tasks with higher frequency (rate)
Hybrid systems
Foreground-background systems (FBS)– polling loop is used for some job (background task –
self-testing, watchdog timers, etc)
Foreground tasks run in round-robin, preemptive priority or hybrid mode
FBS can be extended to a full-featured real-time OS
8/11/2015 84
The Task Control Model of Real-Time Operating System
Each task is associated with a structure called Task Control Block
(TCB). TCB keeps process’ context: PSW, PC, registers, id, status, etc
TCBs may be stored as a linked list
A task typically can be in one of the four following states:
1) Executing; 2) Ready; 3) Suspended (blocked); 4) Dormant (sleeping)
Ready Dormant
Executing
Suspended
RTOS maintains a list of the ready tasks’ TCBs and another list for the suspended tasks
When a resource becomes available to a suspended task, it is activated
8/11/2015 85
Process Scheduling
Pre
Pre-run time and run-time
time scheduling. The aim is to meet time restrictions
Each task is characterized typically by the following temporal parameters:
1) Precedence constraints; 2) Release or Arrival time ri , j of j-th instance
of task i; 3) Phase  i ; 4) Response time; 5) Absolute deadline d i
6) Relative deadline Di
7) Laxity type – notion of urgency or margin in a task’s execution
8) Period
pi
9) Execution time ei
i  ri ,1 ri , k  i  ( k  1) pi
d i , k   i  ( k  1) pi  Di
Assume for simplicity: all tasks are periodic and independent, relative deadline
is a period/frame, tasks are pre-emptible, preemption time is neglected
8/11/2015 86
Round-Robin Scheduling
8/11/2015 87
Cyclic Executives
Scheduling decisions are made periodically, rather than at arbitrary times
Time intervals during scheduling decision points are referred to as frames or
minor cycles, and every frame has a length, f, called the frame size
The major cycle is the minimum time required to execute tasks allocated to
the processor, ensuring that the deadlines and periods of all processes are
met
The major cycle or the hyperperiod is equal to the least common multiple
(lcm) of the periods, that is, lcm(p1,..,pn)
Scheduling decisions are made at the beginning of every frame. The phase of
each task is a non-negative integer multiple of the frame size.
Frames must be long enough to accommodate each task:
C1 : f  max ei
1i  n
8/11/2015 88
Cyclic Executives
Hyper period should be a multiple of the frame size:
C2 :  pi / f   pi / f  0
To insure that every task completes by its deadline, frames must be small
so that between the release time and deadline of every task, there is at
least one frame.
8/11/2015 89
Cyclic Executives
The following relation is derived for a worst-case scenario, which
occurs when the period of a process starts just after the
beginning of a frame, and, consequently, the process cannot be
released until the next frame:
C3 : 2 f  gcd( pi , f )  Di
t  t :
t  2 f  t   Di
2 f  (t   t )  Di
t   t  lp i  kf  lp i  kf  gcd( pi , f )
f  2 f  gcd( pi , f )  Di
8/11/2015 90
Cyclic Executives
8/11/2015 91
Cyclic Executives
For example, for tasks T1(4,1), T2(5,1.8), T3(20,1), T4(20,2), hyper-period is 20 (without
and with frames – f=2)
1 3 2 1 4 2 1
0 4 8 12
1 2 1 2
12 16 20
1 3 2 1 4 2 1
0 4 8 12
2 1 1 2
12 16 20
8/11/2015 92
Fixed Priority Scheduling – Rate-Monotonic Approach (RMA)
8/11/2015 93
Rate-Monotonic Scheduling
Theorem (RMA Bound). Any set of n periodic tasks is RM schedulable if the
processor utilization
n
ei
U    n(21/ n  1)
i 1 pi
8/11/2015 94
Dynamic-Priority Scheduling – Earliest-Deadline-First
Approach
Theorem (EDF Bound). A set of n periodic tasks, each of whose relative
deadline equals its period, can be feasibly scheduled by EDF if and only if
U 1
8/11/2015 95
Intertask Communication and Synchronization
•Buffering data
•Double-buffering
8/11/2015 96
Ring Buffers
8/11/2015 97
8/11/2015 98
Mailbox: void pend (int data, s); void post (int data, s);
Access to mailbox is mutually exclusive; tasks wait access granting
8/11/2015 99
•Queues – can be implemented with ring buffers
•Critical regions – sections of code to be used in the mutually exclusive
mode
•Semaphores – can be used to provide critical regions
8/11/2015 100
Mailboxes and Semaphores
8/11/2015 101
Semaphores and mailboxes
Sema mutex=0/*open*/, proc_sem=1;/*closed*/
Bool full_slots=0, empty_slots=1;
Void post( int mailbox, int message){
while (1){ wait(mutex);
if (empty_slots){
insert(mailbox, message); update(); signal(mutex);
signal(proc_sem); break;
}
else{ signal(mutex); wait(proc_sem);
}
}
}
8/11/2015 102
Semaphores and mailboxes
Void pend( int mailbox, int *message){
while (1){ wait(mutex);
if (full_slots){
extract(mailbox, message); update(); signal(mutex);
signal(proc_sem); break;
}
else{ signal(mutex); wait(proc_sem);
}
}
}
8/11/2015 103
Driver{ while(1){
if(data_for_I/O){
prepare(command);
V(busy); P(done);}
}}
Controller{while(1){
P(busy); exec(command);
V(done);
}}
8/11/2015 104
Counting Semaphores:
Wait: void MP(int &S){
S=S-1; while(S<0);
}
Signal: void MV(int &S){
S=S+1
}
8/11/2015 105
8/11/2015 106
Problems with semaphores:
Wait: void P(int &S){
while(S==TRUE);
S=TRUE;
}
LOAD R1,S ; address of S in R1
LOAD R2,1 ; 1 in R2
@1 TEST R1,I,R2 ; compare (R1)=*S with R2=1
JEQ @1 ; repeat if *S=1
STORE R2,S,I ; store 1 in *S
Interruption between JEQ and STORE, passing control to a next process,
can cause that several processes will see *S=FALSE
8/11/2015 107
The Test-and-Set Instruction
Void P(int &S){
while(test_and_set(S)==TRUE);//wait
}
Void V(int &S){
S=FALSE;
}
The instruction fetches a word from memory and tests the high-order
(or other) bit . If the bit is 0, it is set to 1 and stored again, and a
condition code of 0 is returned. If the bit is 1, a condition code of 1 is
returned and no store is performed. The fetch, test and store are
indivisible.
8/11/2015 108
Dijkstra’s implementation of semaphore operation (if test-and-set
instruction is not available):
Void P(int &S){
int temp=TRUE;
while(temp){
disable(); //disable interrupts
temp=S;
S=TRUE;
enable(); //enable interrupts
}
}
8/11/2015 109
Other Synchronization Mechanisms:
•Monitors (generalize critical sections – only one process can execute
monitor at a time. Provide public interface for serial use of resources
•Events – similar to semaphores, but usually all waiting processes are
released when the event is signaled. Tasks waiting for event are called
blocked
Deadlocks
8/11/2015 110
Deadllocks:
8/11/2015 111
Deadlocks
Four conditions are necessary for deadlock:
•
•Mutual exclusion
•
•Circular wait
•
•Hold and wait
• preemption
•No
Eliminating any one of the four necessary conditions will prevent deadlock
from occurring
One way to eliminate circular wait is to number resources and give all the
resources with the numbers greater or equal than minimal required to
processes. For example: Disk – 1, Printer – 2, Motor control – 3, Monitor – 4.
If a process wishes to use printer, it will be assigned printer, motor control
and monitor. If another process requires monitor, it will have wait until the
monitor will be released. This may lead to starvation.
starvation
8/11/2015 112
Deadlock avoidance
To avoid deadlocks, it is recommended :
• Minimize the number of critical regions as well as minimizing
their size
• All processes must release any lock before returning to the
calling function
• Do not suspend any task while it controls a critical region
• All critical regions must be error-free
• Do not lock devices in interrupt handlers
• Always perform validity checks on pointers used within critical
regions.
It is difficult to follow these recommendations
8/11/2015 113
A Separate Task Helps Control Shared Hardware
8/11/2015 114
Embedded Systems
Chapter -7
Control System
7.Control System [3 Hrs.]
7.1 Open-loop
Open and Close-Loop
control System overview
7.2 Control System and PID
Controllers
7.3 Software coding of a PID
Controller
7.4 PID Tuning
Control System
• Control physical system’s output
– By setting physical system’s input
• Tracking
• E.g.
– Cruise control
– Thermostat control
– Disk drive control
– Aircraft altitude control
• Difficulty due to
– Disturbance: wind, road, tire, brake; opening/closing door…
– Human interface: feel good, feel right…
Tracking
Open-Loop Control Systems
• Plant
– Physical system to be controlled
• Car, plane, disk, heater,…
• Actuator
– Device to control the plant
• Throttle, wing flap, disk motor,…
Vt – car’s current speed
• Controller
– Designed product to control the plant Ut – throttle position
Vt+1 – car’s speed one sec. later
Open-Loop Control Systems
• Output
– The aspect of the physical system we are interested in
• Speed, disk location, temperature
• Reference
– The value we want to see at output
• Desired speed, desired location, desired temperature Vt – car’s current speed
• Disturbance Ut – throttle position
– Uncontrollable input to the plant imposed by environment Vt+1 – car’s speed one sec.
• Wind, bumping the disk drive, door opening later
Other Characteristics of open loop
• Feed-forward control
• Delay in actual change of the output
• Controller doesn’t know how well thing goes
• Simple Vt – car’s current speed
Ut – throttle position
• Best use for predictable systems
Close Loop Control Systems
• Sensor
– Measure the plant output
• Error detector
– Detect Error Vt – car’s current speed
• Feedback control systems Ut – throttle position
• Minimize tracking error Vt+1 – car’s speed one sec. later
Designing Open Loop Control System
• Develop a model of the plant
• Develop a controller
• Analyze the controller Vt – car’s current speed
• Consider Disturbance Ut – throttle position
• Determine Performance Vt+1 – car’s speed one sec. later
• Example: Open Loop Cruise Control System
Model of the Plant
• May not be necessary
– Can be done through experimenting and tuning
• But,
– Can make it easier to design
– May be useful for deriving the controller
• Example: throttle that goes from 0 to 45 degree
– On flat surface at 50 mph, open the throttle to 40 degree
– Wait 1 “time unit”
– Measure the speed, let’s say 55 mph
– Then the following equation satisfy the above scenario
• vt+1=0.7*vt+0.5*ut
• 55 = 0.7*50+0.5*40
– IF the equation holds for all other scenario
• Then we have a model of the plant
Designing the Controller
• Assuming we want to use a simple linear function
– ut=F(rt)= P * rt
– rt is the desired speed, P is a constant that the designer must specify.
• Linear proportional controller
• vt+1=0.7*vt+0.5*ut = 0.7*vt+0.5P*rt
• Let vt+1=vt at steady state = vss
• vss=0.7*vss+0.5P*rt Vt – car’s current speed
• At steady state, we want vss=rt
• P=0.6
Ut – throttle position
– I.e. ut=0.6*rt Vt+1 – car’s speed one sec. later
Analyzing the Controller
• Let v0=20mph, r0=50mph
• vt+1=0.7*vt+0.5(0.6)*rt =0.7*vt+0.3*50=
0.7*vt+15
• Throttle position is 0.6*50=30 degree
Considering the Disturbance
• Assume road grade can
affect the speed
– From –5mph to +5 mph
– vt+1=0.7*vt+10
– vt+1=0.7*vt+20
Determining Performance
• Vt+1=0.7*vt+0.5P*r0-w0
• v1=0.7*v0+0.5P*r0-w0
• v2=0.7*(0.7*v0+0.5P*r0-w0) +0.5P*r0-w0 =0.7*0.7*v0+(0.7+1.0)*0.5P*r0-
(0.7+1.0)w0
• vt=0.7t*v0+(0.7t-1+0.7t-2+…+0.7+1.0)(0.5P*r0-w0)
• Coefficient of vt determines rate of decay of v0
– >1 or <-1, vt will grow without bound Vt – car’s current speed
– <0, vt will oscillate Ut – throttle position
Designing Close Loop Control System
Stability
• ut = P * (rt-vt)
• vt+1 = 0.7vt+0.5ut-wt = 0.7vt+0.5P*(rt-vt)-w
=(0.7-0.5P)*vt+0.5P*rt-wt
• vt=(0.7-0.5P)t*v0+((0.7-0.5P)t-1+(0.7-0.5P)t-2+…+0.7-0.5P+1.0)(0.5P*r0-w0)
• Stability constraint (I.e. convergence) requires Vt – car’s current speed

|0.7-0.5P|<1 Ut – throttle position
-1<0.7-0.5P<1 Vt+1 – car’s speed one sec. later
-0.6<P<3.4
Reducing effect of v0
• ut = P * (rt-vt)
=(0.7-0.5P)*vt+0.5P*rt-wt
• vt=(0.7-0.5P)t*v0+((0.7-0.5P)t-1+(0.7-0.5P)t-2+…+0.7-0.5P+1.0)(0.5P*r0-w0)
• To reduce the effect of initial condition

– 0.7-0.5P as small as possible
– P=1.4
Avoid Oscillation
• ut = P * (rt-vt)
=(0.7-0.5P)*vt+0.5P*rt-wt
• vt=(0.7-0.5P)t*v0+((0.7-0.5P)t-1+(0.7-0.5P)t-2+…+0.7-0.5P+1.0)(0.5P*r0-w0)
• To avoid oscillation
– 0.7-0.5P >=0
– P<=1.4
Perfect Tracking
• ut = P * (rt-vt)
=(0.7-0.5P)*vt+0.5P*rt-wt
• vss=(0.7-0.5P)*vss+0.5P*r0-w0
(1-0.7+0.5P)vss=0.5P*r0-w0
vss=(0.5P/(0.3+0.5P)) * r0 - (1.0/(0.3+0.5P)) * wo
• To make vss as close to r0 as possible
– P should be as large as possible
Close-Loop Design
• ut = P * (rt-vt)
• Finally, setting P=3.3
– Stable, track well, some oscillation
– ut = 3.3 * (rt-vt)
Analyze the controller
• v0=20 mph, r0=50 mph, w=0
• vt+1 = 0.7vt+0.5P*(rt-vt)-w
= 0.7vt+0.5*3.3*(50-vt)
• ut = P * (rt-vt)
= 3.3 * (50-vt)
• But ut range from 0-45

• Controller saturates
• v0=20 mph, r0=50 mph, w=0
• vt+1 = 0.7vt+0.5*ut
• ut = 3.3 * (50-vt)
– Saturate at 0, 45
• Oscillation!
– “feel bad”
• Set P=1.0 to void
oscillation
– Terrible SS
performance
Analyzing the Controller
Minimize the effect of disturbance
• vt+1 =
0.7vt+0.5*3.3*(rt-vt)-w
– w=-5 or +5
• 39.74
– Close to 42.31
– Better than
• 33
• 66
• Cost
– SS error
– oscillation
General Control System
• Objective
– Causing output to track a reference even in the presence of
• Measurement noise
• Model error
• Disturbances
• Metrics
– Stability
• Output remains bounded
– Performance
• How well an output tracks the reference
– Disturbance rejection
– Robustness
• Ability to tolerate modeling error of the plant
Performance (generally speaking)
• Rise time
– Time it takes form
10% to 90%
• Peak time
• Overshoot
– Percentage by which
Peak exceed final
value
• Settling time
– Time it takes to reach
1% of final value
Plant modeling is difficult
• May need to be done first
• Plant is usually on continuous time
– Not discrete time
• E.g. car speed continuously react to throttle position, not at discrete
interval
– Sampling period must be chosen carefully
• To make sure “nothing interesting” happen in between
• I.e. small enough
• Plant is usually non-linear
– E.g. shock absorber response may need to be 8th order differential
• Iterative development of the plant model and controller

– Have a plant model that is “good enough”
Controller Design: P
• Proportional controller
– A controller that multiplies the tracking error by a
constant
• ut = P * (rt-vt)
– Close loop model with a linear plant
• E.g. vt+1 = (0.7-0.5P)*vt+0.5P*rt-wt
• P affects
– Transient response
• Stability, oscillation
– Steady state tacking
• As large as possible
– Disturbance rejection
• As large as possible
Controller Design: PD
• Proportional and Derivative control
• ut = P * (rt-vt) + D * ((rt-vt)-(rt-1-vt-1)) = P * et+ D * (et-et-1)
• Consider the size of error over time
• Intuitively
– Want to “push” more if the error is not reducing fast enough
– Want to “push” less if the error is reducing really fast
PD Controller
• Need to keep track of error derivative
• E.g. Cruise controller example
– vt+1 = 0.7vt+0.5ut-wt
– Let ut = P * et + D * (et-et-1), et=rt-vt
– vt+1=0.7vt+0.5*(P*(rt-vt)+D*((rt-vt)-(rt-1-vt-1)))-wt
– vt+1=(0.7-0.5*(P+D))*vt+0.5D*vt-1+0.5*(P+D)*rt-0.5D*rt-1-wt
– Assume reference input and distribance are constant, the
steady-state speed is
• Vss=(0.5P/(1-0.7+0.5P)) * r
• Does not depend on D!!!
• P can be set for best tracking and disturbance control
• Then D set to control oscillation/overshoot/rate of
convergence
PD Control Example
PI Control
• Proportional plus integral control
– ut=P*et+I*(e0+e1+…+et)
• Sum up error over time
– Ensure reaching desired output, eventually
– vss will not be reached until ess=0
• Use P to control disturbance
• Use I to ensure steady state convergence
and convergence rate
PID Controller
• Combine Proportional, integral, and derivative
control
– ut=P*et+I*(e0+e1+…+et)+D*(et-et-1)
• Available off-the shelf
Software Coding
• Main function loops forever, during each iteration
– Read plant output sensor
• May require A2D
– Read current desired reference input
– Call PidUpdate, to determine actuator value
– Set actuator value
• May require D2A
Software Coding (continue)
• Pgain, Dgain, Igain are constants
• sensor_value_previous
– For D control
• error_sum
– For I control
Computation
• ut=P*et+I*(e0+e1+…+et)+D*(et-et-1)
PID tuning
• Analytically deriving P, I, D may not be possible
– E.g. plant not is not available, or to costly to obtain
• Ad hoc method for getting “reasonable” P, I, D
– Start with a small P, I=D=0
– Increase D, until seeing oscillation
• Reduce D a bit
– Increase P, until seeing oscillation
• Reduce D a bit
– Increase I, until seeing oscillation
• Iterate until can change anything without excessive oscillation
Practical Issues with Computer-Based Control
• Quantization
• Overflow
• Aliasing
• Computation Delay
Quantization & Overflow
• Quantization
– Can’t store 0.36 as 4-bit fractional number
– Can only store 0.75, 0.59, 0.25, 0.00, -0.25, -050,-0.75, -1.00
– Choose 0.25
• Result in quantization error of 0.11
• Sources of quantization error
– Operations, e.g. 0.50*0.25=0.125
• Can use more bits until input/output to the environment/memory
– A2D converters
• Overflow
– Can’t store 0.75+0.50 = 1.25 as 4-bit fractional number
• Solutions:
– Use fix-point representation/operations carefully
• Time-consuming
– Use floating-point co-processor
• Costly
Aliasing
• Quantization/overflow
– Due to discrete nature of computer data
• Aliasing
– Due to discrete nature of sampling
Aliasing Example
• Sampling at 2.5 Hz, period of 0.4, the following are indistinguishable
– y(t)=1.0*sin(6πt), frequency 3 Hz
– y(t)=1.0*sin(πt), frequency of 0.5 Hz
• In fact, with sampling frequency of 2.5 Hz
– Can only correctly sample signal below Nyquist frequency 2.5/2 = 1.25 Hz
Computation Delay
• Inherent delay in processing
– Actuation occurs later than expected
• Need to characterize implementation delay to make sure it is
negligible
• Hardware delay is usually easy to characterize
– Synchronous design
• Software delay is harder to predict
– Should organize code carefully so delay is predictable and minimized
– Write software with predictable timing behavior (be like hardware)
• Time Trigger Architecture
• Synchronous Software Language
Benefit of Computer Control
• Cost!!!
– Expensive to make analog control immune to
• Age, temperature, manufacturing error
– Computer control replace complex analog hardware with complex code
• Programmability!!!
– Computer Control can be “upgraded”
• Change in control mode, gain, are easy to do
– Computer Control can be adaptive to change in plant
• Due to age, temperature, …etc
– “future-proof”
• Easily adapt to change in standards,..etc
Embedded Systems
Chapter – 8
IC Technology
8. IC Technology [3 Hrs.]
8.1 Full-Custom (VLSI) IC
Technology
8.2 Semi-Custom
Semi (ASIC) IC
Technology
8.3 Programming Logic Device
(PLD) IC Technology
CMOS transistor
• Source, Drain
– Diffusion area where electrons can flow
– Can be connected to metal contacts
• Gate
– Polysilicon area where control voltage is applied
• Oxide
– Si O2 Insulator so the gate voltage can’t leak
End of the Moore’s Law?
• Every dimension of the MOSFET has to scale
– (PMOS) Gate oxide has to scale down to
• Increase gate capacitance
• Reduce leakage current from S to D
• Pinch off current from source to drain
– Current gate oxide thickness is about 2.5-3nm
• That’s about 25 atoms!!!
gate
IC package IC oxide
source channel drain
Silicon substrate
NAND
• Metal layers for routing (~10)
• PMOS don’t like 0
• NMOS don’t like 1
• A stick diagram form the basis for mask sets
Silicon manufacturing steps
• Tape out
– Send design to manufacturing
• Spin
– One time through the manufacturing process
• Photolithography
– Drawing patterns by using photoresist to form barriers for deposition
Introduction to
Photolithography
Introduction to Photolithography
Transistor Layers
n-well p-well
p-channel transistor n-channel transistor
p+ substrate
• The patterns are first

transferred from the mask to
a light sensitive materials (e.g.
Si wafer, glass )
• Function of a barrier in the

following process (oxidation,
etch, ion implantation, etc.)
Photolithography -- Definitions
• Photolithography is used to produce 3-D images
using light sensitive photoresist and controlled
exposure to light.
 Microlithography is the
technique used to print
ultra-miniature patterns
- used primarily in the
semiconductor industry.
Photolithography is at the Center of the
Wafer Fabrication Process
Thin Films Polish
Patterned
wafer Diffusion Photo Etch
Test/Sort
Implant
* 4
What else is Photolithography?
• 3-dimensional circuit patterning
• Most critical step in IC process
–Determines feature resolution
–Determines overlay accuracy
• Bottleneck in the fab process
• The leading technology
Wafer Conditions Prior to Patterning
• Surface conditions include:
– film composition, e.g.: silicon, nitride, polysilicon, metal,
etc.
– bare surface vs. patterned surface
– surface reflectivity
• Surface conditions may affect
– photoresist-to-wafer adhesion
– alignment accuracy
– linewidth resolution
– exposure settings
– bake time
Wafer Conditions after
Photolithography
• resist coated wafer
• patterned resist layer
• withstands etching process
• withstands ion implanting
• quality measures
– overlay accuracy
– particles & defects
Importance of Resolution and
Overlay Registration
VSS VDD
Vin
Top view of Transistor

s g d s g d
Vout
p-channel n-channel
polysilicon gate transistor transistor contact
metal
field oxide n+
p+ p+ n+
source drain source drain
p-well
n-substrate
Cross-section of Transistor
gate oxide
Types of Photolithography Processes
Negative: Prints a pattern that is opposite

of the pattern that is on the
mask.
Positive: Prints a pattern that is the
same as the pattern on the
mask.
Negative Lithography
Areas exposed to light become

polymerized and resist the develop
chemical.
Ultraviolet Light
Chrome island
on glass mask Island
Exposed area
of photoresist
Window
photoresist
Shadow on
photoresist
photoresist
oxide oxide
silicon substrate silicon substrate
Resulting pattern after the

resist is developed.
Positive Lithography
Areas exposed to light

Ultraviolet Light become photosoluble.
Chrome island
on glass mask Island
Shadow on
photoresist
Window
photoresist
Exposed area
of photoresist
photoresist
oxide oxide
silicon substrate silicon substrate
Resulting pattern after the

resist is developed.
Ten Basic Steps of Photolithography
1. Surface Preparation
2. Photoresist Application
3. Soft Bake
4. Align & Expose
5. Develop
6. Hard Bake
7. Develop Inspection
8. Etch
9. Resist Strip
10. Final Inspection
1. Surface Preparation
(HMDS vapor prime)
• Dehydration bake in
HMDS
enclosed chamber
with exhaust
• Clean and dry wafer
surface (hydrophobic)
• Hexamethyldisilazane
(HMDS)
• Temp ~ 200 - 250°C
• Time ~ 60 sec.
 Spin coater used 2500rpm~3000 rpm for 15
seconds
 Si wafer surface treatment  adhesion develop
HDMS
Si wafer
2. Photoresist Application
• Wafer held onto vacuum
chuck
• Dispense ~5ml of
photoresist photoresist
dispenser
• Slow spin ~ 500 rpm
• Ramp up to ~ 3000 - 5000
rpm
• Quality measures:
– time
– speed
– thickness
– uniformity vacuum chuck
to vacuum
pump spindle
• PR = Sensitizer (PAC) +
resin + solvent
• Pattern polarity
– Positive type : AZ PR
series (Shipley)
– Negative type : HR PR
series (Hunt Chemical)
• using spin motor create uniform coating PR
thickness on the wafer
• important element for thickness and uniform
: resin %, cohesion, spin speed, accelerator, time
PR
HDMS
Si wafer
Result: Variation of PR thickness
22000
before soft bake
20000
PR thickness(angstr)
after soft bake

18000
16000
14000
12000
10000
2000rpm 2500rpm 3000rpm 3500rpm 4000rpm
spin RPM
 Spin speed (rpm)  PR thickness

 Condensation after soft bake  Solvent evaporation
3. Soft Bake
• Partial evaporation of
photo-resist solvents
• Improves adhesion
• Improves uniformity
• Improves etch resistance
• Improves linewidth
control
• Optimizes light
absorbance
characteristics of
photoresist
• Soft bake(95oC, 30min in oven)
: improve adhesion and remove
solvent from the PR
• Alignment
: a photo mask, a square glass
plate with patterned emulsion Mask
or metal film on one side is
placed over the wafer
• Exposure(UV light : 12mW) PR
: leaving bare SiO2 in the HDMS

exposed area. A negative resist
remain on the surface Si wafer
wherever it is exposed.
4. Alignment and Exposure
• Transfers the mask

UV Light Source
image to the resist-
coated wafer
• Activates photo-
Mask
sensitive
components of
photoresist
• Quality measures: l
– overlay accuracy Resist
• Photoresist thickness is related to Coating
Spin
– spin speed increase  thickness decrease
– After soft bake  thickness decrease
• Critical Dimension is related to Exposure time

– expose time increase  CD increase
– proper expose time need for exact process
Variation of Critical Dimension (CD)
21.4
21.2
21
C D (um)
20.8
20.6
20.4
20.2
20
9sec 12sec 15sec 18sec
time(s ec )
 Exposing time  CD
5. Develop
• Soluble areas of photoresist

are dissolved by developer developer
chemical dispenser
• Visible patterns appear
on wafer
– windows
– islands
– line resolution vacuum chuck
– uniformity to vacuum
pump
– particles & defects spindle
• Post exposure bake
• Develop
:6AZMIF 300:1H20
(70 sec., room temp.) PR
Si wafer
• Inspection
1. Contamination
2. Opaque spot
3. Large hole
4. Pin hole
5. Excess material
6. Lack of adhesion
7. Intrusion
8. Scratch
Rework
• Hard bake (110oC, 30
min.)
: to harden the
photoresist and
improve adhesion to
the substrate.
6. Hard Bake
 Evaporate
remaining
photoresist
 Improve
adhesion
 Higher
temperature
than soft bake
7. Develop Inspect
• Optical or SEM
metrology
• Quality issues:
–particles
–defects
–critical dimensions
–linewidth resolution
–overlay accuracy
8. Etch
• Selective removal of upper
layer of wafer through CF4
windows in photoresist
• Two basic methods:

– wet acid etch
– dry plasma etch
– defects and particles
– step height
– selectivity
– critical dimensions
Plasma
9. Photoresist Removal (strip)
• No need for photoresist
following etch process O2
• Two common methods:
– wet acid strip
– dry plasma strip
• Followed by wet clean
to remove remaining
resist and strip Plasma
byproducts
10. Final Inspection
• Photoresist has been
completely removed
• Pattern on wafer
matches mask pattern
(positive resist)
• Quality issues:
– defects
– particles
– step height
– critical dimensions
Full Custom
• Very Large Scale Integration (VLSI)
• Placement
– Place and orient transistors
• Routing
– Connect transistors
• Sizing
– Make fat, fast wires or thin, slow wires
– May also need to size buffer
• Design Rules
– “simple” rules for correct circuit function
• Metal/metal spacing, min poly width…
Full Custom
• Best size, power, performance
• Hand design
– Horrible time-to-market/flexibility/NRE cost…
– Reserve for the most important units in a processor
• ALU, Instruction fetch…
• Physical design tools
– Less optimal, but faster…
Semi-Custom
• Gate Array
– Array of prefabricated gates
– “place” and route
– Higher density, faster time-to-market
– Does not integrate as well with full-custom
• Standard Cell
– A library of pre-designed cell
– Place and route
– Lower density, higher complexity
– Integrate great with full-custom
Semi-Custom
• Most popular design style
• Jack of all trade

– Good
• Power, time-to-market,
performance, NRE cost, per-unit
cost, area…
• Master of none
– Integrate with full custom for
critical regions of design
Programmable Logic Device
• Programmable Logic Device
– Programmable Logic Array, Programmable Array Logic, Field Programmable Gate
Array
• All layers already exist
– Designers can purchase an IC
– To implement desired functionality
• Connections on the IC are either created or destroyed to implement
• Benefits
– Very low NRE costs
– Great time to market
• Drawback
– High unit cost, bad for large volume
– Power
• Except special PLA 1600 usable gate, 7.5 ns
– slower $7 list price
Xilinx FPGA
Configurable Logic Block (CLB)
I/O Block
PROGRAMMABLE LOGIC ARRAYS (PLAs)
N-MOS PLA WITH 3 INPUT, 5 PRODUCT
TERMS, AND 4 OUTPUTS
F0=Σm(0,1,4,6)
=A’B’+AC’
F1=Σm(2,3,4,6,7)
=B+AC’
F2=Σm(0,1,2,6)
=A’B’+BC’
F3=Σm(2,3,5,6,7)
=AC+B
AND-OR ARRAY EQUIVALENT OF NMOS 3 INPUT 5
PRODUCT TERMS AND 4 OUTPUTS
PLA table
REALIZATION OF PLA FOR A GIVEN EQUATION
F1 = Σm(2,3,5,7,8,9,10,11,13,15) = BD+B’C+AB’
F2 = Σm(2,3,5,6,7,10,11,14,15) = C+A’BD
F3 = Σm(6,7,8,9,13,14,15) = BC+AB’C’+ABD
F1 = bd(a+a’) + b’c + ab’(c+c’)

= abd + a’bd + b’c +ab’c’ +ab’c
F2 = c(b+b’) +a’bd
= bc + b’c + a’bd
F3 = bc+ab’c’+abd
Reduced PLA table

Embedded Systems
Chapter – 9
Microcontrollers in
Embedded Systems
9. Microcontrollers in Embedded
Systems [3 Hrs.]
9.1 Intel 8051 microcontroller
family, its architecture and
instruction sets
9.2 Programming in Assembly
Language
9.3 A simple interfacing example
with 7 segment display
Microcontroller is a Highly integrated chip
that contains a CPU, scratchpad RAM,
special and general purpose register arrays
and integrated peripherals.
Among 8 bit microcontroller 8051 family is

most popular: cost effective & versatile
device offering extensive support in the
embedded application domain.
1st popular model: 8031AH built by Intel in 1977

Microprocessor Microcontroller
• A Si chip representing • Is a highly integrated chip
a CPU, which is capable that contains a CPU,
of performing scratchpad RAM, special
arithmetic as well as and general purpose
logical operations register arrays, on chip
according to a pre ROM/FLASH memory for
defined set of program storage, timer &
instructions. interrupt control units
and dedicated I/O ports.
• CPU is stand-alone, • CPU, RAM, ROM, I/O

RAM, ROM, I/O, and timer are all on a
timer are separate single chip
• is a dependent unit. It • Is a self-contained unit & it
requires the combination of doesn’t require external
other chip like timers, interrupt controller, timer,
program and data memory UART (Universal
chips, interrupt controllers, Asynchronous Receiver
etc. for functioning Transmitter), etc. for its
• Most of the time general functioning
purpose in design and • Mostly application –
operation oriented or domain-specific
• Targeted for high end • Targeted for embedded
market where market where performance
performance is important is not so critical ( At present
this demarcation is invalid )
• doesn’t contain a built •Most of the processors
in I/O port. The I/O contain multiple built-
port functionality needs
to be implemented with in I/O ports which can
the help of external be operated as a single
programmable 8 or 16 or 32 bit port
peripheral interface or as individual port
chips like 8255 pins
• Limited power saving •Includes lots of power

options compared to saving features
microcontrollers
Factors to be considered in selecting a controller
Feature set: interface, port requirement by the
application, nos. of timers & counters, built-in
ADC/DAC hardware, required performance
Speed of Operation: nos. of clocks required per
instruction cycle and the max operating clock
freq. supported by the processor greatly affected
the speed of operation of the controller.
Code Memory Space: if the target processor/
controller application is written in C or any other
high level language, does the controller support
sufficient code memory space to hold the
compiled hex code?
Data Memory Space: does the controller support
sufficient internal data memory (on chip RAM) to
hold run time variables and data structures?
Development Support: Does the controller
manufacture provide cost-effective development
tools, sample product, development pains,
support third party development tools, and
technical support?
Availability: referred to as lead time. Lead time
is the time elapsed between the purchase order
approval and supply of the product.
Power Consumption: of the controller should be
minimal. It is a crucial factor since higher power
requirement leads to bulky power supply
designs. The high power dissipation also
demands for cooling fans and it will make the
overall system messy and expensive. Controllers
should support idle and power down modes of
operation to reduce consumption.
Cost: Last but not least, the cost should be
within the reachable limit of the end user and the
target user should not be high tech. Remember
the ultimate aim of a product is to gain marginal
benefit.
Why 8051 Microcontroller?
- Is very versatile microcontroller featuring
powerful Boolean processor which supports bit
manipulation instructions for real time industrial
control applications.
- architecture supports 6 interrupts (2 external
interrupts, 2 timer interrupts & 2 serial interrupts)
- 2 16bit timers/counters
- 32 I/O lines & a programmable full duplex serial
interface.
- is the way it handles interrupts.
- low cost
- flash microcontroller (AT89C51) from Atmel is
available in the market
Microprocessors:
General-purpose microprocessor
• CPU for Computers

• No RAM, ROM, I/O on CPU chip itself
• Example：Intel’s x86, Motorola’s 680x0
Many chips on mother’s board
Data Bus
CPU
General-
Purpose Serial
RAM ROM I/O Timer
Micro- COM
Port
processor Port
Address Bus
General-Purpose Microprocessor System

Microcontroller
 A smaller computer
 On-chip RAM, ROM, I/O ports...
 Example：Motorola’s 6811, Intel’s 8051,
Zilog’s Z8 and PIC 16X.
CPU RAM ROM A single chip
Serial
I/O Timer COM
Port
Port
Microcontroller
Companies Producing 8051
Some Companies Producing a Member of the 8051 Family

Company Web Site
Intel www.intel.com/design/mcs51
Atmel www.atmel.com
Philips/Signetics www.semiconductors.philips.com
Siemens www.sci.siemens.com
Dallas Semiconductor www.dalsemi.com

Common Microcontrollers
•Atmel •Motorola
•ARM •8-bit
•Intel
•8-bit
•68HC05
•8XC42 •68HC08
•MCS48 •68HC11
•MCS51 •16-bit
•8xC251 •68HC12
•16-bit
•MCS96
•68HC16
•MXS296 •32-bit
•National Semiconductor •683xx
•COP8 •Texas Instruments
•Microchip •TMS370
•12-bit instruction PIC
•14-bit instruction PIC
•MSP430
•PIC16F84 •Zilog
•16-bit instruction PIC •Z8
•NEC •Z86E02
8051 Family
Comparison of 8051 Family Members
Feature 8051 8052 8031

ROM (on chip program space in 4K 8k 0k
bytes)
RAM (bytes) 128 256 128
Timers 2 3 2
I/O pins 32 32 32
Serial port 1 1 1
Interrupt sources 6 8 6
Intel 8051
• 8051 introduced by Intel in late 1970s

• Now produced by many companies in
many variations
• The most popular microcontroller –
about 40% of market share
• 8-bit microcontroller
Important features of 8051
• 8-bit ALU, Accumulator and Registers;
hence it is an 8-bit microcontroller.
• 8-bit data bus - It can access 8 bits of data
in one operation.
• 16-bit address bus - It can access 216
memory locations - 64 kb (65536 locations)
each of RAM and ROM.
• On-chip RAM - 128 bytes (data memory)
Important features of 8051
• On-chip ROM- 4kb (program

memory)
• Bi-directional input/output ports
• UART (serial port)
• Two 16-bit Counter/timers
• Two-level interrupt
Embedded System General Block Diagram
Sensor conditioning
Output interfaces
sensor
actuator
sensor Microcontroller
(µC)
indicator
sensor
Three criteria in Choosing a Microcontroller
1. meeting the computing needs of the task efficiently
and cost effectively
• speed, the amount of ROM and RAM, the
number of I/O ports and timers, size, packaging,
power consumption
• easy to upgrade
• cost per unit
2. availability of software development tools
• assemblers, debuggers, C compilers, emulator,
simulator, technical support
3. wide availability and reliable sources of the
microcontrollers.
8051 Architecture
Oscillator and 4096 Bytes 128 Bytes Two 16 Bit

timing Program Data Timer/Event
Memory Memory Counters
8051 Internal data bus

CPU
64 K Byte Bus Programmable Programmable Serial

Expansion I/O Port Full Duplex
Control UART Synchronous
Shifter
subsystem interrupts
External interrupts Control Parallel ports Serial Output

Address Data Bus Serial Input
I/O pins
8051 Memory Architecture
Memory Model
Program Memory
Internal ROM (4k)
External EPROM
Data Memory
Internal RAM (128 bytes)
General Purpose Registers
Special Function Registers
External SRAM
8051 General Purpose Registers
A
R0
DPTR DPH DPL
R1
R2
PC PC
R3
R4 Some 8051 16-bit Register
R5
Note:
R6
A= accumulator
R7 PC=program counter
DPTR=data pointer
Some 8-bit Registers of the
8051
8051 Special Function Registers(SFRs)
Contd...
Pin Description of the 8051
P1.0 1 40 Vcc
P1.1 2 39 P0.0(AD0)
P1.2 3 38 P0.1(AD1)
P1.3
P1.4
4
5
8051 37
36
P0.2(AD2)
P0.3(AD3)
P1.5 6 (8031) 35 P0.4(AD4)
P1.6 7 34 P0.5(AD5)
P1.7 8 33 P0.6(AD6)
RST 9 32 P0.7(AD7)
(RXD)P3.0 10 31 EA/VPP
(TXD)P3.1 11 30 ALE/PROG
(INT0)P3.2 12 29 PSEN
(INT1)P3.3 13 28 P2.7(A15)
(T0)P3.4 14 27 P2.6(A14)
(T1)P3.5 15 26 P2.5(A13)
(WR)P3.6 16 25 P2.4(A12)
(RD)P3.7 17 24 P2.3(A11)
XTAL2 18 23 P2.2(A10)
XTAL1 19 22 P2.1(A9)
GND 20 21 P2.0(A8)
Pins of 8051
• Vcc（pin 40）：
– Vcc provides supply voltage to the chip.
– The voltage source is +5V.
• GND（pin 20）：ground
• XTAL1 and XTAL2（pins 19,18）

Pins of 8051
• RST (pin 9)：reset

– It is an input pin and is active high (normally low）
• The high pulse must be high at least 2 machine
cycles.
– It is a power-on reset.
• Upon applying a high pulse to RST, the
microcontroller will reset and all values in
registers will be lost.
Pins of 8051
• /EA （pin 31）：external access

– There is no on-chip ROM in 8031 and 8032 .
– The /EA pin is connected to GND to indicate the code is
stored externally.
– /PSEN ＆ ALE are used for external ROM.
– For 8051, /EA pin is connected to Vcc.
– “/” means active low.
• /PSEN （pin 29）：program store enable
– This is an output pin and is connected to the OE pin of the
ROM.
Pins of 8051
 ALE （pin 30）：address latch enable
– It is an output pin and is active high.
– 8051 port 0 provides both address and data.
– The ALE pin is used for de-multiplexing the address
and data by connecting to the G pin of the 74LS373
latch.
 I/O port pins

– The four ports P0, P1, P2, and P3.
– Each port uses 8 pins.
– All I/O pins are bi-directional.
Pins of I/O Port
• The 8051 has four I/O ports
– Port 0 （pins 32-39）：P0（P0.0～P0.7）
– Port 1（pins 1-8）：P1（P1.0～P1.7）
– Port 2（pins 21-28）：P2（P2.0～P2.7）
– Port 3（pins 10-17）：P3（P3.0～P3.7）
– Each port has 8 pins.
• Named P0.X （X=0,1,...,7）, P1.X, P2.X, P3.X
• Example：P0.0 is the bit 0（LSB）of P0
• Example：P0.7 is the bit 7（MSB）of P0
• These 8 bits form a byte.
• Each port can be used as input or output (bi-direction).

Port 0
• It can be used for both input or output.
• Each pin must be connected externally to a 10K-ohm

pull-up resistors to use the pins of port 0 as both input
and output ports (open drain).
• Dual role of PORT 0

– It is used for both address and data(for 8031).
– It is also designated as AD0-AD7 for address to be
connected to an external memory.
• It must be assigned 1for input and 0 for output.

Port 1
• It can be used for both input or output.
• Does not need any pull up registers.

Port 2
• Can be used for both input or output.
• Dual role of PORT 2

– It is used along with P0 to provide 16-bit address for
external memory(for 8031).
– It is also designated as A8-A15 for address to be

connected to an external memory.
Port 3
• Can be used for both input or output.
• Port 3 Alternate Functions
Minimum hardware connection for 8051
based system
8051 Instructions Sets
• The 8051 instruction sets are

divided into five functional groups:
–Arithmetic instructions
–Logical instructions
–Data transfer instructions
–Boolean variable instructions
–Program branching instructions
Arithmetic Instructions
• With arithmetic instructions, 8051

performs all the arithmetic operations.
• Arithmetic operations effect the flags,

such as Carry Flag (CY), Overflow
Flag (OV) etc, in the PSW register.
Contd…
Note:
- [@Ri] implies
contents of
memory location
pointed to by R0 or
R1
- Rn refers to
registers R0-R7 of
the currently
selected register
bank
Logical
Instructions
• Logical
instructions
perform
Boolean
operations
(AND, OR,
XOR, and
NOT) on data
bytes on a bit-
by-bit basis.
Data Transfer Instructions
• Data transfer instructions can be used
to transfer data between an internal
RAM location and an SFR location
without going through the
accumulator.
• It is also possible to transfer data

between the internal and external
RAM by using indirect addressing.
Contd…
Boolean Variable Instructions
• The 8051 processor can perform single bit

operations.
• The operations include set, clear, and, or and

complement instructions.
• Also included are bit–level moves or

conditional jump instructions.
Contd…
Program Branching Instructions
• Program branching instructions are

used to control the flow of program
execution
• Some instructions provide decision

making capabilities before transferring
control to other parts of the program
(conditional branches).
Contd…
8051 Programming with C and Assembly
Write a assembly program that continuously toggles the value of port 0.
// Absolute subroutine call
// Short Jump
Contd…
Corresponding C program:
#include <regx51.h>
Void Delay(unsigned int); Void Delay(unsigned int n)

{
Void main(void) unsigned int i, j;
{
P0=0x00; //make P0 an output
for(i=0; i<n; i++)
//port for(j=0; j<1275; j++);
while(1) }
{
P0=0x55;
Delay(200);
P0=0xAA;
Delay(200);
}
}
Contd…
Write assembly program to get data from P0 and send it to P1.
MOV A, #00H
MOV P1, A
Corresponding C program: Contd…
#include <regx51.h>
Void main(void)
{
P0=0xFF; //make P0 an input port
P1=0x00; //make P1 an output port
while(1)
{
P1=P0;
}
}
Contd…
Contd…
Corresponding C program:
P1^0
P
Contd…
Interfacing 7 segment display with 8051
Basically there are two types of 7-Seg display's:

1. Common Cathode: where all the segments share the same Cathode.
2. Common Anode: where all Segments share the same Anode.
Here we will be only discussing the Common Anode type. In common Anode
in order to turn ON a segment the corresponding pin must be set to 0. And to
turn it OFF it is set to 1.
Contd…
Port connection
Lookup Table for 7 Segment Decoding
Hardware connection of 7 segment with
8051
Assembly program to display 0 to 9 in 7 segment display
MOV A, #00H
MOV P2, A // make P2 an output port
MOV P2, #C0 MOV P2, #82

ACALL DELAY ACALL DELAY
MOV P2, #F9 MOV P2, #F8
MOV P2, #A4 MOV P2, #80
MOV P2, #B0 MOV P2, #98
MOV P2, #99
ACALL DELAY
MOV P2, #92
ACALL DELAY
C Program Contd…
#include <regx51.h>
Void Delay(unsigned int);
Void main(void)
{
P2=0x00; //make P0 an output port
P2=0xC0;
Delay(200);
P2=0xF9;
Delay(200);
P2=0xA4;
Delay(200);
P2=0xB0;
Delay(200);
P2=0x82;
Delay(200); Contd…
P2=0xF8;
Delay(200);
P2=0x80;
Delay(200);
P2=0x98;
}
Void Delay(unsigned int n)

{
unsigned int i, j;
for(i=0; i<n; i++)
for(j=0; j<1275; j++);
}
Timers
• The 8051 has two timers/counters, they can be
used either as
– Timers to generate a time delay
– or as Event counters to count events happening outside the
microcontroller
• Both Timer 0 and Timer 1 are 16 bits wide.
• Since 8051 has an 8-bit architecture, each 16-
bits timer is accessed as two separate registers
of low byte and high byte.
– The low byte register is called TL0/TL1 and
– The high byte register is called TH0/TH1
Contd…
Timer 0
Timer 1
Both timers 0 and 1 use the same register, called

TMOD (timer mode), to set the various timer
operation modes.
When GATE=0, Timer
ON/OFF is controller
using software.
When GATE=1, Timer
ON/OFF is controller
using hardware.
TCON (Timer control) Register
Contd…
Timer mode 1 programming
• The following are the characteristics and
operations of mode1:
– It is a 16-bit timer; therefore, it allows value of 0000 to
FFFFH to be loaded into the timer’s register TL and TH
– After TH and TL are loaded with a 16-bit initial value,
the timer must be started
– This is done by setting high TR0 for timer0 and TR1 for
timer1
– After the timer is started, it starts to count up
– It counts up until it reaches its limit of FFFFH
– When it rolls over from FFFFH to 0000, it sets high a
flag bit called TF (timer flag).Each timer has its own
timer flag: TF0 for timer 0, and TF1 for timer 1.
Contd…
– When this timer flag is raised, one option would be to stop
the timer.
– After the timer reaches its limit and rolls over, in order to
repeat the process TH and TL must be reloaded with the
original value, and TF must be reloaded to 0.
Delay generation process
1. Load the TMOD value register indicating which
timer (timer 0 or timer 1) is to be used and which
timer mode (0 or 1) is selected.
2. Load registers TL and TH with initial count value.
3. Start the timer.
4. Keep monitoring the timer flag (TF). if it is raised,
5. ƒStop the timer.
6. Clear the TF flag for the next round
7. Go back to Step 2 to load TH and TL again
How to calculate values to be loaded into
TH and TL
Assume XTAL = 11.0592 MHz, we can use the following
steps for finding the TH and TL registers’ values,
1. Divide the desired time delay by 1.085 us.
2. Calculate 65536 – n, where n is the decimal
value we got in Step1.
3. Convert the result of Step2 to hex, where yyxx
is the initial hex value to be loaded into the
timer’s register
4. Set TL = xx and TH = yy.
Program that generates 56 ms delay
Interrupt
• Concept behind Interrupt
– Interrupt vs Polling
– What is the advantage of having interrupt based
system over polling system
• Interrupt Process:
Upon activation of an interrupt, the microcontroller
goes through the following steps,
1. It finishes the instruction it is executing and
saves the address of the next instruction (PC) on
the stack
2. It also saves the current status of all the
interrupts internally (i.e: not on the stack)
Contd…
3. It jumps to a fixed location in memory, called the
interrupt vector table, that holds the address of
the ISR.
4. The microcontroller gets the address of the ISR
from the interrupt vector table and jumps to it.
o It starts to execute the interrupt service subroutine
until it reaches the last instruction of the subroutine
which is RETI (return from interrupt)
5. Upon executing the RETI instruction, the
microcontroller returns to the place where it was
interrupted and starts executing form that
address.
Interrupt sources of 8051
Level triggered (normally HIGH) and edge trigged (falling edge)

Example
Serial communication
• Different ways of communication
– wireless
– Wired
• Protoc0l: set of rules agreed by both the

sender and receiver on
– How the data is packed
– How many bits constitute a character
– When the data begins and ends
VHDL
1
CHAPTER TEN
DESIGN PROCESS
2
Introduction
3
 VHDL is the acronym of VHSIC Hardware Description

Language. VHSIC is an abbreviation for Very High Speed
Integrated Circuit. It can describe the behavior and
structure of electronic systems, but is particularly suited
as a language to describe the structure and behavior of
digital electronic hardware designs, such as ASICs and
FPGAs as well as conventional digital circuits.
 Developed by Department of Defense (DOD) between

1970s and 80s, it was officially standardized as IEEE
1076 in 1987.
Levels of representation and abstraction
4
 A digital system can be represented at different levels

of abstraction. This keeps the description and design
of complex systems manageable. Figure below
shows different levels of abstraction.
Figure 1 : Levels of abstraction: Behavioral, Structural and Physical

Contd…
5
 Behavioral: The highest level of abstraction that

describes a system in terms of what it does (or how it
behaves) rather than in terms of its components and
interconnection between them. A behavioral description
specifies the relationship between the input and output
signals. This could be a Boolean expression or a more
abstract description such as the Register Transfer or
Algorithmic level.
 Example, let us consider a simple circuit that warns car
passengers when the door is open or the seatbelt is not
used whenever the car key is inserted in the ignition lock.
At the behavioral level this could be expressed as,
Warning = Ignition_on AND ( Door_open OR Seatbelt_off)

Contd…
6
 Structural: The structural level, on the other hand,

describes a system as a collection of gates and
components that are interconnected to perform a
desired function. A structural description could be
compared to a schematic of interconnected logic
gates. It is a representation that is usually closer to
the physical realization of a system. For the example
above, the structural representation is shown
in Figure below.
Figure 2: Structural representation of a “buzzer” circuit.

Basic Structure of a VHDL file
7
 A digital system in VHDL consists of a design entity that

can contain other entities that are then considered
components of the top-level entity. Each entity is
modeled by an entity declaration and an architecture
body.
 One can consider the entity declaration as the interface

to the outside world that defines the input and output
signals, while the architecture body contains the
description of the entity and is composed of
interconnected entities, processes and components, all
operating concurrently, as schematically shown in Figure
below. In a typical design there will be many such entities
connected together to perform the desired function.
Contd…
8
Figure 3: A VHDL entity consisting of an interface (entity declaration) and a body (architectural
description).
Contd…
9
 VHDL uses reserved keywords that cannot be used

as signal names or identifiers.
 Keywords and user-defined identifiers are case
insensitive.
 Lines with comments start with two adjacent
hyphens (--) and will be ignored by the compiler.
 VHDL also ignores line breaks and extra spaces.
 VHDL is a strongly typed language which implies
that one has always to declare the type of every
object that can have a value, such as signals,
constants and variables.
Contd…
10
Entity Declaration:
 The entity declaration defines the NAME of the entity
and lists the input and output ports. The general form is
as follows,
entity NAME_OF_ENTITY is
port (signal_names: mode type;
signal_names: mode type;
:
signal_names: mode type);
end [NAME_OF_ENTITY] ;
Contd…
11
 An entity always starts with the keyword entity,

followed by its name and the keyword is. Next are
the port declarations using the keyword port. An
entity declaration always ends with the
keyword end, optionally [] followed by the name of
the entity.
 The NAME_OF_ENTITY is a user-selected
identifier.
 signal_names consists of a comma separated list of
one or more user-selected identifiers that specify
external interface signals.
Contd…
12
mode: is one of the reserved words to indicate the

signal direction:
 in – indicates that the signal is an input

 out – indicates that the signal is an output of the entity
whose value can only be read by other entities that use it.
 buffer – indicates that the signal is an output of the
entity whose value can be read inside the entity‟s
architecture
 inout – the signal can be an input or an output.
Contd…
13
type: a built-in or user-defined signal type. Examples of types

are bit, bit_vector, Boolean, character, std_logic, and
std_ulogic.
 bit – can have the value 0 and 1
 bit_vector – is a vector of bit values (e.g. bit_vector (0 to 7)
 std_logic, std_ulogic, std_logic_vector, std_ulogic_vector:
can have 9 values to indicate the value and strength of a signal.
Std_ulogic and std_logic are preferred over the bit or
bit_vector types.
 boolean – can have the value TRUE and FALSE
 integer – can have a range of integer values
 real – can have a range of real values
 character – any printing character
 time – to indicate time
Contd…
14
For the example of Figure 2 above, the entity

declaration looks as follows.
-- comments: example of the buzzer circuit of fig. 2

entity BUZZER is
port(DOOR,IGNITION,SBELT: in std_logic;
WARNING: out std_logic);
end BUZZER;
Note: try out for Mux 4 to 1, D-flipflop

Contd…
15
Architecture body
 The architecture body specifies how the circuit operates and
how it is implemented.
The architecture body looks as follows,
architecture architecture_name of NAME_OF_ENTITY is

-- Declarations
-- components declarations
-- signal declarations
-- constant declarations
-- type declarations
begin
-- Statements
:
end architecture_name;
Contd…
16
Behavioral model
 The architecture body for the example of Figure 2,
described at the behavioral level, is given below,
architecture behavioral of BUZZER is

begin
WARNING <= (not DOOR and IGNITION) or (not SBELT and IGNITION);
end behavioral;
Contd…
17
 The header line of the architecture body defines the

architecture name, e.g. behavioral, and associates it
with the entity, BUZZER.
 The architecture name can be any legal identifier.
 The main body of the architecture starts with the
keyword begin.
 The “<= ” symbol represents an
assignment operator and assigns the value of the
expression on the right to the signal on the left.
 The architecture body ends with an end keyword
followed by the architecture name.
Note: Try out for Basic Gates Like AND, OR, NOT
Complete program looks like
18
library ieee;
use ieee.std_logic_1164.all;
entity BUZZER is
port(DOOR,IGNITION,SBELT: in std_logic;
WARNING: out std_logic);
end BUZZER;
architecture behavioral of BUZZER is

begin
WARNING <= (not DOOR and IGNITION) or (not SBELT and IGNITION);
end behavioral;
19
Concurrency
Contd…
20
Structural description
 The circuit of Figure 2 can also be described using a
structural model that specifies what gates are used
and how they are interconnected. The following example
illustrates it.
architecture structural of BUZZER is

-- Declarations
component AND2
port (in1, in2: in std_logic;
out1: out std_logic);
end component;
Contd…
21
component OR2
port (in1, in2: in std_logic;
end component;
component NOT1
port (in1: in std_logic;
end component;
-- declaration of signals used to interconnect gates

signal DOOR_NOT, SBELT_NOT, B1, B2: std_logic;
Contd…
22
Begin
-- Component instantiations statements
U0: NOT1 port map (DOOR -> DOOR_NOT);
U1: NOT1 port map (SBELT -> SBELT_NOT);
U2: AND2 port map (IGNITION, DOOR_NOT, B1);
U3: AND2 port map (IGNITION, SBELT_NOT, B2);
U4: OR2 port map (B1, B2, WARNING);
end structural;
Contd…
23
 Following the header is the declarative part that

gives the components (gates) that are going to be
used in the description of the circuits.
 In our example, we use a two- input AND gate, two-
input OR gate and an inverter. These gates have to
be defined first.
 The statements after the begin keyword gives the
instantiations of the components and describes how
these are interconnected.
label: component-name port map (port1=>signal1,

port2=> signal2,… port3=>signaln);
Library and Package
24
 A library can be considered as a place where the

compiler stores information about a design project.
A VHDL package is a file or module that contains
declarations of commonly used objects, data type,
component declarations, signal, procedures and
functions that can be shared among different VHDL
models.
 std_logic is defined in the package

ieee.std_logic_1164 in the ieee library. In order to
use the std_logic one needs to specify the library and
package.
Contd…
25
 This is done at the beginning of the VHDL file using

the library and the use keywords as follows:
library ieee;
 The .all extension indicates to use all of the

ieee.std_logic_1164 package.
Contd…
26
 The Xilinx Foundation Express comes with several

packages.
ieee Library:
 std_logic_1164 package: defines the standard data types
 std_logic_arith package: provides arithmetic, conversion and
comparison functions for the signed, unsigned, integer,
std_ulogic, std_logic and std_logic_vector types
 std_logic_misc package: defines supplemental types, subtypes,
constants and functions for the std_logic_1164 package.
To use any of these one must include the library and use
clause:
library ieee;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
Lexical Elements of VHDL
27
Identifiers
 Identifiers are user-defined words used to name objects in
VHDL modules. We have seen examples of identifiers for
input and output signals as well as the name of a design entity
and architecture body.
 When choosing an identifier one needs to follow these basic

rules:
 May contain only alpha-numeric characters (A to Z, a to z, 0-9) and the
underscore (_) character.
 The first character must be a letter and the last one cannot be an
underscore.
 An identifier cannot include two consecutive underscores.
 An identifier is case insensitive (ex. And2 and AND2 or and2 refer to the
same object)
 An identifier can be of any length.
Contd…
28
Keywords (Reserved words)

 Certain identifiers are used by the system as keywords for
special use such as specific constructs. These keywords
cannot be used as identifiers for signals or objects we
define.
Contd…
29
Numbers
 The default number representation is the decimal
system. VHDL allows integer literals and real literals.
Integer literals consist of whole numbers without a
decimal point, while real literals always include a
decimal point. Exponential notation is allowed using
the letter “E” or “e”. For integer literals the exponent
must always be positive. Examples are:
Integer literals: 12 10 -100

Real literals: 1.2 256.24 3.14E-2
Contd…
30
Characters, Strings and Bit Strings

 Single quotation mark is used to represent character literal in
VHDL, as shown in the examples below:
„a‟, „B‟, „,‟
 A string of characters are placed in double quotation marks as

shown:
“This is a string”,
 A bit-string represents a sequence of bit values. In order to indicate

a bit string, one places the „B‟ in front of the string: B”1001”. One
can also use strings in the hexagonal or octal base by using the X or
O specifiers, respectively. Some examples are:
Binary: B”1100_1001”, b”1001011”
Hexagonal: X”C9”, X”4b”
Octal: O”311”, o”113”
Data Objects: Signals, Variables, Constants
31
Constant
 A constant can have a single value of a given type and cannot be
changed during the simulation. A constant is declared as follows,
constant list_of_name_of_constant: type [ := initial value] ;
 where the initial value is optional. Constants can be declared at the

start of an architecture and can then be used anywhere within the
architecture. Constants declared within a process can only be used
inside that specific process.
constant RISE_FALL_TME: time := 2 ns;

constant DELAY1: time := 4 ns;
constant RISE_TIME, FALL_TIME: time:= 1 ns;
constant DATA_BUS: integer:= 16;
Contd…
32
Variable
 A variable may be changed during program execution. Variable
value is updated using a variable assignment statement. The
variable is updated without any delay as soon as the statement is
executed. Variables must be declared inside a process (and are local
to the process). The variable declaration is as follows:
variable list_of_variable_names: type [ := initial value] ;
A few examples follow:
variable CNTR_BIT: bit :=0;

variable VAR1: boolean :=FALSE;
variable SUM: integer range 0 to 256 :=16;
variable STS_BIT: std_logic_vector (7 downto 0);
Contd…
33
Signal
 Signals are similar to wires on a schematic, and can be used to
interconnect concurrent elements of the design.
signal list_of_signal_names: type [ := initial value] ;
signal SUM, CARRY: std_logic;

signal CLOCK: bit;
signal TRIGGER: integer :=0;
signal DATA_BUS: std_logic_vector (7 downto 0);
signal VALUE: integer range 0 to 100;
• Signals are updated when their signal assignment statement is

executed, after a certain delay.
Data Types
34
 Each data object has a type associated with it. The

type defines the set of values that the object can have
and the set of operations that are allowed on it.
 It is not allowed to assign a value of one type to an

object of another data type (e.g. assigning an integer
to a bit type is not allowed).
 Data Types defined in the Standard Package

VHDL has several predefined types in
the standard package as shown in the table below:
Contd…
35
Type Conversion
36
 Since VHDL is a strongly typed language one

cannot assign a value of one data type to a signal of
a different data type.
Operators
37
 VHDL supports different classes of operators that

operate on signals, variables and constants. The
different classes of operators are summarized below.
Sequential Statements
38
Process
 A PROCESS is a sequential section of VHDL code. It is
characterized by the presence of IF, WAIT, CASE, LOOP
and a sensitivity list(except when WAIT is used). Process is
executed every time a signal in the sensitivity list
changes(or the condition related to WAIT is fulfilled). Its
syntax is shown below:
[label:] PROCESS (sensitivity list)

[VARIABLE name type [range] [:=initial_value]]
BEGIN
(sequential statements)
END PROCESS
An example of a positive edge-triggered D flip-flop
39
library ieee;
entity DFF_CLEAR is
port (CLK, CLEAR, D : in std_logic;
Q : out std_logic);
end DFF_CLEAR;
architecture BEHAV_DFF of DFF_CLEAR is

begin
DFF_PROCESS: process (CLK, CLEAR)
begin
if (CLEAR = „1‟) then
Q <= „0‟;
elsif (CLK‟event and CLK = „1‟) then
Q <= D;
end if;
end process;
end BEHAV_DFF;
If- statements
40
The if statement executes a sequence of statements whose sequence

depends on one or more conditions. The syntax is as follows:
if condition then
sequential statements
elsif condition then
sequential statements ]
else
sequential statements ]
end if;
Example:
if S1=‟0‟ and S0=‟0‟ then
Z <= A;
elsif S1=‟0‟ and S0=‟1‟ then
Z <= B;
Else Z <= C;
end if;
Case Statements
41
 The case statement executes one of several sequences of

statements, based on the value of a single expression.
The syntax is as follows,
case expression is
when choices =>
when choices =>
-- branches are allowed
when others => sequential statements ]
end case;
Contd…
42
Example:
case VALUE is
when 51 to 60 =>
D <= ‟1‟;
when 61 to 70 | 71 to 75 =>
C <= ‟1‟;
when 76 to 85 =>
B <= ‟1‟;
when 86 to 100 =>
A <= ‟1‟;
when others =>
F <= „1‟;
end case;
Finite State Machine (FSM)
43
 A sequential logic unit which

 Takes an input and a current state
 Produces an output and a new state
 It is called a Finite State Machine because it can

have, at most, a finite number of states.
 It is composed of a combinational logic unit and flip-

flops placed in such a way as to maintain state
information.
Contd…
44
 It can also be represented using a state diagram as

below.
Input / Output
State0 State1
Input / Output
FSM Diagram
45
Finite State Machine Design
46
 Design a custom processor that calculates Greatest

Common Divisor (GCD) using FSM.
 STEPS:
Algorithm: 0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6: if (x < y)
7: y = y - x;
else
8: x = x - y;
}
9: d_o = x;
}
Contd…
47
State Diagram: If start=0
Idle State: 1
If start=1
Initialize
State: 2
x=x_i
y=y_i
Check4Condition State: 3
If x>y If x=y If y>x
State: 4
Update_x Update _y State: 5
x=x-y y=y-x
Get Result State: 6

Contd…
48
VHDL Coding:
library ieee;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
entity gcd is
port( clk : in std_logic;
reset: in std_logic;
num_1: in unsigned(3 downto 0);
num_2: in unsigned(3 downto 0);
gcd_num: out unsigned(3 downto 0)
);
end entity;
Contd…
49
architecture behav of gcd is

type state is (idle, init, check, update_x, update_y, get_result );
signal pr_state: state:=idle;
signal nx_state: state;
signal flag: std_logic:='0';

signal start: std_logic:='1';
begin
--Sequential Section
sequential:process(clk,reset) is
begin
if(reset='1') then
pr_state<=idle;
elsif(clk'event and clk='1') then
pr_state<=nx_state;
end if;
end process sequential;
Contd…
50
--Combinational Section
combinational:process(pr_state,num_1,num_2) is
variable temp_x: unsigned(3 downto 0):=(others=>'0');

variable temp_y: unsigned(3 downto 0):=(others=>'0');
begin
case pr_state is
when idle=>
if(start='1') then
nx_state<=init;
start<='0';
else
nx_state<=idle;
end if;
when init=>
temp_x:=num_1;
temp_y:=num_2;
nx_state<=check;
Contd…
51
when check=>
if(temp_x=temp_y) then
nx_state<=get_result;
elsif(temp_x>temp_y) then
nx_state<=update_x;
else
nx_state<=update_y;
end if;
when update_x=>
temp_x:=temp_x-temp_y;
nx_state<=check;
when update_y=>
temp_y:=temp_y-temp_x;
nx_state<=check;
Contd…
52
when get_result=>
gcd_num<=temp_x;
nx_state<=idle;
start<=„1';
end case;
end process combinational;
end architecture;
Contd…
53
Testbench:
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_arith.all ;
USE ieee.std_logic_unsigned.all ;
ENTITY gcd_tb IS
END ;
ARCHITECTURE gcd_tb_arch OF gcd_tb IS

SIGNAL num_1 : unsigned (3 downto 0) ;
SIGNAL gcd_num : unsigned (3 downto 0) ;
SIGNAL num_2 : unsigned (3 downto 0) ;
SIGNAL clk : std_logic ;
SIGNAL reset : std_logic ;
constant clk_period : time := 10 ns;
Contd…
54
COMPONENT gcd
PORT (
num_1 : in unsigned (3 downto 0) ;
gcd_num : out unsigned (3 downto 0) ;
num_2 : in unsigned (3 downto 0) ;
clk : in std_logic ;
reset : in std_logic );
END COMPONENT ;
BEGIN
DUT : gcd
PORT MAP (
num_1 => num_1 ,
gcd_num => gcd_num ,
num_2 => num_2 ,
clk => clk ,
reset => reset ) ;
Contd…
55
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
stim_proc: process
begin
-- hold reset state for 100 ns.
wait for 5 ns;
Contd…
56
reset<='0';
num_1<="1010";
num_2<="0101";
wait for 10 ns;
num_1<="1100";
num_2<="1001";
wait for 10 ns;
num_1<="1111";
num_2<="1101";
--wait for 10 ms;
wait;
end process;
END ;
Simulation Result
57
Synthesis Result
58
59

Embedded System Lecture Notes by Prof. Dr. Surendra Shrestha Sir

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Embedded System Lecture Notes by Prof. Dr. Surendra Shrestha Sir

Enviado por

Direitos autorais:

Formatos disponíveis

Dr.

•PhD (Major: Nanoscience), Sun Moon University, S. Korea

• M.Sc. Engg., Tashkent Electro-Technical Institute of Communication, Uzbekistan

-Associate Professor, Department of Electronics and Computer Engineering,

-Program Coordinator, M.Sc. In ICE, Department of Electronics and Computer

1.1 Embedded Systems overview

is an electronic/electromechanical system designed to perform a

is a system built to perform its duty, completely or partially

is specially designed to perform a few tasks in the most efficient

Interacts with physical elements in our environment, controlling and

• Computing systems are everywhere

– Computing systems and here...

embedded within and even here...

CCD preprocessor Pixel coprocessor D2A

JPEG codec Microcontroller Multiplier/Accum

DMA controller Display ctrl

Memory controller ISA bus interface UART LCD ctrl

• Single-functioned -- always a digital camera

Based on different criteria:

2. Complexity and performance

3. Based on deterministic behaviour

• Fourth Generation: The advent of System on Chip

• Small-Scale ESs: ESs which are simple in

• Medium-Scale ESs: ESs are slightly complex in

• Large-Scale ESs/Complex Systems: ESs

1. Data Collection/ Storage/

– NRE cost (Non-Recurring

– Size: the physical space required by the system

– Power: the amount of power consumed by the system

• Common metrics (continued)

Memory controller ISA bus interface UART LCD ctrl

Hardware Design Issues

2.1 Combination Logic

computation tasks A2D

– Single-purpose: one particular lens

– Fast, small, low power

Oxide Semiconductor gate Conducts

• refer to logic levels drain

• Two basic CMOS types

– Inverter, NAND, NOR 0 0

O= O0 =1 if I=0..00 sum = A+B less = 1 if A<B O = A op B

With enable With carry-in May have status

A) Problem Description C) Implementation Model D) State Table (Moore-type)

controller and datapath

(b) desired functionality !(x!=y)

• Connect the ports, 4: y = y_i

registers and functional 5: !(x!=y)

– for each datapath 1-J:

6-J: 0111 1000

1-J: 1100 1-J: d_o

Controller implementation model Controller !1

Q3 Q2 Q1 Q0 x_neq x_lt_y go_i I3 I2 I1 I0 x_sel y_sel x_ld y_ld d_ld

datapath controller datapath

• We have a state table next-state registers

– Cycle timing often too data_in(4)

– Bus bridge that converts WaitFirst4 RecFirst4Start

4-bit bus to 8-bit bus rdy_in=0 rdy_in=0 rdy_in=1

rdy_in=0 rdy_in=0 rdy_in=1

• Analyze program attributes and look

4: y = y_i x<y x>y

• Sharing of functional units

Software design issues

3.1 Basic Architecture