Escolar Documentos
Profissional Documentos
Cultura Documentos
Surendra Shrestha
surendra@ioe.edu.np, surendtha@gmail.com
Education:
• Post Doc. (Graphene Tech.), University Polytechnica de Madrid , Spain
PROFESSIONAL EXPERIENCE:
electronic devices
– Hard to define. Nearly any
computing system other
than a desktop computer
Lots more of these,
– Billions of units produced though they cost a lot
less each.
yearly, versus millions of
desktop units
General Purpose Computing System Embedded System
• A system which is a • A system which is a combination
combination of a generic of special purpose hardware and
hardware and a General embedded OS for executing a
Purpose Operating System for specific set of application
executing a variety of
application
• May or may not contain an
• Contains a General Purpose operating system for functioning
Operating System (GPOS)
• The firmware of the embedded
• Applications are alterable system is pre- programmed and
(programmable) by the user (It it is non- alterable by the end-
is possible for the end user to user (there may be exceptions
re-install the operating system, for system supporting OS kernel
and also add or remove user image flashing through special
application) hardware settings)
General Purpose Computing
System … Embedded System …
• Performance is the key • Application-specific requirement
deciding factor in the (like performance, power
selection of the system. requirements, memory usage, etc.)
Always, ‘Faster is Better’ are the key deciding factors
• Less/not at all tailored • Highly tailored to take advantage of
the power saving modes supported
towards reduced operating
by the hardware and the operating
power requirements, options system
for different levels of power
• For certain category of ESs like
management. mission critical systems, the
• Response requirements are response time requirement is
not time-critical highly critical
• Need not be deterministic in • Execution behavior is deterministic
execution behavior for certain types of ESs like ‘Hard
Real Time’ systems
A “short list” of embedded systems
•Anti-lock brakes •Modems
•Auto-focus cameras •MPEG decoders
•Automatic teller machines •Network cards
•Automatic toll systems •Network switches/routers
•Automatic transmission •Pagers
•Avionic systems •Photocopiers
•Battery chargers •Point-of-sale systems
•Camcorders •Portable video games
•Cell phones •Printers
•Cell-phone base stations •Satellite phones
•Cordless phones •Scanners
•Cruise control •Smart ovens/dishwashers
•Digital cameras •Speech recognizers
•Disk drives •Stereo systems
•Electronic card readers •Teleconferencing systems
•Electronic instruments •Televisions
•Electronic toys/games •Temperature controllers
•Factory control •Theft tracking systems
•Fax machines •TV set-top boxes
•Fingerprint identifiers •VCR’s, DVD players
•Home security systems •Video game consoles
•Life-support systems •Video phones
•Medical testing systems •Washers and dryers
And the list goes on and on … … …
Some common characteristics of ESs
• Single-functioned
– Executes a single program, repeatedly
• Tightly-constrained
– Low cost, low power, small, fast, etc.
• Reactive and real-time
– Continually reacts to changes in the system’s
environment
– Must compute certain results in real-time without
delay
An embedded system example –
a digital camera
Digital camera chip
CCD
lens
1. Based on generation
requirements
4. Based on triggering
Classification based on Generation:
• First Generation: ES were built around 8 bit
microprocessors like 8085, and Z80, and 4 bit
microcontrollers. Simple in hardware circuits with
firmware developed in Assembly code. e.g. telephone
keypads, stepper motor control unit.
• Second Generation: ES are built around 16 bit
microprocessors and 8 or 16 bit microcontrollers,
following the first generation ESs. The instruction set for
the second generation processors/controllers were much
more complex and powerful then 1st generation. Some of
2nd G ESs contained embedded operating systems for their
operation. Data Acquisition System, SCADA (Supervisory
Control And Data Acquisition) system.
Classification based on Generation: …
• Third Generation: With advances in processor tech.,
ES developers started making use of powerful 32 bit
processor 16 bit microcontrollers for their design. e.g.
DSPs, Application Specific Integrated Circuits (ASICs),
processors like Intel, Pentium, Motorola 68K.
• Common metrics
– Unit cost: the monetary cost of manufacturing each copy of the
system, excluding NRE cost
Chapter -2
• A custom single-purpose
ctrl
processor may be Memory controller ISA bus interface UART LCD ctrl
gate Conducts
1 if gate=1
drain
gate
IC package IC oxide
source channel drain
Silicon substrate
5
6
7
8
CMOS transistor implementations
• Complementary Metal source source
pMOS
– Typically 0 is 0V, 1 is 5V nMOS
x x x x
F x F F
x y F F x y F F x y F
y
0 0 y 0 0 0 0 0 0 y 0 0 0
1 1 0 1 0 0 1 1 0 1 1
F=x F=xy 1 0 0 F=x+y 1 0 1 F=xy 1 0 1
1 1 1 1 1 1 1 1 0
Driver AND OR XOR
x F x F x x y F x x y F x x y F
F F F
0 1 y 0 0 1 y 0 0 1 y 0 0 1
1 0 0 1 1 0 1 0 0 1 0
F = x’ F = (x y)’ 1 0 1 F = (x+y)’ 1 0 0 F=x y 1 0 0
Inverter NAND 1 1 0 NOR 1 1 0 XNOR 1 1 1
Combinational logic design
A) Problem description B) Truth table C) Output equations
Inputs Outputs
a b c y z y = a'bc + ab'c' + ab'c +
y is 1 if a is to 1, or b and c are
0 0 0 0 0 abc' + abc
1. z is 1 if b or c is to 1, but not 0 0 1 0 1
both, or if all are 1. 0 1 0 0 1
z = a'b'c + a'bc' + ab'c +
0 1 1 1 0 abc' + abc
1 0 0 1 0
1 0 1 1 1
D) Minimized output equations 1 1 0 1 1
y bc 1 1 1 1 1 E) Logic Gates
a 00 01 11 10
0 0 0 1 0
a y
1 1 1 1 1 b
c
y = a + bc
z
bc
a 00 01 11 10
0 0 1 0 1
z
1 0 1 1 1
z = ab + b’c + bc’
Combinational components
I(log n -1) I0 A A B
B A B
I(m-1) I1 I0 n
… n n n n
n …
log n x n n-bit n bit,
S0 n-bit, m x 1 n-bit
Decoder Adder m function S0
… Multiplexor Comparator
ALU …
… n
S(log m) S(log m)
n n
O(n-1) O1 O0 carry sum less equal greater
O O
I
n
load shift n-bit
n-bit n-bit
Register Shift register Counter
clear I Q
n n
Q Q
Q= Q = lsb Q=
0 if clear=1, - Content shifted 0 if clear=1,
I if load=1 and clock=1, Q(prev)+1 if count=1
- I stored in msb
Q(previous) otherwise. and clock=1.
Sequential logic design
a=1 a=1
1
a=1
2
a=0
• Given this implementation model
a=0 x=0 x=0
– Sequential logic design quickly reduces to
combinational logic design
Sequential logic design (cont.)
E) Minimized Output Equations F) Combinational Logic
I1 Q1Q0
a 00 01 11 10
a
0 0 0 1 1 x
I1 = Q1’Q0a + Q1a’ +
1 Q1Q0’
0 1 0 1
I0 Q1Q0 I1
00 01 11 10
a
0 0 1 1 0 I0 = Q0a’ + Q0’a
1 1 0 0 1
I0
x Q1Q0
a
00 01 11 10
0 0 0 1 0 x = Q1Q0
Q1 Q0
1 0 0 1 0
Custom single-
single-purpose processor basic model
… …
external external
control data controller datapath
inputs inputs
… …
datapath next-state registers
control and
controller inputs datapath control
logic
datapath
control state functional
outputs register units
… …
external external
control data
outputs outputs
… …
• Convert algorithm to
go_i x_i y_i !go_i
2-J:
GCD
“complex” state d_o
3: x = x_i
machine 4: y = y_i
d_o = x
9:
conversion }
9: d_o = x; 1-J:
}
State diagram templates
Creating the datapath
• Create a register for any
declared variable 1:
!1
1
• Create a functional unit
!(!go_i)
2:
!go_i
for each arithmetic 2-J:
operation 3: x = x_i
x!=y
units 6:
x<y !(x<y)
– Based on reads and writes 7: y = y -x 8: x = x - y
– Use multiplexors for 6-J:
multiple sources
5-J:
• Create unique identifier 9: d_o = x
1010 5-J:
1011 9: d_ld = 1
1100 1-J:
Controller state table for the Greatest
Common Divisor (GCD) example
Inputs Outputs
0 0 0 1 * * 0 0 0 1 0 X X 0 0 0
0 0 0 1 * * 1 0 0 1 1 X X 0 0 0
0 0 1 0 * * * 0 0 0 1 X X 0 0 0
0 0 1 1 * * * 0 1 0 0 0 X 1 0 0
0 1 0 0 * * * 0 1 0 1 X 0 0 1 0
0 1 0 1 0 * * 1 0 1 1 X X 0 0 0
0 1 0 1 1 * * 0 1 1 0 X X 0 0 0
0 1 1 0 * 0 * 1 0 0 0 X X 0 0 0
0 1 1 0 * 1 * 0 1 1 1 X X 0 0 0
0 1 1 1 * * * 1 0 0 1 X 1 0 1 0
1 0 0 0 * * * 1 0 0 1 1 X 1 0 0
1 0 0 1 * * * 1 0 1 0 X X 0 0 0
1 0 1 0 * * * 0 1 0 1 X X 0 0 0
1 0 1 1 * * * 1 1 0 0 X X 0 0 1
1 1 0 0 * * * 0 0 0 0 X X 0 0 0
1 1 0 1 * * * 0 0 0 0 X X 0 0 0
1 1 1 0 * * * 0 0 0 0 X X 0 0 0
1 1 1 1 * * * 0 0 0 0 X X 0 0 0
Completing the GCD custom single-
single-
purpose processor design
• We finished the … …
design, but we see the a view inside the controller and datapath
basic steps
RT--level custom single-
RT single-purpose
processor design
• We often start with a
Problem Specification
state machine Sende
r rdy_in
Bridge
A single-purpose processor that rdy_out
Rece
iver
converts two 4-bit inputs, arriving one
– Rather than algorithm clock at a time over data_in along with a
rdy_in pulse, into one 8-bit output on
central to functionality
• Example rdy_in=0
rdy_in=1
Bridge rdy_in=1
(a) Controller
rdy_in=0 rdy_in=1
rdy_in=1
WaitFirst4 RecFirst4Start RecFirst4End
data_lo_ld=1
Send8Start Send8End
data_out_ld=1 rdy_out=0
rdy_out=1
rdy_in rdy_out
clk
data_in(4) data_out
data_lo_ld
data_out_ld
data_hi_ld
data_hi data_lo
registers
to all
data_out
(b) Datapath
Optimizing single-
single-purpose processors
• Optimization is the task of making
design metric values the best
possible
• Optimization opportunities
–original program
–FSMD
–datapath
–FSM
Optimizing the original program
2-J: x = x_i
3: y = y_i
merge state 2 and state 2J – no loop operation in
3: x = x_i between them
5:
x!=y
9: d_o = x
6: merge state 5 and state 6 – transitions from state 6
x<y !(x<y) can be done in state 5
y = y -x 8: x = x - y
7:
eliminate state 5J and 6J – transitions from each state
6-J: can be done from state 7 and state 8, respectively
5-J:
eliminate state 1-J – transition from state 1-J can be
d_o = x done directly from state 9
9:
1-J:
Optimizing the datapath
Chapter -3
– Datapath is general
PC IR
– Control unit
doesn’t store the
I/O
algorithm – the
algorithm is Memory
“programmed”
into the memory
Datapath Operations
• Load Processor
– Input certain
registers through 10 11
PC IR
ALU, store back in
register
I/O
...
Memory
10
• Store 11
...
– Write register to
memory location
Control Unit
• Control unit: configures the
datapath operations
Processor
– Sequence of desired operations
(“instructions”) stored in Control unit Datapath
memory – “program” ALU
Controller Control
• Instruction cycle – broken into /Status
several sub-operations, each
one clock cycle, e.g.: Registers
IR Registers
– PC: program
counter, always PC 100 IR R0 R1
load R0, M[500]
points to next
instruction I/O
...
– IR: holds the 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
501 ...
fetched 102 store M[501], R1
instruction
Control Unit Sub-Operations
/Status
what the
Registers
instruction
means PC 100 IR
load R0, M[500] R0 R1
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0
102 store M[501], R1
501 ...
Control Unit Sub-Operations
• Fetch Control unit
Processor
Datapath
/Status
from
10
memory to PC 100 IR
load R0, M[500] R0 R1
datapath I/O
...
register 100 load R0, M[500]
101 inc R1, R0
Memory
500 10
ALU Registers
– This particular 10
PC 100 IR R0 R1
instruction load R0, M[500]
to memory Registers
– This particular 10
PC 100 IR R0 R1
instruction load R0, M[500]
Registers
10
PC 100 IR R0 R1
load R0, M[500]
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
Instruction Cycles
PC=100 Processor
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops result
clk s 10 11
PC 101 IR R0 R1
inc R1, R0
I/O
...
100 load R0, M[500] Memory
500 10
101 inc R1, R0 501 ...
102 store M[501], R1
Instruction Cycles
PC=100 Processor
PC=101
Registers
Fetch Decode Fetch Exec. Store
ops result
clk s 10 11
PC 102 IR R0 R1
store M[501], R1
PC=102
Fetch Decode Fetch Exec. Store I/O
ops result ...
s 100 load R0, M[500] Memory
clk 500 10
101 inc R1, R0 501 11
...
102 store M[501], R1
Architectural Considerations
• N-bit processor Processor
16-bit, 32-bit
common
PC IR
– Desktop/servers:
32-bit, even 64
• PC size determines
I/O
Memory
address space
Architectural Considerations
• Clock frequency Processor
ALU
period Controller Control
/Status
– Must be longer
than longest Registers
register to
register delay in PC IR
entire processor
– Memory access is I/O
Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Non-pipelined Pipelined
Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Fetch-instr. 1 2 3 4 5 6 7 8
Decode 1 2 3 4 5 6 7 8
Execute 1 2 3 4 5 6 7 8
Instruction 1
Store res. 1 2 3 4 5 6 7 8
Time
pipelined instruction execution
Superscalar and VLIW Architectures
• Performance can be improved by:
– Faster clock (but there’s a limit)
– Pipelining: slice up instruction into stages, overlap
stages
– Multiple ALUs to support more than one instruction
stream
• Superscalar
– Scalar: non-vector operations
– Fetches instructions in batches, executes as many as possible
» May require extensive hardware to detect independent
instructions
– VLIW (Very Long Instruction Word): each word in memory has
multiple independent instructions
» Relies on the compiler to detect and schedule instructions
» Currently growing in popularity
Two Memory Architectures
Processor Processor
• Princeton
– Fewer memory
wires
• Harvard
– Simultaneous Program
memory
Data memory Memory
(program and data)
program and data
memory access
Harvard Princeton
Cache Memory
• Memory access may Fast/expensive technology, usually on
the same chip
be slow Processor
processor
– Holds copy of part of Memory
...
• Instruction Set
– Defines the legal set of instructions for that processor
• Data transfer: memory/register, register/register, I/O, etc.
• Arithmetic/logical: move register through ALU and back
• Branches: determine next PC value when not just PC+1
A Simple (Trivial) Instruction Set
Assembly instruct. First byte Second byte Operation
Immediate Data
Register-direct
Register address Data
Register
Register address Memory address Data
indirect
Data
Sample Programs
C program Equivalent assembly program
• Assemblers
Linker
Library Debugger
• Linkers
Exec.
File Profiler
• Debuggers
Implementation Phase Verification Phase
• Profilers
Running a Program
• If development processor is different than
target, how can we run our compiled code? Two
options:
– Download to target processor
– Simulate
• Simulation
– One method: Hardware description language
• But slow, not always available
– Another method: Instruction set simulator (ISS)
• Runs on development processor, but executes instructions
of target processor
Instruction Set Simulator For A Simple
Processor
#include <stdio.h> }
typedef struct { }
unsigned char first_byte, second_byte; return 0;
} instruction; }
Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998
Designing a General Purpose Processor
FSMD
down 0001
Mov2 M[dir] = RF[rn]
to Fetch
control
– register file holds each of the signals
RFwa
RFw
variables Controller
(Next-state and RFwe
control From all RF (16)
• Functional units to carry out logic; state register) output
control
RFr1a
RFr1e
the FSMD operations 16
signals
RFr2a
Irld
– One ALU carries out every
PCld
PC IR RFr1 RFr2
PCinc RFr2e
FSM
• Unique identifiers created
for every control signal
A Simple Microprocessor
Reset PC=0; PCclr=1;
Chapter – 4
Memory
4. Memory [5 Hrs.]
• RAM
–Misnamed as all semiconductor
memory is random access
–Read/Write
–Volatile
–Temporary storage
–Static or dynamic
Memory Cell Operation
Dynamic RAM
• Bits stored as charge in capacitors
• Charges leak
• Need refreshing even when powered
• Simpler construction
• Smaller per bit
• Less expensive
• Need refresh circuits
• Slower
• Main memory
• Essentially analogue
– Level of charge determines value
Dynamic RAM Structure
DRAM Operation
• Address line active when bit read or written
– Transistor switch closed (current flows)
• Write
– Voltage to bit line
• High for 1 low for 0
– Then signal address line
• Transfers charge to capacitor
• Read
– Address line selected
• transistor turns on
– Charge from capacitor fed via bit line to sense amplifier
• Compares with reference value to determine 0 or 1
– Capacitor charge must be restored
Static RAM
• Bits stored as on/off switches
• No charges to leak
• No refreshing needed when powered
• More complex construction
• Larger per bit
• More expensive
• Does not need refresh circuits
• Faster
• Cache
• Digital
– Uses flip-flops
Stating RAM Structure
Static RAM Operation
• Transistor arrangement gives stable logic state
• State 1
– C1 high, C2 low
– T1 T4 off, T2 T3 on
• State 0
– C2 high, C1 low
– T2 T3 off, T1 T4 on
• Address line transistors T5 T6 is switch
• Write – apply value to B & compliment to B
• Read – value is on line B
Basic types of RAM
• SRAM: Static RAM memory cell internals
– Memory cell uses flip-flop to store bit
– Requires 6 transistors SRAM
– Holds data as long as power supplied
Data' Data
• Permanent storage
–Nonvolatile
• Microprogramming
• Library subroutines
• Systems programs (BIOS)
• Function tables
Types of ROM
• Written during manufacture
– Very expensive for small runs
• Programmable (once)
– PROM
– Needs special equipment to program
• Read “mostly”
– Erasable Programmable (EPROM)
• Erased by UV
– Electrically Erasable (EEPROM)
• Takes much longer to write than read
– Flash memory
• Erase whole memory electrically
Organisation in detail
• A 16Mbit chip can be organised as 1M of 16 bit
words
• A bit per chip system has 16 lots of 1Mbit chip
with bit 1 of each word in chip 1 and so on
• A 16Mbit chip can be organised as a 2048 x
2048 x 4bit array
– Reduces number of address pins
• Multiplex row address and column address
• 11 pins to address (211=2048)
• Adding one more pin doubles range of values so x4
capacity
ROM: “Read-Only” Memory
• Nonvolatile memory
• Can be read from but not written to, by a
processor in an embedded system External view
…
Ak-1
– Store software program for general-purpose
…
processor
• program instructions can be one or more Qn-1 Q0
ROM words
– Store constant data needed by system
– Implement combinational circuit
Example: 8 x 4 ROM
• Horizontal lines = words
• Vertical lines = data Internal view
• Lines connected only at circles 8 × 4 ROM
• Want inexpensive,
fast memory
• Main memory
– Large, inexpensive,
slow memory
stores entire
program and data
• Cache
• Small, expensive, fast memory stores copy of likely accessed
parts of larger memory
• Can be multiple levels of cache
Cache
• Usually designed with SRAM
– faster but more expensive than DRAM
• Usually on same chip as processor
– space limited, so much smaller than off-chip main memory
– faster access ( 1 cycle vs. several cycles for main memory)
• Cache operation:
– Request for main memory access (read or write)
– First, check cache for copy
• cache hit
– copy is in cache, quick access
• cache miss
– copy not in cache, read address and possibly its neighbors into cache
• Valid bit
– indicates whether data in slot has been Valid
loaded from memory =
• Offset
– used to find particular word in cache
line
Fully associative mapping
• Complete main memory address stored in each cache address
• All addresses stored in cache simultaneously compared with
desired address
• Valid bit and offset same as direct mapping
Set-associative mapping
• Compromise between direct mapping and fully
associative mapping
• Index same as in direct mapping
• But, each cache address contains content and tags of
2 or more memory address locations
• Tags of that set
simultaneously compared
as in fully associative
mapping
• Cache with set size N called
N-way set-associative
– 2-way, 4-way, 8-way are
common
Cache-replacement policy
• Technique for choosing which block to replace
– when fully associative cache is full
– when set-associative cache’s line is full
• Direct mapped cache has no choice
• Random
– replace block chosen at random
• LRU: least-recently used
– replace block not accessed for longest time
• FIFO: first-in-first-out
– push block onto queue when accessed
– choose block to replace by popping queue
Cache write techniques
• When written, data cache must update main
memory
• Write-through
– write to main memory whenever cache is written to
– easiest to implement
– processor must wait for slower main memory write
– potential for unnecessary writes
• Write-back
– main memory only written when “dirty” block replaced
– extra dirty bit for each block set when cache block
written to
– reduces number of slow main memory writes
Cache impact on system performance
• Most important parameters in terms of performance:
– Total size of cache
• total number of data bytes cache can hold
• tag, valid and other house keeping bits not included in total
– Degree of associativity
– Data block size
• Larger caches achieve lower miss rates but higher access cost e.g.,
• 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles
– avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles
• 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change
– avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles
(improvement)
• 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change
– avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles
(worse)
Cache performance trade-offs
• Improving cache hit rate without increasing size
– Increase line size
– Change set-associativity
Advanced RAM
• DRAMs commonly used as main memory in
processor based embedded systems
– high capacity, low cost
• Many variations of DRAMs proposed
– need to keep space with processor speeds
– FPM DRAM: fast page mode DRAM
– EDO DRAM: extended data out DRAM
– SDRAM/ESDRAM: synchronous and enhanced
synchronous DRAM
– RDRAM: rambus DRAM
Basic DRAM
• Address bus multiplexed between row and column components
• Row and column addresses are latched in, sequentially, by strobing ras
and cas signals, respectively
• Refresh circuitry can be external or internal to DRAM device
– strobes consecutive memory address periodically causing memory content to
be refreshed
– Refresh circuitry disabled during read or write operation
Typical 16 Mb DRAM (4M x 4)
Packaging
Fast Page Mode DRAM (FPM DRAM)
•Each row of memory bit array is viewed as a page
•Page contains multiple words
•Individual words addressed by column address
•Timing diagram:
– row (page) address sent
– 3 words read consecutively by sending column address for each
•Extra cycle eliminated on each read/write of words from same page
Extended data out DRAM (EDO DRAM)
• Improvement of FPM DRAM
• Extra latch before output buffer
– allows strobing of cas before data read operation
completed
• Reduces read/write latency by additional cycle
Advanced DRAM Organization
• Duties of MMU
– Handles DRAM refresh, bus interface and
arbitration
– Takes care of memory sharing among multiple
processors
– Translates logic memory addresses from processor
to physical memory addresses of DRAM
• Modern CPUs often come with MMU built-in
• Single-purpose processors can be used
Embedded Systems
Chapter – 5
Interfacing
5. Interfacing [6 Hrs.]
(ack – acknowledge
req - request)
A strobe/handshake compromise
ISA bus protocol – memory access
• ISA: Industry
Standard
Architecture
– Common in 80x86’s
• Features
– 20-bit address
– Compromise
strobe/handshake
control
• 4 cycles default
• Unless CHRDY (channel
ready) deasserted –
resulting in additional
wait cycles (up to 6)
Microprocessor interfacing:
I/O addressing
• A microprocessor communicates with other
devices using some of its pins
– Port-based I/O (parallel I/O)
• Processor has one or more N-bit ports
• Processor’s software reads and writes a port just like a
register; e.g., P0 = 0xFF; v = P1.2; -- P0 and P1 are 8-bit ports
– Bus-based I/O
• Processor has address, data and control ports that form a
single bus
• Communication protocol is built into the processor
• A single instruction carries out the read or write protocol on
the bus
Compromises/extensions
• Parallel I/O peripheral
– When processor only supports bus-based
I/O but parallel I/O needed
– Each port on peripheral connected to a
register within peripheral that is
read/written by the processor
• Extended parallel I/O
– When processor supports port-based I/O
but more ports needed
– One or more processor ports interface
with parallel I/O peripheral extending
total number of ports available for I/O
– e.g., extending 4 ports to 6 ports in figure
Types of bus-based I/O:
memory-mapped I/O and standard I/O
• Processor talks to both memory and peripherals using
same bus – two ways to talk to peripherals
– Memory-mapped I/O
• Peripheral registers occupy addresses in same address
space as memory
• e.g., Bus has 16-bit address
– lower 32K addresses may correspond to memory
– upper 32k addresses may correspond to peripherals
– Standard I/O (I/O-mapped I/O)
• Additional pin (M/IO) on bus indicates whether a memory
or peripheral access
• e.g., Bus has 16-bit address
– all 64K addresses correspond to memory when M/IO set to 0
– all 64K addresses correspond to peripherals when M/IO set to 1
Memory-mapped I/O vs. Standard I/O
• Memory-mapped I/O
– Requires no special instructions
• Assembly instructions involving memory like MOV and
ADD work with peripherals as well
• Standard I/O requires special instructions (e.g., IN, OUT)
to move data between peripheral registers and memory
• Standard I/O
– No loss of memory addresses to peripherals
– Simpler address decoding logic in peripherals
possible
• When number of peripherals much smaller than address
space then high-order address bits can be ignored
– smaller and/or faster comparators
ISA bus
• Industry Standard Architecture (ISA) supports standard
I/O
– /IOR (IO - read) distinct from /MEMR (memory - read)for
peripheral read
• /IOW used for writes
– 16-bit address space for I/O vs. 20-bit address space for memory
– Otherwise very similar to memory protocol
A basic memory protocol
P0 Adr. 7..0 Data
P2 Adr. 15…8
Q Adr. 7…0
ALE
/RD
of 16.
Interrupt-driven I/O using fixed ISR
location
Program memory μP Data memory
4(a): The ISR reads ISR
16: MOV R0, 0x8000
data from 0x8000, 17: # modifies R0 System bus
modifies the data, 18: MOV 0x8001, R0
19: RETI # ISR return
and writes the ... Int P1 P2
Main program 0
resulting data to ...
100: instruction
PC 0x8000 0x8001
0x8001. 101: instruction 100
data to 0x8001.
resumes
Interrupt address table
• Compromise between fixed and vectored
interrupts
– One interrupt pin
– Table in memory holding ISR addresses (may
be 256 words)
– Peripheral doesn’t provide ISR address, but
rather index into table
• Fewer bits are sent by the peripheral
• Can move ISR location without changing
peripheral
Additional interrupt issues
• Maskable vs. non-maskable interrupts
– Maskable: programmer can set bit that causes processor to
ignore interrupt
• Important when in the middle of time-critical code
– Non-maskable: a separate interrupt pin that can’t be masked
• Typically reserved for drastic situations, like power failure requiring
immediate backup of data to non-volatile memory
• Jump to ISR
– Some microprocessors treat jump same as call of any
subroutine
• Complete state saved (PC, registers) – may take hundreds of cycles
– Others only save partial state, like PC only
• Thus, ISR must not modify registers, or else must save them first
• Assembly-language programmer must be aware of which registers
stored
Direct memory access
• Buffering
– Temporarily storing data in memory before processing
– Data accumulated in peripherals commonly buffered
register with
address 0x8000.
Peripheral to memory transfer without
DMA, using vectored interrupt
Program memory μP Data memory
2: P1 asserts Int ISR
16: MOV R0, 0x8000
0x0000 0x0001
17: # modifies R0
to request 18: MOV 0x0001, R0
19: RETI # ISR return
System bus
...
servicing by the Main program
...
Inta
Int
P1
100: instruction 16
100
1
0x8000
Peripheral to memory transfer without
DMA, using vectored interrupt (cont’)
μP Data memory
3: After completing ISR
Program memory
0x0000 0x0001
16: MOV R0, 0x8000
instruction at 100, 17: # modifies R0
18: MOV 0x0001, R0 System bus
P sees Int asserted, 19: RETI # ISR return
... 1
100
0x8000
Inta.
Peripheral to memory transfer without
DMA, using vectored interrupt (cont’)
Program memory μP Data memory
4: P1 detects ISR
16: MOV R0, 0x8000
0x0000 0x0001
17: # modifies R0
Inta and puts 18: MOV 0x0001, R0
19: RETI # ISR return
16
System bus
...
interrupt Main program
...
Inta
Int
P1
16
100: instruction
100
0x8000
16 on the data
bus.
Peripheral to memory transfer without
DMA, using vectored interrupt (cont’)
5(a): P jumps to the Program memory μP Data memory
ISR 0x0000 0x0001
16: MOV R0, 0x8000
address on the bus (16). 17: # modifies R0
System bus
The ISR there reads 18: MOV 0x0001,
0x8001, R0
19: RETI # ISR return
...
data from 0x8000 and Main program Inta P1
...
then writes it to 0x0001, 100: instruction
Int
0
16
101: instruction PC
0x8000
which is in memory. 100
μP Data memory
2: P1 asserts req to Program memory
0x0000 0x0001
Dreq to request
control of system bus
Peripheral to memory transfer with DMA (cont’)
μP Data memory
4: After executing Program memory
0x0000 0x0001
(Meanwhile,
processor still
executing if not
stalled!)
Peripheral to memory transfer with DMA (cont’)
μP Data memory
6: DMA de-asserts Program memory
0x0000 0x0001
No ISR needed!
Dreq and ack System bus
Processor Memory
ISA-Bus
R A
R
DMA A I/O Device
CYCLE C1 C2 C3 C4 C5 C6 CYCLE C1 C2 C3 C4 C5 C6
C7 C7
CLOCK CLOCK
ALE ALE
/IOR /MEMR
/MEMW /IOW
CHRDY CHRDY
Arbitration: Priority arbiter
Consider the situation where multiple peripherals request service from
single resource (e.g., microprocessor, DMA controller) simultaneously -
which gets serviced first?
Priority arbiter
Single-purpose processor
Peripherals make requests to arbiter, arbiter makes requests to
resource
Arbiter connected to system bus for configuration only
Micro-
processor
System bus 7
Inta 5
Priority Peripheral1 Peripheral2
Int arbiter
3
Ireq1 2 2
Iack1 6
Ireq2
Iack2
Arbitration using a priority arbiter
Micro-
processor
System bus 7
Inta 5
Priority Peripheral1 Peripheral2
Int arbiter
3
Ireq1 2 2
Iack1 6
Ireq2
Iack2
Types of priority
Fixed priority
each peripheral has unique rank
highest rank chosen first with simultaneous requests
preferred when clear difference in rank between peripherals
Rotating priority Arbitration (called, round-robin)
priority changed based on history of servicing
better distribution of servicing especially among peripherals with
similar priority demands
Arbitration: Daisy-chain
arbitration
Arbitration done by peripherals
Built into peripheral or external logic added
req input and ack output added to each peripheral
Peripherals connected to each other in daisy-chain manner
One peripheral connected to resource, all others connected “upstream”
Peripheral’s req flows “downstream” to resource, resource’s ack flows
“upstream” to requesting peripheral
Closest peripheral has highest priority
P
System bus
Peripheral1 Peripheral2
Inta
Ack_in Ack_out Ack_in Ack_out
Int Req_out Req_in Req_out Req_in 0
Micro-
P
processor System bus
System bus
Inta
Priority Peripheral Peripheral Peripheral1 Peripheral2
Int arbiter 1 2 Inta
Ack_in Ack_out Ack_in Ack_out
Ireq1 Int Req_out Req_in Req_out Req_in 0
Iack1
Ireq2
Iack2 Daisy-chain aware peripherals
Network-oriented arbitration
Memory Bus
Peripherals receive external data and raise
DATA Peripheral 1 Peripheral 2 Jump Table interrupt
void Peripheral1_ISR(void) {
unsigned char data;
data = PERIPHERAL1_DATA_REG;
// do something with the data
}
unsigned char ARBITER_MASK_REG _at_ 0xfff0; void Peripheral2_ISR(void) {
unsigned char data;
unsigned char ARBITER_CH0_INDEX_REG _at_ 0xfff1;
data = PERIPHERAL2_DATA_REG;
unsigned char ARBITER_CH1_INDEX_REG _at_ 0xfff2;
// do something with the data
unsigned char ARBITER_ENABLE_REG _at_ 0xfff3; }
unsigned char PERIPHERAL1_DATA_REG _at_ 0xffe0; void InitializePeripherals(void) {
unsigned char PERIPHERAL2_DATA_REG _at_ 0xffe1; ARBITER_MASK_REG = 0x03; // enable both channels
unsigned void* INTERRUPT_LOOKUP_TABLE[256] _at_ 0x0100; ARBITER_CH0_INDEX_REG = 13;
ARBITER_CH1_INDEX_REG = 17;
void main() { INTERRUPT_LOOKUP_TABLE[13] = (void*)Peripheral1_ISR;
InitializePeripherals(); INTERRUPT_LOOKUP_TABLE[17] = (void*)Peripheral2_ISR;
for(;;) {} // main program goes here ARBITER_ENABLE_REG = 1;
} }
Intel 8237 DMA controller
Signal Description
D[7..0] Intel 8237 REQ 0
A[19..0] ACK 0 D[7..0] These wires are connected to the system bus (ISA) and are used by the
ALE microprocessor to write to the internal registers of the 8237.
MEMR REQ 1
ACK 1
A[19..0] These wires are connected to the system bus (ISA) and are used by the DMA to
MEMW
IOR issue the memory location where the transferred data is to be written to. The 8237 is
IOW REQ 2 ALE* also addressed
This by the
is the address micro-processor
latch through
enable signal. The 8237theuselower bits ofwhen
this signal thesedriving
addressthelines.
ACK 2 system bus (ISA).
HLDA MEMR* This is the memory write signal issued by the 8237 when driving the system bus
HRQ REQ 3 (ISA).
ACK 3
MEMW* This is the memory read signal issued by the 8237 when driving the system bus (ISA).
IOR* This is the I/O device read signal issued by the 8237 when driving the system bus
(ISA) in order to read a byte from an I/O device
IOW* This is the I/O device write signal issued by the 8237 when driving the system bus
(ISA) in order to write a byte to an I/O device.
HLDA This signal (hold acknowledge) is asserted by the microprocessor to signal that it has
relinquished the system bus (ISA).
HRQ This signal (hold request) is asserted by the 8237 to signal to the microprocessor a
request to relinquish the system bus (ISA).
REQ 0,1,2,3 An attached device to one of these channels asserts this signal to request a DMA
transfer.
ACK 0,1,2,3 The 8237 asserts this signal to grant a DMA transfer to an attached device to one of
these channels.
*See the ISA bus description in this chapter for complete details.
Intel 8259 programmable priority
controller
D[7..0] Intel 8259 IR0 Signal Description
A[0..0] IR1 D[7..0] These wires are connected to the system bus and are used by the microprocessor to
RD IR2 write or read the internal registers of the 8259.
WR IR3
INT IR4 A[0..0] This pin actis in cunjunction with WR/RD signals. It is used by the 8259 to decipher
INTA IR5 various command words the microprocessor writes and status the microprocessor
IR6 wishes to read.
CAS[2..0] IR7
SP/EN WR When this write signal is asserted, the 8259 accepts the command on the data line, i.e.,
the microprocessor writes to the 8259 by placing a command on the data lines and
asserting this signal.
RD When this read signal is asserted, the 8259 provides on the data lines its status, i.e., the
microprocessor reads the status of the 8259 by asserting this signal and reading the data
lines.
INT This signal is asserted whenever a valid interrupt request is received by the 8259, i.e., it
is used to interrupt the microprocessor.
INTA This signal, is used to enable 8259 interrupt-vector data onto the data bus by a sequence
of interrupt acknowledge pulses issued by the microprocessor.
SP/EN This function is used in conjunction with the CAS signals for cascading purposes.
Multilevel bus architectures
• Don’t want one bus for all communication
– Peripherals would need high-speed, processor-specific bus interface
• excess gates, power consumption, and cost; less portable
– Too many peripherals slows down bus
Processor-local bus Micro-
processor
Cache Memory
controller
DMA
controller
High speed, wide, most frequent
communication
Connects microprocessor, cache, Processor-local bus
memory controllers, etc.
Peripheral Peripheral Peripheral Bridge
Peripheral bus
Lower speed, narrower, less frequent
communication
Typically industry standard bus (ISA, Peripheral bus
PCI) for portability
• Bridge
– Single-purpose processor converts communication between busses
Advanced communication principles
Layering
Break complexity of communication protocol into pieces easier to
design and understand
Lower levels provide services to higher level
Lower level might work with bits while higher level might work with packets
of data
Physical layer
Lowest level in hierarchy
Medium to carry data from one actor (device or node) to another
Parallel communication
Physical layer capable of transporting multiple bits of data
Serial communication
Physical layer transports one bit of data at a time
Wireless communication
No physical connection needed for transport at physical layer
Parallel communication
Multiple data, control, and possibly power wires
One bit per wire
Parity: extra bit sent with word used for error detection
Odd parity: data word plus parity bit contains odd number of 1’s
Even parity: data word plus parity bit contains even number of 1’s
Always detects single bit errors, but not all burst bit errors
From From
Servant receiver
D
C
S A A A A R A D D D A S O
T R 6 5 0 / C 8 7 0 C T P
T w K K
Typical read/write cycle
Serial protocols: CAN
CAN (Controller area network)
Protocol for real-time applications
Developed by Robert Bosch GmbH
Originally for communication among components of cars
Applications now using CAN include:
elevator controllers, copiers, telescopes, production-line control
systems, and medical instruments
Data transfer rates up to 1 Mbit/s and 11-bit addressing
Common devices interfacing with CAN:
8051-compatible 8592 processor and standalone CAN controllers
Actual physical design of CAN bus not specified in protocol
Requires devices to transmit/detect dominant and recessive signals to/from
bus
e.g., ‘1’ = dominant, ‘0’ = recessive if single data wire used
Bus guarantees dominant signal prevails over recessive signal if asserted
simultaneously
Serial protocols: FireWire
FireWire (I-Link, or Lynx, IEEE 1394)
High-performance serial bus developed by Apple Computer Inc.
Designed for interfacing independent electronic components
e.g., Desktop, scanner
Data transfer rates from 12.5 to 400 Mbits/s, 64-bit addressing
Plug-and-play capabilities
Packet-based layered design structure
Applications using FireWire include:
disk drives, printers, scanners, cameras
Capable of supporting a LAN similar to Ethernet
64-bit address:
10 bits for network ids, 1023 subnetworks
48 bits for memory address, each node can have 281 terabytes of distinct
locations
Serial protocols: USB
USB (Universal Serial Bus)
Easier connection between PC and monitors, printers, digital speakers,
modems, scanners, digital cameras, joysticks, multimedia game
equipment
2 data rates:
12 Mbps for increased bandwidth devices
1.5 Mbps for lower-speed devices (joysticks, game pads)
Tiered (layered) star topology can be used
One USB device (hub) connected to PC
hub can be embedded in devices like monitor, printer, or keyboard or can be
standalone
Multiple USB devices can be connected to hub
Up to 127 devices can be connected like this
USB host controller
Manages and controls bandwidth and driver software required by each
peripheral
Dynamically allocates power downstream according to devices
connected/disconnected
Parallel protocols: PCI Bus
PCI Bus (Peripheral Component Interconnect)
High performance bus originated at Intel in the early
1990’s
Standard adopted by industry and administered by
PCISIG (PCI Special Interest Group)
Interconnects chips, expansion boards, processor
memory subsystems
Data transfer rates of 127.2 to 508.6 Mbits/s and 32-bit
addressing
Later extended to 64-bit while maintaining compatibility with
32-bit schemes
Synchronous bus architecture
Multiplexed data/address lines
Parallel protocols: ARM Bus
ARM Bus
Designed and used internally by ARM
Corporation
Interfaces with ARM line of processors
Many IC design companies have own bus
protocol
Data transfer rate is a function of clock speed
If clock speed of bus is X, transfer rate = 16 x X bits/s
32-bit addressing
Wireless protocols: IrDA
IrDA
Protocol suite that supports short-range point-to-
point infrared data transmission
Created and promoted by the Infrared Data
Association (IrDA)
Data transfer rate of 9.6 kbps and 4 Mbps
IrDA hardware deployed in notebook computers,
printers, PDAs, digital cameras, public phones, cell
phones
Lack of suitable drivers has slowed use by
applications
Windows 2000/98 now include support
Becoming available on popular embedded OS’s
Wireless protocols: Bluetooth
Bluetooth
New, global standard for wireless
connectivity
Based on low-cost, short-range radio
link
Connection established when within 10
meters of each other
No line-of-sight required
e.g., Connect to printer in another room
Wireless Protocols: IEEE 802.11
IEEE 802.11
Proposed standard for wireless LANs
Specifies parameters for PHY and MAC layers of
network
PHY layer
physical layer
handles transmission of data between nodes
provisions for data transfer rates of 1 or 2 Mbps
operates in 2.4 to 2.4835 GHz frequency band (RF)
or 300 to 428,000 GHz (IR)
MAC layer
medium access control layer
protocol responsible for maintaining order in shared medium
collision avoidance/detection
Embedded Systems
Chapter – 6
8/11/2015 1
6. Real-Time Operating System [8 Hrs.]
Central server
8/11/2015 5
Fire Alarm System
• Problem
– Hundreds of sensors, each fitted with Low Range Wireless
• Sensor information to be logged in a server & appropriate action
initiated
• Possible Solution
– Collaborative Action
• Routing
– Dynamic – Sensors/controllers may go down
– Auto Configurable – No/easy human intervention.
– Less Collision/Link Clogging
– Less no of intermediate nodes
» Fast Response Time
– Secure
8/11/2015 6
RTOS: Target Architectures
Processors MIPS
Microcontrollers ~20
ARM7 100-133
ARM9 180-250
Strong ARM 206
Intel Xscale 400
Mips4Kcore 400
X86
8/11/2015 7
Operating System Basics contd…
contd …
8/11/2015 9
Operating System Basics contd…
contd …
Process Management:
• deals with managing the processes/tasks.
• Includes setting up the memory space for the process
• Loading the process’s code into the memory space
• Allocating system resources
• Scheduling and managing the execution of the process
• Setting up and managing the process control Block
(PCB)
• Inter process communication and synchronization
• Process termination/deletion
8/11/2015 10
Operating System Basics contd…
contd …
– Distinction:
• Desktop OS – OS is in control at all times and runs applications, OS runs
in different address space
• RTOS – OS and embedded software are integrated, ES starts and
activates the OS – both run in the same address space (RTOS is less
protected)
• RTOS includes only service routines needed by the ES application
• RTOS vendors: VsWorks, VTRX, Nucleus, LynxOS, uC/OS
• Most conform to POSIX (IEEE standard for OS interfaces)
• Desirable RTOS properties: use less memory, application programming
interface, debugging tools, support for variety of microprocessors,
already-debugged network drivers
8/11/2015 16
Hard and Soft Real Time Systems
• Hard Real Time System
– Failure to meet deadlines is fatal
– example : Flight Control System
• Qualitative Definition.
8/11/2015 17
Hard and Soft Real Time Systems
(Operational Definition)
• Hard Real Time System
– Validation by provably correct procedures or extensive
simulation that the system always meets the timings
constraints
8/11/2015 18
Operating System Types contd…
contd …
8/11/2015 21
Tasks
Blocked Ready
Task States
Running
8/11/2015 22
Tasks
Here are answers to some common questions
about the scheduler and task states'.
8/11/2015 25
Tasks
Microprocessor Responds to a Button under an RTOS;
8/11/2015 26
Tasks
RTOS Initialization Code
8/11/2015 27
• Tasks and Data
– (See Fig 6.5, Fig 6.6, Fig 6.7, and Fig 6.8)
Tank Monitoring System
8/11/2015 30
Tasks in the Underground Tank System
8/11/2015 31
8/11/2015
Tank Monitoring Design
32
• Tasks – 2
– Variants:
• Binary semaphores – single resource, one-at-a time, alternating in use
(also for resources)
• Counting semaphores – multiple instances of resources,
increase/decrease of integer semaphore variable
• Mutex – protects data shared while dealing with priority inversion
problem
Process
Stack
Stack Pointer
Working registers
Status registers
process
• Is a single sequential
flow of control Data memory for process
within a process Code memory for process
• Also known as light Memory organization of a process
weight process and its associated Threads
8/11/2015 68
Multithreading ……
• POSIX Threads (Portable Operating System Interface)
8/11/2015 69
8/11/2015 70
8/11/2015 71
Win 32 Threads:
• are the threads supported by various flavors of windows
OS.
• Win 32 Application Programming Interface (Win 32 API)
libraries provide the standard set of Win 32 thread
creation and management functions.
• Win 32 threads are created with the API
Context
8/11/2015 switching 74
Real-Time Kernels
• A process is an abstraction of a running
program and is the logical unit of work
scheduled by OS
8/11/2015 78
Cyclic Executives
For(;;){/* do forever in round-robin fashion*/
Process1();
Process2();
..
ProcessN();
}
Different rates example:
For(;;){/* do forever in round-robin fashion*/
Process1();
Process2();
Process3();/*process 3 executes 50% of the time*/
Process3();
}
8/11/2015 79
State-Driven Code
It uses if-then, case statements or finite state automata to break up
processing of functions into code segments
For(;;){/*dining philosophers*/
switch (state)
case Think: pause(random()); state=Wait; break;
case Wait: if (forks_available()) state=Eat;
case Eat: pause(random()); return_forks(); state=Think;
}
Return forks
}
Eat
Think Take forks
Take forks
Wait forks
Wait
8/11/2015 80
Coroutines
Void process_i(){//code of the i-th process
switch (state_i){// it is a state variable of the i-th process
case 1: phase1_i(); break;
case 2: phase2_i(); break;
..
case N: phaseN_i();break; 1 2 N
}
}
Dispatcher(){
For(;;){ /*do forever*/
Dispatcher
process_1();
..
process_M();
}
8/11/2015 81
Interrupt-Driven Systems
Interrupt Service Routine (ISR) takes action in response to the interrupt
Reentrant code can be used by multiple processes. Reentrant ISR can
serve multiple interrupts. Access to critical resources in mutually
exclusive mode is obtained by disabling interrupts
On context switching save/restore:
•General registers
•PC, PSW
•Coprocessor registers
•Memory page register
•Images of memory-mapped I/O locations
The stack model is used mostly in embedded systems
8/11/2015 82
Pseudocode for Interrupt Driven System
Main(){//initialize system, load interrupt handlers
init();
while(TRUE);// infinite loop
}
Intr_handler_i(){// i-th interrupt handler
save_context();// save registers to the stack
task_i(); // launch i-th task
restore_context();// restore context from the stack
}
Work with a stack:
Push x: SP-=2; *SP=x;
Pop x: x=*SP; SP+=2;
8/11/2015 83
Preemptive Priority System
A higher-priority task is said to preempt a lower-priority task if it interrupts the lower-
priority task
The priorities assigned to each interrupt are based on the urgency of the task associated
with the interrupt
Prioritized interrupts can be either priority or dynamic priority
Low-priority tasks can face starvation due to a lack of resources occupied by high-priority
tasks
In rate-monotonic systems higher priority have tasks with higher frequency (rate)
Hybrid systems
Foreground-background systems (FBS)– polling loop is used for some job (background task –
self-testing, watchdog timers, etc)
Foreground tasks run in round-robin, preemptive priority or hybrid mode
FBS can be extended to a full-featured real-time OS
8/11/2015 84
The Task Control Model of Real-Time Operating System
Each task is associated with a structure called Task Control Block
(TCB). TCB keeps process’ context: PSW, PC, registers, id, status, etc
TCBs may be stored as a linked list
A task typically can be in one of the four following states:
1) Executing; 2) Ready; 3) Suspended (blocked); 4) Dormant (sleeping)
Ready Dormant
Executing
Suspended
RTOS maintains a list of the ready tasks’ TCBs and another list for the suspended tasks
When a resource becomes available to a suspended task, it is activated
8/11/2015 85
Process Scheduling
Pre
Pre-run time and run-time
time scheduling. The aim is to meet time restrictions
Each task is characterized typically by the following temporal parameters:
1) Precedence constraints; 2) Release or Arrival time ri , j of j-th instance
of task i; 3) Phase i ; 4) Response time; 5) Absolute deadline d i
6) Relative deadline Di
7) Laxity type – notion of urgency or margin in a task’s execution
8) Period
pi
9) Execution time ei
i ri ,1 ri , k i ( k 1) pi
d i , k i ( k 1) pi Di
Assume for simplicity: all tasks are periodic and independent, relative deadline
is a period/frame, tasks are pre-emptible, preemption time is neglected
8/11/2015 86
Round-Robin Scheduling
8/11/2015 87
Cyclic Executives
Scheduling decisions are made periodically, rather than at arbitrary times
Time intervals during scheduling decision points are referred to as frames or
minor cycles, and every frame has a length, f, called the frame size
The major cycle is the minimum time required to execute tasks allocated to
the processor, ensuring that the deadlines and periods of all processes are
met
The major cycle or the hyperperiod is equal to the least common multiple
(lcm) of the periods, that is, lcm(p1,..,pn)
Scheduling decisions are made at the beginning of every frame. The phase of
each task is a non-negative integer multiple of the frame size.
Frames must be long enough to accommodate each task:
C1 : f max ei
1i n
8/11/2015 88
Cyclic Executives
C2 : pi / f pi / f 0
To insure that every task completes by its deadline, frames must be small
so that between the release time and deadline of every task, there is at
least one frame.
8/11/2015 89
Cyclic Executives
The following relation is derived for a worst-case scenario, which
occurs when the period of a process starts just after the
beginning of a frame, and, consequently, the process cannot be
released until the next frame:
C3 : 2 f gcd( pi , f ) Di
t t :
t 2 f t Di
2 f (t t ) Di
t t lp i kf lp i kf gcd( pi , f )
f 2 f gcd( pi , f ) Di
8/11/2015 90
Cyclic Executives
8/11/2015 91
Cyclic Executives
For example, for tasks T1(4,1), T2(5,1.8), T3(20,1), T4(20,2), hyper-period is 20 (without
and with frames – f=2)
1 3 2 1 4 2 1
0 4 8 12
1 2 1 2
12 16 20
1 3 2 1 4 2 1
0 4 8 12
2 1 1 2
12 16 20
8/11/2015 92
Fixed Priority Scheduling – Rate-Monotonic Approach (RMA)
8/11/2015 93
Rate-Monotonic Scheduling
Theorem (RMA Bound). Any set of n periodic tasks is RM schedulable if the
processor utilization
n
ei
U n(21/ n 1)
i 1 pi
8/11/2015 94
Dynamic-Priority Scheduling – Earliest-Deadline-First
Approach
Theorem (EDF Bound). A set of n periodic tasks, each of whose relative
deadline equals its period, can be feasibly scheduled by EDF if and only if
U 1
8/11/2015 95
Intertask Communication and Synchronization
•Buffering data
•Double-buffering
8/11/2015 96
Intertask Communication and Synchronization
Ring Buffers
8/11/2015 97
Intertask Communication and Synchronization
8/11/2015 98
Intertask Communication and Synchronization
Mailbox: void pend (int data, s); void post (int data, s);
Access to mailbox is mutually exclusive; tasks wait access granting
8/11/2015 99
Intertask Communication and Synchronization
•Queues – can be implemented with ring buffers
•Critical regions – sections of code to be used in the mutually exclusive
mode
•Semaphores – can be used to provide critical regions
8/11/2015 100
Intertask Communication and Synchronization
Mailboxes and Semaphores
8/11/2015 101
Intertask Communication and Synchronization
Semaphores and mailboxes
Sema mutex=0/*open*/, proc_sem=1;/*closed*/
Bool full_slots=0, empty_slots=1;
Void post( int mailbox, int message){
while (1){ wait(mutex);
if (empty_slots){
insert(mailbox, message); update(); signal(mutex);
signal(proc_sem); break;
}
else{ signal(mutex); wait(proc_sem);
}
}
}
8/11/2015 102
Intertask Communication and Synchronization
Semaphores and mailboxes
Void pend( int mailbox, int *message){
while (1){ wait(mutex);
if (full_slots){
extract(mailbox, message); update(); signal(mutex);
signal(proc_sem); break;
}
else{ signal(mutex); wait(proc_sem);
}
}
}
8/11/2015 103
Intertask Communication and Synchronization
Driver{ while(1){
if(data_for_I/O){
prepare(command);
V(busy); P(done);}
}}
Controller{while(1){
P(busy); exec(command);
V(done);
}}
8/11/2015 104
Intertask Communication and Synchronization
Counting Semaphores:
Wait: void MP(int &S){
S=S-1; while(S<0);
}
Signal: void MV(int &S){
S=S+1
}
8/11/2015 105
Intertask Communication and Synchronization
8/11/2015 106
Intertask Communication and Synchronization
Problems with semaphores:
Wait: void P(int &S){
while(S==TRUE);
S=TRUE;
}
LOAD R1,S ; address of S in R1
LOAD R2,1 ; 1 in R2
@1 TEST R1,I,R2 ; compare (R1)=*S with R2=1
JEQ @1 ; repeat if *S=1
STORE R2,S,I ; store 1 in *S
Interruption between JEQ and STORE, passing control to a next process,
can cause that several processes will see *S=FALSE
8/11/2015 107
Intertask Communication and Synchronization
The Test-and-Set Instruction
Void P(int &S){
while(test_and_set(S)==TRUE);//wait
}
Void V(int &S){
S=FALSE;
}
The instruction fetches a word from memory and tests the high-order
(or other) bit . If the bit is 0, it is set to 1 and stored again, and a
condition code of 0 is returned. If the bit is 1, a condition code of 1 is
returned and no store is performed. The fetch, test and store are
indivisible.
8/11/2015 108
Intertask Communication and Synchronization
Dijkstra’s implementation of semaphore operation (if test-and-set
instruction is not available):
Void P(int &S){
int temp=TRUE;
while(temp){
disable(); //disable interrupts
temp=S;
S=TRUE;
enable(); //enable interrupts
}
}
8/11/2015 109
Intertask Communication and Synchronization
Other Synchronization Mechanisms:
•Monitors (generalize critical sections – only one process can execute
monitor at a time. Provide public interface for serial use of resources
•Events – similar to semaphores, but usually all waiting processes are
released when the event is signaled. Tasks waiting for event are called
blocked
Deadlocks
8/11/2015 110
Intertask Communication and Synchronization
Deadllocks:
8/11/2015 111
Deadlocks
Four conditions are necessary for deadlock:
•
•Mutual exclusion
•
•Circular wait
•
•Hold and wait
• preemption
•No
Eliminating any one of the four necessary conditions will prevent deadlock
from occurring
One way to eliminate circular wait is to number resources and give all the
resources with the numbers greater or equal than minimal required to
processes. For example: Disk – 1, Printer – 2, Motor control – 3, Monitor – 4.
If a process wishes to use printer, it will be assigned printer, motor control
and monitor. If another process requires monitor, it will have wait until the
monitor will be released. This may lead to starvation.
starvation
8/11/2015 112
Deadlock avoidance
To avoid deadlocks, it is recommended :
• Minimize the number of critical regions as well as minimizing
their size
• All processes must release any lock before returning to the
calling function
• Do not suspend any task while it controls a critical region
• All critical regions must be error-free
• Do not lock devices in interrupt handlers
• Always perform validity checks on pointers used within critical
regions.
It is difficult to follow these recommendations
8/11/2015 113
A Separate Task Helps Control Shared Hardware
8/11/2015 114
Embedded Systems
Chapter -7
Control System
7.Control System [3 Hrs.]
7.1 Open-loop
Open and Close-Loop
control System overview
7.2 Control System and PID
Controllers
7.3 Software coding of a PID
Controller
7.4 PID Tuning
Control System
• Control physical system’s output
– By setting physical system’s input
• Tracking
• E.g.
– Cruise control
– Thermostat control
– Disk drive control
– Aircraft altitude control
• Difficulty due to
– Disturbance: wind, road, tire, brake; opening/closing door…
– Human interface: feel good, feel right…
Tracking
Open-Loop Control Systems
• Plant
– Physical system to be controlled
• Car, plane, disk, heater,…
• Actuator
– Device to control the plant
• Throttle, wing flap, disk motor,…
Vt – car’s current speed
• Controller
– Designed product to control the plant Ut – throttle position
Vt+1 – car’s speed one sec. later
Open-Loop Control Systems
• Output
– The aspect of the physical system we are interested in
• Speed, disk location, temperature
• Reference
– The value we want to see at output
• Desired speed, desired location, desired temperature Vt – car’s current speed
• Disturbance Ut – throttle position
– Uncontrollable input to the plant imposed by environment Vt+1 – car’s speed one sec.
• Wind, bumping the disk drive, door opening later
Other Characteristics of open loop
• Feed-forward control
• Delay in actual change of the output
• Controller doesn’t know how well thing goes
• Simple Vt – car’s current speed
Ut – throttle position
• Best use for predictable systems
Vt+1 – car’s speed one sec. later
Close Loop Control Systems
• Sensor
– Measure the plant output
• Error detector
– Detect Error Vt – car’s current speed
• Feedback control systems Ut – throttle position
• Minimize tracking error Vt+1 – car’s speed one sec. later
Designing Open Loop Control System
• Develop a model of the plant
• Develop a controller
• Analyze the controller Vt – car’s current speed
• Consider Disturbance Ut – throttle position
• Determine Performance Vt+1 – car’s speed one sec. later
• Example: Open Loop Cruise Control System
Model of the Plant
• May not be necessary
– Can be done through experimenting and tuning
• But,
– Can make it easier to design
– May be useful for deriving the controller
• Example: throttle that goes from 0 to 45 degree
– On flat surface at 50 mph, open the throttle to 40 degree
– Wait 1 “time unit”
– Measure the speed, let’s say 55 mph
– Then the following equation satisfy the above scenario
• vt+1=0.7*vt+0.5*ut
• 55 = 0.7*50+0.5*40
– IF the equation holds for all other scenario
• Then we have a model of the plant
Designing the Controller
• Assuming we want to use a simple linear function
– ut=F(rt)= P * rt
– rt is the desired speed, P is a constant that the designer must specify.
• Linear proportional controller
• vt+1=0.7*vt+0.5*ut = 0.7*vt+0.5P*rt
• Let vt+1=vt at steady state = vss
• vss=0.7*vss+0.5P*rt Vt – car’s current speed
• At steady state, we want vss=rt
• P=0.6
Ut – throttle position
– I.e. ut=0.6*rt Vt+1 – car’s speed one sec. later
Analyzing the Controller
• Let v0=20mph, r0=50mph
• vt+1=0.7*vt+0.5(0.6)*rt =0.7*vt+0.3*50=
0.7*vt+15
• Throttle position is 0.6*50=30 degree
Considering the Disturbance
• Assume road grade can
affect the speed
– From –5mph to +5 mph
– vt+1=0.7*vt+10
– vt+1=0.7*vt+20
Determining Performance
• Vt+1=0.7*vt+0.5P*r0-w0
• v1=0.7*v0+0.5P*r0-w0
• v2=0.7*(0.7*v0+0.5P*r0-w0) +0.5P*r0-w0 =0.7*0.7*v0+(0.7+1.0)*0.5P*r0-
(0.7+1.0)w0
• vt=0.7t*v0+(0.7t-1+0.7t-2+…+0.7+1.0)(0.5P*r0-w0)
• Coefficient of vt determines rate of decay of v0
– >1 or <-1, vt will grow without bound Vt – car’s current speed
– <0, vt will oscillate Ut – throttle position
Vt+1 – car’s speed one sec. later
Designing Close Loop Control System
Stability
• ut = P * (rt-vt)
• vt+1 = 0.7vt+0.5ut-wt = 0.7vt+0.5P*(rt-vt)-w
=(0.7-0.5P)*vt+0.5P*rt-wt
• vt=(0.7-0.5P)t*v0+((0.7-0.5P)t-1+(0.7-0.5P)t-2+…+0.7-0.5P+1.0)(0.5P*r0-w0)
• 39.74
– Close to 42.31
– Better than
• 33
• 66
• Cost
– SS error
– oscillation
General Control System
• Objective
– Causing output to track a reference even in the presence of
• Measurement noise
• Model error
• Disturbances
• Metrics
– Stability
• Output remains bounded
– Performance
• How well an output tracks the reference
– Disturbance rejection
– Robustness
• Ability to tolerate modeling error of the plant
Performance (generally speaking)
• Rise time
– Time it takes form
10% to 90%
• Peak time
• Overshoot
– Percentage by which
Peak exceed final
value
• Settling time
– Time it takes to reach
1% of final value
Plant modeling is difficult
• May need to be done first
• Plant is usually on continuous time
– Not discrete time
• E.g. car speed continuously react to throttle position, not at discrete
interval
– Sampling period must be chosen carefully
• To make sure “nothing interesting” happen in between
• I.e. small enough
• Plant is usually non-linear
– E.g. shock absorber response may need to be 8th order differential
• Quantization
• Overflow
• Aliasing
• Computation Delay
Quantization & Overflow
• Quantization
– Can’t store 0.36 as 4-bit fractional number
– Can only store 0.75, 0.59, 0.25, 0.00, -0.25, -050,-0.75, -1.00
– Choose 0.25
• Result in quantization error of 0.11
• Sources of quantization error
– Operations, e.g. 0.50*0.25=0.125
• Can use more bits until input/output to the environment/memory
– A2D converters
• Overflow
– Can’t store 0.75+0.50 = 1.25 as 4-bit fractional number
• Solutions:
– Use fix-point representation/operations carefully
• Time-consuming
– Use floating-point co-processor
• Costly
Aliasing
• Quantization/overflow
– Due to discrete nature of computer data
• Aliasing
– Due to discrete nature of sampling
Aliasing Example
• Sampling at 2.5 Hz, period of 0.4, the following are indistinguishable
– y(t)=1.0*sin(6πt), frequency 3 Hz
– y(t)=1.0*sin(πt), frequency of 0.5 Hz
• In fact, with sampling frequency of 2.5 Hz
– Can only correctly sample signal below Nyquist frequency 2.5/2 = 1.25 Hz
Computation Delay
• Inherent delay in processing
– Actuation occurs later than expected
• Need to characterize implementation delay to make sure it is
negligible
• Hardware delay is usually easy to characterize
– Synchronous design
• Software delay is harder to predict
– Should organize code carefully so delay is predictable and minimized
– Write software with predictable timing behavior (be like hardware)
• Time Trigger Architecture
• Synchronous Software Language
Benefit of Computer Control
• Cost!!!
– Expensive to make analog control immune to
• Age, temperature, manufacturing error
– Computer control replace complex analog hardware with complex code
• Programmability!!!
– Computer Control can be “upgraded”
• Change in control mode, gain, are easy to do
– Computer Control can be adaptive to change in plant
• Due to age, temperature, …etc
– “future-proof”
• Easily adapt to change in standards,..etc
Embedded Systems
Chapter – 8
IC Technology
8. IC Technology [3 Hrs.]
8.1 Full-Custom (VLSI) IC
Technology
8.2 Semi-Custom
Semi (ASIC) IC
Technology
8.3 Programming Logic Device
(PLD) IC Technology
CMOS transistor
• Source, Drain
– Diffusion area where electrons can flow
– Can be connected to metal contacts
• Gate
– Polysilicon area where control voltage is applied
• Oxide
– Si O2 Insulator so the gate voltage can’t leak
End of the Moore’s Law?
• Every dimension of the MOSFET has to scale
– (PMOS) Gate oxide has to scale down to
• Increase gate capacitance
• Reduce leakage current from S to D
• Pinch off current from source to drain
– Current gate oxide thickness is about 2.5-3nm
• That’s about 25 atoms!!!
gate
IC package IC oxide
source channel drain
Silicon substrate
NAND
• Metal layers for routing (~10)
• PMOS don’t like 0
• NMOS don’t like 1
• A stick diagram form the basis for mask sets
Silicon manufacturing steps
• Tape out
– Send design to manufacturing
• Spin
– One time through the manufacturing process
• Photolithography
– Drawing patterns by using photoresist to form barriers for deposition
Introduction to
Photolithography
Introduction to Photolithography
Transistor Layers
n-well p-well
p-channel transistor n-channel transistor
p+ substrate
Microlithography is the
technique used to print
ultra-miniature patterns
- used primarily in the
semiconductor industry.
Photolithography is at the Center of the
Wafer Fabrication Process
Patterned
wafer Diffusion Photo Etch
Test/Sort
Implant
* 4
What else is Photolithography?
• 3-dimensional circuit patterning
• Most critical step in IC process
–Determines feature resolution
–Determines overlay accuracy
• Bottleneck in the fab process
• The leading technology
Wafer Conditions Prior to Patterning
• Surface conditions include:
– film composition, e.g.: silicon, nitride, polysilicon, metal,
etc.
– bare surface vs. patterned surface
– surface reflectivity
• Surface conditions may affect
– photoresist-to-wafer adhesion
– alignment accuracy
– linewidth resolution
– exposure settings
– bake time
Wafer Conditions after
Photolithography
• resist coated wafer
• patterned resist layer
• withstands etching process
• withstands ion implanting
• quality measures
– linewidth resolution
– overlay accuracy
– particles & defects
Importance of Resolution and
Overlay Registration
VSS VDD
Vin
Vout
p-channel n-channel
polysilicon gate transistor transistor contact
metal
field oxide n+
p+ p+ n+
source drain source drain
p-well
n-substrate
Cross-section of Transistor
gate oxide
Types of Photolithography Processes
photoresist
oxide oxide
Chrome island
on glass mask Island
Shadow on
photoresist
Window
photoresist
Exposed area
of photoresist
photoresist
oxide oxide
HDMS
Si wafer
2. Photoresist Application
• Wafer held onto vacuum
chuck
• Dispense ~5ml of
photoresist photoresist
dispenser
• Slow spin ~ 500 rpm
• Ramp up to ~ 3000 - 5000
rpm
• Quality measures:
– time
– speed
– thickness
– uniformity vacuum chuck
– particles & defects
to vacuum
pump spindle
• PR = Sensitizer (PAC) +
resin + solvent
• Pattern polarity
– Positive type : AZ PR
series (Shipley)
– Negative type : HR PR
series (Hunt Chemical)
• using spin motor create uniform coating PR
thickness on the wafer
• important element for thickness and uniform
: resin %, cohesion, spin speed, accelerator, time
PR
HDMS
Si wafer
Result: Variation of PR thickness
22000
before soft bake
20000
PR thickness(angstr)
16000
14000
12000
10000
2000rpm 2500rpm 3000rpm 3500rpm 4000rpm
spin RPM
• Alignment
: a photo mask, a square glass
plate with patterned emulsion Mask
or metal film on one side is
placed over the wafer
21.4
21.2
21
C D (um)
20.8
20.6
20.4
20.2
20
9sec 12sec 15sec 18sec
time(s ec )
Exposing time CD
5. Develop
• Develop
:6AZMIF 300:1H20
(70 sec., room temp.) PR
Si wafer
• Inspection
1. Contamination
2. Opaque spot
3. Large hole
4. Pin hole
5. Excess material
6. Lack of adhesion
7. Intrusion
8. Scratch
Rework
• Hard bake (110oC, 30
min.)
: to harden the
photoresist and
improve adhesion to
the substrate.
6. Hard Bake
Evaporate
remaining
photoresist
Improve
adhesion
Higher
temperature
than soft bake
7. Develop Inspect
• Optical or SEM
metrology
• Quality issues:
–particles
–defects
–critical dimensions
–linewidth resolution
–overlay accuracy
8. Etch
• Selective removal of upper
layer of wafer through CF4
windows in photoresist
• Quality measures:
– defects and particles
– step height
– selectivity
– critical dimensions
Plasma
9. Photoresist Removal (strip)
• No need for photoresist
following etch process O2
• Two common methods:
– wet acid strip
– dry plasma strip
• Followed by wet clean
to remove remaining
resist and strip Plasma
byproducts
10. Final Inspection
• Photoresist has been
completely removed
• Pattern on wafer
matches mask pattern
(positive resist)
• Quality issues:
– defects
– particles
– step height
– critical dimensions
Full Custom
• Very Large Scale Integration (VLSI)
• Placement
– Place and orient transistors
• Routing
– Connect transistors
• Sizing
– Make fat, fast wires or thin, slow wires
– May also need to size buffer
• Design Rules
– “simple” rules for correct circuit function
• Metal/metal spacing, min poly width…
Full Custom
• Best size, power, performance
• Hand design
– Horrible time-to-market/flexibility/NRE cost…
– Reserve for the most important units in a processor
• ALU, Instruction fetch…
• Physical design tools
– Less optimal, but faster…
Semi-Custom
• Gate Array
– Array of prefabricated gates
– “place” and route
– Higher density, faster time-to-market
– Does not integrate as well with full-custom
• Standard Cell
– A library of pre-designed cell
– Place and route
– Lower density, higher complexity
– Integrate great with full-custom
Semi-Custom
F1=Σm(2,3,4,6,7)
=B+AC’
F2=Σm(0,1,2,6)
=A’B’+BC’
F3=Σm(2,3,5,6,7)
=AC+B
AND-OR ARRAY EQUIVALENT OF NMOS 3 INPUT 5
PRODUCT TERMS AND 4 OUTPUTS
PLA table
REALIZATION OF PLA FOR A GIVEN EQUATION
F1 = Σm(2,3,5,7,8,9,10,11,13,15) = BD+B’C+AB’
F2 = Σm(2,3,5,6,7,10,11,14,15) = C+A’BD
F3 = Σm(6,7,8,9,13,14,15) = BC+AB’C’+ABD
REALIZATION OF PLA FOR A GIVEN EQUATION
F2 = c(b+b’) +a’bd
= bc + b’c + a’bd
F3 = bc+ab’c’+abd
Chapter – 9
Microcontrollers in
Embedded Systems
9. Microcontrollers in Embedded
Systems [3 Hrs.]
9.1 Intel 8051 microcontroller
family, its architecture and
instruction sets
9.2 Programming in Assembly
Language
9.3 A simple interfacing example
with 7 segment display
Microcontroller is a Highly integrated chip
that contains a CPU, scratchpad RAM,
special and general purpose register arrays
and integrated peripherals.
Address Bus
Serial
I/O Timer COM
Port
Port
Microcontroller
Companies Producing 8051
Intel www.intel.com/design/mcs51
Atmel www.atmel.com
Philips/Signetics www.semiconductors.philips.com
Siemens www.sci.siemens.com
Timers 2 3 2
I/O pins 32 32 32
Serial port 1 1 1
Interrupt sources 6 8 6
Intel 8051
Sensor conditioning
Output interfaces
sensor
actuator
sensor Microcontroller
(µC)
indicator
sensor
Three criteria in Choosing a Microcontroller
1. meeting the computing needs of the task efficiently
and cost effectively
• speed, the amount of ROM and RAM, the
number of I/O ports and timers, size, packaging,
power consumption
• easy to upgrade
• cost per unit
2. availability of software development tools
• assemblers, debuggers, C compilers, emulator,
simulator, technical support
3. wide availability and reliable sources of the
microcontrollers.
8051 Architecture
Memory Model
Program Memory
Internal ROM (4k)
External EPROM
Data Memory
Internal RAM (128 bytes)
General Purpose Registers
Special Function Registers
External SRAM
8051 General Purpose Registers
A
R0
DPTR DPH DPL
R1
R2
PC PC
R3
R5
Note:
R6
A= accumulator
R7 PC=program counter
DPTR=data pointer
Some 8-bit Registers of the
8051
8051 Special Function Registers(SFRs)
Contd...
Pin Description of the 8051
P1.0 1 40 Vcc
P1.1 2 39 P0.0(AD0)
P1.2 3 38 P0.1(AD1)
P1.3
P1.4
4
5
8051 37
36
P0.2(AD2)
P0.3(AD3)
P1.5 6 (8031) 35 P0.4(AD4)
P1.6 7 34 P0.5(AD5)
P1.7 8 33 P0.6(AD6)
RST 9 32 P0.7(AD7)
(RXD)P3.0 10 31 EA/VPP
(TXD)P3.1 11 30 ALE/PROG
(INT0)P3.2 12 29 PSEN
(INT1)P3.3 13 28 P2.7(A15)
(T0)P3.4 14 27 P2.6(A14)
(T1)P3.5 15 26 P2.5(A13)
(WR)P3.6 16 25 P2.4(A12)
(RD)P3.7 17 24 P2.3(A11)
XTAL2 18 23 P2.2(A10)
XTAL1 19 22 P2.1(A9)
GND 20 21 P2.0(A8)
Pins of 8051
• Vcc(pin 40):
– Vcc provides supply voltage to the chip.
– The voltage source is +5V.
• GND(pin 20):ground
– It is a power-on reset.
• Upon applying a high pulse to RST, the
microcontroller will reset and all values in
registers will be lost.
Pins of 8051
- Rn refers to
registers R0-R7 of
the currently
selected register
bank
Logical
Instructions
• Logical
instructions
perform
Boolean
operations
(AND, OR,
XOR, and
NOT) on data
bytes on a bit-
by-bit basis.
Data Transfer Instructions
• Data transfer instructions can be used
to transfer data between an internal
RAM location and an SFR location
without going through the
accumulator.
// Short Jump
Contd…
Corresponding C program:
#include <regx51.h>
MOV A, #00H
MOV P1, A
Corresponding C program: Contd…
#include <regx51.h>
Void main(void)
{
P0=0xFF; //make P0 an input port
P1=0x00; //make P1 an output port
while(1)
{
P1=P0;
}
}
Contd…
Contd…
Corresponding C program:
P1^0
P
Contd…
Interfacing 7 segment display with 8051
Port connection
Lookup Table for 7 Segment Decoding
Hardware connection of 7 segment with
8051
Assembly program to display 0 to 9 in 7 segment display
MOV A, #00H
MOV P2, A // make P2 an output port
Void main(void)
{
P2=0x00; //make P0 an output port
P2=0xC0;
Delay(200);
P2=0xF9;
Delay(200);
P2=0xA4;
Delay(200);
P2=0xB0;
Delay(200);
P2=0x82;
Delay(200); Contd…
P2=0xF8;
Delay(200);
P2=0x80;
Delay(200);
P2=0x98;
}
Timer 1
CHAPTER TEN
DESIGN PROCESS
2
Introduction
3
Figure 3: A VHDL entity consisting of an interface (entity declaration) and a body (architectural
description).
Contd…
9
Entity Declaration:
The entity declaration defines the NAME of the entity
and lists the input and output ports. The general form is
as follows,
entity NAME_OF_ENTITY is
port (signal_names: mode type;
signal_names: mode type;
:
signal_names: mode type);
end [NAME_OF_ENTITY] ;
Contd…
11
Architecture body
The architecture body specifies how the circuit operates and
how it is implemented.
The architecture body looks as follows,
Behavioral model
The architecture body for the example of Figure 2,
described at the behavioral level, is given below,
WARNING <= (not DOOR and IGNITION) or (not SBELT and IGNITION);
end behavioral;
Contd…
17
library ieee;
use ieee.std_logic_1164.all;
entity BUZZER is
port(DOOR,IGNITION,SBELT: in std_logic;
WARNING: out std_logic);
end BUZZER;
WARNING <= (not DOOR and IGNITION) or (not SBELT and IGNITION);
end behavioral;
19
Concurrency
Contd…
20
Structural description
The circuit of Figure 2 can also be described using a
structural model that specifies what gates are used
and how they are interconnected. The following example
illustrates it.
component OR2
port (in1, in2: in std_logic;
out1: out std_logic);
end component;
component NOT1
port (in1: in std_logic;
out1: out std_logic);
end component;
Begin
-- Component instantiations statements
U0: NOT1 port map (DOOR -> DOOR_NOT);
U1: NOT1 port map (SBELT -> SBELT_NOT);
U2: AND2 port map (IGNITION, DOOR_NOT, B1);
U3: AND2 port map (IGNITION, SBELT_NOT, B2);
U4: OR2 port map (B1, B2, WARNING);
end structural;
Contd…
23
library ieee;
use ieee.std_logic_1164.all;
Identifiers
Identifiers are user-defined words used to name objects in
VHDL modules. We have seen examples of identifiers for
input and output signals as well as the name of a design entity
and architecture body.
Numbers
The default number representation is the decimal
system. VHDL allows integer literals and real literals.
Integer literals consist of whole numbers without a
decimal point, while real literals always include a
decimal point. Exponential notation is allowed using
the letter “E” or “e”. For integer literals the exponent
must always be positive. Examples are:
Constant
A constant can have a single value of a given type and cannot be
changed during the simulation. A constant is declared as follows,
Variable
A variable may be changed during program execution. Variable
value is updated using a variable assignment statement. The
variable is updated without any delay as soon as the statement is
executed. Variables must be declared inside a process (and are local
to the process). The variable declaration is as follows:
Signal
Signals are similar to wires on a schematic, and can be used to
interconnect concurrent elements of the design.
Process
A PROCESS is a sequential section of VHDL code. It is
characterized by the presence of IF, WAIT, CASE, LOOP
and a sensitivity list(except when WAIT is used). Process is
executed every time a signal in the sensitivity list
changes(or the condition related to WAIT is fulfilled). Its
syntax is shown below:
library ieee;
use ieee.std_logic_1164.all;
entity DFF_CLEAR is
port (CLK, CLEAR, D : in std_logic;
Q : out std_logic);
end DFF_CLEAR;
Example:
if S1=‟0‟ and S0=‟0‟ then
Z <= A;
elsif S1=‟0‟ and S0=‟1‟ then
Z <= B;
Else Z <= C;
end if;
Case Statements
41
case expression is
when choices =>
sequential statements
when choices =>
sequential statements
-- branches are allowed
when others => sequential statements ]
end case;
Contd…
42
Example:
case VALUE is
when 51 to 60 =>
D <= ‟1‟;
when 61 to 70 | 71 to 75 =>
C <= ‟1‟;
when 76 to 85 =>
B <= ‟1‟;
when 86 to 100 =>
A <= ‟1‟;
when others =>
F <= „1‟;
end case;
Finite State Machine (FSM)
43
State0 State1
Input / Output
FSM Diagram
45
Finite State Machine Design
46
STEPS:
Algorithm: 0: int x, y;
1: while (1) {
2: while (!go_i);
3: x = x_i;
4: y = y_i;
5: while (x != y) {
6: if (x < y)
7: y = y - x;
else
8: x = x - y;
}
9: d_o = x;
}
Contd…
47
Idle State: 1
If start=1
Initialize
State: 2
x=x_i
y=y_i
Check4Condition State: 3
State: 4
Update_x Update _y State: 5
x=x-y y=y-x
VHDL Coding:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
use ieee.std_logic_arith.all;
entity gcd is
port( clk : in std_logic;
reset: in std_logic;
num_1: in unsigned(3 downto 0);
num_2: in unsigned(3 downto 0);
gcd_num: out unsigned(3 downto 0)
);
end entity;
Contd…
49
begin
--Sequential Section
sequential:process(clk,reset) is
begin
if(reset='1') then
pr_state<=idle;
elsif(clk'event and clk='1') then
pr_state<=nx_state;
end if;
end process sequential;
Contd…
50
--Combinational Section
combinational:process(pr_state,num_1,num_2) is
when init=>
temp_x:=num_1;
temp_y:=num_2;
nx_state<=check;
Contd…
51
when check=>
if(temp_x=temp_y) then
nx_state<=get_result;
elsif(temp_x>temp_y) then
nx_state<=update_x;
else
nx_state<=update_y;
end if;
when update_x=>
temp_x:=temp_x-temp_y;
nx_state<=check;
when update_y=>
temp_y:=temp_y-temp_x;
nx_state<=check;
Contd…
52
when get_result=>
gcd_num<=temp_x;
nx_state<=idle;
start<=„1';
end case;
end process combinational;
end architecture;
Contd…
53
Testbench:
LIBRARY ieee ;
USE ieee.std_logic_1164.all ;
USE ieee.std_logic_arith.all ;
USE ieee.std_logic_unsigned.all ;
ENTITY gcd_tb IS
END ;
COMPONENT gcd
PORT (
num_1 : in unsigned (3 downto 0) ;
gcd_num : out unsigned (3 downto 0) ;
num_2 : in unsigned (3 downto 0) ;
clk : in std_logic ;
reset : in std_logic );
END COMPONENT ;
BEGIN
DUT : gcd
PORT MAP (
num_1 => num_1 ,
gcd_num => gcd_num ,
num_2 => num_2 ,
clk => clk ,
reset => reset ) ;
Contd…
55
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
stim_proc: process
begin
-- hold reset state for 100 ns.
wait for 5 ns;
Contd…
56
reset<='0';
num_1<="1010";
num_2<="0101";
wait for 10 ns;
num_1<="1100";
num_2<="1001";
wait for 10 ns;
num_1<="1111";
num_2<="1101";
--wait for 10 ms;
wait;
end process;
END ;
Simulation Result
57
Synthesis Result
58
59