Você está na página 1de 45

INTRODUCTION TO DSP

PROCESSORS

K. VIJAYA KUMAR
Asst. Prof.
USHARAMA COLLEGE OF
ENGINEERING & TECHNOLOGY
On-chip peripherals
Introduction to DSP Processors
Introduction to programmable Architecture of TMS 320C5X
DSPs Introduction
Multiplier
Bus structure
Multiplier accumulator (MAC)
Central Arithmetic and Logic Unit
Modified Bus Structure

Memory Access Schemes in DSPs Auxiliary register, Index register,


Block move address register
Multiple Access Memory

Multiport memory Parallel logic unit, memory


mapped registers, Program
VLIW Architecture controller
Pipelining
Flags in Status register

Special addressing modes On-Chip registers, On-chip


peripherals
Introduction
In DSP algorithms (convolution, FFT,
Filtering, etc) Common features are
Involving Processing an array
Majority of operations multiply &
accumulate
Linear and circular shift of arrays
required
General purpose processors are not
suitable for DSP operations.
Features of DSP Processors
Should have multiple registers
Require multiple operands simultaneously
Should have circular buffers to support circular
shift operation
Should be able to perform multiply & accumulate
very fast.
Should have multiple pointers to support multiple
operands, jumps and shifts
Should have multi processing ability
Should have On-chip memory
Should have powerful interrupt structure and
timers as they are required for real time
applications
Comparison between DSP
Processors and General
S.
Purpose Processors
General Purpose
Parameter DSP Processor
No Processor
Instructions Multiple clock cycles
1 Instruction Cycle executed in single are required for 1
cycle instruction
Instruction Parallel execution
2 Always sequential
Execution possible
Multiple operands
Operand Fetch
3 fetched Fetched sequentially
from memory
simultaneously
Separate program No separate
4 Memories
and data memories memories present
Cache memory
PM and DM are
On chip / Off On-chip
5 present on-chip and
chip memories Main memory Off-
extendable off-chip
chip
Program sequencer
Program counter
Comparison between DSP
Processors and General
S.
Purpose Processors
General Purpose
Parameter DSP Processor
No Processor
Queue is performed
Queuing is implicit
explicitly by queue
Queuing / through instruction
7 registers for
Pipelining register and
pipelining of
instruction cache
instructions
PC is incremented
Generated by DAGs
Address sequentially to
8 and program
generation generate address
sequencer
bus
Address and data Address / data buses
buses are not can be separate on
Address / Data
9 multiplexed. They the chip but usually
bus multiplexing
are separate on-chip multiplexed on the
as well as off-chip chip
3 separate
Computational ALU is the main
10 computational units:
Comparison between DSP
Processors and General
Purpose Processors
Separate address
and data buses for
program memory
and data Address and Data
On-chip address
11 memories and are the 2 buses
& data buses
result bus. (i.e., on the chip
PMA, DMA, PMD,
DMD and the R
bus)
Direct, Indirect,
register, register
Direct and indirect
indirect, immediate,
12 Addressing modes addressing modes
etc addressing
are supported
modes are
supported
Array Processing General Purpose
13 Suitable for
Operations processing
Multiplier & Accumulator

(MAC) Unit
Most of the operations involve array multiplication

In real time applications array multiplication and accumulation


must be completed before next sample of input arrives. Hence fast
implementation of MAC.

MAC dedicated hardware computational unit in the processor.

Complete MAC operation executed in one clock cycle

Output of multiplier is stored in product register. Then the contents


of Product register are added to Accumulator register in
central ALU.

Texas Instruments DSP processor TMS320C5x output of


multiplier is stored in to product register and then added to
accumulator register

DSP Processors have special instruction called MACD Multiply &


Modified BUS structure
Fetch MACD instruction from program memory
Fetch one of the operands from program
memory
Fetch second operand from data memory
Data memory write
If this instruction is executed with conventional
Architecture (Von Neumann Architecture),
it requires 4 clock cycles. But Harvard and
Modified Harvard Architecture requires
lesser number of clock cycles.
DSP Architectures
Von Neumann Architecture

Harvard Architecture

Modified Harvard

Architecture
Von Neumann
Architecture
Von Neumann
Architecture
Von Neumann
Architecture
General purpose processors have this
architecture
Shares same memory for program and data
Processor performs instruction Fetch,
Decode and execute operations sequentially
Speed increased by pipelining
Contains common interval address and data
bus, ALU, accumulator, I/O devices and
common memory for program and data
Not suitable for DSP Architecture
Harvard Architecture
Harvard Architecture
Separate memory for program and data
Separate address and data busses for program
and data
High speed of execution
Include various registers, ALUs, address
generators, etc
PMD Bus is used to get instructions from memory
DMD bus is used to exchange operands & results
from data memory
Instruction code from program memory & data
memory can be fetched simultaneously. This
parallel processing increases the speed
It is possible to fetch next instruction when
current instruction is executed. i.e., FETCH,
Harvard Architecture
What is the difference between a von Neumann
architecture and a Harvard architecture?
Harvard architecturehas separate data and instruction
busses, allowing transfers to be performed simultaneously on
both busses. Avon Neumann architecturehas only one bus
which is used for both data transfers and instruction fetches,
and therefore data transfers and instruction fetches must be
scheduled - they can not be performed at the same time.
It is possible to have two separate memory systems for
aHarvard architecture. As long as data and instructions can
be fed in at the same time, then it doesn't matter whether it
comes from a cache or memory. But there are problems with
this. Compilers generally embed data (literal pools) within the
code, and it is often also necessary to be able to write to the
instruction memory space, for example in the case of self
modifying code, or, if an ARM debugger is used, to set software
breakpoints in memory. If there are two completely separate,
isolated memory systems, this is not possible. There must be
some kind of bridge between the memory systems to allow this.
Using a simple, unified memory system together with a Harvard
Modified Harvard Architecture
Modified Harvard Architecture

One set of bus is used to access both


program & data memories
DMD bus is used to transfer data from
program memory to data memory and
viceversa
PM and DM addresses are generated by
separate address generators
Used in several programmable DSPs such
as DSP processors from Texas Instruments,
Analog Devices, etc
Modified Harvard Architecture
Multiple Access Memory
Allows more than 1 memory access in 1
clock cycle
Dual access memory 2 memory access in 1
clock cycle
Connected to DSP Processor with 2 address
& 2 data buses independently which gives 4
memory access in 1 clock cycle
Harvard architecture allows multiple access
memories to be interfaced to DSP Processors
Multi Ported Memory
Interface multiple address and data busses.
Two memory access in a single clock
pulse.
Program and data can be stored in a single
memory chip and accessed
simultaneously.
They increase number of pins and larger
chip area which makes it more expensive
and larger in size.
VLIW Architecture and Multiple
ALUs
Very Large Instruction Word Architecture (VLIW).
Consist of multi ported register file and is used
for fetching the operands and storing the results.
Read and write crossbar provides parallel random
access memory by functional units to the multi
ported register file.
PCU provides the algorithm that executes
independent parallel operations.
Normally 8 functional units are preferred. This
number is limited by hardware cost of the multi
ported register file and cross Bar switch.
VLIW
Architecture
Pipe Lining
Instruction cycle shift into following micro
instructions
1. Fetch: Instruction is fetched from micro
instruction.
2. Decode: Instruction is decoded.
3. Read: operand required for instruction is fetched
from the data memory.
4. Execute: operation is executed and results are
stored in appropriate place.
Valu EXE Valu EXE
FET DEC REA FETC DEC REA
e of CUT e of CUT
CH ODE D H ODE D
T E T E
1 1

2 2

3 3

4 4

5 5

6 6

7 7

8 8

Without With Pipeline


Pipeline
Special Addressing Modes
Short Immediate Addressing: the operand is specific using a
short constant. This short constant becomes a part of a single
word instruction such as ADD, subtract, AND, OR, XOR,
etc
Short Direct Addressing: the lower order address of the
operand is specified in the word Instruction. In TMS320CXX
DSP processor, lower 7bits of address are specified as the part
of Instruction. Higher 9 bits of address are stored in data page
pointer. each such page consists of 128 words.
Memory Mapped Addressing: CPU and I/O registers are
accessed as memory location. These registers are mapped in
the starting page or final page of memory space.
Indirect Addressing: The address of operands are stored in the
indirect address registers.(Auxiliary register).
Special Addressing Modes
Bit reversal Addressing Modes: For computation of FFT,
input data is required in bit reversal format. Serial data in
memory on buffer can be given to the processor in bit
reversal mode with the help of bit reversed addressing
mode.

Circular Addressing Modes: Data stored in memory can


be READ/WRITTEN in circular fashion. This increases
the utility of the memory. Memory is organized as
circular buffer. Beginning and ending address are
monitored continuously. If address exceeds and address
memory, then it is set at the beginning address of
memory.
On Chip peripherals:
DSP Processors have many peripherals on-chip to
support its operation. These peripherals reduce the
DSP system around Programmable DSPs.

On Chip Timers:
Timer generates single pulse or periodic train of pulses.
Its period can be programmed. It can be used for
Generation of periodic interrupts to programmable
DSPs
Generation of sampling clocks for ADCs
Timing signals
Serial Port:
It has input and output buffers. Also Serial to parallel and
parallel to serial converters.
Serial port can operate in asynchronous mode or in
synchronous mode.
It allows following operations.
(1) communication between programmable DSPs and
external peripherals
(2) Parallel writes from PDSP and serial transmission.
(3) Receives serially from external peripherals and gives
parallelly to PDSPs.
(4) Generates interrupts when serial port output buffer is
empty or input buffer is full.
TDM Serial Port:
CH- CH- CH- CH- CH- CH- CH- CH-
1 2 3 4 5 6 7 8
One TDM frame with
8 slots

DE DE DE DE
V0 V1 V2 V2
TMS320C5x
TDX
TDR TDA
T
TCLKX TCL
TCLKR K
TFR
TFSX M
TFSR TAD
D
P-DSP peripherals communicate using TDM

TDAT: Data Transmission and reception in TDM channel by


authorized device.

TCLK: Bit Clock used by the transmitting and receiving devices

TFRM: Frame Sync signal (indicates beginning of a TDM


frame)

TADD: Address of serial device authorized to output data in a


particular time slot
Parallel Port:
Allows faster data transmission compared to
serial port.
It includes data lines as well as additional lines
for strobing and hand shaking
Sometimes data bus itself is used for parallel
port and is then addressed by using I/O
instructions
It is assigned a fixed assign space
Bit I/O Port:
Have Single bit lines
Can be individually operated
Do not have hand shaking signals
Used for control purpose and data transfer
Used for conditional branching or cells

Comm Port:
These are Parallel ports.
Each port have 8 bits and are used for communication between
P-DSPs when they are operating in multi-processor system.
23 bit wide data can be split in to four 8 bit words. Then data
is exchanged among P-DSPs over 4 different comm ports
Host Port:
It is a parallel port 8 or 16 bit wide.
PDSPs communicate with host processors such as
micro processors, PCs, etc through host port
Generates interrupt to P-DSP and load data on reset
through Host port
Used for data communication with host processor

A/D and D/A converter:


Useful for P-DSPs for voice applications such as
mobiles and answering machines
TMS320C5x DSP PROCESSOR
Manufactured by Texas Instruments
Most commonly used DSP Processor
Has advanced Harvard Architecture
Can execute up to 50 million instructions per
second. (MIPS)
Features of TMS320C5x Processors
Powerful 16 bit CPU
20, 25, 35 and 50 ns single cycle instruction execution time for 5V
operation and 25, 40 and 50 ns for 3V operation
16 X 16 bit multiply / add operations can be performed in a single
cycle
224K X 16 bit max addressable memory space divided into 64K
program, 64K data, 64K I/O and 32K global memories
Up to 32K X 16 bit single access on-chip program ROM
Up to 9K X 16 bit single access on-chip program / data RAM
(SARAM)
1K X 16 bit dual access on-chip program / data RAM (DARAM)
Full duplex synchronous serial port for coder decoder interface
TDM Serial Port
Features of TMS320C5x Processors
It has hardware / software wait stage generation capability
On-chip timer for control operations
Repeat instructions for efficient use of program space
Buffered serial port
Host interface port
Multiple PLL clocking operations. i.e., X1, X2, X3, X4, X5 and X9
Block move facility for Data / Program management
On-chip scan based emulation logic
Boundary scan
Manufactured into high performance static CMOS Technology
Low power dissipation and power down modes
IEEE Standard Test Access Port (JTAG)
Processors are available in 5 packaging options
Symbols used in Functional Block Diagram
Symb
Symbol Description Description
ol
Auto Buffering Unit Block Move Address
ABU BMAR
Register
Accumulator Buffer Block Repeat Counter
ACCB BRCR Register
ACCH Accumulator High BSP Buffered Serial Port

ACCL Accumulator Low C Carry Bit


Arithmetic and Logic CBER Circular Buffer 1 end
ALU
Unit 1 Address
Auxiliary Register CBER Circular Buffer 2 end
ARAU Arithmetic Unit 2 Address
Auxiliary Register Pointer CBSR Circular Buffer 1 start
ARB Buffer 1 Address
Auxiliary Register CBSR Circular Buffer 2 start
ARCR Compare Register
2 Address
Auxiliary Register DARA Dual Access RAM
ARP
Pointer M
32 bit ALU / Accumulator:
It performs arithmetic & Logic functions.
Executed in a single cycle
Perform Boolean operations also
Takes its operands from accumulator, shifter & multiplier
Scaling Shifter:
16 bit i/p connected to data bus & 32 bit o/p connected to ALU
Produces a left shift of 0 to 16 bits on i/p data
Other shifters perform numerical scaling, bit extraction, extended
precision arithmetic and over flow prevention

Parallel Logic Unit:


It is second Logic Unit. Executes Logic operations on data
without affecting the contents of accumulator
Provides bit manipulation which can be used to set, clear, test
or toggle bits in data memory control or status registers
16X16 bit parallel multiplier:
Capable of multiplying signed or unsigned 32 bit product in
a single machine cycle
Numbers and results in 2s complement form

Auxiliary registers and ARAU:


ARAU: Auxiliary register Arithmetic Unit
Register file of 8 ARs for temporary data storage (AR0 AR7)
connected to ARAU
Contents stored in data memory or used as i/ps to central ALU
ARAU helps to speedup the operation of CALU
Memory:
3 separate spaces for address, data and i/o memory
Each space up to 64K 16 bit words
First 96 (00 to 5F) data memory locations allocated to memory
mapped registers.
2 types of RAMs: SARAM and DARAM

Interrupts:
4 general purpose interrupts ---- INT4 INT1, one Reset (RS) and
non-maskable Interrupt (NMI)
Internal interrupts generated by serial port (RINT, XINT) by timer
TINT through s/w (TRAP, INTR and NMI instructions)
RS has highest priority followed by NMI and INT4 with lowest
priority
Except RS and NMI, any interrupt can be masked.
Program
000 Data
Interrupts and reserved 000
0 Memory Mapped
003 (external) 0
005 registers
004
F External 006
F
0
07F On-chip DARAM B2
On-chip SARAM (RAM = 0
080
F
0 1) Reserved
2BF External (RAM = 0)
2C0 On-chip DARAM B0
F External
0 (CNF = 0)
FDF On-chip DARAM B0 Reserved (CNF = 1)
FE0
F (CNF = 1)
0 On-chip DARAM B1
FFF External (CNF = 0)
F MP/MC = 1 Reserved
Microprocessor
On-chip SARAM (OVLY
mode
= 1)
MP/MC = 0 FFF External (OVLY = 0)
MicroComputer F
mode External
Anusha.alluri96@gmail.com
komaliece403@gmail.com
Kode.mohansai@gmail.com
nsn1256@gmail.com
Kumili.naga@gmail.com

Você também pode gostar