Você está na página 1de 282

TMS320C54x DSP Design Workshop

Student Guide

DSP54x-NOTES-1.2
May 1997 Technical Training
Copyright © 1997 Texas Instruments Incorporated.
All rights reserved.

Notice
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the
prior written permission of Texas Instruments.

Texas Instruments reserves the right to update this Guide to reflect the most current product
information for the spectrum of users. If there are any differences between this Guide and a
technical reference manual, references should always be made to the most current reference
manual. Information contained in this publication is believed to be accurate and reliable.
However, responsibility is assumed neither for its use nor any infringement of patents or rights of
others that may result from its use. No license is granted by implication or otherwise under any
patent or patent right of Texas Instruments or others.

Revision History

ii TMS320C54x DSP Design Workshop


Welcome to the

TMS320C54x DSP
Design Workshop

Texas Instruments
Technical Training

Introductions

u Name

u Company

u Project Responsibilities

u DSP Experience

u 320 Experience

u Hardware/Software, Asm/C
Asm/C

u Interests

0-2

TMS320C54x DSP Design Workshop iii


TMS320C54x Workshop Agenda
I: u 1. Introduction and Overview
2. Assembly Language Environment
3. Addressing Modes
II : u 4. Basic Programming Techniques
5. Advanced Programming Control
6. Pipeline Issues
III : u 7. Numerical Issues
8. Fundamental DSP Applications
9. Advanced DSP Applications
10. Interrupts
IV :
u 11. Hardware Interfacing
12. Other Interfacing
13. System Design Issues
14. Using the C Compiler
0-3

iv TMS320C54x DSP Design Workshop


Introduction and Overview

Learning Objectives
Learning Objectives

u Describe the requirements of a DSP system.


u Identify the CPU components of the ‘C54x.
u List the ‘C54x internal buses and their usage.
u List the ‘C54x pipeline stages and their actions.
u Describe the memory map of the ‘C54x.
u List memory and peripherals of the ‘C54x devices.
u Become familiar with ‘C54x simulator.

1-2

DSP54x - Introduction and Overview 1-1


1-2 DSP54x - Introduction and Overview
Module 1

Module 1

DSP: Sum-of-Products

100
y = ∑ xn an
n =1

x a

MPY

ADD

1-3

MAC Unit Details


D AT C P DA
s/u s/u
MPY
A
FRCT B
0
ADD

M bus

acc A acc B

MAC *AR2+, *AR3+, A

1-4

DSP54x - Introduction and Overview 1-3


Module 1

Accumulators + ALU

General-Purpose Math, ex: t = s + e - r

A BUS B BUS ABCT DS

LD s, A
acc A acc B ALU
ADD e, A
U BUS SUB r, A
MUX STL A, t

A B M

1-5

Notes

1-6

1-4 DSP54x - Introduction and Overview


Module 1

Barrel Shifter

A B C D

SHIFTER (-16 to +31)

S BUS

ALU W BUS

LD X, 16, A
STH B, y

1-7

Temporary Register
A
D X EXP
B

T
ex: A = xa
T BUS LD x, T
MPY a, A

MAC ALU

1-8

DSP54x - Introduction and Overview 1-5


Module 1

’C54x Buses
P

M D
INTERNAL U M EXTERNAL
X C U
MEMORY E X MEMORY
S
E

C D
T MAC A B ALU SHIFT

MAC *AR2+, *AR3+, A


1-9

Notes

1 - 10

1-6 DSP54x - Introduction and Overview


Module 1

Pipeline - Concept

F: Fetch Get instruction from memory.


D: Decode Schedule activity.
R: Read Get operand from memory.
X: Execute Perform operation.

1 - 11

Memory Interaction

u Broken into two phases:


1. Calculate address
2. Collect data
u Allows more time for memory interface.

1 - 12

DSP54x - Introduction and Overview 1-7


Module 1

‘C54x Pipeline - Enhanced

P Prefetch Calculate address of instruction.


F Fetch Collect instruction.
D Decode Interpret instruction.
A Access Calculate address of operand.
R Read Collect operand.
X Execute Perform operation.
operation.

1 - 13

Memory Write

u When storing results back to memory


u Two phases
À Address set up
À Data written

u Overlaid onto R + X phases


u Best balance of:
À Processor loading
À Speed
À Cost

1 - 14

1-8 DSP54x - Introduction and Overview


Module 1

’C54x Pipeline Events

P Drive address of instruction PA


F Collect instruction PD
D Interpret instruction, plan job ctlr
A Set up pointers, Calc data address DA
R Collect operand DD
Calculate Write address EA
X Execute operation *,+
Send result ED

1 - 15

‘C54x Pipeline Hardware

P PC, PA
F Program Mem,
Mem, PD
D Controller
A ARs,
ARs, DA, ARAUs
R Data Mem,
Mem, DD ; AR, ARAU, EA
X CALU (MAC, ALU) ; ED, Data Mem

1 - 16

DSP54x - Introduction and Overview 1-9


Module 1

’C54x Components and Bus Usage

CNTL PC ARs

M
D
INTERNAL U M EXTERNAL
X C U
MEMORY E X MEMORY
S
E

T MAC A B ALU SHIFT

1 - 17

Notes

1 - 18

1 - 10 DSP54x - Introduction and Overview


Module 1

Pipeline Performance

TIME

P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 F4 D4 A4 R4 X4
P5 F5 D5 A5 R5 X5
P6 F6 D6 A6 R6 X6

FULLY LOADED ’PIPE’


1 - 19

Pipeline Conflicts - External Memory

P
54x
D

P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 -- -- -- F4 D4 A4 R4 X4
-- -- -- P5 F5 D5 A5 R5 X5
-- -- -- P6 F6 D6 A6 R6

1 - 20

DSP54x - Introduction and Overview 1 - 11


Module 1

Pipeline Flow: Internal and External Memories

54x 54x
P or D
D P

P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 F4 D4 A4 R4 X4
P5 F5 D5 A5 R5 X5
P6 F6 D6 A6 R6 X6

NO CONFLICT
1 - 21

Pipeline: Internal Memory Only


’C54x
ROM DA RAM
P

MAC ALU

ROM DARAM
4K 1K
4K 1K
. .
. .
. .

Two accesses per block per cycle


1 - 22

1 - 12 DSP54x - Introduction and Overview


Module 1

’C541 Memory Maps


PROGRAM DATA
0000 0000
RAM ? OVLY MMR / RAM
1400 1400

EXT

EXT
9000

Internal
ROM ?
E000
FF80 DROM ROM ?
VECTORS FFFF
FFFF

1 - 23

’C541 Program Memory Options


All External 28K ROM**
ROM** ’RAM’ Option
MP/MC = 1 MP/MC = 0 OVLY = 1
0000 0000 0000
0080 RAM
1400

EXT EXT

EXT
9000
2K ROM
9800 2K ROM
A000 4K ROM
B000
4K ROM EXT
C000
4K ROM or
D000 ROM
4K ROM
E000
4K ROM
FF80 F000
4K ROM w VECs *
FFFF VECTORS* FFFF FFFF VECTORS*
* FF80 - FFFF are the default locations for vectors.
** Internal ROM FF00 - FF7F reserved for TI test.
1 - 24

DSP54x - Introduction and Overview 1 - 13


Module 1

’C541 Data Memory

0000 0000 0000


MMR / RAM MMR MMR
1400 + 0060
RAM a 0080 SPRAM
0400
RAM b
0800
EXT RAM c
RAM a
0C00
RAM d
1000
E000
EXT or ROM RAM e
FFFF 1400 0400

1 - 25

’C542 Memory Maps


P D
0000 0000
RAM? OVLY RAM
2800 2800

EXT EXT

F800
ROM
FF80
VECTORS
FFFF FFFF

1 - 26

1 - 14 DSP54x - Introduction and Overview


Module 1

’C54x Memory Mix

C54x RAM ROM DROM


1 5 28 8
2 10 2
3 10 2
4 4 24 8
5 6 48 16
6 6 48 16
9 32 16

1 - 27

’C54x Peripheral Mix

C54x SER TDM BSP HPI


1 2
2 1 1 1
3 1 1
4 2
5 1 1 1
6 1 1
9 1 2 1

1 - 28

DSP54x - Introduction and Overview 1 - 15


Module 1

Lab 1: Debugger Walkthrough


Window Management
Select Close Open
Move Size Edit
Running Code
Reset Step
Run Breakpoint
Benchmark
Display and Automation
Saving configurations
Using log files

1 - 29

Debugger Screen
command
menu bar

code CPU
window registers

command memory
window window

1 - 30

1 - 16 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

Lab 1: The Debugger Interface


The Texas Instruments DSP family has moved to a common user interface called the Source
Debugger. Almost all TMS320 tools, including the TMS320C54x simulator, use this interface.

This module will guide you through the basic commands of the source debugger. Upon
completion of the walkthrough, you will be able to:
• Set up and manipulate windows to display variables and data structures
• Single-step C statements and/or assembly instructions
• Set breakpoints and benchmark code
• Issue debugger commands via command menus, keyboard entry, or a mouse

Note: This walkthrough is intended to demonstrate the use of the debugger interface. It is not
meant to be an opportunity to get to know the ’C54x assembly language or C. Please do
not attempt to dwell upon them, as this adds considerable time (and effort) to the process.
The assembly language will be thoroughly presented in succeeding modules.

DSP54x - Introduction and Overview 1 - 17


Lab 1: The Debugger Interface

Simulator Files and Directory


Verify that you are in the proper directory by typing:

cd \dsp54x\labs ↵

The demo program is a C file which simply loads an incrementing value to a variety of data
types. Although of little interest in terms of DSP, it is a useful platform for exercising the
debugger interface and commands.

Sample Program - Source Debugger Walkthrough


(1 of 2)
/*-------------------------------------------------------------------------*/
/* Sample program for Source Debugger Walkthrough */
/*-------------------------------------------------------------------------*/
/* declare globals: int, float array, mixed type structure */
int i:
float a[10];
struct { int i;
float j;
int k[4];
int *p;
} example ;
void init();
/*count from 0 to 1000 forever, call init each count */

main() {
int count;
for (;;)
for (count=0; count<1000; count++)
init(count);
}

1 - 18 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

Sample Program - Source Debugger Walkthrough


(2 of 2)
/* load all globals with the current count value */
void init(x)
int x;
{
for (i=0; i<10; i++)
a[i] = x;
example.i = x;
example.j = x;
for (i=0; i<4; i++)
example.k[i] = x;
example.p = (int * (0x0200 + x);
}

DSP54x - Introduction and Overview 1 - 19


Lab 1: The Debugger Interface

Starting the Simulator


To start the debugger and load your linked output file, type:

SIM5XX lab1 ↵

The debugger assumes that the file to be loaded has a default extension of .out. We will learn
how to create output files in Module 2.

You should now see the debugger screen.

Note: If, in the process of this lab, you reach a point where the system no longer responds, or is
otherwise corrupted, you may reload the file by typing LOAD lab1 at the command
prompt. In rare cases, you may have to exit the simulator entirely by typing QUIT and
starting over.

Selecting the Active Window


The active window is shown with a highlighted border. To change the active window, point the
mouse at the desired window and press the left button. Repeat this a few times to cycle through
the active windows.

Make the DISASSEMBLY window active. You can scroll through the code displayed in the
DISASSEMBLY window several ways. First, by using the keyboard up-arrow, down-arrow,
PgUp, and PgDn keys. And finally, by pointing the mouse at the up and down arrows on the
window border and pressing the left button.

Note: Be careful. If you click while over an element of a window, you may set a breakpoint (if
you are in a FILE or DISASSEMBLY window), or select a register or memory location
for modification. To remove the breakpoint, simply point and click at the highlighted
instruction.

Try scrolling through the DISASSEMBLY window several ways.

When you want to return to a particular label or address, use the command addr nnnn, where
nnnn is the label or address to return to. For example, type:

addr c_int00 ↵

Move to an absolute address by typing:

addr 0x0005 ↵

Move to a function by typing:

addr main ↵

1 - 20 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

Sizing and Moving Windows


You can size and move any window. Make the CPU window active with the mouse. To change
the size of the window, grab the left or right corner of the window by holding down the left
mouse button down and drag the corner to a new position. Release the mouse button.

To move the window, grab the top of the window by holding the left mouse button down and
drag the window to a new position.

To restore the screen to its original state, use the “screen configuration” command with no
arguments:

sconfig ↵

To load a particular screen configuration you may specify the desired file with the SCONFIG
command:

sconfig tc.clr

To save a configuration, use the ssave <file name> command. There is no default
extension, although .CLR (for color) is the extension generally used.

Typing sconfig once again will return you to the original configuration. The sconfig
command uses the default filename init.clr. You may use either of these configurations, or
any of your own creation, whenever using the debugger.

DSP54x - Introduction and Overview 1 - 21


Lab 1: The Debugger Interface

Running the Program


The sample program begins execution at the C reset function labeled c_int00. To reposition
the disassembly window, type:

addr c_int00 ↵

The assembly code shown at c_int00 can be single-stepped by pressing <F8> on the keyboard
or by pointing and clicking the left mouse button.

Try running a few instructions by pressing <F8> and watching the PC value (in the CPU
window) change as the corresponding instruction is executed (highlighted). Modified register and
memory contents are also highlighted.

To skip past this reset function, type:

go main ↵

Notice that the display changes to display the C program in the FILE window. Also notice that
the CPU registers are no longer displayed. The CALLS window is opened to show which C
functions have been called.

The ability to view C source code in its native format is why our debugger is termed a “source”
debugger.

Watch Window
Suppose you want to watch the value of a C variable while single-stepping the program. Type:

wa count ↵

This creates a watch window with the value of the variable count displayed. The value
displayed for count is not meaningful at this point since it has not been initialized yet. You may
discover that opening a watch window on a variable not found in the current function will
generate a warning.

Single-step the program (execute one C statement at a time) by pressing

<F8>

<F8>

Notice that the variable count was assigned the value zero. You should now be at the init()
function call. Press:

<F8>

and you will go to the function. Notice the change in the CALLS window.

Add another variable to the watch window by typing:

1 - 22 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

wa i ↵

Single-step some more C statements by pressing <F8>.

To watch an array element, type:

wa a[0] ↵

Notice that the display shows a[0] as a floating-point value automatically. The debugger
displays values according to their defined type.

When a watch or display is no longer needed on screen, it may be closed by first selecting it
(using <F6> or a mouse click), and then using the close window command: the <F4> key. <F4>
does not apply the main simulator windows (CPU, MEM, Disassembly, etc.).

Displaying arrays and structures is a powerful debugging feature. Type:

wa a ↵

You receive the error message Invalid watch expression because you are allowed to
watch only single, scalar values. If you have forgotten the type of the variable a, type:

whatis a ↵

To display the entire array of floating-point values, use the display command:

disp a ↵

You might want to move the DISP window over to the right of the screen.

Display the structure called example by typing:

disp example ↵

This structure has four members called i, j, k, and p. Note that they are displayed in accordance
with their type. Move this window over to the right just below the DISP: a window.

To display the contents of the array example.k, move the cursor down to highlight the line
showing k: [...] and select this by pressing:

<F9> or the left mouse button

A new window is opened which shows the elements of the array. If this had been another
structure (instead of an array), it would be shown as k: {...}. Brackets indicate arrays and
braces indicate structures.

Since this new window showing the array k is opened directly on top of the previous window,
you should move it down to make the example window visible.

DSP54x - Introduction and Overview 1 - 23


Lab 1: The Debugger Interface

Single STEP and NEXT instructions


Now that the display windows are opened, let’s restart from the beginning and then single-step
some instructions. Type:

restart ↵

go main ↵

Press:

<F8>

Continue executing instructions by repeatedly pressing <F8>. Observe how the values in the
watch window and display windows change. Continue stepping through the init function until
it returns to the main function. If you do not wish to see the remainder of the function in step
mode, you can complete the function and return immediately by entering:

ret ↵

Note: If you were not in a sub-function at this point, the simulator will never reach a return and,
therefore, will never halt. To stop the simulator in such an event, simply press <Esc>.

Suppose you want to single-step without seeing the details of each individual function call. You
can step across function calls using:

Next ↵

Alternatively, you can press <F10>. Notice that the next C statement is executed without
showing function calls. (Called functions are not skipped; they are just not executed in single-step
mode.)

Both the step command <F8> and next command <F10> can be executed from the command
line with an argument specifying the number of instructions to execute. For example, type:

step 10000 ↵

To stop execution, press:

<Esc>

Note: You can use a Boolean expression as well as a numerical example with the step
command; e.g., step (AR0 !=0)

If you are executing within the init() function and want to return, type:

ret ↵

1 - 24 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

Now try the next command with a count value:

next 10000 ↵

You can sit back and observe the single-step operation.

To stop execution, press:

<Esc>

and you will see a User halt message displayed in the command window.

DSP54x - Introduction and Overview 1 - 25


Lab 1: The Debugger Interface

Debugging Assembly Language and C Programs


This part of the tutorial assumes you have already completed the first part of the walkthrough and
have loaded the lab1.out program into the debugger.

To start execution over again, type:

restart ↵

go main ↵

MIXED Mode
To debug in mixed mode, which allows you to observe assembly instructions and C statements
simultaneously, type:

mix ↵

You should see both the C source code and the corresponding assembly code. The
DISASSEMBLY window shows highlighted memory locations which are associated with the
current C statement.

You may have to move and size your display windows and watch windows to see the CPU and
REGISTER windows. A suggestion is to remove (reset) the watch window using the command:

wr ↵

Try single-stepping by repeatedly pressing:

<F8>

Notice that assembly instructions are stepped. If you are currently executing with the init()
function and want to return from the function, type:

ret ↵

Try the next command by repeatedly pressing:

<F10>

Continue this while observing that the assembly instruction CALL init is skipped over.

To single-step C statements while you are in mixed mode, type either:

cstep ↵

or

cnext ↵

1 - 26 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

Like their counterparts, step and next, you can execute a fixed number of instructions. For
example:

cstep 10 ↵

will execute 10 C statements.

ASM Mode
If you are interested only in debugging an assembly language program, you can switch to
assembly mode by typing:

asm ↵

Notice that the windows that display C data structures disappear when you are in assembly mode.
This is a convenient way to clear up the screen if you want to observe CPU register values or
display memory contents. Try single-stepping by repeatedly pressing:

<F8>

and observe the changing register values in the CPU window. Changed values are highlighted so
you will notice when a change occurs.

You can go back to mixed mode by simply typing:

mix ↵

Notice that your DISP windows reappear.

Review of Modes
In summary, there are three modes of operation:
• Mixed mode (mix command) shows assembly and C (if C source exists).
• Assembly mode (asm command) shows assembly code only.
• C mode (c command) automatically switches from C to assembly displays,
depending on what type of source code is executing.

Breakpoints and Benchmarking


Restart your program and execute to the first call to the init() function. Type:

restart ↵

mix ↵

go init ↵

DSP54x - Introduction and Overview 1 - 27


Lab 1: The Debugger Interface

To set a breakpoint you can either use the command ba xxxx, where xxxx is an absolute
memory location or a valid label. This method requires that you know the address (or label). For
example, type:

ba init ↵

This sets a breakpoint at the entry point to the function. Notice that the instruction is highlighted
when a breakpoint is set.

To list breakpoints that are set, type:

bl ↵

To delete all breakpoints, use the breakpoint reset command. Type:

br ↵

Verify the process by listing again. Type:

bl ↵

In addition to the ba command to add breakpoints, simply point the mouse at the line a
breakpoint is desired and press the left mouse button. The line that the breakpoint is set on should
now be highlighted. Pressing the left mouse button again will remove the breakpoint.

To execute your program up to the breakpoint, type:

run ↵

The program should stop at the breakpoint. If the breakpoint is not reached, press <Esc> and
verify that the breakpoint has been set (use the bl command or look at the
FILE/DISASSEMBLY window to see a highlighted instruction).

To use a previously entered debugger command (for lazy typists), press:

<Tab> ↵

Notice that pressing <Tab> backs up to the previous command entered. Pressing ↵ causes that
command to be executed again. In fact, you can cycle back through all previous commands you
have entered by repeatedly pressing <Tab>. Pressing <Shift><Tab> takes you forward
through this command buffer.

Let’s assume you still have a breakpoint set at the for statement. To “benchmark” the execution
time required to execute from one breakpoint to another, you need to set a second breakpoint. Go
ahead and select another instruction for a breakpoint using either <F9>, a mouse click, or the ba
command. To benchmark, type:

run ↵

runb ↵

1 - 28 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

? clk ↵

The run command executes to the first breakpoint. The runb command is the “run-with-
benchmarking” command. The ? command tells the debugger to evaluate the following C
expression and display the result. The clk debugger variable is valid only after a runb
command and is set to the number of clock cycles between the run and runb commands.

DSP54x - Introduction and Overview 1 - 29


Lab 1: The Debugger Interface

Evaluating Expressions
To evaluate a C expression, you can use the ? command. This is one way to modify register
values, since C expressions may have side effects such as assignment. Type:

? pc ↵

You should see the pc value displayed. To modify the current pc, type:

? pc = main

To modify a register, type:

? ar0 = 0

To evaluate an expression without displaying the result in the COMMAND window, use the
eval command instead of the ? command. Type:

eval pc = 0 ↵

eval pc = main ↵

CPU, MEMORY, and WATCH window registers can be modified by pointing the mouse to the
desired register and pressing the left mouse button. When the register is selected, it will be
highlighted and ready for input from the keyboard.

Point to the CPU window AR0 and press the left mouse button.
Enter a new value of 5 and press ↵ when complete.

Displaying Files
You can display any file in the FILE window. Type:

file siminit.cmd ↵

You should see the debugger’s initialization command file displayed. At this point, you can go
back to debugging and the previous C source file will automatically be displayed when you start
executing instructions.

Within the debugger COMMAND window, you can perform DOS-like commands to examine
and change the current directory. Use the command dir nnnn, where nnnn is the directory
name, to display a directory listing. Type:

dir ↵

to display the current directory.

The command cd nnnn, where nnnn is the new directory name, changes the current directory.

To clear the COMMAND window, type:

1 - 30 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

cls ↵

Some other miscellaneous commands are:


• quit which exits the debugger and returns you to DOS
• restart which sets the PC to the code entry point.

Drop Down Menus


To access the drop-down menus from the menu bar at the top of the screen, press <Alt><key>,
where <key> is the highlighted menu letter (L, B, W, M, C, or D). Once a menu is displayed,
you can execute a command either by typing the designated letter, or by using the arrow keys to
move the selector bar to the desired command and pressing ↵. For example, press:

<Alt>L

then repeatedly press the right arrow key to look at the drop-down menus.

The drop-down menus can also be selected by pointing and pressing the left mouse button. For
example, select the mode menu with the mouse.

Changing the Display Sizes


If you have a display capable of greater than 80 x 25 character resolution, you can get more
information on the screen using the debugger -b[bbbb] option when you invoke the debugger.
Let’s try it. Exit the debugger by typing:

quit ↵

From the DOS prompt, enter:

sim54xx lab1 -bb

and you should get a display that shows more detail, but may also cause more eye strain. A larger
monitor will allow you to take full advantage of the source debugger’s high resolution modes.

The -bb switch creates a 50-line display. Another switch, -b, offers an intermediate-sized 43-
line display. Your preferred display size may be made the default by saving the screen
configuration as init.clr with the ssave command described earlier. Then the need to
explicitly use the -b switch is eliminated.

Batch Operation of Debugger


You can execute debugger commands from a batch file. This can be useful if there is a certain
sequence of commands you want to enter every time you start a debug session for a given
application. The filename should have a .log extension. To execute a .log file while in the
debugger, use the command take <filename>.log. For example, try the batch command
file:

DSP54x - Introduction and Overview 1 - 31


Lab 1: The Debugger Interface

take lab1.log ↵

Congratulations, you have completed the walkthrough. To exit the debugger, press:

<Esc>

Type:

quit ↵

1 - 32 DSP54x - Introduction and Overview


Lab 1: The Debugger Interface

Simulator Quick Reference


Window Management Running Code Other Actions
Selecting Window Reset ? <label> display value of <label>
F6 rotates to next window Type RESET forces PC to zero ? <label> = <n>load <label> with <n>
WIN <name> selects <name> window Type RESTART return to "entry point" file <name> load file <name> to file window
Click window frame select window TAB scroll to prior commands
F4 close selected window Stepping SHIFT TAB scroll to subsequent
F8 or type STEP for one step commands
Moving Inside Window F10 or type NEXT condense subroutines F9 alternate form of mouse click
Up Arrow / Down Arrow Type STEP <n> for <n> steps TAKE <name> simulator ’batch’ file
Page Up / Page Down LOAD <name>download file <name>
Type NEXT <n> for <n> nexts
Click on window frame arrows
For DISASSEM window; type ADDR <value>
Running
For MEMORY window; type MEM <value>
RUN run until <Esc> or breakpoint
RUNB run with benchmark
Moving Window
Click on top of frame; drag to new location GO <label> run to <label>
Type MOVE and use arrows or type
coordinates

Sizing Window
Click on bottom right corner; drag to new shape
Type SIZE and use arrows or type coordinates Watches and Breakpoints Entry/Exit
ZOOM click on top left corner
Operation Watch Breakpoint SIM2xx <file> start simulator with <file>.out
UNZOOM click again on top left corner
ADD WA BA SIM2xx -bb high resolution mode
Screen Configuration RESET WR BR
QUIT exit simulator
SCONFIG <name> load configuration <name> LIST WL BL
SSAVE <name> save configuration DELETE WD # BD # SYSTEM go to DOS shell
<name> or hot keys or mouse clicks

Modes
ASM display ASM info or <Alt> D,A
C display C info or <Alt> D,C
MIX display both ASM and C or <Alt> D,M

1 - 31

’C54x Review - CALU

u CALU supports:
À General-purpose operations:
À MAC
À ALU
À Special functions:
À CSSU (Viterbi)
À EXP (Norm)
À FIRS: MAC + ALU
À 16- or 32-bit operations:
À C16 mode
À ’Double’ operations

1 - 33

DSP54x - Introduction and Overview 1 - 33


Lab 1: The Debugger Interface

’C54x Review - System


u Four buses allow 1 fetch, 2 reads, and 1 write each cycle.
u Built from and for cDSP:
cDSP:
À Fast growing family
À Easy to modify for custom use.
u Attributes
À Static design
À Low power
À Any clock below maximum
À Low $/MIP
À Fast/dense instructions
À Small size for functionality
À LC version for 3V operation

1 - 34

1 - 34 DSP54x - Introduction and Overview


Assembly Language Tools

Learning Objectives
Learning Objectives
u Describe steps to create executable output files
u Create an assembly file containing:
À Code
À Constants (initialized data)
À Variables
u Create a linker command file which:
À Identifies input and output files
À Describes a system’s available memory
À Indicates where code and data shall be located
u Develop multi-file systems
2-2

DSP54x - Assembly Language Tools 2-1


2-2 DSP54x - Assembly Language Tools
Module 2

Module 2

Software Development Tools


.cmd

-o
.asm .obj .out
Text
ASM500 LNK500 Debug
Editor

-L -m

.lst .map
HEX500

ASM500 -LS TEST


LNK500 TEST.CMD

2-3

Software Debug Tools

.out SIM5xx Software Only


Debug

•Contains DSP
EVM500 •ISA Card

•ISA card
XDS510 •No DSP
•PC<-> Target

HEX500 ROM Target


Prog. Board

2-4

DSP54x - Assembly Language Tools 2-3


Module 2

Lab 2a: COFF Tools


1. Assemble LAB2A.ASM.
LAB2A.ASM.
Note error message - inspect .LST
.LST file.
2. Edit LAB2A.ASM.
LAB2A.ASM.
Replace label ’strt
’strt’’ with ’start
’start’’ - update and exit file.
3. Reassemble LAB2A.ASM.
LAB2A.ASM.
Verify error-free assembly - reinspect .LST
.LST file.
4. Link using LAB2A.CMD.
LAB2A.CMD.
Verify the result in LAB2A.MAP.
LAB2A.MAP.
5. Simulate LAB2A.OUT.
LAB2A.OUT.
Step through the code to verify performance.
6. Inspect batch files: A.BAT, L.BAT, S.BAT, ALS.BAT
Consider their use to save time in later labs.
7. Add a NOP before the loop to separate the .text
.text label from start.
start.
Reassemble, link and simulate. Note any change from before.

2-5

Assembly Files
u Describe steps to create executable output files
u
u Create an assembly file containing:
À Code
Code
À Constants
Constants (initialized
(initialized data)
À Variables
Variables
u Create a linker command file which:
À Identifies input and output files
À Describes a system’s available memory
À Indicates where code and data shall be located
u Develop multi-file systems
2-6

2-4 DSP54x - Assembly Language Tools


Module 2

Assembly Conventions
WDEVRUVSDFHV

label: mnemonic operand,operand ;comment

FRORQRSWLRQDO LQVWUXFWLRQRUGLUHFWLYH

u Any ASCII text is O.K.


u Use .asm
.asm extension
u Instructions and directives cannot be in first column
u Comments O.K. in any column after semicolon

2-7

Assembly Files

u Mnemonics
À Lines of 320 code
À Generally written in upper case
À Become components of program memory

u Directives
À Begin with a period (.) and are lower case
À Can create constants and variables
À May occupy no memory space when used
to control ASM and LNK process

2-8

DSP54x - Assembly Language Tools 2-5


Module 2

COFF Data Types

Type Examples
Binary 1110001b or 11111001B
Octal 226q or 572Q
Decimal 1234 or +1234 or -1234 (Default type)
Hexadecimal 0A40h or 0A40H or 0xA40
Floating-point 1.623e-23 (sign and decimal point optional)
Character ‘D’
Character “this is a string”
strings

2-9

Coding Example: z = x + y
Code .text
start LD x,A
get x
ADD y,A
add y
STL A,z
store z
B start
loop

Constants .data
x .int 2
x=2
y .int 7
y=7

Variables
z .bss z,1

2 - 10

2-6 DSP54x - Assembly Language Tools


Module 2

The .bss Directive


u Only directive with assembly label
defined in the operand field
u Use separate .bss statements for each
named variable
u Remember .bss by thinking:
À Block - reserves a block of memory
À Symbol - begining at address symbol
À Size - of the specified size
u Example: Create a 5-word array ’x’
.bss x,5

2 - 11

Basic Assembler Directives

Assembler
Directive Example Definition

.text .text Code to follow


.data .data Constants to follow
.bss .bss x,10 Allocate space for variables
.int
TBL .int 53h, 5Ah Create 16-bit integer constant(s)
.word

2 - 12

DSP54x - Assembly Language Tools 2-7


Module 2

Exercise 2b: ASM Files and Sections

; a = 0,1,2,3,4
; x = input array of length 5 a 0
; y = result array of length 1 1
2
3
4

_______

_____ _______ ________________ x

_______ _______, _______


y
_______ _______, _______

2 - 13

Lab 2b: Assembly Files

table 1 x
2
3
4
8 a
6
4
2
0 y

2 - 14

2-8 DSP54x - Assembly Language Tools


Module 2

Lab 2b: Procedure

1. Copy LAB2A.ASM to LAB2B.ASM. In LAB2B :


2. Define three arrays in RAM (x, a, y).
3. Define an initialized data table that contains the nine values above.
4. Write code that begins with the label start, contains four NOP
instructions, and ends with a branch (B) back to start.
5. Assemble the file and inspect the list (.LST) file.

What is the opcode for NOP? ______________


What are the addresses for the .text and .data sections?
.text ____________
.data ____________
Why? ______________________________________________

2 - 15

Linking
u Describe steps to create executable output files
u Create an assembly file containing:
À Code
À Constants (initialized data)
À Variables
u
u Create a linker command file which:
À Identifies
Identifies input
input and
and output
output files
files
À Describes
Describes aa system’s
system’s available
available memory
memory
À Indicates
Indicates where
where code
code and
and data
data shall
shall be
be located
located
u Develop multi-file systems
2 - 16

DSP54x - Assembly Language Tools 2-9


Module 2

Linking
l)LOHVLQSXWDQGRXWSXW
l0HPRU\GHVFULSWLRQ
l+RZWRSODFHVZLQWRKZ

OLQNFPG

REM /1. RXW

PDS

2 - 17

Example System

Program Data
Memory Memory
&[
8000 4000
FRGH 65$0
YDU
FFFF 3520 6000

8000
(3520

FRQVW A000

2 - 18

2 - 10 DSP54x - Assembly Language Tools


Module 2

Linker Command File


example1.obj
-o example1.out
-m example1.map
MEMORY
{ Page 0: /* Program */
PROM: org = 8000h , len = 8000h
Page 1: /* Data */
SRAM: org = 4000h , len = 2000h
EPROM: org = 8000h , len = 2000h
}
SECTIONS
{ .text:> PROM PAGE 0
.bss: > SRAM PAGE 1
.data:> EPROM PAGE 1
}
2 - 19

Memory Descriptor Suggestions

1. Describe each memory resource on the processor


(internal RAM and/or ROM)
2. Describe each external memory chip in your system
3. Combine contiguous memory segments, if desired
4. Split any memory segment into multiple segments,
if desired
5. Name memory segments with useful names; e.g.:
À Types of memory chips (EPROM, RAM, EEPROM)
À Usage (vectors, code, variables)
À Chip layout names (U1, E2)

2 - 20

DSP54x - Assembly Language Tools 2 - 11


Module 2

Exercise 2c: Example System

Program SPRAM DARAM Data


0 8000
16K ‘C541 32K
SRAM DEPROM
(µP mode)

8000
32K
EPROM

2 - 21

Exercise 2c: Link Command File


example1.obj
-o example1.out
-m example1.map

MEMORY
{ PAGE ___: /* Program Memory */
______: org = ______, len = ______
______: org = ______, len = ______
________:
______: org = ______, len = ______
______: org = ______, len = ______
______: org = ______, len = ______
}

SECTIONS
{ .text: > EPROM PAGE 0
.bss: > SPRAM PAGE 1
.data: > DEPROM PAGE 1
}
2 - 22

2 - 12 DSP54x - Assembly Language Tools


Module 2

Lab 2c: Linking

Program SPRAM DARAM Data


8000
8K ‘C541 32K
EPROM DEPROM
(µP mode)
FFFF

2 - 23

Lab 2c: Linking


Procedure
1. Copy the linker command file LAB2A.CMD to LAB2C.CMD
2. Specify LAB2B as the input; and request output and map files
3. Define a system memory map to include:
TMS320C541 (µ ( P mode) with all internal RAM mapped as Data
8K Program EPROM ending at 64K
32K Data EPROM beginning at 8000h
4. Place code sections as follows:
Code into EPROM
Table into DEPROM
Variable arrays in SPRAM
5. Link and inspect the .MAP file. What addresses are assigned to:
.text _____________
.data _____________
.bss _____________
2 - 24

DSP54x - Assembly Language Tools 2 - 13


Module 2

Multiple Sections

Program Data
Memory 520 5$0 Memory

FRGH 65$0
¶&[
YDU
(3520

(3520 '(3520

YHFWRUV FRQVW

How do we put a particular code section into specific memory?


2 - 25

Named Sections

Directive Example Description

.sect .sect "vectors" Creates initialized sections


for code or constants

.usect label .usect "name", 23 Creates uninitialized sections


for variables

2 - 26

2 - 14 DSP54x - Assembly Language Tools


Module 2

Adding Reset
sum.asm vectors.asm
GHI VWDUW  UHI VWDUW
WH[W VHFW ³YHFWRUV´
VWDUW /' [$ % VWDUW
$'' \$
67/ $]
% VWDUW

GDWD
[ LQW 
\ LQW 

EVV ]
2 - 27

Linker CMD File with Vectors


sum.obj
vectors.obj
-o sum.out
-m sum.map
MEMORY
{ Page 0: /* Program Memory */
EPROM: org = 0E000h , len = 1F80h
2000h
VECS: org = 0FF80h , len = 0080h
Page 1: /* Data Memory */
SPRAM: org = 0060h , len = 20h
DARAM: org = 0080h , len = 1380h
DEPROM:org = 8000h , len = 8000h
}
SECTIONS
{ .text: > EPROM PAGE 0
.data: > DEPROM PAGE 1
.bss: > SPRAM PAGE 1
.vectors: > VECS PAGE 0
}
2 - 28

DSP54x - Assembly Language Tools 2 - 15


Module 2

Lab 2d: Multi-file Linking


P D

x
a .bss
y
start NOP
NOP
NOP .text
B start table 1 2 3 4
.data
8 6 4 2
FF80 B start
.vectors

2 - 29

Lab 2d: Multi-file Systems

Procedure
1. Create VECTORS.ASM
2. Copy LAB2B.ASM to LAB2D.ASM
Modify LAB2D to make start accessible
3. Assemble LAB2D and VECTORS
4. Copy LAB2C.CMD to LAB2D.CMD
Modify LAB2D.CMD to specify the desired input
and output files and the routing of the RESET vector
5. Link the system and inspect the .MAP file
6. Step through the code on the simulator to verify
performance.

2 - 30

2 - 16 DSP54x - Assembly Language Tools


Module 2

COFF Directive Summary

Type Directive Purpose


Initialized .text Program code
Sections .data Data Constants
.sect User-named
Uninitialized .bss Data variables
Sections .usect User-named
Constants .int Create integer
.word Create integer
.long Create aligned 32-bit constant
Labels .def Define global variable
.ref Reference global variable
.global Global declaration (.ref + .def
.def))
Misc .set Assign a value, sim to .equ
.equ or #define
.end Halt assembler

2 - 31

Exercise 2d: Multi-file Issues


Procedure
1. Fill in blanks in EX2C.CMD to support reset
vector.
2. Per EX2C.CMD,
EX2C.CMD, fill in the post-link addresses
in the left-side blanks in the ASM files.
- Branch is a 2-word instruction.
- Other instructions are single-word.
- Put an ’X’ in any blank that has no address.
- Linkage is performed in the order you
specify, with sections as the major and files
as the minor sort criteria
3. Resolve symbolic references in the right-side
blanks.
2 - 33

DSP54x - Assembly Language Tools 2 - 17


Module 2

Exercise 2d: EX2C.CMD File


mult.obj
sum.obj
vectors.obj
-o system.out
-m system.map

MEMORY
{ Page 0:
SRAM: org = 0000h , len = 4000h
EPROM: org = 0E000h , len = _____
VECS: org = _____ , len = _____
Page 1:
SPRAM: org = 0060h , len = 0020h
DARAM: org = 0100h , len = 0400h
DEPROM: org = 8000h , len = 8000h
}
SECTIONS
{ .text: > EPROM PAGE 0
.data: > DEPROM PAGE 1
.bss: > SPRAM PAGE 1
_____: > _______________
}
2 - 34

Exercise 2d: mult.asm


mult.asm Note: order of link is :
SECTION major, FILE minor
BBBB UHI ]\
BBBB GHI FPXOW
Example: yields:
BBBB. VHW  file1.obj
file1.obj file1.text
BBBBPXOW /' ]$BBBB file2.obj
file2.obj file2.text
SECTIONS{
BBBB 03< \$BBBB file1.data
.text : > ROM file2.data
BBBB $'' F$BBBB
.data: > ROM }
BBBB 67+ $]BBBB
BBBBGRQH % GRQHBBBB
GDWD
BBBBF LQW  .BBBB

2 - 35

2 - 18 DSP54x - Assembly Language Tools


Module 2

Exercise 2d: sum and vectors.asm


sum.asm vectors.asm
BBBB UHI FPXOW BBBB UHI VWDUW
BBBB GHI VWDUW]\ BBBB VHFW ³YHFWRUV´
BBBBVWDUW /' [$BBBB BBBB % VWDUWBBBB
BBBB $'' F$BBBB
BBBB 67+ $]BBBB
BBBB % PXOWBBBB

BBBB GDWD
BBBB[ LQW 
BBBB\ LQW 
BBBB EVV ]
2 - 36

DSP54x - Assembly Language Tools 2 - 19


Module 2

LAB2A.ASM : Solution

;; SOLUTION
SOLUTION FILE
FILE FOR
FOR LAB2A.ASM
LAB2A.ASM

NOP
NOP
start:
start:
start: NOP
NOP
NOP
NOP
BB start
start

2 - 37

Exercise 2b: Solution

; a = 0,1,2,3,4
; x = input array of length 5 a 0
1
; y = result 2
3
4

.data
a .int 0,1,2,3,4 x

.bss x,5
.bss y,1 y

2 - 38

2 - 20 DSP54x - Assembly Language Tools


Module 2

LAB2B.ASM : Solution

.bss
bss
..bss x,4
x,4
.bss
bss
..bss a,4
a,4
.bss
bss
..bss y,1
y,1
.data
.data
.word
.word 1,2,3,4
1,2,3,4
.word
.word 8,6,4,2,0
8,6,4,2,0
.text
.text
NOP
NOP
start:
start: NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
BB start
start
2 - 39

Exercise 2c: Link Command File


example1.obj
-o example1.out
-m example1.map

MEMORY
{ PAGE 0: /* Program Memory */
SRAM : org = 0000h , len=4000h
EPROM : org = 8000h , len = 8000h
PAGE 1: /* Data Memory */
SPRAM : org = 0060h , len = 0020h
DARAM : org = 0080h , len = 1380h
DEPROM: org = 8000h , len = 8000h
}

SECTIONS
{ .text: > EPROM PAGE 0
.bss: > SPRAM PAGE 1
.data: > DEPROM PAGE 1
}
2 - 40

DSP54x - Assembly Language Tools 2 - 21


Module 2

LAB2C.CMD : Solution

lab2b.obj
lab2b. obj
lab2b.obj .bss
bss
..bss data,4
data,4
-o
-o lab2c.out
lab2c.out ..
-m
-m lab2c.map
lab2c.map
MEMORY
MEMORY {{
PAGE
PAGE 0:
0: EPROM
EPROM :: org
org == 0E000h
0E000h len
len == 02000h
02000h
PAGE
PAGE 1:
1: SPRAM
SPRAM :: org
org == 00060h
00060h len
len == 00020h
00020h
DARAM : org = 00080h
DARAM : org = 00080h len
len == 01380h
01380h
DEPROM
DEPROM :: org
org == 08000h
08000h len
len == 08000h
08000h
}}
SECTIONS{
SECTIONS{
.text
.text :: >> EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DEPROM
DEPROM PAGE
PAGE 11
.bss
bss :: >>
..bss SPRAM
SPRAM PAGE
PAGE 11
}}

2 - 41

Exercise 2d: CMD File Solution


mult.obj
sum.obj
vectors.obj
-o system.out
-m system.map

MEMORY {
Page 0:
SRAM: org = 0000h , len = 4000h
EPROM: org = E000h , len = 1F80h
VECS: org =0FF80h , len = 0080h
Page 1:
SPRAM: org = 0060h , len = 0020h
DARAM: org = 0100h , len = 0400h
DEPROM: org = 8000h , len = 8000h
}
SECTIONS {
.text: > EPROM PAGE 0
.data: > DEPROM PAGE 1
.bss: > SPRAM PAGE 1
.vectors:> VECS PAGE 0
}
2 - 42

2 - 22 DSP54x - Assembly Language Tools


Module 2

Exercise 2d: ASM Files Solution


PXOWDVP VXPDVP
[ UHI ]\ [ UHI FPXOW
[ GHI FPXOW [ GHI VWDUW]\
. VHW  (VWDUW /' [$
(PXOW /' ]$ ( $'' F$
( 03< \$ ( 67+ $]
( $'' F$ ( % PXOW(
[ GDWD
( 67+ $] [ LQW 
(GRQH % GRQH( \ LQW 
GDWD
 EVV ]
F LQW  .
YHFWRUVDVP
[ UHI VWDUW
[ VHFW ³YHFWRUV´
)) % VWDUW(
2 - 43

LAB2D & VECTORS : Solution

.ref
.ref start
start ..def
def start
start
.sect ".vectors" .bss
bss
..bss x,4
x,4
.sect ".vectors"
bb start .bss
bss
..bss a,4
a,4
start
.bss
bss
..bss y,1
y,1
.data
.data
.word
.word 1,2,3,4
1,2,3,4
.word
.word 8,6,4,2,0
8,6,4,2,0
.text
.text
NOP
NOP
start:
start: NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
BB start
start

2 - 44

DSP54x - Assembly Language Tools 2 - 23


Module 2

LAB2D.CMD : Solution
lab2d.obj
lab2d. obj
lab2d.obj
vectors.obj
vectors. obj
vectors.obj
-o lab2d.out
-o lab2d.out
-m
-m lab2d.map
lab2d.map
MEMORY
MEMORY {{
PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS:
VECS: org
org == 0FF80h
0FF80h len
len == 00080h
00080h
PAGE
PAGE 1:
1: SPRAM:
SPRAM: org
org == 00060h
00060h len
len == 00020h
00020h
DARAM: org = 00080h
DARAM: org = 00080h len = 01380h
len = 01380h
DEPROM:
DEPROM: org
org == 08000h
08000h len
len == 08000h
08000h }}
SECTIONS{
SECTIONS{
.vectors:
.vectors: >> VECS
VECS PAGE
PAGE 00
.text :
.text : >> EPROM
EPROM PAGE
PAGE 00
.data :
.data : >> DEPROM
DEPROM PAGE 11
PAGE
.bss
bss :: >>
..bss SPRAM
SPRAM PAGEPAGE 11 }}

2 - 45

2 - 24 DSP54x - Assembly Language Tools


Addressing Modes

Learning Objectives
Learning Objectives

u List the four basic addressing modes and


identify the purpose of each.
u Express constants via immediate addressing.
u Access tables and arrays via indirect addressing -
a pointer-like process.
u Select the optimal mode when using indirect
addressing.
u Perform general purpose access to Data Memory
via direct addressing (two methods).
u Define and implement methods for controlling
page boundary crossings.
u Access stack variables and MMRs via special
versions of direct addressing
3-2

DSP54x - Addressing Modes 3-1


3-2 DSP54x - Addressing Modes
Module 3

Module 3
Addressing Modes
Type Symbol Purpose, Benefit‘
Immediate # Using constants/initialization
Long 16-bit values
Short Single cycle
Indirect * Support for pointers - access arrays, lists, tables
w. Inc/Dec 0 cycle auto increment/decrement by +/- 1
w. Index 0 cycle auto increment by “n”
Direct <default> General-purpose access to data
Absolute - or - Access any location in data memory - ‘flat memory’
Paged @ Single-cycle access within boundary
SP-relative Optimal for stack-based values (C)
MMR Optimal for DP0 values (MMR and SPRAM)
Register Operate between Acc A and B

3-3

Immediate Addressing
u Long Immediate Example:
À Allows use of constant
À Up to 16-bit operand LD #1234h,A

À 2 words, 2 cycles Load to A #


À Optimal for initialization 1 2 3 4

u Short Immediate
À Available in limited cases Example:

À 9-bit or smaller values


À 1 word, 1 cycle
LD #12h,A

À Init.
Init. Acc (8), DP (9),
Load A # 1 2
ASM (5), etc.

3-4

DSP54x - Addressing Modes 3-3


Module 3

Indirect Addressing
u Hardware support of pointer concept
u Eight ARs (Address or Auxiliary Registers) available
u AR0 also used as (optional) index
u Allows fast, efficient access to arrays, lists, tables, etc.
Example
Data
100 .bss x,100
y = ∑ xn .text x x1
x2
AR1
n =1 STM #x,AR1 x3
LD *AR1 +,A .
ADD *AR1 +,A .
x100
ADD *AR1 +,A y
...
STL A,y

3-5

Indexed Addressing
u Add step size option to auto increment.
u AR0 holds step size.
u Mode selected by using *ARn+0
ARn+0 as *ARn-0
ARn-0..
u Pre-mod fixed index w. extra cycle: *+ARn(K)
*+ARn(K)
Example
Data
100 .bss x,200
y = ∑ x2 n .text AR1 x x2
x4
n =1 STM #x,AR1
x6
STM #2,AR0 .
ADD *AR1+ 0,A .
x200
ADD *AR1+ 0,A y
...
STL A,*(y)

3-6

3-4 DSP54x - Addressing Modes


Module 3

Indirect Addressing Options


No Mod *ARn no modification to Arn
Inc/Dec *ARn+
ARn+ post increment by 1
*ARn-
ARn- post decrement by 1
Index *ARn+0
ARn+0 post increment by AR0
*ARn-0
ARn-0 post decrement by AR0
Circular *ARn+%
ARn+% post increment by 1 - circular
*ARn-%
ARn-% post decrement by 1 - circular
*ARn+0%
ARn+0% post increment by AR0 - circular
*ARn-0%
ARn-0% post decrement by AR0 - circular
Bit-Reverse *ARn+0B
ARn+0B post increment by n - bit rev (for FFT)
*ARn-0B
ARn-0B post decrement by n - bit rev (for FFT)
Pre-mod *ARn (lk
(lk)) use *(ARn
*(ARn+LK),
+LK), ARn unchanged
*+ARn (lk
(lk)) use *(ARn
*(ARn+LK),
+LK), ARn changed
*+ARn (lk
(lk)%
)% use *(ARn
*(ARn+LK),
+LK), ARn changed - circular
*+ARn pre-increment by 1, during write only
Absolute *(lk
*(lk)) absolute direct
3-7

Indirect Addressing Caveats


u Load pointers before using
u Pointer (MMR) latencies:
À no latency STM, MVDK
À 1 cycle MVDM, MVKD, MVDD
À 2 cycles STLM, ST, etc
u ARs are read/modified in access phase, so during
debug, will appear to change early.
u CMPT must = 0 (bit5, ST1)
À is 0 on reset
À is forced to 0 with RSBX CMPT
À CMPT = 1 allows old 5x-styled NARP operation
for ARs.

3-8

DSP54x - Addressing Modes 3-5


Module 3

Absolute Direct Addressing


u Actually a form of indirect addressing.
u Allows access to any data memory operand.
u Requires extra word of code and extra cycle(s).

Example
Data Memory
Addr Data
.data . .
. .
x: .word 1000h x: 01FF 1000
y: .word 0500h y: 0200 0500
. .
.text . .
LD *(x),A
Acc A 0 0 0 0 0 0 1 0 0 0
ADD *(y),A
0 0 0 0 0 0 1 5 0 0

3-9

Paged Direct Addressing


u Allows single-word/single-cycle operation
u Seven-bit address field allows access to 128 words
u Pages are selected by DP field in ST0.
.data
Data Memory
x: .word 01000 Addr Data
y: .word 00500 0180 0001
.text . .
x: 01FF 1000
LD #x,DP
y: 0200 0500
LD x,A . .
ADD y,A
Acc A DP
- - - - - - - - - - 0 0 3
0 1 0 0 0 0 0 3
0 1 0 0 1 0 0 3
3 - 10

3-6 DSP54x - Addressing Modes


Module 3

Paged Direct Addressing - Blocking

Single DP can be assured in either of two simple methods:

Specify the blocking argument in the linker command file:


.bss : > RAM BLOCK=128

Group and block variables in ASM file:


.bss x,2,1 ;request all vars together,
;third field requests block
y .set x+1 ;assign vars within block
;orig sets

3 - 11

Paged Direct - Blocking Example

.bss x,2,1
Data Memory
y .set x+1
Addr Data
.text
100
LD #x,DP 1FF ----
LD x,A 200 1000
201 0500
ADD y,A

Acc A DP
- - - - - - - - - - 0 0 4
0 1 0 0 0 0 0 4
0 1 5 0 0 0 0 4

3 - 12

DSP54x - Addressing Modes 3-7


Module 3

Paged Direct Addressing - Caveats


u Data page must be managed by programmer.
À No warnings issued by tools for crossed page.
u CPL bit in ST1 must be 0 for paged direct.
À Default condition on reset.
À Invoked with RSBX CPL.
CPL.
u Useful for fast, random access to <100 variables at a time.
À For >100 variables, use pointers.
À Not speed critical - use absolute direct.
u Recommended: Watch DP when debugging code:
WA ST0<<7, Base = , x
will display "Base = " and the first address active for paged
direct addressing cast in hex.

3 - 13

Stack Relative Direct Addressing


u Alternative to paged direct mode
u Uses 16-bit SP instead of 9-MSB DP as base
u Useful for stack-based operations

Example
Data Memory
.text
SP
SSBX CPL Acc A 0100
LD 1,A 0 0 0 0 0 0 0 1 0 0 0050
ADD 2,A 0 0 0 0 0 0 0 1 5 0

Notes:
1. SP and DP relative direct are mutually exclusive!
exclusive!
2. Restore CPL = 0 (RSBX CPL) before using paged direct again.
3. CPL = 0 on reset.
3 - 14

3-8 DSP54x - Addressing Modes


Module 3

MMR Direct Addressing


u DP is ignored - not used or modified
u CPL is ignored - not used or modified
u Allows access to all DP0 resources (MMRs
(MMRs and SPRAM)
u Invoked via MMR-specific mnemonics:
LDM, STLM MMR ↔ Acc
STM # → MMR
PSHM, POPM MMR ↔ Stack
MVDM, MVMD MMR ↔ DMem
MVMM AR, SP, BK ↔ AR, SP, BK
Example
.mmregs Allows MMR names as addresses
LDM ST1,B
OR #4000,B
STLM B,ST1

3 - 15

Memory-Mapped Registers (MMR)


Addr.
Addr. Addr.
Addr.
Name (Hex) Description Name (Hex) Description
IMR 0000 Interrupt Mask Register AR0 0010 Auxiliary Register 0
IFR 0001 Interrupt Flag Register AR1 0011 Auxiliary Register 1
----- 2-5 Reserved AR2 0012 Auxiliary Register 2
ST0 0006 Status 0 Register AR3 0013 Auxiliary Register 3
ST1 0007 Status 1 Register AR4 0014 Auxiliary Register 4
AL 0008 A accumulator low (A[15:00]) AR5 0015 Auxiliary Register 5
AH 0009 A accumulator high (A[31:16]) AR6 0016 Auxiliary Register 6
AG 000A A accumulator guard (A[39:32]) AR7 0017 Auxiliary Register 7
BL 000B B accumulator low (B[15:00]) SP 0018 Stack Pointer Register
BH 000C B accumulator high (B[31:16]) BK 0019 Circular Size Register
BG 000D B accumulator guard (B[39:32]) BRC 001A Block Repeat Counter
T 000E Temporary Register RSA 001B Block Repeat Start Address
TRN 000F Transistion Register REA 001C Block Repeat End Address
PMST 001D PMST Register
------- 01E-01F Reserved

3 - 16

DSP54x - Addressing Modes 3-9


Module 3

Register Addressing

u Allows interchange between accumulators


u Examples:
LD A,B A → B
ADD B,A A = A + B
u Can sometimes be merged with other action
ADD x,B,A A = B + x

3 - 17

Exercise 3: Addressing
Address/Data (hex) Scratch Data1 Data2 B1
Assume: DP=0 60 20h DP=4 200 100h DP=6 300 100h
CPL=0 61 120h 201 60h 301 30h
CMPT=0 62 202 40h 302 60h

Program AA B DP AR0 AR1 AR2


AR2
LD #0,DP

STM #2,AR0
STM #200h,AR1
STM #300h,AR2
LD 61h,A 120
120
ADD *AR1+,A
SUB 60h,A,B 200
ADD *AR1+,B,A 260
260
LD #6,DP
ADD 1,A
ADD *AR2+,A 390
390
SUB *AR2+,0,A
SUB #32,A
ADD *AR1-0,A,B 380
SUB
SUB *AR2-0,B,A
*AR2-0,B,A 320
320
STL
STL A,62h
A,62h

3 - 18

3 - 10 DSP54x - Addressing Modes


Module 3

Lab 3: Addressing
P D

.bss AR(dst
AR(dst)
)

.text
AR(src
AR(src)
)
.data
vectors

ACC

Code Abstract Caveats


LD (src1) l Don’t put in a loop
STL (dst1) l Use best addressing mode
LD (src2)
STL (dst2)
l Optimizations come in later labs
...
...
done: B done
3 - 19

Lab 3: Procedure
1. Copy LAB2D.ASM to LAB3.ASM.
LAB3.ASM. Modify LAB3 by
replacing the NOPs with code to copy the nine data
table values into the allocated RAM, as shown in the
diagram above.
2. Copy LAB2D.CMD to LAB3.CMD.
LAB3.CMD. Modify LAB3 as
required.
3. Assemble and link your code. Check the .LST and
.MAP files for expected results.
4. Step through the code on the simulator. Verify
performance; debug as necessary.

Optional: If time permits, add components to create a


location "status" and copy ST0 to status. Which
addressing modes are best here? Why?
3 - 20

DSP54x - Addressing Modes 3 - 11


Module 3

Exercise 3: Addressing - Solution


Address/Data (hex) Scratch Data1 Data2 B1
Assume: DP=0 60 20h DP=4 200 100h DP=6 300 100h
CPL=0 61 120h 201 60h 301 30h
CMPT=0 62 202 40h 302 60h

Program AA B DP AR0 AR1 AR2


AR2
LD #0,DP 0

STM #2,AR0 2
STM #200h,AR1 200
STM #300h,AR2 300
300
LD 61h,A 120
120
ADD *AR1+,A 220
220 201
SUB 60h,A,B 200
ADD *AR1+,B,A 260
260 202
LD #6,DP 6
ADD 1,A 290
290
ADD *AR2+,A 390
390 301
301
SUB *AR2+,0,A 360
360 302
302
SUB #32,A 340
340
ADD *AR1-0,A,B 380 200
SUB
SUB *AR2-0,B,A
*AR2-0,B,A 320
320 300
300
STL
STL A,62h
A,62h

3 - 22

LAB3.ASM : Solution
;; LAB3.ASM:
LAB3.ASM: Data
Data Xfer
Xfer solution
solution LD
LD *AR1+,A
*AR1+,A ;4
;4
..def
def start,table,x
start,table,x STL
STL A,*AR2+
A,*AR2+
.bss
bss
..bss x,4
x,4 LD
LD *AR1+,A
*AR1+,A ;5
;5
.bss
bss
..bss a,4
a,4 STL
STL A,*AR2+
A,*AR2+
.bss
bss
..bss y,1
y,1 LD
LD *AR1+,A
*AR1+,A ;6
;6
.data
.data STL
STL A,*AR2+
A,*AR2+
table:
table: .word
.word 1,2,3,4
1,2,3,4 LD
LD *AR1+,A
*AR1+,A ;7
;7
.word
.word 8,6,4,2,0
8,6,4,2,0 STL
STL A,*AR2+
A,*AR2+
.text
.text LD
LD *AR1+,A
*AR1+,A ;8
;8
NOP
NOP STL
STL A,*AR2+
A,*AR2+
start:
start: STMSTM #table,AR1
#table,AR1 LD
LD *AR1+,A
*AR1+,A ;9
;9
STM
STM #x,AR2
#x,AR2 STL
STL A,*AR2+
A,*AR2+
LD
LD *AR1+,A
*AR1+,A ;1
;1 ;; Optional
Optional process
process solution
solution
STL
STL A,*AR2+
A,*AR2+ ..mmregs
mmregs
LD
LD *AR1+,A
*AR1+,A ;2
;2 ..bss
bss status,1
status,1
STL
STL A,*AR2+
A,*AR2+ ..def
def status
status
LD
LD *AR1+,A
*AR1+,A ;3
;3 option:
option:LDM
LDM ST0,A
ST0,A
STL
STL A,*AR2+
A,*AR2+ STL
STL A,*(status)
A,*(status)
done:
done: BB done
done
3 - 23

3 - 12 DSP54x - Addressing Modes


Module 3

LAB3.CMD : Solution
lab3.obj
lab3.obj
lab3.obj
vectors.obj
vectors. obj
vectors.obj
-o
-o lab3.out
lab3.out
-m
-m lab3.map
lab3.map
MEMORY
MEMORY {{ PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS:
VECS: org
org == 0FF80h
0FF80h len
len == 00080h
00080h
PAGE 1: SPRAM: org = 00060h
PAGE 1: SPRAM: org = 00060h len
len == 00020h
00020h
DARAM:
DARAM: org
org == 00080h
00080h len
len == 01380h
01380h
}}
SECTIONS{
SECTIONS{
.vectors:
.vectors: >> VECS
VECS PAGE
PAGE 00
.text
.text :: >> EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DARAM
DARAM PAGE
PAGE 11
.bss
bss :: >>
..bss SPRAM
SPRAM PAGE
PAGE 11
}}

3 - 24

DSP54x - Addressing Modes 3 - 13


Module 3

3 - 14 DSP54x - Addressing Modes


Basic Programming Techniques

Learning Objectives
Learning Objectives
u Perform simple branch, loop control,
and subroutine operations.
u Set up and employ the stack for
subroutine call and return.
u Use the accumulator to load, store, add
and subtract 16-bit values from data
and program memory.
u Use the multiplier to implement sum-of
products equations.

4-2

DSP54x - Basic Programming Techniques 4-1


4-2 DSP54x - Basic Programming Techniques
Module 4

Module 4
Basic Program Control

Branch Call Return


B next CALL sub RET
BACC src CALA src
BC next,cnd
next,cnd,
, CC sub,cnd
sub,cnd,
, RC cnd,
cnd,

Instruction Cycles
B, CALL 4
RET 5
BACC, CALA 6
BC, CC, RC 5/3

4-3

Condition Operators

EQ NEQ OV TC C BIO
LEQ GEQ NOV NTC NC NBIO
LT GT

Pick 1 and/or Pick 1 OR Pick 1 and/or Pick 1 and/or Pick 1

Examples
RC TC
CC sub,BNEQ
BC new,AGT,AOV

4-4

DSP54x - Basic Programming Techniques 4-3


Module 4

Loop Counter: BANZ


5
y = ∑ Xn
n =1

.bss x,5
STM #x,AR1
STM #4,AR2
LD #0,A
loop: ADD *AR1+,A
B ANZ loop ,*AR2-
STL A,y

4-5

Comparison: CMPR
For (n=5; n<10; n++)
STM #5,AR1
STM #10,AR0
loop: ...
...
*AR1+
...
...
CMPR LESS,AR1
BC loop,TC

EQUAL .set 00b


Useful:
LESS .set 01b
GRTR .set 10b .include
NOTEQ .set 11b files

4-6

4-4 DSP54x - Basic Programming Techniques


Module 4

The Stack

Setup:
Data
Memory
0 STACK .usect "STK",100
STM #STACK+100,SP
STACK
Open
SP Last Used STK
Used
Use:
CALL : PC → *--SP
64K RET : *SP++ → PC

4-7

Measuring Stack Required

Determining amount of stack to allocate


can be done in four steps :
AR7 DEAD 1. Allocate a large stack and fill it with DEAD
known values :
DEAD DEAD
LD #-8531,A
DEAD 7AB3
MVMM SP,AR7
... ...
RPT #length
DEAD SP 0013
STL A,*AR7-
DEAD 6B14
SP DEAD 2. Run system to exercise all operations 0000
3. Halt and inspect stack for prior value
4. Delete excess (unused) stack

4-8

DSP54x - Basic Programming Techniques 4-5


Module 4

Exercise 4-1. Program Control

1. What is the difference between Branch and Call?


2 Why is there no Return using Accumulator?
3. Are multi-conditions based on the AND or OR of
conditions?
4. When looping "n" times, what value do you place in
the loop counter?
5. Which register(s) may be used for loop counting?
6. When adding to the stack, what happens to SP?
7. What does SP point to?
8. How many cycles do Branch operations require?
Why?

4-9

Lab 4a
1. Modify VECTORS.ASM to allocate a stack, set up the SP, and call start
2. Copy LAB3.
LAB3.CMD to LAB4A.CMD
3. Modify LAB4A.CMD to route the stack to Data RAM
4. Link LAB3.OBJ with the modified VECTORS.OBJ to produce LAB4A.OUT
5. Simulate LAB4A.OUT to verify your results, especially the placement of a
return address on the stack

VECTORS.ASM LAB4A.CMD
MEMORY
.sect ".vectors"
{Page 1
B BEGIN
RAM: org=___,len
org=___,len=___
=___
;allocate stack[.usect
stack[.usect]
]
. . .
.text
. . .
BEGIN
}
;[setup SP]
SECTIONS
;[call START]
STACK: > RAM

4 - 10

4-6 DSP54x - Basic Programming Techniques


Module 4

Dual Accumulators
A B C T A B D S

ALU
A
B
MUX M

AG AH AL BG BH BL
39 - 32 31 - 16 15 - 0 39 - 32 31 - 16 15 - 0

LD x,A ADD *AR2-,16,B


STL A,*AR1+ STH B,y

4 - 11

Instruction Formats

Load acc-
acc-Lo with Smem

Load acc-Hi
acc-Hi with Smem

Load acc w. T-SHIFT Smem

Load A with shift shft Xmem

Load A with SHIFT


SHIFT Smem

4 - 12

DSP54x - Basic Programming Techniques 4-7


Module 4

Load Accumulator: LD
LD _____, dst
Shift Type Data Memory Constant Accumulator
Low Acc Smem #k8
High Acc Smem,
Smem, 16 #K,16 !
T-reg
T-reg Value Smem,
Smem, TS src,
src, ASM
#K, [shft !
Fixed Value Xmem,
Xmem, [shft
[shft]] [shft]]
Extended Smem,
Smem, SHIFT ! src,
src, [SHIFT]

LEGEND
Smem:
Smem: single dat shft:
shft: 0<=S<=15 ASM: Acc.Shifter
Acc.Shifter K: 16-bit const.
const.
Xmem:
Xmem: ptr.data
ptr.data SHIFT: -16<=S<=15 TS: TREG(5-0) k8: 8-bit const.
const.
src,
src,dst:
dst: Acc.
Acc. A or B ! = 2-word size

4 - 13

Add and Subtract: ADD, SUB


ADD _____ or SUB _____
Shift Type Data Memory Constant Accumulator
Low Acc Smem,
Smem, src
High Acc Smem,
Smem, 16, src,
src, [dst
[dst]] #K, 16, src, [dst]] !
src, [dst
T-reg
T-reg Value Smem,
Smem, TS, src src,
src, ASM, [dst
[dst]]
Fixed Value Xmem, !
Xmem, [shft
[shft],
], src #K, [shft
[shft],
], src
Extended Smem, !
Smem, SHIFT, src,
src, [dst
[dst]] src,
src, [SHIFT], [dst
[dst]]

Dual Op: Xmem,


Xmem, Ymem,
Ymem, dst
LEGEND
Smem:
Smem: single dat shft:
shft: 0<=S<=15 ASM: Acc.Shifter
Acc.Shifter K: 16-bit const.
const.
Xmem:
Xmem: ptr.data
ptr.data SHIFT: -16<=S<=15 TS: TREG(5-0) k8: 8-bit const.
const.
src,
src,dst:
dst: Acc.
Acc. A or B ! = 2-word size

4 - 14

4-8 DSP54x - Basic Programming Techniques


Module 4

MIN, MAX

MAX dst dst = max (A, B) if A > B then C = 0


MIN dst dst = min (A, B) if A < B then C = 0

Example: z = max (xn)

AR1 .bss x,100


.bss z,1
STM #x,AR1
STM #98,BRC
x
LD *AR1+,B
RPTB loop
LD *AR1+,A
loop: MAX B
z STL B,*(z)

4 - 15

Store Accumulator: STL, STH

Shift type STL STH


None AccLo → Smem Acc >> 16 → Smem
ASM Acc << ASM → Smem Acc >> (16-ASM) → Smem
Short (Xmem
(Xmem)) Acc << shft → Xmem (16-shft)) → Xmem
Acc >> (16-shft
Extended ! Acc << SHIFT → Smem Acc >> (16-SHIFT) → Smem

LEGEND
Smem:
Smem: single dat shft:
shft: 0<=S<=15 ASM: Acc.Shifter
Acc.Shifter K: 16-bit const.
const.
Xmem:
Xmem: ptr.data
ptr.data SHIFT: -16<=S<=15 TS: TREG(5-0) k8: 8-bit const.
const.
src,
src,dst:
dst: Acc.
Acc. A or B ! = 2-word size

4 - 16

DSP54x - Basic Programming Techniques 4-9


Module 4

Store Constant to Memory

ST #K, Smem

u Direct store of constant to memory


u Accumulator not affected
u Two words, two cycles
u Alt. syntax allows store of T or TRN registers

4 - 17

MAC Unit
D
17 x 17 Multiplier :
T
- Sign / Unsigned support
T
D D P C A - 8000h x 8000h = 7FFFh
in SMUL=1 mode

17 x 17
MULTIPLIER A B 0

40-Bit Adder : ADDER (40)


- Separate from ALU M
- Sum & Add in single cycle A B U

A B

4 - 18

4 - 10 DSP54x - Basic Programming Techniques


Module 4

Multiplier Instructions
OP Options Execution
LD Smem,
Smem, T T=S

MPY Smem,
Smem, dst dst = S . T
Xmem,
Xmem, Ymem,
Ymem, dst dst = X . Y
Smem, #K, dst !
Smem, dst = S . K
#K, dst ! dst = K . T
MAC Smem,
Smem, src src = src + S . T
Xmem,
Xmem, Ymem,
Ymem, src,
src, [dst
[dst]] dst = src + X . Y
Smem,
Smem, #K, src, [dst]] !
src, [dst dst = src + S . K
#K, src,
src, [dst]]!
[dst dst = src + K . T
MAS Smem,
Smem, src src = src - S . T
Xmem,
Xmem, Ymem,
Ymem, src,
src, [dst
[dst]] dst = src - X . Y

4 - 19

Additional Multiplier Instructions

OpCode Options Execution


MPYA Smem B = S . AH
dst dst = T . AH
MACA Smem B = B + S . AH
T, src,
src, [dst
[dst]] dst = src + T . AH
MASA Smem B = B - S . AH
T, src,
src, [dst
[dst]] dst = src - T . AH
SQUR Smem,
Smem, dst dst = S2
A, dst dst = AH2
SQURA Smem,
Smem, src src = src + S2
SQURS Smem,
Smem, src src = src - S2

4 - 20

DSP54x - Basic Programming Techniques 4 - 11


Module 4

Examples

z=x+y-w y = mx + b y = x1 . a1 + x2 . a2
LD x,A LD m,T LD x1,T
ADD y,A MPY x,A MPY a1,B
SUB w,A ADD b,A LD x2,T
STL A,z STL A,y MAC a2,B
STL B,y
STH B,y+1

4 - 21

Lab 4b: Basic Programming


4
y = ∑ ( an • xn )
n =1

Program Data
Memory Memory

T AR1 1 2 3 4 RAM
Lab 3 X AR2 8 6 4 2
y
Lab 4
A LAB 3
Done
1 2 3 4 ROM
Vector 8 6 4 2

4 - 22

4 - 12 DSP54x - Basic Programming Techniques


Module 4

Lab 4b: Procedure


1. Copy LAB3.ASM to LAB4B.ASM.
LAB4B.ASM. Open LAB4B.ASM.
LAB4B.ASM.
2. Modify the initialization process to use a BANZ loop.
3. Call a routine that does the following:
a. Initialize pointers to the x and a arrays.
b. Multiply the first two array elements into the accumulator.
c. Multiply and accumulate the remaining pairs using in-line
code -- don’t use BANZ.
BANZ.
d. Store the result to memory location y .
e. Return to the main routine.
4. Setup an appropriate linker command file
5. Assemble, link, simulate and debug your code.

Optional: Obtain the maximum value of an individual product.


4 - 23

DSP54x - Basic Programming Techniques 4 - 13


Module 4

VECTOR4.ASM : Solution

;Solution
;Solution for
for VECTORS.ASM
VECTORS.ASM for
for LAB4A
LAB4A

.ref
.ref start
start

LEN
LEN .set
.set 100
100
STACK
STACK .usect
usect "STK",LEN
..usect "STK",LEN

.sect
.sect ".vectors"
".vectors"
BB BEGIN
BEGIN

.text
.text
BEGIN
BEGIN STM
STM #STACK+LEN,SP
#STACK+LEN,SP
call
call start
start

4 - 25

LAB4A.CMD : Solution

lab3.obj
lab3.obj
lab3.obj
vector4.obj
vector4. obj
vector4.obj
-o
-o lab4a.out
lab4a.out
-m
-m lab4a.map
lab4a.map
MEMORY
MEMORY {{ PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS
VECS : org == 0FF80h
: org 0FF80h len
len == 00080h
00080h
PAGE 1: SPRAM: org = 00060h
PAGE 1: SPRAM: org = 00060h len
len == 00020h
00020h
DARAM:
DARAM: orgorg == 00080h
00080h len
len == 01380h
01380h }}
SECTIONS{
SECTIONS{ .vectors :
.vectors : > > VECS
VECS PAGE
PAGE 00
.text
.text :: >> EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DARAM
DARAM PAGE
PAGE 11
..bss
bss :: >> SPRAM
SPRAM PAGE
PAGE 11
STK
STK :: >> DARAM
DARAM PAGE
PAGE 11 }}

4 - 26

4 - 14 DSP54x - Basic Programming Techniques


Module 4

LAB4B.ASM : Solution

.def
def
..def start,table,x
start,table,x sop:
sop: STM
STM #x,AR1
#x,AR1
.bss
bss
..bss x,4
x,4 STM
STM #a,AR2
#a,AR2
.bss
bss
..bss a,4
a,4 LD
LD *AR1+,T
*AR1+,T ;1
;1
.bss
bss
..bss y,1
y,1 MPY
MPY *AR2+,A
*AR2+,A
.text
.text LD
LD *AR1+,T
*AR1+,T ;2
;2
NOP
NOP MAC
MAC *AR2+,A
*AR2+,A
start:
start: STM
STM #table,AR1
#table,AR1 LD
LD *AR1+,T
*AR1+,T ;3
;3
STM
STM #x,AR2
#x,AR2 MAC
MAC *AR2+,A
*AR2+,A
STM
STM #8,AR7
#8,AR7 LD
LD *AR1,T
*AR1,T ;4
;4
loop:
loop: LD
LD *AR1+,A
*AR1+,A MAC
MAC *AR2,A
*AR2,A
STL
STL A,*AR2+
A,*AR2+ STL
STL A,*(y)
A,*(y)
BANZ
BANZ loop,*AR7-
loop,*AR7- RET
RET
CALL
CALL sop
sop .data
.data
CALL
CALL maxi
maxi table:
table: .word
.word 1,2,3,4
1,2,3,4
done:
done: BB done
done .word
.word 8,6,4,2,0
8,6,4,2,0

4 - 27

LAB4B.ASM Optional : Solution


.bss
bss
..bss max,1
max,1
maxi:
maxi: STM
STM #x,AR1
#x,AR1
STM
STM #a,AR2
#a,AR2
LD
LD *AR1+,T
*AR1+,T ;1
;1
MPY
MPY *AR2+,B
*AR2+,B
LD
LD *AR1+,T
*AR1+,T ;2
;2
MPY
MPY *AR2+,A
*AR2+,A
MAX
MAX BB
LD
LD *AR1+,T
*AR1+,T ;3
;3
MPY
MPY *AR2+,A
*AR2+,A
MAX
MAX BB
LD
LD *AR1+,T
*AR1+,T ;4
;4
MPY
MPY *AR2+,A
*AR2+,A
MAX
MAX BB
STL
STL B,max
B,max
RET
RET

4 - 28

DSP54x - Basic Programming Techniques 4 - 15


Module 4

LAB4B.CMD : Solution

lab4b.obj
lab4b.obj
lab4b.obj
vector4.obj
vector4.obj
vector4.obj
-o
-o lab4b.out
lab4b.out
-m lab4b.map
-m lab4b.map
MEMORY
MEMORY {{ PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS:
VECS: org = 0FF80h
org = 0FF80h len = 00080h
len = 00080h
PAGE
PAGE 1:
1: SPRAM:
SPRAM: org
org == 00060h
00060h len
len == 00020h
00020h
DARAM: org = 00080h
DARAM: org = 00080h len
len = 01380h }}
= 01380h
SECTIONS{
SECTIONS{ .vectors:
.vectors: >> VECS
VECS PAGE
PAGE 00
.text : >
.text : > EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DARAM
DARAM PAGE
PAGE 11
..bss
bss :: >> SPRAM
SPRAM PAGE
PAGE 11
STK
STK :: >> DARAM
DARAM PAGE
PAGE 11 }}

4 - 29

4 - 16 DSP54x - Basic Programming Techniques


Advanced Programming

Learning Objectives
Learning Objectives

u Repeat Functions
u Data Move Functions
u Dual Operands (Xmem, Ymem)
u Long , Double, & Parallel Ops

5-2

DSP54x - Advanced Programming 5-1


5-2 DSP54x - Advanced Programming
Module 5

Module 5
Repeat Next: RPT

u Features
À Next instruction iterated N+1 times
À Saves code space (1 or 2 words)
À Low overhead (1 or 2 cycles) Example :
À Easy to use int x[5]={0,0,0,0,0};
À Non-interruptible .bss x,5
STM #x,AR1
LD #0,A
RPT #4
u Options STL A,*AR1+
À RPT #k8 up to 256 iterations
À RPT #K up to 64K iterations
À RPT Smem ref. data mem for count value

5-3

Enhanced Performance with RPT


These instructions execute faster when in
a RPT loop:
MVDM MVKD MACD
MVMD MVDK MACP
MVDP READA FIRS
MVPD WRITA

Pointer setup and usage becomes more


efficient while RPT loop is active.

5-4

DSP54x - Advanced Programming 5-3


Module 5

Non-Repeatable Instructions
Generally, not operations useful to repeat; e.g.;
branches, status register ops, etc :

B[D] BC[D] BANZ[D] INTR RETE[D]


CALL[D] CC[D] RPT TRAP RETF[D]
RET[D] RC[D] RPTZ RESET BACC[D]
Far Ops XC RPTB[D] IDLE CALA[D]

ANDM
ORM LD DP RSBX MVMM
XORM LD ASM SSBX CMPR
ADDM LD ARP RND DST

Can yield errors. Won’t damage device.


5-5

Repeat and Zero: RPTZ


u Repeats following instruction N+1 times
u Additionally, zeros specified accumulators
u Uses long constant only
u Requires two words and two cycles
Example :
int x[5]={0,0,0,0,0};

.bss x,5
STM #x,AR2
RPTZ B,#4
STL B,*AR2+

5-6

5-4 DSP54x - Advanced Programming


Module 5

Block Repeat: RPTB


u Allows ’zero overhead’ looping on any
size code segment
u Is a 2-word, 4-cycle instruction
u Is interruptible
u RSA is next line of code
u REA is operand for RPTB
u BRC must be pre-loaded with ’count-1’
u May operate on any length block

5-7

RPTB Example
Add 1 to each element in the array x[5]

.bss x,5
begin: LD #1,16,B
STM #4,BRC
STM #x,AR4
RPTB next-1
ADD *AR4,16,B,A
STH A,*AR4+ } Loop 5x
next: LD #0,B

‘next-1’ assures complete fetch of possible multiword final instruction

5-8

DSP54x - Advanced Programming 5-5


Module 5

Nested Loops
STM #L-1,AR7
Level Operator Cycles
1st: out
out 1 RPT 1

STM #M-1,BRC 2 RPTB 4+2


3 BANZ 2+4 . N
RPTB 2nd-1
… …
mid
mid
RPT #N-1 u RPT uses (invisible) RC
3 2 1
inner u RPTB uses BRC, RSA, REA
mid
u Nesting RPTB possible,
mid
but not efficient
2nd: out
out
BANZ 1st,*AR7-

5-9

Exercise 5-1: Repeat Operations


1. Which repeat functions are interruptible?
2. How many/few lines of code can be in a RPTB?
RPTB?
3. Which repeat function is fastest?
4. What does RPT 5 do?

Add array x[10] to y[10] Add 100 values in the array x

5 - 10

5-6 DSP54x - Advanced Programming


Module 5

Data Move

u Faster than Load and Store


u Transfer avoids accumulator
u Allows access to program memory
u Optimal with RPT (speed and code size)

5 - 11

Optimal Initialization: MVPD


Example : x[5]={1,2,3,4,5};

Program Memory Data Memory

.bss x,5 RAM


.text
START: STM #x,AR5
RPT #4
MVPD TBL,*AR5+

… ROM
.data no data
TBL: .word 1,2,3,4,5 ROM
.sect “.vectors”
required!
B START
5 - 12

DSP54x - Advanced Programming 5-7


Module 5

Move Instructions
DATA ↔ DATA # w/c DATA ↔ MMR # w/c
MVDK Smem,
Smem, dmad 2/2 MVDM dmad,
dmad, MMR 2/2
MVKD dmad,
dmad, Smem 2/2 MVMD MMR, dmad 2/2
MVDD Xmem,
Xmem, Ymem 1/1 MVMM mmr,
mmr, mmr 1/1

PGM ↔ DATA # w/c PGM(Acc)) ↔ DATA


PGM(Acc # w/c
MVPD pmad,
pmad, Smem 2/3 READA Smem 1/5
MVDP Smem,
Smem, pmad 2/4 WRITA Smem 1/5

LEGEND
Smem:
Smem: regular data memory address dmad:
dmad: 16-bit data memory address
Xmem,
Xmem,Ymem:
Ymem: dual operand data mems pmad:
pmad: 16-bit pgm.
pgm. memory address
MMR: any memory map register mmr:
mmr: AR0-AR7, or SP

5 - 13

Exercise 5-2: Move Operations

1. Which instructions would be best for a


context save and restore?
2. Which mmrs does MVMM access?
3. Which move operations allow a run-time
selectable pmad?
pmad?
4. Write a routine to copy x[20] to y[20].

5 - 14

5-8 DSP54x - Advanced Programming


Module 5

Dual Operand Multiplication

DATA MEMORY

C BUS

D BUS

MAC A B
UNIT

5 - 15

Dual Operand MPY


y = mx + b
Standard Solution Dual Op Solution
LD m,T MPY *AR2,*AR3,A
*AR2,*AR3,A
MPY x,A ADD b,A
ADD b,A STL A,y
STL A,y

Dual Op Caveats
u May use only AR2-AR5
u Requires less code space
u Executes more quickly
5 - 16

DSP54x - Advanced Programming 5-9


Module 5

Dual Operand MPY Example


20
20
y= ∑xa
nn ==11
nn nn

LD #0,B sop1:
sop1: LD
LD #0,B
#0,B
STM #a,AR2 STM
STM #a,AR2
#a,AR2
STM #x,AR3 STM
STM #x,AR3
#x,AR3
STM #19,BRC STM
STM #19,BRC
#19,BRC
RPTB done-1 RPTB
RPTB done-1
done-1
LD *AR2+,T MPY
MPY *AR2+,*AR3+,A
*AR2+,*AR3+,A
2
MPY *AR3+,A 3 ADD
ADD A,B
A,B
ADD A,B
done:
done: STM
STM B,y
B,y
done STH B,y STL
STL B,y+1
B,y+1
STL B,y+1

Total savings: 1 cycle * 20 iterations = 20 cycles


5 - 17

Dual Operand MAC Example


20
20
y = ∑ xnnann
nn==11

sop2: STM #x,AR2


STM #a,AR3

RPTZ A,19
MAC *AR2+,*AR3+,A

STH A,y
STL A,y+1

Performance: N+2 cycles for N iterations


5 - 18

5 - 10 DSP54x - Advanced Programming


Module 5

Dual Operand MAC and MPY

MPY Xmem,
Xmem,Ymem,
Ymem,dst dst = Xmem * Ymem

MAC Xmem,
Xmem,Ymem,
Ymem,src,[
src,[dst
dst]
] dst = src + Xmem * Ymem

MAS Xmem,
Xmem,Ymem,
Ymem,src,[
src,[dst
dst]
] dst = src - Xmem * Ymem

MACP Smem,
Smem,pmad,
pmad,src,[
src,[dst
dst]
] dst = src + Smem * pmad

LEGEND
Smem:
Smem: regular data memory address dmad:
dmad: 16-bit data memory address
Xmem,
Xmem,Ymem:
Ymem: dual operand data mems pmad:
pmad: 16-bit pgm.
pgm. memory address
src:
src: source accumulator dst:
dst: destination accumulator

5 - 19

X,Y Addressing Rules


Dual operand addressing allows only certain
pointers and modes :
Pointers Modes
AR2 *ARn
AR3 *ARn+
ARn+
AR4 *ARn-
ARn-
AR5 *ARn+0%
ARn+0%

Modifiers: BK + AR0
Since the only index offered is circular, regular
index is possible only if BK is set to 0, or made
very large, e.g., FFFFh.
FFFFh.
5 - 20

DSP54x - Advanced Programming 5 - 11


Module 5

Exercise 5-3: Dual Op MAC


u What addressing options exist for dual
operand mode?
u Which multiplication instructions support
dual operands?
u Write the code to solve, for i = 1 to 10 :
y(i) = y(i-1) + x(i) . e

5 - 21

Notes

5 - 22

5 - 12 DSP54x - Advanced Programming


Module 5

Long Word Operations

Example : Z32 = X32 + Y32

Standard Operations Long Word Operations


LD XHI,16,A DLD XHI,A
ADDS XLO,A DADD YHI,A
ADD YHI,16,A DST A,ZHI
ADDS YLO,A
STH A,ZHI
STL A,ZLO

Words = 6 Words = 3
Cycles = 6 Cycles = 4

5 - 23

Long Operand Instructions

DLD Lmem,
Lmem, dst dst = Lmem
DST src,
src, Lmem Lmem = src
DADD Lmem,
Lmem, src,
src, [dst
[dst]
] dst = src + Lmem
DSUB Lmem,
Lmem, src,
src, [dst
[dst]
] dst = src - Lmem
DRSUB Lmem,
Lmem, src,
src, [dst
[dst]
] dst = Lmem - src

u Double store requires two cycles for dual E-bus activity


u Double Load/Add/Sub are single cycle in DARAM
u Double operations to single access memories take two cycles
u Default auto-increment step size is TWO
5 - 24

DSP54x - Advanced Programming 5 - 13


Module 5

Long Operand Issues


u Long operand instructions read MSW from specified
address and LSW at same address with LSB toggled
Ex1: DLD 100,A A = @100 @101

Ex2: DLD 201,B B = @201 @200

u Recommended: Align words in memory so that MSW is at


even address
Ex1: .long 12345678h even 1234
odd 5678

Ex2: .bss XHI,2,1,1 even XHI


odd XLO
Name
Size
Page Contig
EVEN ALIGN
5 - 25

Double Word Operations


Example : Z = X + Y and F = D + E

u Split accumulators into separate LO and HI halves: SSBX C16

1. Interleave Data .data X


.align 2 D
.word X,D,Y Y
E
.word E,Z,F Z
--- or --- F
.bss X,6,1,1

2. Write Code SSBX C16



DLD X,B
DADD Y,B
DST B,Z

5 - 26

5 - 14 DSP54x - Advanced Programming


Module 5

Parallel Operations
Example : Z = X + Y and F = D + E

u Parallel load/store instructions use D Bus and E Bus


in same cycle.
.bss X,3 AR5 X LD *AR5+,16,A
.bss D,3 Y ADD *AR5+,16,A
Z ST A,*AR5
STM #X,AR5 AR6 D || LD *AR6+,B
STM #D,AR6 E ADD *AR6+,16,B
LD #0,ASM F STH B,*AR6

u Parallel ops focus on high accumulator.


u Store in parallel ops are offset by ASM value.
À ASM is a 5-bit signed field in ST1 (bits 4-0)
À ASM is best loaded with: LD #k5,ASM
u What is the error in the above example? 5 - 27

Parallel Instructions
Instruction Example Operation

LD || MAC[R] LD Xmem,
Xmem,dst dst = Xmem << 16
LD || MAS[R] || MAC[R] Ymem,[dst2]
Ymem,[dst2] dst2 = dst2 + T * Ymem

ST || MPY ST src,
src,Ymem Ymem = src >> (16-ASM)
ST || MAC[R] || MAC[R] Xmem,
Xmem,dst dst = dst + T * Xmem
ST || MAS[R]

ST || ADD ST src,
src,Ymem Ymem = src >> (16-ASM)
ST || SUB || ADD Xmem,
Xmem,dst dst = dst + Xmem
ST || LD

ST || LD ST src,
src,Ymem Ymem = src >> (16-ASM)
|| LD Xmem,T
Xmem,T T = Xmem

5 - 28

DSP54x - Advanced Programming 5 - 15


Module 5

Double, Long, and Parallel Review

u How many program words are double, long,


and parallel ops?
u How many cycles do they take to execute?
u If a ST ||LD refers to the same Acc and
DMEM, what happens?
u What is the ASM field? What does it affect?
How is it loaded?
u How should 32-bit data be aligned in memory?
How is that accomplished?

5 - 29

Bus Usage
Instruction Activity PB CB DB EB
Program Read A,D
Program Write A D
Data Single Read A,D
Data Dual Read A,D A,D
Data Long (32-bit) Read A,D(ms) A1,D(ls
,D(ls))
Data Single Write A,D
Data Read / Data Write A,D A,D
Dual Read / Coefficient Read A,D A,D A,D
Peripheral Write A,D
Peripheral Read* A,D

**MMRs
MMRsonly
onlyaccessible
accessiblevia
viaDDBus,
Bus,MMR
MMRaccess
accessas
asYmem
Ymemop
opyields
yieldsbad
baddata!
data!

5 - 30

5 - 16 DSP54x - Advanced Programming


Module 5

Module Review

u Fast loops : Repeat


u Fast data transfer : Move Ops
u Faster math : Dual Operands
u Fast 32-bit math : Long Ops
u Double math : Double Ops
u Two actions in one cycle : Parallel Ops

5 - 31

Lab 5: Advanced Programming


Program Memory Data Memory

.text .bss
.bss
start: …
MVPD tbl,*…
tbl,*… x ___ ___ ___ ___
… a ___ ___ ___ ___ RAM
MAC *,*
… y ___
ROM
.data
tbl:
tbl: .word 1,2,3,4
.word 8,6,4,2

.sect “.vectors”
B start

5 - 32

DSP54x - Advanced Programming 5 - 17


Module 5

Lab 5: Procedure
1. Copy LAB4B.ASM to LAB5.ASM.
LAB5.ASM. Modify LAB5 to:
a. Perform initialization with a repeated MVPD.
MVPD.
b. Perform the sum-of-products with a repeated dual
operand MAC.
MAC.
2. Copy LAB4.CMD to LAB5.CMD.
LAB5.CMD. Modify LAB5.CMD to:
a. Load .data to program memory.
b. Input LAB5.OBJ and create LAB5.OUT and
LAB5.MAP.
LAB5.MAP.
3. Assemble, link, and simulate the program. Debug and
verify performance.
4. Optional: If time permits, modify LAB5.ASM to use
MACP. What effects would using MACP have on the
MACP.
system implementation?

5 - 33

Lab 5: Optional
Program Memory Data Memory

.text .bss
start: …
MVPD tbl,*+ x ___ ___ ___ ___ RAM
… y ___
MACP coeff,…
coeff,…

ROM
.data
tbl:
tbl: .word 1,2,3,4
a .word 8,6,4,2

.sect “.vectors”
B start

5 - 34

5 - 18 DSP54x - Advanced Programming


Module 5

Exercise 5-1: Solution


Add 100 values in the array x Add array x[10] to y[10]
..bss
bss x,100
x,100 ..bss
bss x,10
x,10
..bss
bss y,10
y,10
STM
STM #x,AR6
#x,AR6
RPTZ A,99
RPTZ A,99 STM
STM #x,AR2
#x,AR2
ADD
ADD *AR6+,A
*AR6+,A STM
STM #y,AR3
#y,AR3
STM
STM #9,BRC
#9,BRC
RPTB next-1
RPTB next-1
LD
LD *AR2+,A
*AR2+,A
ADD
ADD *AR3,A
*AR3,A
STL
STL A,*AR3+
A,*AR3+
next:
next: LD
LD #0,A
#0,A

5 - 36

Exercise 5-2 & 5-3 : Solutions


y(i) = y(i-1) + x(i) . e
Copy x[20] to y[20]
where i = 1 to 10

..bss
bss x,10
x,10
.bss x,20 ..bss
bss y,10
y,10
.bss y,20 ..bss
bss e,1
e,1

STM
STM #x,AR2
#x,AR2
STM #x,AR2 STM #y,AR3
STM #y,AR3
STM #y,AR3 STM
STM #e,AR4
#e,AR4
LD
LD #0,A
#0,A
STM
STM #10-1,BRC
#10-1,BRC
RPT #19 RPTB loop-1
RPTB loop-1
MVDD *AR2+,*
*AR2+,*AR3+
AR3+ MAC
MAC *AR2+,*
*AR2+,*AR4,A
AR4,A
*AR2+,*AR4,A
STL
STL A,*AR3+
A,*AR3+
loop:
loop:

5 - 37

DSP54x - Advanced Programming 5 - 19


Module 5

Lab 5: Solution
.bss x,4
.bss a,4
.bss y,1
.data
tbl:
tbl: .word 1,2,3,4
.word 8,6,4,2,0
.text
start: STM #data,AR1
RPT #8
MVPD tbl,*AR1+
tbl,*AR1+
STM #data,AR2
STM #coeff,AR3
coeff,AR3
RPTZ A,3
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
STL A,*(result)

5 - 38

Lab 5 Optional: Solution

.bss x,4
.bss y,1
.data
tbl:
tbl: .word 1,2,3,4,0
a: .word 8,6,4,2
.text
STM #data,AR1
RPT #4
MVPD tbl,*AR1+
tbl,*AR1+
STM #data,AR1
RPTZ A,3
MACP *AR1+,coeff
*AR1+,coeff,A
,A
STL A,*(result)

Advantages : Faster initialization, Less RAM required


5 - 39

5 - 20 DSP54x - Advanced Programming


Pipeline Issues

Learning Objectives
Learning Objectives

u Describe the ’C54x pipeline events.

u Implement delayed branching.

u Identify and resolve pipeline conflicts.

6-2

DSP54x - Pipeline Issues 6-1


6-2 DSP54x - Pipeline Issues
Module 6

Module 6

TMS320C54x DSP
Design Workshop

Module 6
Pipeline Issues

DSP54x - Pipeline Issues 6-3


Module 6

Pipeline Operation
P F D A R X

PREFETCH EXECUTE
PAB loaded with Execution of the
PC contents. instruction and EB
loaded with write data
FETCH READ
PB loaded by DB loaded by wrapper
wrapper manager with data1 if required.
manager. CB loaded by wrapper
manager with data2 if required.
EAB loaded with data3 write
DECODE address if required.
IR loaded with ACCESS
either PB content
or IQ content. IR DAB loaded with data1
content is decoded. read access if required.
CAB loaded with data2
read address if required.
Auxliary register update.
6-3

Pipe Flow

TIME

P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 F4 D4 A4 R4 X4
P5 F5 D5 A5 R5 X5
P6 F6 D6 A6 R6 X6

6-4

6-4 DSP54x - Pipeline Issues


Module 6

Standard vs.
vs. Delayed Branch: B & BD

B P1 F D! -- -- --
2 WORDS
addr.
addr. P2 F ADDR -- -- -- 4
CYCLES
P3 F3 FLUSH -- -- --
P4 FLUSH -- -- -- --
PA F D A R XA

BD P1 F1 D1 ! -- -- -- 2 WORDS
2 CYCLES
new P2 F2 NEW -- -- --
P3 F3 D3 A3 R3 X3 2 FINAL
CODE
P4 F4 D4 A4 R4 X4 WORDS

PN FN DN AN RN XN

6-5

Delayed Branch Examples


Move branch up two words of code.

LD x,A LD x,A
ADD y,A ADD y,A
MPY z,B BD next
STL A,r MPY z,B
B next STL A,r

6 words 6 words
8 cycles 6 cycles

6-6

DSP54x - Pipeline Issues 6-5


Module 6

Delayed Operations

BD CALLD BCD
BACCD CALAD CCD
RETD RCD
BANZD RETED
RPTBD RETFD

Delayed branches are effectively two words


faster than their non-delayed version.

6-7

Delay Slot Caveats

u Delay slot is two words deep - cycles or


lines of code are not relevant
u Delay operation may not be a branch of
any kind (B, CALL, RET, RPT, etc.)
u Conditions in delay slot will be too late
u Do not load BRC in slot of RPTBD
u No PUSH/POP in CALLD or RETD delay
slots

6-8

6-6 DSP54x - Pipeline Issues


Module 6

Conditional Execution: XC
u Allows fast choice of running one or two words of code or
substitution of NOPs.
NOPs.
u Condition evaluated early, so must be set two instructions
prior.
u Avoid change of condition in last two lines prior to XC, as
they can be recognized in event of interrupt prior to XC.

XC n,cnd,cnd,cnd
-pre- CMPR GRTR,AR1
-pre- -other-
CMPR GRTR,AR1 -other-
BC next,TC XC 1,NTC
LD *AR3+,A LD *AR3+,A
next: ABS A ABS A

3 words, 5/4 cycles 2 words, 2 cycles


6-9

Exercise 6-1: Delayed Operations


u How does BD differ from B? How do they differ in code?
u What should not appear in delay slots?
u Why shouldn’t PUSH or POP appear in a CCD slot?
u What should be done if a condition is set in the delay slot
of BC?
BC?
10
u Write code using RPTBD to perform:
y = ∑ xn a n
n =1

u When would this approach be better than using RPTZ?


RPTZ?
6 - 10

DSP54x - Pipeline Issues 6-7


Module 6

Exercise 6-1: Solution

STM #7,BRC
RPTBD next-1
MPY *AR2+,*
*AR2+,*AR3+,A
AR3+,A
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
next: STL A,y
STH A,y+1

6 - 11

Lab 6

1. Modify VECTORS.ASM to employ BD.


2. What code would be most useful in the
delay slot?

Optional: If time permits, modify your


sum-of-products to be interruptible.

6 - 12

6-8 DSP54x - Pipeline Issues


Module 6

Pipeline Cases
Average
AverageC54x
C54xSystem
SystemCode
Code Analysis:
u 99% of ’C54x code
1 requires no special
30% CCCode
30% Code 70%Assembly
70% AssemblyCode
Code attention.
No
NoProblem
Problem u Latency requirements
are resolved via a table.
2
65%CALU
65% CALUOperations
Operations 5% MMR
5% MMRWrites
Writes
No
NoProblem
Problem

4 3
2% Early
2% EarlyWrite
Write 2% Protected
2% ProtectedMMR
MMRWrite
Write 1% Regular
1% RegularMMR
MMRWrite
Write
No
NoProblem
Problem Use
UseKey
Key

5 6
1.9% Usual
1.9% UsualCase
Case 0.1% Prior
0.1% PriorReg
RegMMR
MMRWrite
Write
No
NoProblem
Problem Add
Add11Cycle
Cycle

6 - 13

Pipeline Case 1 - C Code

u Compiler does not produce code with latency issues


u User need not debug C code for pipeline-related issues
u C code is ideal for non-critical speed path code.
À Operating system
À Diagnostics
À Etc.
u Allows portability of software to other platforms as
required.
u Systems can easily mix C and ASM code.

6 - 14

DSP54x - Pipeline Issues 6-9


Module 6

Pipeline Case 2 - CALU Operations

u No pipeline errors exist between CALU


operations.
u Special efforts have been made to avoid
errors without slowing down the
pipeline.
u Only one rare case exists where a CALU
activity results in a slowdown, and it is
handled automatically by the ’C54x
without data errors (W,W,R||R).

6 - 15

CALU Operations - Analysis


u The ’C54x may need to perform a fetch, two reads,
and a write in any given cycle. Depending on the
system setup, this event could occur in one cycle or be
spread over several cycles. In no case are errors
generated. Consider the following environments:
À More than one external access: multiple cycles
À Each resource in separate memories: single cycle
À Note: ’C54x memories are broken into blocks.
À More than one resource in a single ’C54x memory
block - Dual Access RAM :

Early phase P and D


Late phase C and E

6 - 16

6 - 10 DSP54x - Pipeline Issues


Module 6

Pipeline Events

Single read instructions: PA PD DA DD


DA DD
Dual read instructions: PA PD CA CD

Single write instructions: PA PD EA ED

Dual write instruction: PA PD EA ED


(2 cycles) EA ED
DD
Read/write instructions: PA PD DA EA ED

6 - 17

DARAM Events

Single read instructions: P D

Dual read instructions: P C D

Single write instructions: P E

Dual write instruction: P E


(2 cycles) E

Read/write instructions: P D E

6 - 18

DSP54x - Pipeline Issues 6 - 11


Module 6

Case Study - Latencies Avoided

WRITE STL A,*AR3+ P E


READ LD *AR2+,A P D

What if both are to the same address?

WRITE STL A,*AR3+ P E


--- LD #0,A P E
DUAL ADD *AR4,*AR5,A P C D
READ

Early write held off to allow dual access to operate w/o delay.

6 - 19

Case Study - Automatic Latency

WRITE STL A,*AR3+ P E


WRITE STH A,*AR3 P E
DUAL ADD *AR4,*AR5,A P E
READ
CD

One cycle latency automatically inserted by decoder

6 - 20

6 - 12 DSP54x - Pipeline Issues


Module 6

Pipeline Issues for MMR Activity

5% MMR
5% MMRWrites
Writes

4 3
2% Early
2% EarlyWrite
Write 2% Protected
2% Protected MMR
MMRWrite
Write 1% Regular
1% RegularMMR
MMROp
Op
No
NoProblem
Problem Use
UseKey
Key

5 6
1.9%Usual
1.9% UsualCase
Case 0.1% Prior
0.1% PriorReg
RegMMR
MMROp
Op
No
NoProblem
Problem Add
Add11Cycle
Cycle

6 - 21

MMRs That Affect Pipeline


Name Description Ph Name Description Ph

AR0 Auxiliary Register 0 R* BRC Block Repeat Counter P


AR1 Auxiliary Register 1 R RSA Block Repeat Start Address P
AR2 Auxiliary Register 2 R REA Block Repeat End Address P
AR3 Auxiliary Register 3 R
AR4 Auxiliary Register 4 R T Temporary Register X
AR5 Auxiliary Register 5 R A Acc A - written as MMR X
AR6 Auxiliary Register 6 R B Acc B - written as MMR X
AR7 Auxiliary Register 7 R --
ST0 Status Register 0
SP Stack Pointer Register R/A ST1 Status Register 1 --
BK Circular Size Register A PMST Proc. Mode Status Register --

* AR’s have been designed specially to operate ‘late’: R instead of A.


6 - 22

DSP54x - Pipeline Issues 6 - 13


Module 6

Pipeline Control Fields

ST0 DP

ST1 BRAF CPL OVM SXM C16 FRCT ASM

PMST IPTR MP/MC OVLY DROM

X OVM, SXM, C16, FRCT, ASM


R
A DP, CPL, DROM
D
F
P BRAF, MP/MC-, OVLY, IPTR

6 - 23

Pipeline Case 3 : Standard Ops on MMRs

u Standard write operations will create latency issues that must be


considered by the programmer!
u Consider the following pipeline diagram to understand how much
latency a control field requires:

Instr.
Instr. 0 writes to a control field.
P0 F0 D0 A0 R0 X0 1 word for effect.
Affect on these stages ready now
1 1

2 X2 A,B,T,SXM,ASM,OVM,FRCT,C16
3 R3 ARn,
ARn, SP(0)
4 A4 SP(1), BK, DP, CPL, DROM
example:
example: D5
5
SSBX
SSBX SXM
SXM F6
6
NOP
NOP OVLY, MP/MC-, IPTR, &
LD x,B P7
LD x,B BRC, RSA, REA, BRAF

6 - 24

6 - 14 DSP54x - Pipeline Issues


Module 6

Calculating Minimum Number of Protected Cycles

u Latency diagram shows worst case number of NOPs


to insert between store to control field and effect to be
valid.
u NOPs need not be used; any other non-involved code
may intervene.
u Extra cycle from double word dependent instructions
may be counted, reducing the number of other
intervening cycles required; e.g. :

EXPLICIT PROTECTED CYCLE IMPLICIT PROTECTED CYCLE


SSBX SXM SSBX SXM
NOP LD *(x),B
LD x,B

6 - 25

Pipeline Case 4: Early Write


u Many MMRs and bit fields are set up during
initialization and don’t get changed during runtime,
e.g.;

begin: …
SSBX SXM

Any pipeline read

more than 6 words
CALL MAIN removed from the
write is immune.
main: …
LD x,A

u These cases are very common and do not present any


pipeline concerns to the user.

6 - 26

DSP54x - Pipeline Issues 6 - 15


Module 6

Pipeline Case 5 : Protected Instructions

u Given the pipeline latency issue, it would be helpful to have


optimized instructions that operate early
À Allow faster code
À Easier to write
u Therefore, these instructions offer a one-cycle early
execution on MMR writes:
STM #K,MMR ST #K,MMR
POPD Smem POPM MMR
MVDK Smem,
Smem,dmad MVMD MMR,Smem
MMR,Smem
FRAME n

u Initialization of AR’s with no explicit latency :


ST, STM, MVMM, MVDK, MVMM
LD #k9, DP LD #k5,ASM
u Modify allows early increment, so no latency issues arise :
MAR *ARn
*ARn+
+
6 - 27

Pipeline Case 6: Protected Instruction Exception

Problem: Protected instructions attempting to write early (in the R phase)


can be blocked if a prior standard instruction writes to an
addressing register in the X phase:
STLM
STLM A,AR0
A,AR0 E
MVMM
MVMM AR2,AR1
AR2,AR1 x E
LD
LD *AR1,B
*AR1,B A

Solution: Add one protected cycle before or after STM:


STLM
STLM A,AR0
A,AR0
MVMM
MVMM AR2,AR1
AR2,AR1
nop
nop
LD
LD *AR1,B
*AR1,B
Note: Problem can be extended through a chain of special instructions:
STLM
STLM A,AR0
A,AR0
MVMM
MVMM AR7,AR6
AR7,AR6
MVMM
MVMM AR2,AR1
AR2,AR1
nop
nop
LD
LD *AR1,B
*AR1,B
6 - 28

6 - 16 DSP54x - Pipeline Issues


Module 6

Execute Unit Latencies

Control Field Latency 0 Latency 1

T STM; MVDK All other


LD x,T stores incl:
incl:
ST || LD EXP

ASM LD #k5,ASM All other


Smem,ASM stores
LD Smem,ASM

SXM C16 All stores incl:


incl:
FRCT OVM SSXM RSXM

A or B All except 1. mod acc


2. read mmr

6 - 29

Access Unit Latencies


Control Field Latency 0 Latency 1 Latency 2 Latency 3
AR, STM; ST MVKD; MVDM All other
SP (CPL=0) MVDK; MVMD MVPD; MVDD stores
MVMM POPM SP
POPD SP
BK STM; ST MVKD; MVDM All other
SP (CPL=1) MVDK; MVMD MVPD; MVDD stores
MVMM; FRAME POPM SP
PUSH; POP POPD SP
RETFD
DP (CPL=0) LD #K,DP STM; ST All other
LD Smem,DP
Smem,DP MVDK; MVMD stores
CPL STM; ST All other
MVDK; MVMD stores incl.
incl.
SSBX RSBX

6 - 30

DSP54x - Pipeline Issues 6 - 17


Module 6

Other Latencies
Ctl Field Latency 2 Latency 3 Latency 4 Latency 5 Latency 6

DROM STM; ST All other


MVDK stores
MVMD

OVLY STM; ST All other


IPTR MVDK stores
MP/MC- MVMD

BRC * SRCCD STM; ST All stores


pre-loop MVDK pre-loop
MVMD
BRAF ** All stores
pre-loop

* Note: Writing to BRC before RPTB has zero latency for


ST, STM,
ST, STM, MVDK,
MVDK, MVMD,
MVMD, and one latency for all other stores
** Avoid modifying BRAF in line prior to RPTB[D]
6 - 31

Latency Caveats

u No latency for CALU operations


u Use protected MMR writes whenever possible
u Set status early
u Use latency diagram when writing to MMRs
u For debug: focus on unprotected MMR writes
u Reference Guide has chapter on pipeline use

6 - 32

6 - 18 DSP54x - Pipeline Issues


Module 6

Exercise 6-2a
1. Determine if latency condition exists.
2. Note why.
3. Add appropriate number of NOPs to correct.

LD GAIN,T MPY *AR1,A


STM #input,AR1 POPM AR0

MPY *AR1+,A MVKD #table,*AR0

STLM B,AR2 ADD y,A


STM #input,AR3 LD #table,DP

MPY *AR2+,AR3+,A ADD table,A,B

6 - 33

Exercise 6-2b

MAC x,B STL B,coeff


B,coeff
STLM B,ST0 STLM A,SP

ADD table,A,B POPM AR0

STM #pointer,AR4 LD *AR2,A


STM #stack,SP SSBX SXM

LD VAR1,A LD data,B

6 - 34

DSP54x - Pipeline Issues 6 - 19


Module 6

Exercise 6-2a - Solution

LD GAIN,T MPY *AR1,A


STM #input,AR1 POPM AR0
nop
MPY *AR1+,A MVKD #table,*AR0

0 latency: STM 1 latency: pop ARn

STLM B,AR2 ADD y,A


STM #input,AR3 LD #table,DP
nop
MPY *AR2+,AR3+,A ADD table,A,B

1 latency: STM exc’n 0 latency: LD DP

6 - 36

6 - 20 DSP54x - Pipeline Issues


Module 6

Exercise 6-2b: Solution

MAC x,B STL B,coeff


B,coeff
STLM B,ST0 STLM A,SP
nop nop
nop nop
nop POPM AR0
ADD table,A,B

3 latencies: DP 2 latencies: SP(0)

STM #pointer,AR4 LD *AR2,A


STM #stack,SP SSBX SXM
LD VAR1,A nop
LD data,B

0 latency: STM 1 latency: SXM

6 - 37

VECTOR6.ASM : Solution

.ref start
LEN .set 100
STACK .usect
.usect "STK",LEN

.sect ".vectors"
BD start
STM #STACK+LEN,SP

6 - 38

DSP54x - Pipeline Issues 6 - 21


Module 6

6 - 22 DSP54x - Pipeline Issues


Numerical Issues

Learning Objectives
Learning Objectives
uu Identify
Identify&
&resolve
resolveissues
issuesfor:
for:
ÀÀ Multiplication
Multiplication
ÀÀ Addition
Addition//Subtraction
Subtraction
ÀÀ Division
Division
u Select the appropriate numerical models
À Integer vs.
vs. Fraction
À Signed vs.
vs. Unsigned math
À Rounding vs.
vs. Truncation
À Overflow vs.
vs. Carry
À Fixed point vs.
vs. Floating point
u List mnemonics to perform
À Extended precision math
À Boolean Operations
7-2

DSP54x - Numerical Issues 7-1


7-2 DSP54x - Numerical Issues
Module 7

Module 7
Integer Multiplication
u Integer multiplication yields products larger than the inputs, as can
be seen in the example below, using single digit decimal values as
inputs:

9 value
x 9 times value
8 1 yields double size result

u Does the user store the lower (1) or upper (8) result?
u Both must be kept, resulting in additional resources (two cycles,
words of code, and RAM locations) to complete the store.
u Worse, how can the double-sized result be used recursively as
an input in later calculations, given that the multiplier inputs
are single-width?
7-3

Fractional Multiplication
u Multiplication of fractions yields products that never exceed the
range of a fraction, as can be seen in the example below, using
single digit decimal fractions as inputs:

. 9 value
x . 9 times value
. 8 1 yields double size result

u Don’t we still have a double sized result to store?


u In this case, we can store just the upper result (.8)
u This allows storage of result with fewer resources
u Results may be used recursively
u Has accuracy been lost by dropping the lower accumulator value?
7-4

DSP54x - Numerical Issues 7-3


Module 7

Accuracy vs. Precision


u Often the programmer wants to retain the fullest accuracy
of a calculation, thus dropping the 16 LSB’s of the result in
the previous example seems a bad choice.
u Note though, the inputs: how much accuracy do they offer?
u The product offers double precision but its’ accuracy is
based on the single-width inputs.
u Thus, storing a single precision result is not only an efficient
solution, but represents the limit of the accuracy of the
result.
u The accumulator is double-sized for two reasons:
À To allow for integer operations, which would possibly require
the LSBs for the result.
À So that sum-of-product operations will generate accumulative
noise at the 32nd vs.
vs. the 16th bit.

7-5

Notes

7-6

7-4 DSP54x - Numerical Issues


Module 7

Two’s Complement Fractions


u How can fractions be represented in binary?
u Since fractions have a range of +1 to -1, we will have
to create a system capable or representing this range.
u Since negative numbers are involved, a two’s
complement system is required. Two’s complement
numbers follow these rules:
À The bits are a binary weighted progression
À The MSB and only the MSB is of negative sign
À Complement equals invert plus one
À Small values written to large registers require sign extension
u Given items 1 and 2 above, we can create the
following fractional model:

-1 1/2 1/4 1/8 ...


7-7

Fractional Example
u The following example demonstrates how two’s
complement fractions perform under multiplication.
u The 4/8 bit model shown here behaves identically to
the 16/32 bit TMS320 device
0100 u What values do the inputs represent?
x 1101
u What is the result?
0100
0000
u What should be stored to memory?

0100 Accumulator 1111


1100 1111 0100
0100
1110100 Data Memory 1110

u What are the Q-types of the input, accumulator,


and output values?
7-8

DSP54x - Numerical Issues 7-5


Module 7

Redundant Sign Bit


S x x x Q3 u Multiplication of two signed
* S y y y Q3 numbers yields product with
two sign bits
S S z z z z z z Q6
u Extra sign bit causes problems
if stored to memory as result:
or, with FRCT=1:
À Wastes space
S z z z z z z 0 Q7
À Creates off-size Q
u Solution: Fractional mode bit!
SSBX FRCT u When FRCT (mode bit in ST1) ST1)
is set, the multiplier output is
... left-shifted by one
MPY *AR2,*AR3,A
u For 16-bit ‘C54x:
STH A,*(z) Q15*Q15=Q15

7-9

Exercise 7a : Multiplier Issues

1. Does the C54x support integer operations?

2. What is the optimal numerical type? Why?

3. How is the extra sign bit in fractional multiply handled?

7 - 10

7-6 DSP54x - Numerical Issues


Module 7

Accumulation

u With fractions, we were able to guarantee that no


multiplicative overflow could occur, ie:
ie: F*F<=F.
u For addition, this rule does not apply, ie:
ie: F+F>F.
u Therefore, we need additional measures to manage the
possibility of overflow for accumulation. Two general
methods apply:
À Guard Bits: the ‘C54x offers an 8-bit extension above the
high accumulator to allow valid representation of the
result of up to 256 summations.
À Non-gain Systems: offer additional criteria that allow a
simple solution for unlimited length summations.

7 - 11

Guard Bits
u Guard Bits: the ‘C54x offers an 8-bit extension above
the high accumulator to allow valid representation of
the result of up to 256 summations.
AG AH AL

39 31 15 0

BG BH BL

u At the conclusion of the summation, what should be


done?
À Store all accumulator components?
À Store only high accumulator?
À What should be done about guard values?
7 - 12

DSP54x - Numerical Issues 7-7


Module 7

Saturation (SAT)
u SAT instruction saturates value exceeding 32-bit range in
the selected accumulator:
SAT A -or- SAT B
u Provides single-cycle ‘clipping’ function:
Before saturating After saturating
256
1

0
-1
-256
À Values not overflowed are unchanged
À Positive overflows are set to : 00 7FFF FFFF h
À Negative overflows are set to : FF 8000 0000 h
u Is automatic on store if SST=1 (LP devices)
7 - 13

Overflow Bits (OVA, OVB,OVM)


u Overflow (OV
(OV)) is used to record if the range of AccHi is ever exceeded.
À OV is a latched event: once set it remains set
À OV is cleared only by:
À Test of OV: BC oflow,OVB oflow,OVB
À Write to OV : RSBX OVA or STM #val #val,ST0
,ST0
À System reset
À OV is largely obsolete, given the presence of the Acc Guard.
u Overflow Mode (OVM) causes accumulator to saturate at 32nd bit:
À Acc = 0x00 7FFF FFFF is the positive limit
À Acc = 0xFF 8000 0000 is the negative limit
u Setting OVM (SSBX OVM) causes guard bits to be unused.
(SSBX OVM)
u Setting OVM makes accumulator values non-linear even if subsequent
terms would have corrected for intermediate overflows.
u Overflow mode is generally undesirable, and should usually be turned
off (RSBX OVM)
(RSBX OVM)
7 - 14

7-8 DSP54x - Numerical Issues


Module 7

Non-gain Systems
u Many systems can be modeled to have no DC gain:
À Filters with low Q.
À Any systems scaled by its’ maximum gain value.
u Input values from A/D converters are automatically
fractions, if the limits of the A/D are presumed to be +/- 1.
u Coefficient values can similarly bounded by making the
largest value the scaling factor for all other values.
u For these systems, it is known that the final value of the
process is less than or equal to the input values.
u The accumulator therefore can be allowed to temporarily
overflow, since the final result is known to be bounded by
+/- 1.
u Allows maximum usage of selected A/D and D/A
converters
À D/A bits for gain are more expensive than using analog
components
7 - 15

Number Circle
7FF0h
+ 100h = 80F0h
Overflowed Intermediate Results
+ 10h = 8100h
- 200h = 7F00h Valid Final Result

~1 –1
~1 7FFFh 7FFFh 8000h

OV

+½ –½
0 4000h C000h

–1 8000h 0001h FFFFh


0000
7 - 16

DSP54x - Numerical Issues 7-9


Module 7

Fractional Representation

~1 32K 7FFFh

½ 16K 4000h

0 ⇒ 0 0000
* 32768
–½ –16K C000h

–1 –32K 8000h
Fractions Integers Hex

To store 0.707 type:


.word 32768*707/1000
7 - 17

Handling Amplifier Functions

u Gain is best handled external to the ‘C54x


u Allows DSP to perform frequency shaping functions
À higher precision than analog
À lower cost
À more stable
À readily supports adaptive systems
u Analog system can perform gain functions
À Op Amps and resistors are very low cost
À DC gain can easily be made very accurate
À Adaptive DC gain in analog
À Not as easy, but reasonable
À May be controlled by ‘C54x

7 - 18

7 - 10 DSP54x - Numerical Issues


Module 7

Exercise 7b : Accumulation Issues

1. How wide are the accumulators?

2. What are guard bits for?

3. What is the easiest way to avoid accumulative overflow?

4. When is saturation useful?

5. What benefit do OVA, OVB, and OVM serve?

7 - 19

LAB 7 : Fractional Math


Program Memory Data Memory

.text
start: …
MVPD tbl,*…
tbl,*… .bss
.bss

MAC *,* x ___ ___ ___ ___
… a ___ ___ ___ ___ RAM

.data y ___
ROM tbl:
tbl: .word 0.1 , 0.2
.word 0.3 , 0.4
.word 0.8 , 0.6
.word 0.4 , 0.2

.vectors
B start

7 - 20

DSP54x - Numerical Issues 7 - 11


Module 7

LAB 7 : Procedure
1. Copy LAB5.ASM to LAB7.ASM.
LAB7.ASM. Modify LAB7 to:
a. Use the fractional data table shown above
b. Perform fractional multiplication
What status bits will be important for this routine to
perform correctly?
2. Copy LAB5.CMD to LAB7.CMD.
LAB7.CMD. Modify LAB7.CMD to
input LAB7.OBJ and create LAB7.OUT and LAB7.MAP.
LAB7.MAP.
3. Assemble, link, and simulate the program. Debug and verify
performance. What answer did you get?
4. To better view the result on the simulator, try:
WA *(y)/327,y = 0 .,d

5. Optional: Time permitting, repeat your experiment using


some negative array values. Was your result as expected?

7 - 21

Division
u The ‘C54x does not have a single cycle 16-bit divide instruction
À Divide is a rare function in DSP
À Division hardware is expensive
u The ‘C54x does have a single cycle 1-bit divide instruction: conditional
subtract or SUBC
À Preceded by RPT #15, #15, a 16-bit divide is performed
À Is much faster than without SUBC
u The SUBC process operates only on unsigned operands, thus software
must:
À Compare the signs of the input operands
À If they are alike, plan a positive quotient
À If they differ, plan to negate (NEG
(NEG)) the quotient
À Strip the signs of the inputs
À Perform the unsigned division
À Attach the proper sign based on the comparison of the inputs
7 - 22

7 - 12 DSP54x - Numerical Issues


Module 7

Division Routine
LD @den,16,A
MPYA @num B = num*den
num*den (tells sign)
ABS A Strip sign of numerator
STH A,@den
LD @num,A
num,A
ABS A Strip sign of denominator

RPT #15 16 iterations


SUBC @den,A 1-bit divide

XC 1,BLT If result needs to be negative


NEG A Invert sign
STL A,@quot
A,@quot Store negative result

7 - 23

Learning Objectives
u Identify & resolve issues for:
À Multiplication
À Addition / Subtraction
À Division
u Select the appropriate numerical models
À Integer vs.
vs. Fraction
À Signed vs.
vs. Unsigned math
À Rounding vs.
vs. Truncation
À Overflow vs.
vs. Carry
À Fixed point vs.
vs. Floating point
u List mnemonics to perform
À Extended precision math
À Boolean Operations
7 - 24

DSP54x - Numerical Issues 7 - 13


Module 7

Rounding
u Result of multiplication can be rounded for MPY,
MPY, MAC
and MAS operations. This is specified by appending
the instruction with an "R
"R" suffix.
À Example: MAC with rounding is MACR.
À Rounding consists of adding 215 to the result and then
clearing the low accumulator.
u In a long sum-of-products, only the last MAC
operation should specify rounding:
RPTZ A,#98
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
MACR *AR2+,*
*AR2+,*AR3+,A
AR3+,A
u Rounding can also be achieved with a load operation:
LDR Smem,
Smem,dst

7 - 25

Sign Extension (SXM)


Example: LD #0F794h,8,A

SXM=1

Before After
C G ACC C G ACC
X 00 E F 1 3 6 4 8 C X FF F F F 7 9 4 0 0

SXM=0

Before After
C G ACC C G ACC
X 00 E F 1 3 6 4 8 C X 00 0 0 F 7 9 4 0 0

7 - 26

7 - 14 DSP54x - Numerical Issues


Module 7

Carry Bit (C)


u Carry is:
À Used with unsigned numbers to indicate an
over/under flow condition
À Set or cleared with each calculation - it is not
latched
À Optimal for extending 32-bit accumulators to
larger-word-size calculations
u Example - 64-bit addition:
C
XD XC XB XA

7 - 27

Special Load, Add, Subtract

Carry & Borrow


ADDC Smem,src src = src + Smem + C
SUBB Smem,
Smem,src src = src - Smem - (C-)

Sign-suppressed Math
ADDS Smem,
Smem,src src = src + u(Smem
u(Smem))
SUBS Smem,
Smem,src src = src - u(Smem
u(Smem))

Load unsigned
LDU Smem,
Smem,dst dst = u(Smem
u(Smem))

7 - 28

DSP54x - Numerical Issues 7 - 15


Module 7

64-bit Add & Subtract Code


Example: z64 = w64 + x64 - y64
DLD @w1,A A = w1+w0 w3 w2 w1 w0
DADD @x1,A A += x1+x0 x3 x2 x1 x0
y3 y2 y1 y0
DLD @w3,B B = w3+w2 z3 z2 z1 z0
ADDC @x2,B B += x2+C
ADD @x3,16,B B += x3

DSUB @y1,A A -= y1+y0


DST A,@z1 z1 = w1+w0+x1+x0-y1-y0

SUBB @y2,B B -= y2+C’


SUB @y3,16,B B -= y3
DST B,@z3 z3 = w3+w2+x3+x2+C-y3-y2-C’
7 - 29

Long Multiplication

X1 X0 S U
X Y1 Y0 S U

XO * Y0 U*U
Y1 * X0 S*U
X1 * Y0 S*U
Y1 * X1 S*S

W3 W2 W1 W0 S U U U

MACSU Xmem,
Xmem,Ymem,
Ymem,src src = src + u(Smem
u(Smem)*
)*Ymem
Ymem
MPYU Smem,
Smem,dst dst = u(TREG)*u(Smem
u(TREG)*u(Smem))

7 - 30

7 - 16 DSP54x - Numerical Issues


Module 7

Long Multiply Routine


STM #X0,AR2
STM #Y0,AR3
LD *AR2,T T = x0
MPYU *AR3+,A A = ux0*uy0
STL A,@W0 w0 = ux0*uy0
LD A,-16,A A = A>>16
MACSU *AR2+,*
*AR2+,*AR3-,A
AR3-,A A += x1*uy0
MACSU *AR3+,*
*AR3+,*AR2,A
AR2,A A += y1*ux1
STL A,@W1 w1 = A
LD A,-16,A A = A>>16
MAC *AR2,*AR3,A A += x1*y1
STL A,@W2 w2 = A-lo
A-lo
STH A,@W3 w3 = A-hi
7 - 31

Exponent Encoder
u One cycle exponent ( [ -8, +31 ] range) computation
u Result in T register as 2’s complement value

EXPONENT
EXPONENT
ENCODER
ENCODER
A B
6

T ALU
exp A ; 1 cycle for exp
norm A ; 1 cycle normalize

-8 0 16 31

7 - 32

DSP54x - Numerical Issues 7 - 17


Module 7

Floating Point Usage


Full Floating Point Block Floating Point

e1 m1 e m1
e2 m2 m2
e3 m3 m3

LD e1,T LD e,T
LD m1,T,A LD m1,T,A
LD e2,T ADD m2,T,A
ADD m2,T,A ADD m3,T,A
LD e3,T …
ADD m3,T,A

2*N RAM & Cycles N+1 RAM & Cycles

7 - 33

Exercise 7c : Numerical Issues

1. How is division performed on the 54x?


2. How is rounding performed on the 54x?
3. How are fractions represented in the assembler?
4. What benefit does the carry bit offer?
What instructions employ/affect the carry bit?
5. When are unsigned operations useful?
6. Does the 54x offer any form of floating point
operation?

7 - 34

7 - 18 DSP54x - Numerical Issues


Module 7

Learning Objectives
u Identify & resolve issues for :
À Multiplication
À Addition / Subtraction
À Division
u Select the appropriate numerical models :
À Integer vs.
vs. Fraction
À Signed vs.
vs. Unsigned math
À Rounding vs.
vs. Truncation
À Overflow vs.
vs. Carry
À Fixed point vs.
vs. Floating point
u List mnemonics to perform :
À Extended precision math
À Boolean Operations
7 - 35

Bitfield Test & Bit Extraction


CMPM Smem,#K
Smem,#K TC=1 if Smem=K
Smem=K
BITF Smem,#K TC=0 if Smem&K=0
Smem&K=0

BIT Xmem,bit
Xmem,bit TC=Xmem
TC=Xmem(15-bit)
(15-bit)
BITT Smem TC=Smem
TC=Smem(15-T(3-0))
(15-T(3-0))

bit

mem 15 n 0

BIT *AR2,5 LD @bit,T


BC true,TC TC n
BITT @x
BC false,NTC

7 - 36

DSP54x - Numerical Issues 7 - 19


Module 7

Boolean Operations

AND OR XOR 1 cycle


Smem,src src = src (op) Smem
src,[SHIFT],[
src,[SHIFT],[dst
dst]
] dst = dst (op) src << SHIFT

AND OR XOR 2 cycles


#K,[shft
#K,[shft],
],src
src,[
,[dst
dst]
] dst = src (op) #K << shft
#K,16,src
#K,16,src,[
,[dst
dst]
] dst = src (op) #K << 16

ANDM ORM XORM ADDM 2 cycles


#K, Smem Smem = Smem (op) #K

7 - 37

Shift and Rotate Operations


SFTA src,SHIFT,[
src,SHIFT,[dst
dst]
] C 39 32 31 0 0

Sx 39 32 31 0 C

SFTL src,SHIFT,[
src,SHIFT,[dst
dst]
] C - 00 - 31 0 0

0 - 00 - 31 0 C

ROLTC src C - 00 - 31 0 TC

ROL src C - 00 - 31 0

ROR src C - 00 - 31 0

7 - 38

7 - 20 DSP54x - Numerical Issues


Module 7

Shifter Hardware
40 16
A D Bus
40 16
B C Bus
Sign Control SXM
C T(5-0)
(-16, +31) Range
Barrel Shifter ASM(4-0)
(-16, +31) (-16, +15) Range
TC Constant
40 32 (-16, +15) Range
To ALU or
MSW/LSW (0, +15) Range
Write Select

16
E Bus
7 - 39

Other Numerical Operations

ABS src,[
src,[dst
dst]
] dst = |src
|src||
NEG src,[
src,[dst
dst]
] dst = -src
-src
CMPL src,[dst] dst = src

7 - 40

DSP54x - Numerical Issues 7 - 21


Module 7

Exercise 7d : Boolean Operations

1. How are bits tested on the 54x? What’s unusual about it?

2. What boolean operations are present on the 54x?

3. Must boolean functions operate on the accumulator?

4. What is the difference between shift and rotate?

5. What is the difference between SFTA and SFTL?


SFTL?

6. What is the difference between NEG and CMPL?


CMPL?

7 - 41

7 - 22 DSP54x - Numerical Issues


Module 7

LAB7.ASM : Solution
.def
def
..def start,table,y
start,table,y start:
start: STM
STM #x,AR2
#x,AR2
.bss
bss
..bss x,4
x,4 RPT
RPT #8
#8
.bss
bss
..bss a,4
a,4 MVPD
MVPD table,*AR2+
table,*AR2+
.bss
bss
..bss y,1
y,1 CALL
CALL sop
sop
.data
.data done:
done: BB done
done
table:
table: .word
.word 32768*1/10
32768*1/10 sop:
sop: STM
STM #x,AR2
#x,AR2
.word
.word 32768*2/10
32768*2/10 STM
STM #a,AR3
#a,AR3
.word
.word 32768*3/10
32768*3/10 RSBX
RSBX OVM
OVM
.word
.word 32768*4/10
32768*4/10 SSBX
SSBX SXM
SXM
.word
.word 32768*8/10
32768*8/10 SSBX
SSBX FRCT
FRCT
.word
.word 32768*6/10
32768*6/10 RPTZ
RPTZ A,#3
A,#3
.word
.word 32768*4/10
32768*4/10 MAC
MAC
*AR2+,*
*AR2+,*AR3+,A
AR3+,A
*AR2+,*AR3+,A
.word
.word 32768*2/10
32768*2/10
STH
STH A,*(y)
A,*(y)
.text
.text
RET
RET
NOP
NOP

7 - 43

DSP54x - Numerical Issues 7 - 23


Module 7

7 - 24 DSP54x - Numerical Issues


Fundamental DSP Applications

Learning Objectives
Objectives

u Describe how FIR/IIR filters operate


u Implement delay lines in two ways
u Write code for FIR/IIR filters on the 54x
u Translate signal flow diagrams to 54x code
u Employ techniques to avoid IIR instability
u Select the best filter type for a given need

8-2

DSP54x - Fundamental DSP Applications 8-1


8-2 DSP54x - Fundamental DSP Applications
Module 8

Module 8
Finite Impulse Response (FIR) Filter
Circular Buffer or Linear Buffer

X0 X1 X2
x in zz–1–1 zz–1–1
LD x2, T

a0 × a1 × a2 ×

MAC a2, A

y out

y(n) = a0 × x(n) + a1 × x(n–1) + a2 × x(n–2)


8-3

I/O Memory Read & Write

PORTR
PORTR PA,Smem
PA,Smem
PA,Smem PA
PA Smem
Smem
PORTW
PORTW Smem,PA
Smem,PA
Smem,PA PA
PA Smem
Smem

u Port operations access I/O devices


u Requires two words & two cycles
u I/O range can be up to 64K locations
u There are no I/O resources on-chip

8-4

DSP54x - Fundamental DSP Applications 8-3


Module 8

Linear Buffer (Delay Line)

1. Access from the oldest to newest sample.


2. Input the newest sample on the top of the buffer.

JUN
JUN PORTR JUL
JUN
JUN PORTR AUG
JUL
JUL
MAY
MAY JUN
JUN JUL
JUL
APR
APR MAY
MAY JUN
JUN
MAR
MAR APR
APR MAY
MAY
FEB
FEB MAR
MAR APR
APR
*ARn- JAN
JAN *ARn- FEB
FEB *ARn- MAR
MAR

DELAY
DELAY *AR2-
*AR2-

Note: DELAY operates in DARAM only!


8-5

Six-Level Circular Buffer

JUN JAN JUN JUL JUN JUL

MAY FEB MAY FEB MAY AUG

APR MAR APR MAR APR MAR

start JUN
JUN ARn JUN
JUN JUN
JUN
MAY
MAY MAY
MAY MAY
MAY
APR
APR APR
APR APR
APR
MAR
MAR MAR
MAR MAR
MAR
FEB
FEB FEB
FEB AUG
AUG ARn
end JAN
JAN JUL
JUL ARn JUL
JUL

8-6

8-4 DSP54x - Fundamental DSP Applications


Module 8

Circular Addressing Hardware

Top of Buffer A ... A 0 ... 0 Element 0

Circular
(ARn) Index A ... A x ... x Buffer
Range
Element N-1
End of Buffer + 1 A ... A BK

BK = Length “n” of Delay Line

8-7

Circular Addressing Code


FIR.ASM
X0 .usect “D_LINE”,32
.text
.text
STM
STM #32,BK
#32,BK ;BK
;BK == size
size of
of circular
circular buf
buf
.. .. .. *AR3+%
*AR3+% ;circular
;circular addressing
addressing

LINK.CMD
SECTION
SECTION
{{
D_LINE:
D_LINE: align(64) {{ }} >> RAM
RAM PAGE
PAGE 11
.. .. ..
}}

Circular buffers of length n must be aligned on 2K > n boundaries

8-8

DSP54x - Fundamental DSP Applications 8-5


Module 8

Circular Addressing Caveats

u Allows all AR modifications:


u increment or decrement
u indexing (1, K, AR0)
u Is invoked on any AR with the modulo (%) operator
u Implements true modulo addressing (pointer will never
exit array even if incremented past end of array)
u Alignment is to next larger binary boundary
u Alignment can leave gaps in RAM
u Linker will attempt to backfill unused RAM if possible on
a whole-file basis
u Recommended: Link largest blocks first. Why?

8-9

FIR Filter
X0 X1 X2 X3 X4
x in zz–1–1 zz–1–1 zz–1–1 zz–1–1

a0 × a1 × a2 × a3 × a4 ×

+ + + + y out

y(n) = a0 × x(n) + a1 × x(n-1) + a2 × x(n-2) + a3 × x(n-3) + a4 × x(n-4)


or
Y0 = a0 × X0 + a1 × X1 + a2 × X2 + a3 × X3 + a4 × X4

8 - 10

8-6 DSP54x - Fundamental DSP Applications


Module 8

FIR Filter - Linear Buffer


Direct Addressing Indirect Addressing
LD #X0,DP STM #A+3,AR2
STM #X+3,AR1
STM #4,AR0
SSBX FRCT SSBX FRCT
LOOP LD X4,T LOOP LD *AR1-,T
MPY A4,A MPY *AR2-,A
LTD X3 LTD *AR1-
MAC A3,A MAC *AR2-,A
LTD X2 LTD *AR1-
MAC A2,A MAC *AR2-,A
LTD X1 LTD *AR1-
MAC A1,A MAC *AR2-,A
LTD X0 LTD *AR1
MAC A0,A MAC *AR2+0,A
STH A,X0 STH A,*AR1
PORTW X0,PA0 PORTW *AR1,PA0
BD LOOP BD LOOP
PORTR PA1,X0 PORTR PA1,*AR1+0
Note: location X0 used as temporary output location
8 - 11

FIR Filter - Dual Op w. Delay


INIT STM #X+5,AR2 point to last datum
STM #4,AR0 ptr reset value
SSBS FRCT fractional numbers
FIR RPTZ A,#4 5 iterations
MACD *AR2-,Coef,A Mpy, Acc, Delay
STH A,*AR2 X=result
PORTW *AR2+,PA0 X to DAC, inc to X0
BD FIR loop (soon)
PORTR PA1,*AR2+0 get new X0, inc to X4
.data
COEF .word A4,A3,A2,A1,A0 coeffs: old to new
X .usect “daram”,1+5+1 X,d.line, 1st delay

8 - 12

DSP54x - Fundamental DSP Applications 8-7


Module 8

FIR Filter - Dual Op w. Circ.Buffer


STM #A4,AR3 coeff ptr
STM #X4,AR2 circ buf ptr
STM #-1,AR0 dual op dec
STM #5,BK circ buf size
SSBX FRCT fractions
FIR RPTZ A,#4 5 iterations
MAC *AR2+0%,*AR3+0%,A SOP & circ
STH A,*AR2 result to old x
PORTW *AR2,PA0 result to DAC
BD FIR branch soon
PORTR PA1,*AR2+0% get new data,
inc to old

Note: in dual op mode *Arn


*Arn-%
-% is indirectly supported
via *Arn
*Arn+0%
+0% by setting AR0 = -1
8 - 13

Second-Order IIR Filter


w(n)
x(n) + X0 × + y(n)

B0
zz–1–1
-A1 B1

× X1 ×

zz–1–1
-A2 B2

× X2 ×

Feedback Path - Poles Forward Path - Zeros


8 - 14

8-8 DSP54x - Fundamental DSP Applications


Module 8

IIR Filter - Single Operand


LD #x0,DP
SSBX FRCT
IIR: PORTR 0000,x0
x0 as Input Handler
LD x0,16,A
LD x1,T Feedback
MAC a1,A Section
LD x2,T
MAC a2,A
STH A,x0
MPY b2,A
x0 as Delay Element LTD x1
MAC b1,A
LTD x0 Forward
MAC b0,A Section
STH A,x0
x0 as Output Handler BD IIR
PORTW x0,0001
8 - 15

IIR Filter- Dual Operand


SSBX FRCT
STM #X2,AR3
STM #Coeff+4,AR4
MVMM AR4,AR1
STM #6,BK
STM #-1,AR0
IIR: PORTR 0001h,*AR3
LD *AR3 ,16 ,A Feedback
MAC *AR3+0%,*AR4-,A Path
MAC *AR3+0%,*AR4-,A
STH A,*AR3
MPY *AR3+0%,*AR4-,A
MAC *AR3+0%,*AR4-,A Forward
MAC *AR3 ,*AR4-,A Path
STH A, *AR3
MVMM AR1,AR4
BD IIR
PORTW *AR3,0002h
8 - 16

DSP54x - Fundamental DSP Applications 8-9


Module 8

Classical Form IIR

x(n) + + y(n)

zz–1–1 zz–1–1
b 11 a 11
+ +

zz–1–1 zz–1–1
b 12 a 12

uGain (pole, feedback section) after attenuation (zero, foward section)


uLess need for input scaling
uMore robust
uAlternate coding model
8 - 17

Classical IIR Code


STM #X,AR2
STM #A,AR3
STM #Y,AR4
STM #B,AR5
STM #3,BK
STM #-1,AR0
IIR: PORTR 0001h,*AR2
MPY *AR2+0%,*AR3+0%,A
MAC *AR2+0%,*AR3+0%,A
MAC *AR2 ,*AR4+0%,A Even
MAS *AR4+ ,*AR5+ ,A Iteration
MAS *AR4 ,*AR5- ,A
STH A, *AR4
PORTW *AR4,0002h
...
MAS *AR4+ ,*AR5+ ,A Odd
MAS *AR3 ,*AR5- ,A
STH A, *AR4 Iteration
BD IIR
PORTW *AR3,0002h
8 - 18

8 - 10 DSP54x - Fundamental DSP Applications


Module 8

IIR Solutions Comparison

Parameter 1 Operand 2 Operand Classical


Cycle Count 12(M) + 4(P) 9(M) + 4(P) 6(M) + 4(P)
Code Size 20 24 34
Reg’s Used 0 3 + BK 5 + BK

8 - 19

IIR Implementation Issues

u Break down high-order systems

u Scale down coefficients that are ≥ 1

u Input scaling

u Optimal Topology

8 - 20

DSP54x - Fundamental DSP Applications 8 - 11


Module 8

Break Down High-Order IIR

d(n) w(n)
x(n) + + + + y(n)

zz–1–1 zz–1–1

a 11 b 11 a 21 b 21
+ + + +

zz–1–1 zz–1–1
a 12 b 12 a 22 b 22

8 - 21

Scale Down Coefficient ≥1


w(n)
x(n) + X0 × + y(n)

B0
–(A1)/2
– A1 zz–1–1
B1
×
X1 ×
×

zz–1–1
–(A1)/2

× X2 ×

–A2 B2
8 - 22

8 - 12 DSP54x - Fundamental DSP Applications


Module 8

Input Scaling

PORTR 0001h,Xin PORTR 0001h,Xin


LD Xin , 16, A LD Xin , 16-3, A

Q31 format Divide by 8

1 7

8 - 23

Optimal IIR Topology

x(n) + + + + + + y(n)

zz–1–1 zz–1–1 zz–1–1 zz–1–1

b11 a11 b21 a21 b31 a31


+ + + + + +

zz–1–1 zz–1–1 zz–1–1 zz–1–1


b12 a12 b22 a22 b32 a32

Best blend of efficiency and peformance


by preceding a gain stage (pole) with a zero stage
8 - 24

DSP54x - Fundamental DSP Applications 8 - 13


Module 8

IIR vs. FIR Filters

uFIR:
uAll zero implementation
uUnconditionally stable
uLinear Phase possible
uBest for phase encoded data
uIIR
uPole & zero implementation
uStable if no errors made
uMuch better frequency performance
uBest for frequency discrimination
8 - 25

Lab 8 : Recursive Filter


Implement the signal flow diagram on the ‘C54x.
Y0
+ I/O Port 0

zz–1–1 A = 1.975
A B = –1.000
y(0) = 0.000
× Y1
y(1) = 0.1400
zz–1–1 y(2) = ?
B

× Y2

Notes: Y0 is the current output based on the two prior outputs, Y1 and Y2.
Initial conditions y(0) and y(1) are given, so the ‘54x will begin processing at t=2.
Since location Y0 is not an input value, results can be directly written to Y1.

8 - 26

8 - 14 DSP54x - Fundamental DSP Applications


Module 8

Lab 8 : Procedure
u Create a new assembly file to:
1. Allocate RAM for coefficients and delay line
2. Establish a ROM table for coefficients and intial conditions
3. Initialize ROM into RAM
4. Initialize processor modes
5. Write code to implement signal flow diagram in infinite loop
6. Build reset vector
u Assemble the program
u Link the program using an appropriate linker command file
u Run the program on the simulator through 40 loops
u Exit the simulator and view your results by typing: PLOT OUT.DAT
u Verify the results with the instructor
u Time permitting, consider optimizing your code

8 - 27

Lab 8: Equations

y(n) = A*y(n-1) + B*y(n-2)


Y(z) = A*z-1*Y(z) + B*z-2*Y(z)
Y(z)*[1 - A*z-1 - B*z-2] = 0
solving for roots:
z =[ A +/- (A2 + 4B)1/2] / 2
if z is complex, then A2+4B < 0, so
z = [ A +/- j*(-A2 - 4B)1/2] / 2
|z| = [ A2/4 + (-A2-4B) / 4 ]1/2
|z| = [ -B ] 1/2
Therefore, if B = -1,
|z| = 1

8 - 28

DSP54x - Fundamental DSP Applications 8 - 15


Module 8

Lab 8: Solution - Parts 1 & 2


**** 1. Allocate RAM for coefficients and delay line

.bss
.bss a,4,1 Alloc RAM in 1 page
b .set a+1
y1 .set a+2
y2 .set a+3

**** 2. Establish a ROM table for coeff’s and int.


int.conds.
conds.
.data
TBL .word 32768*1975/2000 A/2 (q15)
.word 32768*(-1) B (q15)
.word 32768*14/100 Y1 (q15)
.word 32768*0 Y2 (q15)
SIZE .set $-TBL

8 - 30

Lab 8: Solution - Parts 3 & 4


**** 3. Initialize ROM into RAM

.text Begin code space


start STM #a,AR7 Pointer to RAM array
RPT #SIZE-1 Loop # of TBL elements
MVPD TBL,*AR7+ Copy ROMs to RAMs

**** 4. Initialize processor modes

SSBX FRCT For Q15*Q15 -> Q31


LD #a,DP Set page for direct addressing
RSBX OVM Allow use of guard bits
SSBX SXM Two's comp numbers

8 - 31

8 - 16 DSP54x - Fundamental DSP Applications


Module 8

Lab 8: Solution - Parts 5 & 6


**** 5. Write code to implement signal flow diagram ...

SINE LD y2,T T = y2
MPY b,A A = b*y2
LTD y1 T = y1 , y1 -> y2
MAC a,A A = (a*y2)/2 + b*y2
MAC a,A A = a*y2 + b*y2
STH A,y1 y0 -> y1
PORTW y1,0000 write to out.dat
out.dat file
B SINE loop ...

**** 6. Build reset vector

.include VECTOR6.SSM

8 - 32

SIMINIT.CMD

ma 0x0000,0, 0x0100, R|W|EX Small Ext’l Pgm Mem at 0


ma 0x9000,0, 0x1000, R|W|EX 4k of Ext’l " "
ma 0xe000,0, 0x1000, R|W|EX 4k of Ext’l " "
ma 0xff80,0, 0x0080, R|W|EX Vector Area " "
ma 0x0000,1, 0x0060, R|W MMRs in Data Mem
ma 0x0060,1, 0x0020, R|W SPRAM "
ma 0x0080,1, 0x0380, R|W RAM 0 "
ma 0x0400,1, 0x0400, R|W RAM 1 "
ma 0x1400,1, 0x0400, R|W|EX Small X Data Mem at 1400
ma 0x8000,1, 0x1000, R|W|EX 4k of Extl Mem Org 8000
ma 0x0,2,1,oport
0x0,2,1,oport Output Port 0
mc 0x0,2,1,out.dat
0x0,2,1,out.dat,W
,W

8 - 33

DSP54x - Fundamental DSP Applications 8 - 17


Module 8

8 - 18 DSP54x - Fundamental DSP Applications


Algorithms

Learning Objectives
Learning Objectives

u List the advanced C54x instructions

u Associate the advanced mnemonic


with the algorithmic need

u Identify the architectural components


that provide advanced performance

u Experiment with some of these


instructions on the simulator
9-2

DSP54x - Algorithms 9-1


9-2 DSP54x - Algorithms
Module 9

Module 9
Advanced Applications

u FIRS Symmetrical FIR filter


u LMS Adaptive filtering
u POLY Polynomial evaluation
u STRCD Code book Search
SACCD
SRCCD
u DADST Viterbi algorithm
DSADT
CMPS
9-3

Symmetric FIR Filter


Coeffs

Symmetric FIR Filters


a0 a1 a2 a3 a3 a2 a1 a0 are commonly used in
applications where phase
Data distortion may degrade
the signal quality,
New Old eg:
eg: modems.
x(8) x(6) x(4) x(2)
x(7) x(5) x(3) x(1)

The general form of this FIR equation is writtenusing 8 Mult’s,7


Mult’s,7 Adds
Y(n) = a0x(8)
a0x(8)+a1
+a1x(7)
x(7)+a2
+a2x(6)
x(6)+a3
+a3x(5)
x(5)+a3
+a3x(4)
x(4)+a2
+a2x(3)
x(3)+a1
+a1x(2)
x(2)+a0
+a0x(1)
x(1)

In the specific case of a Symmetric FIR we can write


using 4 Mult’s,7
Mult’s,7 Adds
Y(n) = a0(x(8)+x(1)
a0(x(8)+x(1))+a1(
)+a1(x(7)+x(2)
x(7)+x(2))+a2(
)+a2(x(6)+x(3)
x(6)+x(3))+a3(
)+a3(x(5)+x(4)
x(5)+x(4)))

9-4

DSP54x - Algorithms 9-3


Module 9

FIRS Implementation
1. Split the data into two parts; New and Old.
2. Set up circular buffers for each part. Set up the pointers for the buffers to the
newest of “New” and the oldest of “Old”. Set up a coeffient table.
New Old Coefficients
x(5)
x(9) A/D x(4) Higher a0
addresses
x(6) x(3) a1
AR2 AR3
x(7) x(2) a2
x(8) x(5)
x(1) a3
BH = a0( x(8)+x(1) ) + a1( x(7)+x(2) ) + a2( x(6)+x(3) ) + a3( x(5)+x(4) )
3. Sum the first two data points into the high A accumulator (AH) and
decrement the data pointers.
4. Zero the B accumulator and repeat the following four times:
a. Multiply AH times the coefficient, accumulate the result into the high B
accumulator (BH) and increment the coefficient pointer.
b. Sum the next two data points and decrement the data pointers.
5. Store the result (BH) & set data pointers to oldest “Old” and oldest “New”.
6. Replace oldest “Old” value with oldest “New” value. Dec. “Old” pointer.
7. Replace oldest “New” value with a new input datum and go to step 3.
9-5

FIRS Code Example


X_new .usect “DATA1”,4
X_old .usect “DATA2”,4
LD #Y,DP
SSBX FRCT AR2 points to NEW buf
STM #X_new,AR2 AR3 points to OLD buf
STM #X_old+3,AR3 Circular buffer length = 4
STM #4,BK Emulates *ARn-%
STM #-1,AR0
FIR ADD *AR2+0%,*AR3+0%,A AH = x(8)+x(1)
RPTZ B,#3 B = 0;do the following 4 times:
FIRS *AR2+0%,*AR3+0%,COEFS B=AH*a0;AH=x(7)+x(2), etc...
STH B,Y
PORTW Y,0000h Output the result
MAR *+AR2(2)% Point to oldest OLD buf
MAR *AR3+% Point to oldest NEW buf
MVDD *AR2,*AR3+0% Xfer old NEW over old OLD
BD FIR
PORTR 0001h,*AR2 Input new X to NEW buf
.data
COEF .word a0,a1,a2,a3
9-6

9-4 DSP54x - Algorithms


Module 9

Architecture - FIRS
MAC ALU
A P C D

MPY ALU
B
ADD MUX

Acc B Acc A

FIRS *AR2+0% , *AR3+0% , COEFS

9-7

Advanced Applications

u FIRS Symmetrical FIR filter


u LMS Adaptive filtering
u POLY Polynomial evaluation
u STRCD Code book Search
SACCD
SRCCD
u DADST Viterbi algorithm
DSADT
CMPS
9-8

DSP54x - Algorithms 9-5


Module 9

Least Mean Square


A least mean square (LMS) approach is widely used for adaptive filter routines.
The technique minimizes an error term by tuning the filter coefficients.

d(n)
x(n) H(z) + e(n)
+
-
W(z) y(n)

H(z) = real system x(n) = input data


W(z) = synthesized system d(n) = desired response
e(n) = error y(n) = actual response

9-9

Adaptive FIR Filtering using LMS

x(n) z-1 z-1 ..... z-1


d(n)
b0 b1 bn-1

+
y(n) -
+ +
e(n)
LMS

FIR type filters are usually used in an adaptive algorithm since


they are more tolerant of non-optimal coefficients.

9 - 10

9-6 DSP54x - Algorithms


Module 9

LMS Loading
Each Iteration ( only once )
1 - determine error : e(i) = d(i) - y(i)
2 - scale by “rate” term B : e´(i) = 2*B*e(i)
Each Term ( N sets )
3 - Qualify error with signal strength : e´´(i) = x(i-k) * e´(i)
4 - Sum error with coefficient : b(i+1) = b(i) + e´´(i)
5 - Update coefficient : b(i) = b(i+1)
Analysis :
LMS: 1 1 SUB
2 1 MPY
3 N MPY ST
4 N ADD || MPY
5 N STH
FIR a N MPY ADD
LMS
MAC
b N ADD
c 1 STH
@ 100 tap: 500+ cycles @ 100 tap: 200+ cycles
9 - 11

LMS Instruction
LMS Xmem,
Xmem, Ymem (Xmem)) << 16 + 215
;A += (Xmem
;B += (Xmem
(Xmem)) * (Ymem
(Ymem))
Before instruction LMS
LMS *AR3+,
*AR3+, *AR4+
*AR4+ After instruction
A 00 1111 2222 00 1111 2222h A 00 2111 A222
B 00 1000 0000 + 00 1000 0000h B 00 1200 0000
FRCT 0 + 8000h FRCT 0
AR3 0100 00 2111 A222h AR3 0101
AR4 0200 AR4 0201
Data memory 00 1000 0000h Data memory
0100h 1000 + 1000h * 2000h 0100h 1000
0200h 2000 00 1200 0000h 0200h 2000
The LMS instruction adapts the coefficient
and performs the MAC for the filtering in the same cycle.
Storing the coefficient will require 1 additional cycle.
9 - 12

DSP54x - Algorithms 9-7


Module 9

LMS Adaptive Filter Code


...
... Pre-calculate 2Beta*e(n) ...
.asg
.asg AR3,
AR3, Coeffs
Coeffs AR3 points to Coefficient table ... a(n)
.asg AR4, AR4 points to Data table ... x(n)
.asg AR4, Data
Data

LD
LD B2e,
B2e, TT T holds the error step amount
LD #0,B Zero out B
LD #0,B Load Branch Repeat Counter
STM
STM #N-2,
#N-2, BRC
BRC Start RPTB, next two are delay slots
RPTBD
RPTBD End-1
End-1 A = error * oldest sample
MPY
MPY *Data
*Data +0%,
+0%, AA
LMS *Coeffs B += a(n)*x(n) ... filter tap
LMS *Coeffs ,, *Data
*Data A += (a(n) << 16)+215 ... coeff. update
Store updated coefficient
ST
ST A,
A, *Coeffs+
*Coeffs+ and form A = x(n-1)*2Beta*e(n)
||
|| MPY
MPY *Data+0%,
*Data+0%, AA B = accumulated filter output
LMS
LMS *Coeffs,
*Coeffs, *Data
*Data A = updated filter coefficients
End
End STH
STH A,
A, *Coeffs
*Coeffs Store the final updated coefficient
STH
STH B,
B, *Result
*Result Store final filter result
9 - 13

Architecture - LMS
MAC : FIR ALU : LMS
D C A D

MPY ALU
B
ADD MUX

Acc B Acc A

LMS *AR2+0%, *AR3+0%

9 - 14

9-8 DSP54x - Algorithms


Module 9

Advanced Applications

u FIRS Symmetrical FIR filter


u LMS Adaptive filtering
u POLY Polynomial evaluation
u STRCD Code book Search
SACCD
SRCCD
u DADST Viterbi algorithm
DSADT
CMPS
9 - 15

Polynomial Evaluation

Polynomial evaluation is commonly used in convolutional encoding.

The general form of a 3rd order polynomial equation can be written as:

P(x) = a3x3+ a2x2 + a1x + a0

The equation can be rewritten as:

P(x) = [(a3x+ a2)x + a1]x + a0

This process can be extended to any order polynomial

9 - 16

DSP54x - Algorithms 9-9


Module 9

POLY Operation
1. Set up a pointer to the coefficients.
2. Load x into the T register.
3. Load a3 into the high A accumulator (AH). Decrement pointer.
4. Load a2 into the high B accumulator (BH). Decrement pointer.
5. Repeat the following three times:
a. Multiply AH times T, accumulate with BH and round in AH.
b. Load the next coefficient into BH. Decrement pointer.
6. Store AH as result.
BG BH BL T Coefficients
a012
?x x a3
a2
AG AH AL aa11
ARn
a3
P(x) a00
?11
P(x) = AH = [(a3x+ a2) x+a1] x+a0 ?2
9 - 17

Polynomial Evaluation
SSBX FRCT
POLY operation is affected by these bits
SSBX OVM
SSBX SXM

LD *AR4+,T T=X(0)
LD *AR3+,16,A A=A(order)=PX init
LD *AR3+,16,B B=A(order-1)

RPT #2 3 times
POLY *AR3+ A=PX=Rnd
A=PX=Rnd(B+A*T)
(B+A*T) B=An<<16

STH A,*AR2+ PX=A>>16


|| LD *AR4+,T T=new x

A parallel load may be added to do iterative POLY operations with no penalty.

Note:
Note: The POLY instruction “expects” Q15 numbers!

9 - 18

9 - 10 DSP54x - Algorithms
Module 9

Architecture - POLY
MAC ALU
A T
T D

MPY ALU
B
ADD MUX

Acc A Acc B

POLY *AR3+
9 - 19

Advanced Applications

u FIRS Symmetrical FIR filter


u LMS Adaptive filtering
u POLY Polynomial evaluation
u STRCD Code book Search
SACCD
SRCCD
u DADST Viterbi algorithm
DSADT
CMPS
9 - 20

DSP54x - Algorithms 9 - 11
Module 9

Code Book Search


A code-excited linear predictive (CELP) speech coder is widely used for
applications requiring speech coding with a bit rate under 16K bps.bps. The
speech coder uses a vector quantization technique from codebooks to an
excitation signal. This excitation signal is then applied to a linear
predictive-coding (LPC) synthesis filter.

Input speech Weighting


Filter
p(n)
Codebook
0 +
1 Synthesis -
2 +
. Filter g(n)
. Gain
. Select
Codebook
Entry Mean-square error
minimization

9 - 21

Code Book Search


Obtaining the optimum code vector involves minimizing the mean-square error
generated from the weighted input speech and from the zero-input response of
the synthesis filter.

Optimum code vector localization

* p(n) is the weighted input speech


* gi(n) is the zero-input response of the
N-1 synthesis filter
Ei = Σ [ p(n) - βgi(n) ]2 * β is the gain of the codebook
i=0 * N is a subframe

Mean-Square Error
9 - 22

9 - 12 DSP54x - Algorithms
Module 9

Code Book Search


The cross-correlation , ci ,of p(n) and gi(n) is represented by :
N-1
ci = Σ gi * p(n)
i=0

The energy variable, Gi , is given by:


N-1
Gi = Σ gi 2
i=0

Minimize Ei by maximizing ci2/Gi . If a code vector with i = opt is optimal, the


following equation is met for any i. The codebook search routine evaluates this
equation for each code vector and finds the optimum one.

ci 2 c opt2
< or ci 2 * Gopt < c opt2 * Gi
Gi Gopt
9 - 23

Code Book Search


.mmregs
.text AR5 C(0)
CBS: STM #C, AR5 ...
STM #G, AR2 AR2 G(0)
STM #G-opt, AR3 ...
STM #I-opt, AR4
ST #0, *AR4 AR3 Gopt=1
Gopt=1
ST #1, *AR3+ Copt=0
ST #0, *AR3-
AR4 Iopt=0
Iopt=0
STM #N-1, BRC
RPTBD done
SQUR *AR5+, A A = C(i)^2
MPYA *AR3+ B = C(i)^2 * Gopt T = G(i)
MAS *AR2+, *AR3-, B B = C(i)^2 * Gopt - G(i) * Copt^2
SRCCD *AR4, BGEQ If (B >= 0) then BRC --> Iopt
STRCD *AR3+, BGEQ and T --> Gopt
SACCD A, *AR3-, BGEQ and A --> Copt^2
done: SQUR
NOP *AR5+, A
done: MPYA *AR3+
9 - 24

DSP54x - Algorithms 9 - 13
Module 9

Codebook Search Instructions


Store T register conditionally ...

STRCD Xmem,
Xmem, cond
Xmem = T if condition is true

Store Block Repeat Counter conditionally ..

SRCCD Xmem,
Xmem, cond
Xmem = BRC if condition is true

Store Accumulator conditionally ...

SACCD src,
src, Xmem,
Xmem, cond
Xmem = src << (ASM - 16) if condition is true

9 - 25

Advanced Applications

u FIRS Symmetrical FIR filter


u LMS Adaptive filtering
u POLY Polynomial evaluation
u STRCD Code book Search
SACCD
SRCCD
u DADST Viterbi algorithm
DSADT
CMPS
9 - 26

9 - 14 DSP54x - Algorithms
Module 9

Data Transmission

Fading
Multipath
XMIT RCV
Noise

Modulate Demodulate

0010110 ... 0010100 ...

u Digital source data is modulated to XMIT


u Signal is demodulated at RCV
u Noise acquired on RCV can cause data errors
u For greater reliability, EDAC technique is desired
9 - 27

Viterbi Encoder
G0
+
Bits
Input
bits Z-1 Z-1 Z-1 Z-1
G1
+
Bits
u N bits are fed into network.
u M (>N) bits flow out (... G0 G1 G0 G1 ...)
u e.g. 3 in : 4 out, 4 in : 8 out, etc.
u recognizable “holes” are created in data path, e.g.:
3 in : 4 out 4 in : 8 out
valid codes: 23 = 8 valid codes: 24 = 16
total codes: 24 = 16 total codes: 28 = 256
“holes” = 8 “holes” = 240

Receiver can use table of valid vs.


vs. invalid code to detect errors
9 - 28

DSP54x - Algorithms 9 - 15
Module 9

Viterbi Decoder Concept


u Receive data
u There are now 4 possible sets: 00, 01, 10, 11

state 0 0 state
n 1 1 n+1

u Traditional approach: keep only 1


Viterbi method: keep ‘best’ 2
u ‘Best’ is determined by maximum value along paths between states
u ‘Pruning’ less likely paths from consideration keeps N samples
of data from using 2n locations to just 2*n locations
u After ‘M’ samples are acquired, two best paths are compared
to table of valid/invalid codes (traceback)
u Invalid set is dropped, valid set is saved as received data
If both are valid, maximum likelihood set is saved
9 - 29

Viterbi Decoder

+M AH,AL J
2*J
-M
Old state New state
-M
2*J+1
+M BH,BL J+8

D-cod: LD *AR2,T T=M


DADST *AR5, A AH=(2*J)+M, AL=(2*J+1)-M
DSADT *AR5+, B BH=(2*J)-M, BL=(2*J+1)+M
CMPS A, *AR4+ (J)=max(AH,AL), etc
CMPS B, *AR3+ (J+8)=max(BH,BL), etc

Often, local distance is the same for consecutive butterflies,


so the benchmark approaches 4 cycles per butterfly.
9 - 30

9 - 16 DSP54x - Algorithms
Module 9

Viterbi Memory Map

u In one symbol time interval, 8 butterflies yield 16 new states


u This operation repeats over a number of symbol time intervals
u At the end of the sequence of time intervals, a back track routine as
performed to find the optimal path out of the 16 paths calculated
u This path represents the bit sequence to be decoded

AR5 0
Metrics Old states
2*J & 2*J+1 15
AR4 16
Metrics J
AR3 24 New states
Metrics J + 8
31
Relative location
9 - 31

Viterbi Instructions
CMPS src,
src, Smem
IF { [ src (31-16) ] > [ src (15-0) ] }

THEN : ELSE :
(src(31-16)) Ð Smem
(src(31-16)) (src(31-16)) Ð Smem
(src(31-16))
0 Ð TC 1 Ð TC
(TRN << 1 ) + 0 Ð TRN (TRN << 1 ) + 1 Ð TRN

DADST Lmem,dst Lmem ( 31-16 ) + (T) Ð dst (39-16)


Lmem ( 15 - 0 ) - (T) Ð dst (15 - 0)

DSADT Lmem,dst Lmem ( 31-16 ) - (T) Ð dst (39-16)


Lmem ( 15 - 0 ) + (T) Ð dst (15 - 0)

9 - 32

DSP54x - Algorithms 9 - 17
Module 9

Compare Select Store (CSS) Unit


DB [15:0]
CB [15:0]

u Dual 16-bit
T ALU operations
u T register input
ALU as dual
AH AL BH BL 16-bit operand
32 u 16-bit transition
C16=1 ALU shift register
(TRN)
MSB/LSB
u One cycle store
WRITE Max and Shift
SELECT

CSS UNIT
decision
COMP
16
TRN
TC EB [15:0]
9 - 33

Absolute and Square Distance


ABDST Xmem, Ymem : Absolute Distance
Xmem,

B += | AH |
AH = Xmem - Ymem

SQDST Xmem, Ymem : Square Distance


Xmem,

B += AH2
AH = Xmem - Ymem

9 - 34

9 - 18 DSP54x - Algorithms
Module 9

Review
u What instruction is used to perform adaptive filtering?
u What instructions are used to perform Viterbi decoding?
u What features of the C54x architecture allow the FIRS
instruction to execute in a single cycle?
What might slow it down?
u What features of the C54x architecture allow the Viterbi
operations to execute so quickly?
u What mnemonic is used for solving polynomials?
What concept allows this to run so quickly?

9 - 35

LAB9A .. Acoustic Echo Cancellation


Reference signal x(n)
ref.dat
ref.dat
Adaptive Filter
LMS update of Speaker
coefficients
bk
Microphone
Error signal e(n) y(n) -
Echo signal
error.dat
error.dat + z(n)
Near-end speech
echo.dat
echo.dat
and room noise

The goal of this adaptive filter is to create a replica of the


echo so that when the signal, y(n), is subtracted from the echo
signal, z(n), the result is zero.
9 - 36

DSP54x - Algorithms 9 - 19
Module 9

LAB9A .. Acoustic Echo Cancellation


u Assemble and link LAB9A
u Enter the simulator and invoke LAB9A.TAK.
LAB9A.TAK. The take file loads LAB9A and
connects the simulator to the input and output files.
u Run the program. Depending on the speed of the simulator it may take several
minutes to process all 2000 points. However, you can stop the simulator at any time
and proceed to the next step in this procedure.
u Exit the simulator
u Plot the ERROR.DAT file by typing: DRAWHEX ERROR.DAT.
ERROR.DAT. Note that the error
converges from a large amplitude to a small amplitude. This is the time it takes the
LMS filter to adapt. The remaining signal is ambient noise present in the room when
the data was collected. Analysis shows that the echo is attenuated by an average of
28dB.
u Determine the number of clock cycles required for the filter and LMS update.
u Compare this to the 4N clock cycles required by most DSP’s. N is the number of
taps of the adaptive filter.
u Try changing the number of the filter taps and Beta. What is the effect?

9 - 37

LAB9A .. Acoustic Echo Cancellation

The code for the LMS filter is in LAB9A.ASM.


LAB9A.ASM. The input x(n) is stored in a file
called REF.DAT,
REF.DAT , representing a 1 kHz sine wave sampled at 8 kHz. To create
the REF.DAT file, a signal was generated with a function generator, then
sampled with a 13 bit A/D converter. At the same time REF.DAT was
collected, the 1 kHz signal was sent to a speaker.

A microphone was used to collect the resulting echoes in a 10’ x 15’ room.

The echo signal z(I) was sampled in the same manner as the reference signal.
The echo signal is stored in a file called ECHO.DAT.
ECHO.DAT. Two thousand samples
or .25 seconds of data is stored in the files. LAB9A.ASM uses REF.DAT and
ECHO.DAT as inputs to the LMS filter. When the program is run, the
resulting error signal is stored in ERROR.DAT . The LMS filter length is set
for 16 taps. For a sampling rate of 8 kHz, a 16 tap filter can cancel up to
16/8000 = 2 msec of echo delay.

9 - 38

9 - 20 DSP54x - Algorithms
Module 9

LAB9B .. GSM Channel Coding


Transmitter LAB9B
53 189 378
class 1a bits bits bits
3 parity check Reorder class 1/2 rate, constraint GMSK
RPE-LPC bits on 50 class 1 bits and length 5 convol-
convol- modulation
s(t) speech encoder 1a bits 132 add 4 zero utional encoding on performed in
Interleaving
input 160 samples class 1b bits trailing bits class 1a and 1b bits RF codec
78 A5 encryption
output 260 bits bits and slot
class 2
formatting
RF
transmission
Voice activity detector

Receiver Channel

Comfort noise
78 class 2 bits
RF
reception
Equalization,
132 class 1b bits
slot dissembly,
dissembly,
RPE-LPC de-encryption
s’(t) speech decoder Bit Test parity and GMSK
input 260 bits 260 Reordering 50 check bits, 53 Viterbi decode 378
de-interleaving modulation
output 260 samples bits bits discard block bits 378 class 1 bits bits performed in
if check fails RF codec

GSM stands for Global Standard for Mobile communications. It is the digital
cellular standard used in Europe and throughout the world.
9 - 39

LAB9B .. GSM Channel Coding


G0
+
Bits
Input bits Z-1 Z-1 Z-1 Z-1
G1
+
Bits

u Examine the file LAB9B.ASM.


LAB9B.ASM. Note the special instructions used for
implementing the Viterbi butterfly.
u Assemble and link LAB9B.ASM.
LAB9B.ASM.
u Simulate LAB9B program by typing: take LAB9B.TAK
u Examine the input data array in the MEMORY window.

9 - 40

DSP54x - Algorithms 9 - 21
Module 9

LAB9B .. continued
u Encode the input data by running to the first break point. Break points were
set by the take file. The encoded data is in the MEMORY1 window. Simulate
transmission errors by making changes to the encoded data. Valid changes
are between -7 and 7. Note: the encoded data is in signed antipodal format,
the format that the GSM equalizer output would be in.
u Run the Viterbi decoder and compare the input data array to the output
data array. Is the output correct?

Convolutional encoder output (G0 and G1) Signed Antipodal format


1 -7
0 7

9 - 41

LAB9B .. GSM Channel Coding


Every 20 mS,
mS, 160 sampled values from the ADC are analyzed by the
Regular Pulse Excitation (RPE) Linear Predictive Coding (LPC) voice
encoder. The filter amounts to a model of the speaker’s vocal tract (pharynx,
teeth, tongue, etc.) and the excitation signal represents sounds (pitch,
loudness, etc.). Finding suitable filter coefficients and an excitation signal
yields an appropriate speech signal.

The real reduction in bit rate comes from further analyzing the
excitation signal. The difference between current and previous excitation
signals is found by using Long Term Predictive analysis (LTP). The LTP
algorithm searches all of the previous sequences (15 mS of history) for the
sequence that has the highest correlation to the current sequence. The
difference is transmitted along with a pointer to the sequence that should be
selected for use. The 160 samples are reduced to 260 bits. The resulting bit
rate is 13 Kbits / sec.

9 - 42

9 - 22 DSP54x - Algorithms
Module 9

LAB9C : Polynomial Evaluation


P(x) = a3x3+ a2x2 + a1x + a0 x = 3/4 = 0x6000
a0 = 1/8 = 0x1000
P(x) = [(a3x+ a2)x + a1]x + a0 a1 = 1/4 = 0x2000
a2 = 3/8 = 0x3000
P(x) = 94/128 = 0x5E a3 = 1/2 = 0x4000

u Examine (or create your own) LAB9C.ASM.


LAB9C.ASM.
u Assemble and link LAB9C.
LAB9C.
u Simulate LAB9C and observe the operation of the POLY instruction
by single stepping through the code until the “End” label.
“End”
u Verify that the code generates the expected result.
u Optional: Modify the inputs to generate new results.
u Note: The POLY instruction “expects” Q15 numbers!

9 - 43

Additional Resources

1.1.S.
S.M.
M.Redl,
Redl, M.
Redl, M.K.K.Weber,
Weber,M.
M.W.
W.Oliphant,
Oliphant, “An
Oliphant, “AnIntroduction
Introductionto
toGSM”,
GSM”,
Artech House, 1995.
Artech House, 1995.

2.2.H.
H.Hendrix,
Hendrix
Hendrix,, “A
“ABrief
BriefTutorial
Tutorialon
onGSM
GSMDecoding
DecodingTechniques”,
Techniques”,
TI Internal paper, 1995.
TI Internal paper, 1995.

3.3.H.
H.Hendrix,
Hendrix
Hendrix,, “Viterbi
“ViterbiDecoding
DecodingTechniques
Techniqueson
onthe
theTMS320C54x
TMS320C54xFamily”,
Family”,
TI Application Report, 1995.
TI Application Report, 1995.

9 - 44

DSP54x - Algorithms 9 - 23
Module 9

9 - 24 DSP54x - Algorithms
Interrupts

Learning Objectives
Objectives

u Describe the ‘C54x state upon reset.


u Identify interrupt sources.
u Identify the requirements for interrupt
recognition.
u Describe the sequence of events during
an interrupt.
u Build vector tables

10 - 2

DSP54x - Interrupts 10 - 1
10 - 2 DSP54x - Interrupts
Module 10

Module 10
Hardware Reset Actions
u All control signals are driven inactive high
u Address lines are driven to FF80h
u Data bus is driven to high impedance state
u Interrupts are disabled : 1 → INTM
u Prior interrupts are purged : 0 → IFR
u The repeat counter (RC) is cleared
u IACK- is driven low
u An internal reset is sent to the peripherals.
u Seven CLKOUT cycles after RS- is released
the processor will fetch from 0FF80h

10 - 3

Processor Status on Reset


Math Misc Memory
0 → OVA 0 → BRAF 0 → OVLY
0 → OVB 0 → DP 0 → DROM
0 → OVM 0 → CPL ? → MP/MC-
1→C 0 → CMPT 1FFh → IPTR
0 → C16 0 → ARP
Pins
0 → ASM 1 → INTM
1 → XF
0 → FRCT
1 → CLKEN
1 → SXM
0 → AVIS
0 → HM

10 - 4

DSP54x - Interrupts 10 - 3
Module 10

Interrupt Locations
Interrupt Offset (Hex) Description
RS 0 Reset
NMI 4 Nonmaskable Interrupt
SINT17-30 8-3C Software Interrupt 17-30
INT0 40 External User Interrupt #0
INT1 44 External User Interrupt #1
INT2 48 External User Interrupt #2
TINT 4C Internal Timer Interrupt
RINT0 50 Serial Port 0 Receive Interrupt
XINT0 54 Serial Port 0 Transmit Interrupt
RINT1 58 Serial Port 1 Receive Interrupt
XINT1 5C Serial Port 1 Transmit Interrupt
INT3 60 External User Interrupt #3
64-7F Reserved
10 - 5

Interrupt Management
15 – 9 8 7 6 5 4 3 2 1 0
IFR Reserved
Reserved INT3
INT3 XINT1
XINT1RINT1
RINT1XINT0
XINT0RINT0
RINT0 TINT
TINT INT2
INT2 INT1
INT1 INT0
INT0

15 – 9 8 7 6 5 4 3 2 1 0
IMR Reserved
Reserved INT3
INT3 XINT1
XINT1RINT1
RINT1XINT0
XINT0RINT0
RINT0 TINT
TINT INT2
INT2 INT1
INT1 INT0
INT0

11

ST1 INTM
INTM

Master Enable : RSBX INTM


Master Inhibit : SSBX INTM
Set IMR Bits : ST #102h,*(IMR)
Modify IMR : ORM #40h, *(IMR)
Clear IFR Bit : ST #1, *(IFR)
10 - 6

10 - 4 DSP54x - Interrupts
Module 10

Recognition of Interrupts
2
INT
INThigh
high22cycles
cycles
INT
INTlow
low33cycles
cycles
3
IFR
IFRBit
BitLatched
Latched

IMR
IMRBit
Bit==1?
1?

INTM
INTMBit
Bit==0?
0?

Interrupt
InterruptBegins
Begins

10 - 7

Post Interrupt Hardware Sequence

CPU
CPU Action
Action Description
Description
11→
→INTM
INTM Disable
Disableglobal
globalinterrupts
interrupts

PC→
PC → ----*(SP)
*(SP) Push
PushPC
PConto
ontopredecremented
predecrementedstack
stack

Vector(n)→
Vector(n) →PC
PC Load
LoadPC
PCwith
withint.
int. vector
int. vector“n”
“n”address
address
00→
→IACK
IACKpin
pin IACK
IACKsignal
signalgoes
goeslow
low
00→
→IFR
IFR(n)
(n) Clear
Clearcorresponding
correspondinginterrupt
interruptflag
flagbit
bit

10 - 8

DSP54x - Interrupts 10 - 5
Module 10

IACK Decoder

‘138
‘C54x (3 – 8 DeMux)
DeMux)

Addr 6 5 4 3 2 1 0 A5 C
A3 B Y0 IACK0
INT0 1 0 0 0 0 0 0
A2 A
INT1 1 0 0 0 1 0 0 Y1 IACK1
INT2 1 0 0 1 0 0 0 Y2 IACK2
INT3 1 1 0 0 0 0 0 A6 G1 Y4 IACK3
IACK - G2 -

Note: For internal vectors set AVIS = 1


10 - 9

Context Save & Restore Instructions

Instruction
Instruction Description
Description
Push
PushMMRMMRonto
ontoStack
Stack
PSHM
PSHM mmr
mmr - 1 →
SP - 1 → SP
SP SP
Pop
Popfrom
fromStack
Stackto
toMMR
MMR
POPM
POPM mmr
mmr + 1 →
SP + 1 → SP
SP SP
Push
PushData
Datamemory
memoryvalue
valueonto
ontoStack
Stack
PSHD
PSHD Smem
Smem - 1 →
SP - 1 → SP
SP SP
Pop
Poptop
topof
ofStack
Stackto
toData
Datamemory
memory
POPD
POPD Smem
Smem + 1 →
SP + 1 → SP
SP SP
Modify
ModifyStack
StackPointer
Pointer
FRAME
FRAME K
K SP
SP++K K→→SP SP

10 - 10

10 - 6 DSP54x - Interrupts
Module 10

Context Save
.ref ISR1
.sect “.vectors”
... ...
INT1: BD ISR1
PSHM ST0
PSHM ST1

.mmregs
.def ISR1
.text
ISR1: PSHM AL
PSHM AH
PSHM AG
PSHM AR1
PSHM IMR
PSHM PMST
; ISR FOLLOWS...
10 - 11

Context Restore

; ISR CONCLUDES...

; Context Restore:

POPM PMST
POPM IMR
POPM AR1
POPM AG
POPM AH
POPM AL
POPM ST1
POPM ST0
RETF
10 - 12

DSP54x - Interrupts 10 - 7
Module 10

Return Instructions

Instruction
Instruction Actions
Actions Cycles
Cycles
RET
RET 55
RET[D]
RET[D] *(SP)---- →
*(SP) → PC
PC RETD 33
RETD

RETE[D] *(SP)---- →
*(SP) → PC
PC RETE
RETE 55
RETE[D] 00 →→ INTM - RETED
INTM - RETED 33

RETF →
RETF → PC
PC RETF 33
00 → RETF
RETF[D]
RETF[D] → INTM
INTM-- RETFD 11
*(SP) RETFD
*(SP)----

10 - 13

Nested Interrupts

PSHM IMR Save IMR


STM #5,IMR Enable only Interrupts 0 and 2
RSBX INTM Enable Interrupts INTM=0

; Nestable ISR . . .

SSBX INTM Disable Interrupts INTM=1


POPM IMR Restore IMR value
RETE

10 - 14

10 - 8 DSP54x - Interrupts
Module 10

Vector Table Structure


.sect “.vectors”
RSV: BD Reset
STM #STK+LEN,SP
NMV: Put NMI
routine here ...

.loop 40h-$
RETE
.endloop
IV1: BD ISR1
PSHM ST0
PSHM ST1
IV2: BD ISR2
PSHM ST0
PSHM ST1
...

10 - 15

Filling Empty Vectors

Standard Vector Unused Vector


IVn:
IVn: BD ISRn IVn:
IVn: NOP
PSHM ST0 NOP
PSHM ST1 NOP
NOP

Unused Vector - Debug Unused Vector - Production


IVn:
IVn: BD IVn IVn:
IVn: XORM #10b,*(IMR)
NOP RETE
NOP

10 - 16

DSP54x - Interrupts 10 - 9
Module 10

NMI Interrupt

u Supersedes all regular activity.


u Can serve as ultra-high priority
interrupt event.
u Ignores state of INTM and IMR.
u Sets INTM=1.
À Cannot supercede RPT or not RDY.
À Can slow time-critical interrupt response.
À Can interrupt itself.
À Can lead to ambiguous return state.

10 - 17

Using Address for NMI Return Status

Put main and ISR code in separate areas


Main

Ints_Off
Ints_Off
ISRs

Vectors
When about to return from NMI:

POPM AL Get Return Address


PSHM AL Put Return Address Back
SUB #Ints_Off,A
Ints_Off,A Is Return Address in ISR region?
RETC GEQ If yes, return w/o clearing INTM
RETE Else clear INTM (allow interrupts)

10 - 18

10 - 10 DSP54x - Interrupts
Module 10

Using Flag for NMI Return Status

VecN:
VecN: BD IsrN Jump to ISR in 2 words
SSBX TC Set flag (non Interruptible)
NOP Room for 1 more...
IsrN: …

RETED Return from ISR in 2 cycles
RSBX TC Clear flag
NOP Last word...

NMI: … NMI ISR code starts here



RETC TC If TC=1 ret to ISR w INTM=1
RETE Else allow interrupts and return
10 - 19

Fast Interrupts
Allows 3-cycle ISR, e.g.:

RINT0: NOP
RETFD
MVKD DRR0,*AR7+%

u Only 2 words of code may follow RETFD.


u One word may precede the RETFD
u Creates “unsupervised” action.

10 - 20

DSP54x - Interrupts 10 - 11
Module 10

INTR and TRAP Instruction


[Label] INTR k ;0 ≤ k ≤ 31
[Label] TRAP k ;0 ≤ k ≤ 31 kk Interrupt
Interrupt Offset
Offset
16
16 INT0
INT0 40h
40h
INTR = TRAP + 1 -> INTM 17 INT1 44h
17 INT1 44h
18
18 INT2
INT2 48h
48h
kk Interrupt
Interrupt Offset
Offset 19
19 TINT
TINT 4Ch
4Ch
00 RS
RS 0h
0h 20 RINT 50h
20 RINT 50h
11 NMI
NMI 4h
4h 21 XINT 54h
21 XINT 54h
22 SINT17
SINT17 8h
8h 22 RINT1 58h
22 RINT1 58h
33 SINT18
SINT18 Ch
Ch 23 XINT1 5Ch
23 XINT1 5Ch
44 SINT19
SINT19 10h
10h 24 INT3 60h
24 INT3 60h
55 SINT20
SINT20 14h
14h 25 Reserved 64h
25 Reserved 64h
.. .. .. .. .. ..
.. .. .. .. .. ..
.. .. .. .. .. ..
15
15 SINT30
SINT30 3Ch
3Ch 31 Reserved 7Ch
31 Reserved 7Ch
10 - 21

RESET Instruction

Math Memory Pins Misc


0 → OVA 0 → OVLY 1 → XF 0 → OVM
0 → OVB 0 → DROM 1 → CLKEN 0 → BRAF
0 → OVM ? → MP/MC- 0 → AVIS 0 → DP
1→C 0 → HM 0 → CPL
0 → C16 0 → CMPT
(IPTR)<<7 → PC
0 → ASM 0 → ARP
0 → IFR
0 → FRCT 1 → INTM
0 → CLOCK OFF (C541)
1 → SXM

10 - 22

10 - 12 DSP54x - Interrupts
Module 10

Interrupt Vector Address


15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

CLKOFF
Reserved
Reserved
MP/MC-

CLKOFF
Reserved
MP/MC-

Reserved
DROM
OVLY

DROM
AVIS
OVLY
AVIS
PMST 11 11 11 11 11 11 11 11 11

IPTR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Interrupt
Vector 11 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00
Address
Reset

00 00 00 00 00

10 - 23

Timer Block Diagram

CLKOUT PSC
PSC (4)
(4) TIM
TIM (16)
(16) TINT

TDDR
TDDR (4)
(4) PRD
PRD (16)
(16)

1
TINT rate =
TCLK1 x ( TDDR+1 ) x ( PRD+1 )

10 - 24

DSP54x - Interrupts 10 - 13
Module 10

Timer Control Register


15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

TRB
TSS
Reserved
Reserved PSC
PSC TDDR
TDDR

enable start / stop

PSC Timer prescaler counter


TDDR Timer divide down ratio
TSS 1 = stop timer, 0 = timer run
TRB 1 = load TIM from PRD

10 - 25

Lab 10 - Interrupt Driven Event


VECTORS.ASM LAB9_T.ASM
RS TIMER_INIT
RET
TINT
1
LAB9_M.ASM 2 LAB9_S.ASM
B
START:
SINE_INIT
CALL TIMER_INIT 3 RET
A CALL SINE_INIT
CONTEXT SAVE
ENABLE INTS SINE_ISR
MAIN: OUT TO PORT0
CONTEXT RESTORE
A = A + 1 D RET
LOOP C

MODIFY OUT.DAT
LAB9.CMD
10 - 26

10 - 14 DSP54x - Interrupts
Module 10

Lab Procedure

u Write code and get files talking

u Verify that interrupts are working

u Get sine wave working (watch DP)

u Perform/verify full context save/restore

10 - 27

Review

u What are the interrupt sources?


u How do you poll for interrupts?
u What must you set up to respond to an
interrupt?
u What conditions affect interrupt latency?

10 - 28

DSP54x - Interrupts 10 - 15
Module 10

10 - 16 DSP54x - Interrupts
Hardware Interfacing

Learning Objectives
Objectives

u Describe the purpose of each interface pin.


u Connect the ‘C54x to various memory and
peripheral devices.
u Identify the key timing for external reads
and writes.
u Implement software wait states.

11 - 2

DSP54x - Hardware Interfacing 11 - 1


11 - 2 DSP54x - Hardware Interfacing
Module 11

Module 11
Interfacing Memory and Peripherals
TMS320C54x
PGM
ADDRESS A (8–15)
16 DATA
PS CS1 (0–7)
CS2 OE GND

A DATA
MSTRB CS2 DATA
D
R/W WE GND
DS CS1 OE
A I/O
CS1 GND
IS OE
WE DATA
IOSTRB CS2
16
DATA
DATA
11 - 3

Read Timing

0 cycle time 25 ns
1
address # Cycle Time 25 15
1 Setup Address 5 3
3 2 2 Data Valid 5 3
data 3 Memory Speed 15 9
1
MSTRB -

Notes: 1. Address timing also includes the PS, DS, IS and MSTRB signals.
2. All times are in nanoseconds.
3. H = one-half CLOCKOUT1 cycle time.
4. MSTRB stays low across reads

11 - 4

DSP54x - Hardware Interfacing 11 - 3


Module 11

Bus Collision Avoidance


u Only one external memory - No collision possible
u Multiple external memories
u Sequential reads within one memory - MS address lines don’t change
Only one memory will respond - No bus collision possible
u Sequential reads across memories - MS address lines change
u Multiple devices may respond - Can yield early data collisions
or: new device may not turn on in time - exp.
exp. if de-mux
de-mux is
used to feed CS-
u Won’t corrupt read data, since 54x reads only at end of cycle
u Won’t damage memory, since event is brief
u Can yield noise, wastes power
u Solutions
u If noise/power not a concern : no problem
u Use faster memory (higher cost, power, etc)
u Add wait state only when reading across devices (how?)
11 - 5

BSCR: Bank Switch Control Register


15 12 11 10 0
PS-
BNKCMP
BNKCMP DS res.
res.
res.

BNKCMP value MSBs compared Bank Size


0 0 0 0 None 64K
1 0 0 0 15 32K
0 1 0 0 15 - 14 16K
0 0 1 0 15 - 13 8K
0 0 0 1 15 - 12 4K

Note: Use only specified values of BNKCMP.

Bit 11: If PS-DS is set, add 1 wait state when access changes
between PS and DS.
11 - 6

11 - 4 DSP54x - Hardware Interfacing


Module 11

Interface Comparison: 2 vs 4 Phase

Device Phases I.Rate Sram Ratio


C10 4 200 75 38%
C25 4 100 35 35%
C50 2 50 32 64%
C54x 2 25 15 60%

Four phase systems allow 1/3 cycle for memory


while two phase approach offers @ 2/3 cycle.

Memory: 4
Cost vs Speed

11 - 7

Memory Interface Protocols


u Four phase memory interface is an industry standard,
as shown in the diagram below:
— A D —
address
data

strobe

u The first phase is a strobe off time to allow the


address to stabilize
u Phase two is strobe on,
on, with valid address
u Phase three is for sending/receiving data
u In the fourth phase the strobe goes off to latch data, allow
data hold time, and to relinquish the bus before the next
memory cycle
11 - 8

DSP54x - Hardware Interfacing 11 - 5


Module 11

Memory Interface Protocols


u Read cycles do not require the dead phases, though.

— A D —
address
data

strobe

u Since the 54x is the only device ‘listening’ on the bus,


it can latch data near the end of “D” time and not be
confused by any spurious early data, nor require an
explicit data strobe signal.
u By eliminating the ‘dead’ phases, the 54x is able to offer
a much larger time window to external memory, as seen
previously

11 - 9

Memory Interface Protocols


u Write cycles do require the dead phases, however.

— A D —
address
data

strobe

u Since here the external memory is ‘listening’ the protocol


must guard against attempting a write before the address
is stabilized (the first ‘off’ time) and provide a data latch
and hold time (the last ‘off’ time)
u Therefore, writes must have a four phase protocol
u These two ‘off’ phases must be implemented with an
extra CPU cycle each

11 - 10

11 - 6 DSP54x - Hardware Interfacing


Module 11

External Interface Write Timing


R — W —
5
ADDR read address write address
1
15 read
DATA data write data
5 5 2
MSTRB 2H–5 2H–5 3
5 3
R/W 4H–2

Cycle Time→
CycleTime → 25
25 20
20
11 Address valid to MSTRB (min)
Address valid to MSTRB (min) 2H-5 = 20 15
2H-5 = 20 15
22 Data
Datavalid
validbefore
before MSTRB
MSTRB(min)
(min)(setup
(setuptime)
time) 2H-5
2H-5==2020 15
15
33 Data
Data valid after MSTRB (min) (holdtime)
valid after MSTRB (min) (hold time) H-5 = 8
H-5 = 8 55
Notes: 1. All times are in nanoseconds.
2. H = one-half CLOCKOUT1 cycle time.
11 - 11

Three Cycle Write Overhead


u The two phase memory interface reduces memory cost,
but yields a three cycle write.
u How much performance is lost as a result?
u Consider the usual activity of DSP: Sum of products.

Order Data Coeffs Code Writes Cycles Ops A.B.R.


20 20 20 10 1 53 51 1.04
50 50 50 10 1 113 111 1.02
100 100 100 10 1 213 211 1.01

u In these examples we can see that the overhead due to


external writes is small for normal DSP functions
u Can the overhead be reduced further if necessary?

11 - 12

DSP54x - Hardware Interfacing 11 - 7


Module 11

Write Timing Details

u Single write requires three cycles.


u Chained writes use 2N+1 cycles.
u Single write is only one CPU cycle.
u Internal write is one cycle.
Note : writing an array of internal memory to external
memory is full speed:
speed: while external bus is dead,
CPU reads next datum from internal memory - thus
N reads and N writes take ~ 2*N cycles
11 - 13

IO Memory Timing
0 12 25 37 50
CLK OUT

5
A(15-0) , R/W-

27 5
Read

7 27 5
Write

5 5 2

IO STRB -

11 - 14

11 - 8 DSP54x - Hardware Interfacing


Module 11

Software Wait States


SWWSR: Software Wait State Register. Data Addr:
Addr: 0028h

RR I/O
I/O Hi
HiData
Data Low
LowData
Data Hi
HiProg
Prog Low
LowProg
Prog

u 3 bit fields = 0 to 7 software wait states (SW-WS)


u On reset, all P/D memory is 7 WS (SWWSR = 7FFFh).
u On last SW-WS: MSC- will go LOW for 1 cycle.

Lo Prog Lo Data
I/O Mem

Hi Prog Hi Data

11 - 15

Hardware Wait States


u Software wait states may not be sufficient for all systems.
u Therefore hardware wait states may be used when:
u More than 7 wait states are required
u More than 2 speeds of memory exist in a map
u Variable wait-states exist
u Hardware wait-states are not considered for 0 and 1 SWWS areas
u For 2-7 SWWS areas, the MSC- (Micro State Complete) pin falls
at the end of the last SWWS. Therefore, this signal may be used to
indicate n cycles have already transpired, upon which external delay
may be added, if required.
u Hardware wait is completed by a high signal input into the READY pin
u READY is ignored for 0 and 1 SWWS areas
u READY is sampled on falling CLOCKOUT1 (mid-cycle)
u READY is not sampled before MSC- falls

11 - 16

DSP54x - Hardware Interfacing 11 - 9


Module 11

Mixed Wait-State Example


R I/O Hi Data Low Data Hi Prog Low Prog
SWWSR
x 4 1 1 7 0

TMS320C54x A Lo Pgm
25 nS PS - 15
DATA
ADDRESS MSTRB - OR
16 CS
A15 -
PS
MSTRB 15 nS SRAM

A Hi Pgm
MSC PS - 15
DATA
MSTRB - OR CS D
CLOCKOUT1
A15 +

READY FF - OR
200nS EPROM
DATA
DATA
11 - 17

Memory Timing Summary

u All internal accesses are single cycle.


u Internal DARAM action may be two accesses per cycle.
u External-read timing is biased for read access times:
À 15 ns for 25-ns
25-ns devices
u Write-cycle timing (cost/performance tradeoffs):
À 3 cycles for a single write
À 2 cycles per write with multiple writes
À 1 CPU cycle to initiate bus cycle(s)
u Software-generated wait states allow for slower memories.

11 - 18

11 - 10 DSP54x - Hardware Interfacing


Module 11

Review Questions

u What signals select program, data or I/O?


u How many wait states can the software wait-
state generator assign?
u What size boundaries are software wait states
assigned in program, data and I/O?
u What is the advantage of slower write cycles?

11 - 19

Lab 11 — Hardware Interface


I/O Hi Data Low Data Hi Prog Low Prog
SWWSR

CS1 PROG
PROG
STM #________, SWWSR OE 8K EPROM
8K EPROM D
CS2
A 70ns
70ns
TMS320C54x-40
16 CS1 DATA
DATA
ADDRESS OE
PS WE 8K
8K SRAM
SRAM D
DS CS2 15
15ns
ns
IS A
MSTRB
CS1 11ADC
ADC
R/ W
OE &
&
MP/ MC 11DAC D
WE DAC
IOSTRB
CS2 120
DATA A 120ns
ns
16
11 - 20

DSP54x - Hardware Interfacing 11 - 11


Module 11

11 - 12 DSP54x - Hardware Interfacing


Ports

Learning Objectives
Learning Objectives

u List the available types of ports


u Demonstrate how to initialize each
port to a given status
u Demonstrate how to connect each
port to external devices
u Write code to send & receive data
to a given port
u Describe when & how a given port
is best utilized 12 - 2

DSP54x - Ports 12 - 1
12 - 2 DSP54x - Ports
Module 12

Module 12
TMS32054x Ports

u
u Standard
Standard Serial
Serial Port
Port

u Buffered Serial Port (BSP)

u TDM Serial Port

u Host Port Interface (HPI)

12 - 3

SP Pins and Signals


Transmit Receive
CLKX CLKR
FSX FSR
DX DR

Clock
Frame
Data

12 - 4

DSP54x - Ports 12 - 3
Module 12

Dual 54x Serial Port Interconnect

CLKX CLKR
FSX FSR
54x DX DR 54x
CLKR CLKX
#1 #2
FSR FSX
DR DX

12 - 5

Serial Port Diagram

RINT CPU XINT

Data Bus

DRR SPC DXR


Control Logic

RSR XSR

DR FSR CLKR CLKX FSX DX

12 - 6

12 - 4 DSP54x - Ports
Module 12

Serial Transmit Timing Example

CLKX

FSX

DX D7 D6 D5 D4 D3 D2 D1 D0 E7

XINT

D È DXR E È DXR
D È XSR E È XSR

12 - 7

Maximum Data Rate Example

CLKX

FSX

DX C0 D7 D6 D5 D4 D3 D2 D1 D0 E7 E6 E5 E4

XINT

D È XSR E È DXR E È XSR F È DXR

12 - 8

DSP54x - Ports 12 - 5
Module 12

Frame Types - Burst/Continuous

Data 1 2 3

Burst

Continuous

FSM = 1 : Burst
FSM = 0 : Continuous
12 - 9

Serial Port Control Register


BIT NAME Function 0 State 1 State RS
0 N/A Std / enhance mode Std Enh 0
1 DLB Digital Loopback Run Test 0
2 FO Format 16 b. 8 b. 0
3 FSM Frame Synch Mode Cont.
Cont. Burst 0
4 MCM Master Clock Mode Ext’l I.Rate/4 0
5 TXM Transmit Mode Follow Lead 0
6 XRST - Transmit Reset Reset Run 0
7 RRST - Receive Reset Reset Run 0
8 IN0 Input value on CLKR In.Val=0 In.Val=1 x
9 IN1 Input value on CLKX In.Val=0 In.Val=1 x
10 RRDY Rcv Ready = RINT No Data Ready 0
11 XRDY Xmit Ready = XINT DXR Full Ready 1
12 XSREMPTY - Xmit Shft Reg Empty OK Error 0
13 RSRFULL Rcv Shft Reg Full OK Error 0
14 FREE Free Run on Break Halt FreeRun 0
15 SOFT Soft stop on Break Hard Soft 0
12 - 10

12 - 6 DSP54x - Ports
Module 12

Serial Port Exercise


u Initialize the ‘541 XMIT port 0 as follows :
u Frame synch is internal and burst mode
u Word size is 16 bit
SSBX INTM Disable Ints
u Run at the fastest possible speed.
ORM #______,IMR Enable Xmit Interrupt (XINT)
STM #______,SPC Halt SP & Conf SP Ctl reg
STM #______,IFR Clear any old Xmit ints (XINT)
ORM #______,SPC Start Xmit process
RSBX INTM Enable ints
12 - 11

Serial Port Exercise


u Initialize the ‘541 XMIT port 0 as follows :
u Frame synch is internal and burst mode
u Word size is 16 bit
u Run INTM
SSBX at the fastest possible speed.
Disable Ints
ORM #0020h ,IMR Enable Xmit Interrupt (XINT)
STM #00BCh ,SPC Halt SP & Conf SP Ctl reg
STM #0020h ,IFR Clear any old Xmit ints (XINT)
ORM #0040h ,SPC Start Xmit process
RSBX INTM Enable ints
12 - 12

DSP54x - Ports 12 - 7
Module 12

Serial Port Caveats


u On reset DX=HiZ, all other pins are inputs
u When initializing SP, use two writes:
u One to halt the SP & set desired modes
u Second to take SP out of reset
u In Continuous mode XMIT stops if no new data is present
u XMIT restarts when new data is written to DXR
u FSX is asserted to indicate new packet initiated
u After last bit sent, DX = HiZ
u Allows other devices to share bus
u Should add pullup to avoid chatter
u For self-test (DLB)
u Use TXM = 1
u If MCM = 1, CLKX -> CLKR
u if MCM = 0, CLKR -> CLKX
u For Low Power : clear MCM, XRST, RRST
12 - 13

TMS32054x Ports

u Standard Serial Port

u Buffered Serial Port (BSP)

u TDM Serial Port

u Host Port Interface (HPI)

12 - 14

12 - 8 DSP54x - Ports
Module 12

Buffered Serial Port


DMEM
0

ABU
000 800
R-Ping
BKR ARR
11 R-Pong RINT

X-Ping
BKX AXR
11 X-Pong XINT

800 1000
SP CPU
DR RINT FFFF
DRR RCV-ISR
XINT
DX DXR XMT-ISR DBUS
12 - 15

Serial Port Control Expansion Register


BIT NAME Function 0 State 1 State RS
0-4 CLKDV Clock Divisor 1:1 1 : n+1 3
5 FSP Frame Sync Polarity Active Hi Active Lo 0
6 CLKP Clk Polarity : XMIT on Rising Falling 0
7 FE Format Extension 16/8 10/12 0
8 FIG Frame Ignore See 2nd No 2nd 0
9 PCM Pules Code Mod’n Normal neg=
neg=HiZ 0
10 BXE Buffer XMIT Enable Std SP BSP on 0
11 XH XMIT Half in 2nd in 1st 0
12 HALTX Halt XMIT Continue Halt on Int 0
13 BRE Buffer RCV Enable Std SP BSP on 0
14 RH RCV Half in 2nd in 1st 0
15 HALTR Halt RCV Continue Halt on Int 0

12 - 16

DSP54x - Ports 12 - 9
Module 12

Buffered Serial Port Exercise


Initialize the Serial Port to Transmit using:
FXM = Burst TXM = Ext’l Polarities = 0
Clock = Int’l Rate = Max. Format =10bit
PCM = off. ABU(X) = on Halt = off
Rcv.Locn = 800h Array size = 100h

____ ____ Disable interrupts


____ #____h , ____ Work on MMRs
____ #____h , ____ Enable XMIT Int (XINT)
____ #____h , ____ Config. SPC (XRST=0)
____ #____h , ____ Config. SPCE (ABU on)
____ #____h , ____ Init AXR to start of buffers
____ #____h , ____ Init buffer size
____ #____h , ____ Clear any old XINT
____ #____h , ____ Start SPI XMIT
____ ____ Enable interrupts

12 - 17

Buffered Serial Port Solution


Initialize the Serial Port to Transmit using:
FXM = Burst TXM = Ext’l Polarities = 0
Clock = Int’l Rate = Max. Format =10bit
PCM = off. ABU(X) = on Halt = off
Rcv.Locn = 800h Array size = 100h

SSBX INTM Disable interrupts


LD #0000h , DP Work on MMRs
ORM #0020h , IMR Enable XMIT Int (XINT)
STM #0098h , BSPC Config. SPC (XRST=0)
STM #0480h , BSPCE Config. SPCE (ABU on)
STM #0800h , AXR Init AXR to start of buffers
STM #0200h , BKX Init buffer size
ORM #0020h , IFR Clear any old XINT
ORM #0040h , BSPC Start SPI XMIT
RSBX INTM Enable interrupts
12 - 18

12 - 10 DSP54x - Ports
Module 12

Interrupt Service Routine for BSP


Respond to RINT of Buffered Serial Port
with iteration counter

RINT .usect
cntr LD #0000h
“SPRAM”,
, DP
1 Work on one
Allocate MMRs SPRAM
.text
BIT 1 , BSPCE Extract RH: ping or pong?
...
ANDM ...
#0000h , *(cntr) Do (atcounter
Clear least) two
location
other words...
XC
... 2
... , TC If in ping,
Other code...
reset AR7 to top of ping
RINT LD
STM #0000h
#0800h , DP
AR7 Config.
Work onSPCE
MMRs (ABU on)
...
BIT ...
1 , BSPCE Process RH:
Extract current
pingarray
or pong?
...
XC
CMPM 2
#count , TC
cntr If in
Is counter
ping, at
reset
nextAR7
to last
to top
iteration?
of ping
STM
XC 2
#0800h , TC
AR7 Config.
If in ping,
SPCE
reset(ABU
AR7 toon)top of ping
ADDM
STM #0001h ,
#0800h cntr
AR7 Increment
Config. SPCE
counter
(ABU on)
CMPM
ADDM #count ,
#0001h cntr Is counter at
Increment counter
next to last iteration?
XC 2 , TC If so, tell ABU to
ORM #8000h , BSPCE Stop after next RCV of next array
... ... Process current array . . .
12 - 19

ABU Caveats
u BKX, BKR range : min = 2, max = 2047 (not
(not 2048)

u Buffers must be aligned on 2N > BK boundary

u Odd size arrays have ping = pong+1

u AXR, ARR, BKX, BKR are all 11- bit registers

u If AXR and ARR are not initialized to base of ‘ping’

arrays 1st data set will be incomplete

u Xmit & Rcv arrays can overlap to extend array size


12 - 20

DSP54x - Ports 12 - 11
Module 12

Using ABU with Overlapping Arrays

Input : 1A 2B 3A 4B 5A 6B

Process : 1A 2B 3A 4B 5A

Output : 1A 2B 3A 4B ...

u Initiate ABU to receive Array1 in ‘ping’


u RHALT should not be necessary
On RINT begin to process Array1
When finished initiate ABU to send Array1, Wait for RINT
On RINT begin to process Array2 in ‘pong’
ABU RCV will return to ‘ping’, but will not pass ABU XMIT if rates are equal
ABU Xmit may surpass CPU, so XHALT is recommended

12 - 21

TMS32054x Ports

u Standard Serial Port

u Buffered Serial Port (BSP)

u TDM Serial Port

u Host Port Interface (HPI)

12 - 22

12 - 12 DSP54x - Ports
Module 12

TDM Serial Port


TDM
7 0
6 1

5 2
4 3

u Eight or more devices may share the bus.


u Any ‘54x may own multiple slices and/or listener
IDs.
u During its slice, a ‘54x may talk to any combination
of listeners.
u May also be used as regular serial port.
12 - 23

TDM Four-Wire Bus

Device
Device00 Device
Device11 ... Device
Device77
TCLK
TFRM
TDAT
TADD

TCLKX TCLK
TCLKR
TFSX TFRM
TMS320C54x
TDX TDAT
TDR
TFSR TADD

12 - 24

DSP54x - Ports 12 - 13
Module 12

TDM Signals

TCLK ...

TFRM ...

TDAT ...
bit 1 7 bit 0 7 bit 15 0 bit 14 0 bit 13 0 bit 12 0

TADD ...
a7 a6 a5 a4

u Any one ’C5x generates the clock and frame signals.


u Data to transmit and listener address are generated
by the ’C5x which "owns" the current signal.
u All ’C5x’s capture address and data bits. TDM RCV interrupt
is generated if routing list includes device’s ID.
12 - 25

TDM Port Registers


routing list for
bit TRCV
TRCV TDXR
TDXR TSPC
TSPC TCSR
TCSR TRTA
TRTA TRAD
TRAD my xmit data
15 15
15 15
15 res
res xx ta7
ta7 xx
14 res
res xx ta6
ta6 xx
13 rsrfull
rsrfull xx ta5
ta5 x2
x2
12 xsrempty
xsrempty xx ta4
ta4 x1
x1 current time
11 xrdy
xrdy xx ta3
ta3 x0
x0
10 rrdy
rrdy xx ta2
ta2 s2
s2 time when
9 in1
in1 xx ta1
ta1 s1
s1 last msg for
8 receive
receive transmit
transmit in0
in0 xx ta0
ta0 s0
s0 me came in
7 data
data data
data rrst
rrst ch7
ch7 ra7
ra7 a7
a7
6 xrst
xrst ch6
ch6 ra6
ra6 a6
a6
5 txm
txm ch5
ch5 ra5
ra5 a5
a5 any ra#
ra# + a#
4 mcm
mcm ch4
ch4 ra4
ra4 a4
a4 both = 1
3 fsm
fsm ch3
ch3 ra3
ra3 a3
a3 means “msg
“msg
2 fo
fo ch2
ch2 ra2
ra2 a2
a2 for me”
1 dlb
dlb ch1
ch1 ra1
ra1 a1
a1
0 00 00 tdm
tdm ch0
ch0 ra0
ra0 a0
a0
current
my time(s) to talk my listener ID routing list
12 - 26

12 - 14 DSP54x - Ports
Module 12

TMS32054x Ports

u Standard Serial Port

u Buffered Serial Port (BSP)

u TDM Serial Port

u Host Port Interface (HPI)

12 - 27

HPI Concept

HOST ’54x MMRs 0000h


HPIC
CONTROL Bk 0
HPI 0800h
10 Bk 1
HPIC (BSP)
000h 1000h
DATA Bk 2
CPU
8 (HPI)
800h 1800h
HPIA

FFFh

12 - 28

DSP54x - Ports 12 - 15
Module 12

HPI Control Signals


Pin From Function
HBIL Host 0 1st Byte 1:2nd Byte
HCNTL0 Host 00 Control 01 Address
HCNTL1 11 Data 10 Data: ++W and R++
HRW- Host 0 to DSP 1:From DSP : Use R/W- or A(n)
HDS1- Host Host Pins HDS1 HDS2 HRW-
HDS2- RD- WE- RD- WE- WE-
STRB- R/W- STRB- VDD R/W-
STRB R/W- STRB gnd R/W-
HAS- Host Host w Mux(A,D): ALEÈHAS-, else VDDÈALE
HCS- Host Chip select: Use Device select Or A(n)
HRDY- DSP to Host(Ready) if Host rate > DSP rate /5
HINT- DSP to Host Int(n) : Int from DSP

12 - 29

HPIC Register
15 - 8 7-4 3 2 1 0
Copy of 7:0 0000 HINT DSPINT SMODE BOB
Both Host DSP Host

0 BOB Byte Order Bit 0 = LSByte 1st (Little Endian)


Endian)
1 = MSByte 1st (Big Endian)
Endian)
1 SMOD Shared Mode 0 = Host Only Mode (HOM)
1 = Shared Access Mode (SAM)
2 DSPINT DSP Interrupt 0 = No Interrupt
1 = DSP Int’d by Host
3 HINT Host Interrupt DSP writes 1 : HINT- driven low
Host writes 1 : HINT cleared (ack
(ack))

Mode Max Rate Details


SAM 5 cycles asserted by DSP. 320 active only.
HOM 50 nS reset cond.
cond. When DSP is: active, idle 1 or 2, reset, no clock
12 - 30

12 - 16 DSP54x - Ports
Module 12

HPI Process
INFO 01 B R DD HPIC HIPA DatL 1234 1235 1236 1237 1238
Ctrl. 0 0 0 0 00 0000 XXXX XXXX 0102 0304 0506 0708 090A
00 1 0 00 0000 XXXX XXXX 0102 0304 0506 0708 090A
Addr.
Addr. 0 1 0 0 12 0000 12XX XXXX 0102 0304 0506 0708 090A
01 1 0 34 0000 1234 0102 0102 0304 0506 0708 090A
W:D1. 1 1 0 0 AA 0000 1234 AA02 0102 0304 0506 0708 090A
11 1 0 BB 0000 1234 AABB AABB 0304 0506 0708 090A
W:+D2 1 0 0 0 CC 0000 1235 CCBB AABB 0304 0506 0708 090A
10 1 0 DD 0000 1235 CCDD AABB CCDD 0506 0708 090A
W:+D3 1 0 0 0 EE 0000 1236 EEDD AABB CCDD 0506 0708 090A
10 1 0 FF 0000 1236 EEFF AABB CCDD EEFF 0708 090A
Addr.
Addr. 0 1 0 0 12 0000 1237 0708 AABB CCDD EEFF 0708 090A
01 1 0 37 0000 1237 0708 AABB CCDD EEFF 0708 090A
R:D4+ 1 0 0 1 07 0000 1237 0708 AABB 0304 0506 0708 090A
10 1 1 08 0000 1237 0708 AABB CCDD 0506 0708 090A
R:D5+ 1 0 0 1 09 0000 1238 090A AABB CCDD 0506 0708 090A
10 1 1 0A 0000 1238 090A AABB CCDD EEFF 0708 090A

12 - 31

’C54x Host to ’C54x HPI


’C54x Host ‘C54x HPI
D7-0 HD7-0
A2-0 A2 HCNTRL0
A1
HCNTRL1
A0
HBIL
R/W - HRW -
VCC HAS -
IS - HCS -
IOSTRB - HDS1 -
VCC HDS2 -
READY HRDY
INT1 - HINT -

12 - 32

DSP54x - Ports 12 - 17
Module 12

Motorola 68HC11F1 to ‘54x HPI


MC68HC11F1 ‘C54x HPI

PC7-0 HD7-0
PF2-0 PF2
HCNTRL0
PF1
HCNTRL1
PF0
HBIL
R/W - HRW-
VCC HAS-
CSIO2 HCS-
E HDS1-
HDS2-
NC HRDY
IRQ - HINT-

12 - 33

Intel 80C51 to ‘54x HPI


Intel 80C51BH ‘C54x HPI

P0.7:0.0 HD7-0
P0.3
HCNTRL0
P0.2
HCNTRL1
P0.1
HBIL
P0.0
HPI HRW-
SELECT
ALE LOGIC HCS-
HAS-
P3.7/ RD- HDS1-
P3.6/ WR- HDS2-
N/C HRDY
P3.2/ INT0- HINT-

Note: HCS- must be low when HDS1- or HDS2- is low.


HCS- may be tied to ground or driven low by some HPI select logic.
12 - 34

12 - 18 DSP54x - Ports
System Considerations

Learning Objectives
Learning Objectives

Become familiar with system level design


considerations for the C54x like:

Boot loader
Clock options
Power management
Program security
JTAG emulation
Memory interfacing
Multiprocessor issues

13 - 2

DSP54x - System Considerations 13 - 1


13 - 2 DSP54x - System Considerations
Module 13

Module 13
Boot Loader

The main function of the boot loader is to transfer user code


from an external source to the program memory at power-up.

Depending on the C54x variant, the part can be booted from:

u 8 or 16 bit serial mode (SSP, BSP or TDM)


u 8 or 16 bit parallel I/O mode
u 8 or 16 bit parallel EPROM mode
u Warm boot mode
u HPI boot mode

13 - 3

Boot Sequence

If the MP/MC pin is sampled low during a hardware reset,


execution begins at location 0FF80h of the on-chip ROM.

This location contains a branch instruction to the start of the


boot loader program.

Unless specified otherwise, the on-chip ROM is factory


programmed with the boot loader program.

13 - 4

DSP54x - System Considerations 13 - 3


Module 13

Boot Loader Operation


The boot loader program sets up the CPU status registers before
initiating the boot load ...
u Interrupts are globally disabled ( INTM = 1 )
u Internal DARAM is mapped in program / data space ( OVLY = 1 )
u 7 wait states are selected for the entire program, data and I/O spaces
u External memory bank size is set to 4K words
u 1 cycle is inserted when accesses switch between
between program
program and
and data
data
space
The boot routine then reads the I/O port address 0FFFFh by driving the I/O
strobe pin low.

The lower 8 bits of the word read from this port address specify the mode of
transfer.

13 - 5

Boot Mode Selection

Test
Yes Begin execution at
Begin Initialize INT2: HPI
mode? HPIRAM

No
Read Boot Routine Selection (BRS) word from I/0 address 0FFFFh

BRS
=
?????

Serial I/O Parallel Warm


Boot Mode Boot Mode Boot Mode Boot Mode
13 - 6

13 - 4 DSP54x - System Considerations


Module 13

HPI boot
The first step of the boot loader is to check if Host Port Interface (HPI)
boot option is selected.

In order to do that, HINT is asserted low. In HPI mode, this pin is normally
tied to INT2.

If INT2 and HINT are tied together, INT2’s bit in the Interrupt Flag Register
(IFR) will be set. The bootloader waits 20 CLOCKOUT cycles after asserting
HINT and then reads IFR bit #2

•If bit #2 is a 1, control istransferred to ‘C54x


the start of HPI RAM.
NOTE: HPI RAM must already be 0
Assert pin HINT
loaded by the host before bringing
the C54x out of reset.
bit 2 of IFR INT2 0
•If bit #2 is a 0, the boot routine set
skips the HPI mode

13 - 7

HPI boot
Alternative methods:

If it’s inconvenient to tie INT2 and HINT together, the following methods
will work.

Send a valid interrupt to the INT2 input pin within 30 CLOCKOUT cycles
after DSP fetches the reset vector.

or ...

Use the warm boot option described later in this section. This method is
preferred.

13 - 8

DSP54x - System Considerations 13 - 5


Module 13

Serial boot
k = 0, standard serial port
15 87 43 0
k = 1, TDM serial port
At address 0FFFFh XXXXXXXX XXkm
XXkm 0n00 n = 0, 8 bit
n = 1, 16 bit
m = 0, CLKX, FSX output
m = 1, CLKX, FSX input

The ‘541 serial boot option can use either the buffered serial port (BSP) ot the
time-division multiplexed (TDM) serial port in standard mode during booting.

13 - 9

Serial boot process


Configure SPC register to
put SP in reset and then pull
SP out of reset. Configure
BSPCE register.
Note: 8 bit read is
Read DA from SP High byte then low
byte.
Read code length from SP

Read code word from SP and save the code word in DM

Transfer data from DM into PM and increment PM

Code Yes
length Branch to DA
0?
No Start executing code
Decrement code length
13 - 10

13 - 6 DSP54x - System Considerations


Module 13

I/O Parallel Boot


15 87 43 0
At address 0FFFFh XXXXXXXX XXX1 1000 8 bit mode
15 87 43 0
XXXXXXXX XXXX 1100 16 bit mode

Most common use of this mode is to boot from a slow microprocessor.

13 - 11

EPROM (Parallel) boot


15 87 21 0 AA = 01, 8 bit mode
At address 0FFFFh XXXXXXXX SRC AA AA = 10, 16 bit mode
SRC = 6 bit page address

13 - 12

DSP54x - System Considerations 13 - 7


Module 13

Warm Boot
15 87 21 0
At address 0FFFFh XXXXXXXX ADDR 11 ADDR = 6 bit page address

13 - 13

Clock Options
CLKMD1
The Phase Locked Loop (PLL) mode is determined
CLKMD2
at start-up by the input states on three pins:
CLKMD3
These pins should not be reconfigured during normal operation.
PLL options for ‘541, ‘2 ,’3 ,’4, ‘5 and ‘6
CLKMD1 CLKMD2 CLKMD3 Option 1+ Option 2+
0 0 0 PLL x 3, ext.
ext. osc.
osc. PLL x 5, ext.
ext. osc.
osc.
1 1 0 PLL x 2, ext.
ext. osc.
osc. PLL x 4, ext.
ext. osc.
osc.
1 0 0 PLL x 3, int.
int. osc.
osc. PLL x 5, int.
int. osc.
osc.
0 1 0 PLL x 1.5, ext.
ext. osc.
osc. PLL x 4.5, ext.
ext. osc.
osc.
0 0 1 Divide by 2, ext.
ext. osc.
osc. Divide by 2, ext.
ext. osc.
osc.
0 1 1 Stop mode* Stop mode*
1 0 1 PLL x 1, ext.
ext. osc.
osc. PLL x 1, ext.
ext. osc.
osc.
1 1 1 Divide by 2, int.
int. osc.
osc. Divide by 2, int.
int. osc.
osc.
+ You can select your device with either option 1 or 2, but not both.
* PLL is disabled. System clock is not provided to CPU / peripherals.
13 - 14

13 - 8 DSP54x - System Considerations


Module 13

Clock Options - ‘548

CLKMD1 CLKMD2 CLKMD3 Clock mode / CLKMD value on reset


0 0 0 1/2 with ext.
ext. source / CLKMD = 0000h
1 1 0 1/2 with ext.
ext. source / CLKMD = 6000h
1 0 0 1/2 with ext.
ext. source / CLKMD = 4000h
0 1 0 1/2 with ext.
ext. source / CLKMD = 2000h
0 0 1 1/2 with ext.
ext. source / CLKMD = 1000h
0 1 1 Stop mode / CLKMD = na
1 0 1 PLL*1 with ext.
ext. source / CLKMD = 00007h
1 1 1 1/2 with ext.
ext. source / CLKMD = 7000h

Check this and add more

13 - 15

PLL Lockup Time


Since it is an analog system, the PLL requires a lockup time before it is stable.

13 - 16

DSP54x - System Considerations 13 - 9


Module 13

Power Management

The current consumption of a DSP can vary depending on many factors


including:

• the instructions being executed


• whether the external pins are exercised or not
• temperature
• supply voltage
• capacitance of the external traces

IDLE1 5.6mA TMS320LC548


IDLE2 2 mA CLKOUT = 40 MHz
IDLE3 0.55 mA Vcc = 3.0 V
Repeat NOP 0.3 mA Room Temperature
Inline NOP 0.4 mA PLL x1 clock mode
Repeat MAC 16 - 44mA Internal consumption only
Inline MAC 20 - 52 mA

13 - 17

Power Management Hints

Some design techniques for minimizing power consumption ...

• Minimize external trace lengths and their associated capacitance


• Set Address Visibility (AVIS) = 0
• When not being used, make sure the timer and serial ports are in reset
and MCM = 0
• Assure all input pins are grounded or pulled hi gh
• Set SWWR to 0 wait states when possible
• Use circular addressing instead of DMOV’s
• Use internal instead of external memory accesses
• Minimize the clock frequency to match the task required
• Implement power down modes where possible

13 - 18

13 - 10 DSP54x - System Considerations


Module 13

Power Management - IDLE


Idle mode is entered by executing the IDLE instruction:
IDLE K ( where K 1, 2 or 3 )
The device will stay in this mode until it is interrupted

All CPU activities stopped Wake on … Reset,


IDLE1 Peripheral interrupts and
Peripherals active
External interrupts
All CPU activities stopped
IDLE2 Wake on … Reset and
Peripherals inactive
External interrupts*
CLKOUT inactive
All CPU activities stopped
Peripherals inactive Wake on … Reset and
IDLE3!
CLKOUT inactive External interrupts*
PLL halted +
* Ints are not latched in idle mode, they must be low for5 cycles to be acked
+ PLL will require a transitory locking time of 50uS for restart
! IDLE3 mode on the 545A, 546A and 548 has additional features. See
Technical Reference.
13 - 19

Power Management - HOLD


Power-down mode can also be initiated by the HOLD signal
When Hold initiates
power-down and ...

HM (in ST1) =1 The CPU stops executing and


address, data and control lines go into
high impedance state. All peripherals
remain active.
HM (in ST1) =0 The address, data and control lines
go into high impedance state. All
peripherals remain active. The CPU
continues to execute internally until
an external access occurs, at which
point the processor will halt.

Power-down mode is terminated when HOLD becomes inactive.


13 - 20

DSP54x - System Considerations 13 - 11


Module 13

Power Management - CLKOUT

All C54x devices can disable the internal clock of external interfaces using
CLKOUT, which will place the interface into a lower power consumption mode.

BSCR(0) = 0 CLKOUT pin enabled*

BSCR(0) = 1 CLKOUT pin disabled

PMST(2) = 0 CLKOUT pin enabled*

PMST(2) = 1 CLKOUT pin disabled

* Condition at Reset

13 - 21

Program Security

On-chip ROM security


ROM / RAM security

13 - 22

13 - 12 DSP54x - System Considerations


Module 13

JTAG Emulation
IEEE 1149.1
JTAG Test Bus

JTAG .. Joint Test Action Group


JTAG
Analysis Control
Block Block The JTAG port on the ‘C54x allows:
•Boundary Scan
•Emulation
Internal scan chain Through a 14 pin Test / Emulation header
( resisters and state machines )
Boundary Scan
PINS
Vcc

Vcc
Header 4.7K ‘C54x
PD EMU0 EMU0
EMU1 EMU1 Header to device
GND TRST TRST lengths greater
GND TMS TMS than 6 inches
GND TDI TDI require extra circuitry
GND TDO TDO and attention
GND TCK TCK to noise.
TCK_RET TCK_RET
GND 6 inches or more

13 - 23

Multiprocessor Issues

Major how-to’s
how-to’s,, signals involved, etc.

13 - 24

DSP54x - System Considerations 13 - 13


Module 13

‘LC548 ROM Features


The on-chip ROM of the ‘LC548 is 2K words in length and is mapped from
0F800h to 0FFFFh if the MP/MC pin is low.

Program space
0x0000
External program space
0xF800
Boot loader
0xFC00
µ-law table
0xFD00
Α-law table
0xFE00
Sine lookup table
0xFF00
Built-in self test
0xFF80
Vector table
13 - 25

Run=, Load=

linker protocol
see sheet1
Load =eprom
Run=SRAM
problems with linker linking symbols

this will the lab topic for the module

assem lang user guide


.label 13 - 26

13 - 14 DSP54x - System Considerations


Module 13

549 features

two voltages ... 2.5 core 3.3 external

power up issues

13 - 27

Level Shifting

3.3 - 5v

13 - 28

DSP54x - System Considerations 13 - 15


Module 13

Power Supply power up sequence

3.3 volt core 5v i/o

2.5v core 3.3. i/o

13 - 29

13 - 16 DSP54x - System Considerations


Using the C Compiler

Learning Objectives
Learning Objectives
u Invoke the compiler or shell program
À Options and Switches
À The RTS library
À The Optimizer
u Write code in C
À Numerical Types supported
À Accessing MMRs and IO Ports
À Inlining C and ASM functions
À Interrupt service routines
À Optimization tips
u Use the C support files :
À C.CMD : Linker file issues when using C
À BOOT.ASM Pre-main initialization process
u Intermix assembly files within the C environment
À Stack Model
À Register Usage
À Argument passing and result return
14 - 2

DSP54x - Using the C Compiler 14 - 1


14 - 2 DSP54x - Using the C Compiler
Module 14

Module 14
Compiler Tool Flow
FILE.C

Parser Invoking the Shell :


C Compiler -o Optimizer CL500 x.c y.asm
y.asm -z c.cmd
c.cmd
CC500

Code Generator

FILE.ASM
Assembler : ASM500

Shell Program :
CL500 -z Linker : LNK500 FILE.OUT

FILE.OBJ
14 - 3

Common Compiler Options


Switch Description
-g Global: symbols for debugging
-s Source: C interlist in ASM file
-al Assembler: List file request
-as Assembler: glboal Symbols
-ms Model Size - optimize for size
-mn Model Normal - full opt. despite -g
-o0 Optimize register use
-o1 Opt. -o0 + local opt.
-o2 Opt. -o1 + global opt.
-o3 Opt. -o2 + file opt.
-oe Eliminate dead code
-x Enable Inlining and -o3
-z LNK500 invoked (link options follow)
14 - 4

DSP54x - Using the C Compiler 14 - 3


Module 14

Compiler Switch Issues


Optimizer should be invoked incrementally:
CL500 -g test -z c.cmd
c.cmd Symbols kept for debug
CL500 -g -o3 test -z c.cmd
c.cmd Add optimizer, keep symbols
CL500 -g -o3 -mn c.cmd Full optimize, some symbols
-mn test -z c.cmd
CL500 -03 test -z c.cmd
c.cmd Final rev: optimize, no symbols

Preferred switches can be selected in several ways:


On command line : As above.
With batch file, eg : CL500 -g -o3 -mn
-mn %1 %2 -z c.cmd
c.cmd
Via environment variable : SET C_OPTION=-g -o3 -mn
-mn

14 - 5

Lab 14-a : Invoking the Compiler

1. Inspect a C file that performs the sine routine


2. Compile the file using CL500
3. Observe the resultant .ASM file
4. Load the .OUT file to the simulator
5. Run the program and
a. Verify correct results obtained
b. Benchmark cycles for sine routine
c. Note lines of code required for sine routine
6. Recompile with optimizer (-o). Repeats steps a - c
7. Compare the results of steps 5 and 6

14 - 6

14 - 4 DSP54x - Using the C Compiler


Module 14

Writing Code in C
u Invoke the compiler or shell program
À Options and Switches
À The RTS library
À The Optimizer
uu Write
Writecode
codein
inCC
ÀÀ Numerical
NumericalTypes
Typessupported
supported
ÀÀ Accessing
Accessing MMRs and
MMRs andIO
IOPorts
Ports
ÀÀ Inlining C and ASM functions
Inlining C and ASM functions
ÀÀ Interrupt
Interruptservice
serviceroutines
routines
ÀÀ Optimization
Optimizationtips
tips
u Use the C support files :
À C.CMD : Linker file issues when using C
À BOOT.ASM Pre-main initialization process
u Intermix assembly files within the C environment
À Stack Model
À Register Usage
À Argument passing and result return
14 - 7

Inline Assembly
u Allows direct access to assembly language from C
u Useful for operating on components not used by C, ex:

asm ( “label RSBX INTM ” );

u Note: first column after leading quote is label field


u Avoid modifying components used by C (especially with -o )
u Long operations should be written in ASM and called from C
u main C file retains portability
u yields more easily maintained structures
u eliminates risk of interfering with registers in use by C
14 - 8

DSP54x - Using the C Compiler 14 - 5


Module 14

Accessing MMRs from C

u Using pointers to access Memory-Mapped Registers :

u Create a pointer and set its value to the assigned memory address :
volatile unsigned int *SPC_REG = (volatile unsigned int *) 0x0022;

u Read and write to the register as any other pointer :


*SPC_REG = 0xC8;

u Volatile modifier :

u Especially important with optimizer (-o)


u Tells compiler to always recheck actual memory whenever encountered
u Otherwise, optimizer might register-base value, or eliminate construct
14 - 9

Accessing I/O Ports from C

Accessing I/O Ports from C : ioport type portHEXNO


1. create the port : ioport unsigned port8000

2. access the port : x = port8000 ;


port8000 = y ;

Accessing I/O Ports from ASM : Label PORTR 8000h,x


PORTW y,8000

Accessing I/O Ports from Simulator : ma 0x8000,2,1,ioport


0x8000,2,1,ioport
mc 0x08000,2,1,out.dat
0x08000,2,1,out.dat,W
,W
mc 0x8000,2,1,in.dat
0x8000,2,1,in.dat,R
,R

14 - 10

14 - 6 DSP54x - Using the C Compiler


Module 14

Interrupts in C

u Interrupt Service Routine


À C function to run when interrupt occurs
À All necessary context save/restore performed
automatically
u Interrupt Initialization Code
À Should be called prior to run-time process
À Interrupt status may be modified during run-time
u Interrupt Vector Table
À Written in ASM

14 - 11

Writing ISRs in C
int x[100] ; u Global variables allow
int *p = x ; sharing of data between
main functions & ISR

main { … } ; u Keyword
u Name of ISR function
interrupt void name(void)
{
static int y = 0 ; u Void input and return values
y += 1 ; u Locals are lost across calls
if y < 100 Statics persist across calls
*p++ = port0001;
else
u ISRs should not include calls
asm(“ intr 17 “); u Return is with enable (RETE)
} u Avoid -e or -oe
-oe options

14 - 12

DSP54x - Using the C Compiler 14 - 7


Module 14

Initializing Interrupts in C
Setup pointers to IMR & IFR. Initialize IMR, IFR, INTM :
volatile unsigned int *IMR = (volatile unsigned int *) 0x0000;
volatile unsigned int *IFR = (volatile unsigned int *) 0x0001;

*IFR = 0xFFFF;
*IMR = 0xFFFF;

asm(“
asm(“ RSBX INTM “);

Create Vector Table : Compiled ISR Sequence :


.sect “.vectors” u I$$SAVE performs context
… save (from RTS.LIB)
B ISR1 u ISR function runs
nop u I$$RESTORE performs
nop context restore (RTS.LIB)
… u RETE - Return with Enable
14 - 13

Numerical Types in C
xxxx xxxx xxxx xxxx 16-bit int
* yyyy yyyy yyyy yyyy 16-bit int

zzzz
zzz zzzz zzzz zzzz zzzz
z zzzz zzzz zzzz 32-bit product

z=((long)(x)*((long)(y))>>15; z = x * y;

zz(Q15)
(Q15) zz(Q0)
(Q0)

u short, char,
short, char, etc, all occupy full 16-bit memories
u no byte-addressing/packing on ‘54x
u float operations supported via rts.lib
rts.lib
u float math is multicycle
14 - 14

14 - 8 DSP54x - Using the C Compiler


Module 14

The Optimizer

u ‘54x Specific Optimizations


u General Optimizations
u Data-flow Optimizations
u Branch & Control-flow Optimizations
u Loop Optimizations

14 - 15

‘54x Specific Optimizations

u Cost-based register allocation


ARn,
ARn, A, B
u Auto-increment
*ARn+
ARn+
u Block repeat
RPTB
u Delayed Branch, Call, and Return
BD, CALLD, RETD

14 - 16

DSP54x - Using the C Compiler 14 - 9


Module 14

General Optimizations

u Algebraic re-ordering
example : (a+b) - (c+d) = 6 cycles
becomes : (((a+b)-c)-d) = 4 cycles

u Constant folding
example : a = (b+4) - (c+1)
becomes : a=b-c+3
u Symbolic simplification
u Alias Disambiguation
When only one pointer accesses a given memory
array, compiler may allow registers to hold values

14 - 17

Data-flow Optimizations
u Copy propagation
Following assignment to a variable, references to the variable
are replaced with the value
u Common sub-expression elimination
If two (or more) equations perform the same sub-action,
the value is saved after the first and recalled later
u Redundant Assignment Elimination
Drop assignments not used in later equations
example (int
(int j)
{ int a = 3; 3 assigned to a & propigated down; a elim’d
int b = (j*a) + (j*2); becomes (j*5)
int c = (j<<a); dead var:
var: replaced with expression
int d = (j>>3) + (j<<b); assignment unused - eliminated
call (a,b,c);
}
14 - 18

14 - 10 DSP54x - Using the C Compiler


Module 14

Branch & Control-flow Optimizations

u Rearrange code to remove branches or


redundancies
u Unreached code is deleted
u Branch to branch is bypassed
u Conditional branch over uncondional branch
becomes single conditional branch ‘not’
u Conditional branches whose conditions are
resolved at compile-time are replaced with
unconditional branches

14 - 19

Loop Optimizations
u Loop induction variables “LIVs
“LIVs”
” are, for example, the “i” in ‘for i=…
u Process of making LIV op’s more efficient is called strength reduction, eg:
eg:

for (i=1,i<100,i++) y+=*x++


+=*x++
becomes
y+=x[i]; using *ARn
*ARn+
+

counters --> BANZ or RPTB

u Often loop control variable is removed entirely - debug issue


u Other loop optimizations:
u Loop Rotation: Evaluate loop condition at end vs.
vs. beginning
u Loop Invariant Code Motion:
Move static equations out of loop & reference result only
u Inline Expansion of RTS Library Functions: small functions are
inlined,
inlined, not called. Size is user specifiable,
specifiable, default = 10 lines
14 - 20

DSP54x - Using the C Compiler 14 - 11


Module 14

Inlining C Functions

Code must be present in file for inlining :


Call of Fn Inline Fn
Put code in file
Put source in header : # include

call fn inline fn
Library of dual function types

call fn
Benefit - Faster: inline fn
no branch
no return
no clear of parent fn Fn ... inline fn
no setup of sub-fn
sub-fn ret
merging of fn’s with optimizer

14 - 21

Optimization Steps
u Optimize : Use -o, -mn -mn when compiling
u Use #define instead of variables for parameters
u Globals may be faster than locals
u Minimize mixing signed & unsigned integers
u Inline short/key functions : compile with -x
u Declare function as inline
u Automatically invoked for short routines within file
u Inlines can be passed between files via header
u Give compiler project visibility
u #include sub-files within main
u Optimizer will operate over all files allowing better
inlining,
inlining, register tracking, etc.
u Tune memory map via C.CMD
u Re-write key code segments in assembly
u Bulletin Board, App notes, 3rd Parties
u S/W Cooperative, Hand written
14 - 22

14 - 12 DSP54x - Using the C Compiler


Module 14

Optimization Process
Write
Write&&debug
debugin
inC,
C,benchmark
benchmark

Real-time
Real-time Y
goal
goalmet
met??
N
Perform
Perform C & C.CMDOptimizations
C & C.CMD Optimizations

Real-time
Real-time Y

goal
goalmet
met??
N
Profile.
Profile.Convert
ConvertKey
KeyFunctions
Functionsto
toASM
ASM

Real-time
Real-time Y
goal Done
Done
goalmet
met??
N
14 - 23

Lab 14-b : Writing Code in C

14 - 24

DSP54x - Using the C Compiler 14 - 13


Module 14

C Support Files
u Invoke the compiler or shell program
À Options and Switches
À The RTS library
À The Optimizer
u Write code in C
À Numerical Types supported
À Accessing MMRs and IO Ports
À Inlining C and ASM functions
À Interrupt service routines
À Optimization tips
u Use the
Use the C
C support
support files
files ::
À C.CMD :: Linker
C.CMD Linker file
file issues
issues when
when using
using C
C
À BOOT.ASM Pre-main
BOOT.ASM Pre-main initialization
initialization process
process
u Intermix assembly files within the C environment
À Stack Model
À Register Usage
À Argument passing and result return
14 - 25

Components of C.CMD
file1.obj Files : list here or pass via shell
vectors.obj Must be written in asm,
asm, listed here
-c Boot.asm
Boot.asm is included
-o test.out Output file name
-m test.map Map file name
-i c:\filepath Paths to search
-l rts.lib Libraries to search - last on list
-stack 400h Override stack size
-heap 200h Override heap size
MEMORY
{P or D, RAM or ROM, F,M or S} Pgm,Data,Fast,
(Pgm,Data,Fast,Med
Med,Slow)
,Slow)
SECTIONS
{ .vectors:> P ROM M Vector table
.text :> P ROM F Code
.cinit :> P ROM S Init table for global/statics
global/statics
.const :> D ROM M Constants - several options here
.switch :> P ROM M Case statement arrays
.bss :> D RAM M Globals and statics
.stack :> D RAM F Stack allocation
.sysmem :> D RAM M } Heap allocation
14 - 26

14 - 14 DSP54x - Using the C Compiler


Module 14

Options for Handling .const


u Put a ROM in data memory for .const
.const..
+ True constant
- Extra cost
u Link .const
.const to a ROM whose CS- CS- is an AND of PS-
PS- & DS-
DS-
+ Lower cost, true constants
- Reduces total memory space, extra gate
u Use: LOAD=PgmRom
LOAD=PgmRom,RUN= DataRam,, and write a
,RUN=DataRam
routine to copy Rom to Ram on reset.
+ Low cost
- Extra design effort, not true constants
u Use a host processor to init.
init. constants to data ram on reset
+ No extra cost if there is already a host and I/F
- Not true constants, extra design effort
u Use initialized globals instead of constants and link
with “-c” to auto-initialize pgm ROM to data RAM
+ Way to autoinit,
autoinit, good use of memory space
- RTS.LIB fns may not apply, not “true” constants

14 - 27

Global and Static Variable Initialization

u Global and Static variables (G/S vars)


vars) are linked under .bss
.bss
u G/S vars with no explicit init val are assumed 0 by ANSI
u Compiler does not support the assumed 0 init value
u Solutions:
u Initialize all G/S vars to 0 explicitly
u Link with a specified Initial value, eg:
eg:
.bss:>
bss:>DatRam
DatRam,
, fill=0
u Add an ASM routine pre-main:
STM #.bss
#.bss,AR7
,AR7
RPTZ A,#len
STL A,*AR7+
14 - 28

DSP54x - Using the C Compiler 14 - 15


Module 14

BOOT.ASM - invoked with “-c”


Reset : PC <- FF80

.ref _c_int00
FF80: B _c_int00
nop
nop

_c_int00 1. Allocate stack


2. Init SP to end of stack
3. Initialize status bits (esp
(esp.
. CPL)
4. Copy .cinit
.cinit to .bss
.bss (skip if “-cr
“-cr”)
”)
5. Call “_main”

_main ...
14 - 29

Runtime Support Library: RTS.LIB

u Use -L RTS.LIB at end of file list in LINK.CMD to get access to


specified libraries as needed by prior listed files:

file1.obj
file1.obj /* can access rts.lib
rts.lib */
-l rts.lib
rts.lib /* run-time support library */
file2.obj
file2.obj /* won’t access rts.lib
rts.lib */

u Library functions must be declared to be used in a C file,


usually via Header file.
u Headers can be inserted or included via the #include
declaration
u Example - to access math.h type: #include <math.h>
u See compiler UG for full list of headers
u Note: no stdio.h
stdio.h - why?

14 - 30

14 - 16 DSP54x - Using the C Compiler


Module 14

The Archiver: AR500

Command line options: x - extract


r - reinstall
a - append

Sequence to modify lib fn,


fn, eg:
eg: Boot.asm
Boot.asm::

1. extract file: AR500 x rts.


rts.src boot.asm
boot.asm

2. modify as desired using ASCII editor

3. refresh both archives: AR500 r rts.


rts.src boot.asm
boot.asm
AR500 r rts.lib
rts.lib boot.obj
boot.obj

14 - 31

Lab 14-c : C.CMD and BOOT.ASM

14 - 32

DSP54x - Using the C Compiler 14 - 17


Module 14

Mixing ASM into C System


u Invoke the compiler or shell program
À Options and Switches
À The RTS library
À The Optimizer
u Write code in C
À Numerical Types supported
À Accessing MMRs and IO Ports
À Inlining C and ASM functions
À Interrupt service routines
À Optimization tips
u Use the C support files :
À C.CMD : Linker file issues when using C
À BOOT.ASM Pre-main initialization process
u Intermix assembly files within the C environment
À Stack Model
À Register Usage
À Argument passing and result return
14 - 33

Calling ASM function from C


u Declare / Prototype the assembly language function
Data Mem
u Call the assembly language function.

extern int slope(int


slope(int,
,int,
int,int);
int); Stack area
void main (void) {
int x,y,b,m ;
y = slope(b,m,x);} A arg b
old PC
u Define function name (code entry point) arg.
arg. m
u Declare function name as a global arg.
arg. x
SP È
.def
.def _slope local x
_slope: LD *SP(1),T local y
MPY
RETD *SP(2),B local b
ADD
MPY B,15,A
*SP(2),B
RET
ADD B,15,A A mx+b
mx+b local m

u Note: *SP(1) implies use of SP, but


requires CPL=1 to work properly
14 - 34

14 - 18 DSP54x - Using the C Compiler


Module 14

Register Caveats for C


$537&&29$29%   '3  
ST0 

%5$)&3/;)+0,1702906;0&)5&7&037$60 
ST1 

Registers not free on function call Registers free on function call


Reg.
Reg. Use by C : Reg.
Reg. Use by C
AR7 Long frame ptr B Expression Analysis
SP Stack Pointer T Expression Analysis
AR1,6 Register Variables AR0 Pointers and expressions
A 1st Arg.
Arg. / Rtn Value AR2-5 Expression Analysis
Use )UDPH136+03230)UDPH1 BRC Loop reg’s (RSA,REA)

14 - 35

Lab 14-d : ASM routine in C

14 - 36

DSP54x - Using the C Compiler 14 - 19


Module 14

14 - 20 DSP54x - Using the C Compiler

Você também pode gostar