Escolar Documentos
Profissional Documentos
Cultura Documentos
Student Guide
DSP54x-NOTES-1.2
May 1997 Technical Training
Copyright © 1997 Texas Instruments Incorporated.
All rights reserved.
Notice
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the
prior written permission of Texas Instruments.
Texas Instruments reserves the right to update this Guide to reflect the most current product
information for the spectrum of users. If there are any differences between this Guide and a
technical reference manual, references should always be made to the most current reference
manual. Information contained in this publication is believed to be accurate and reliable.
However, responsibility is assumed neither for its use nor any infringement of patents or rights of
others that may result from its use. No license is granted by implication or otherwise under any
patent or patent right of Texas Instruments or others.
Revision History
TMS320C54x DSP
Design Workshop
Texas Instruments
Technical Training
Introductions
u Name
u Company
u Project Responsibilities
u DSP Experience
u 320 Experience
u Hardware/Software, Asm/C
Asm/C
u Interests
0-2
Learning Objectives
Learning Objectives
1-2
Module 1
DSP: Sum-of-Products
100
y = ∑ xn an
n =1
x a
MPY
ADD
1-3
M bus
acc A acc B
1-4
Accumulators + ALU
LD s, A
acc A acc B ALU
ADD e, A
U BUS SUB r, A
MUX STL A, t
A B M
1-5
Notes
1-6
Barrel Shifter
A B C D
S BUS
ALU W BUS
LD X, 16, A
STH B, y
1-7
Temporary Register
A
D X EXP
B
T
ex: A = xa
T BUS LD x, T
MPY a, A
MAC ALU
1-8
’C54x Buses
P
M D
INTERNAL U M EXTERNAL
X C U
MEMORY E X MEMORY
S
E
C D
T MAC A B ALU SHIFT
Notes
1 - 10
Pipeline - Concept
1 - 11
Memory Interaction
1 - 12
1 - 13
Memory Write
1 - 14
1 - 15
P PC, PA
F Program Mem,
Mem, PD
D Controller
A ARs,
ARs, DA, ARAUs
R Data Mem,
Mem, DD ; AR, ARAU, EA
X CALU (MAC, ALU) ; ED, Data Mem
1 - 16
CNTL PC ARs
M
D
INTERNAL U M EXTERNAL
X C U
MEMORY E X MEMORY
S
E
1 - 17
Notes
1 - 18
Pipeline Performance
TIME
P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 F4 D4 A4 R4 X4
P5 F5 D5 A5 R5 X5
P6 F6 D6 A6 R6 X6
P
54x
D
P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 -- -- -- F4 D4 A4 R4 X4
-- -- -- P5 F5 D5 A5 R5 X5
-- -- -- P6 F6 D6 A6 R6
1 - 20
54x 54x
P or D
D P
P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 F4 D4 A4 R4 X4
P5 F5 D5 A5 R5 X5
P6 F6 D6 A6 R6 X6
NO CONFLICT
1 - 21
MAC ALU
ROM DARAM
4K 1K
4K 1K
. .
. .
. .
EXT
EXT
9000
Internal
ROM ?
E000
FF80 DROM ROM ?
VECTORS FFFF
FFFF
1 - 23
EXT EXT
EXT
9000
2K ROM
9800 2K ROM
A000 4K ROM
B000
4K ROM EXT
C000
4K ROM or
D000 ROM
4K ROM
E000
4K ROM
FF80 F000
4K ROM w VECs *
FFFF VECTORS* FFFF FFFF VECTORS*
* FF80 - FFFF are the default locations for vectors.
** Internal ROM FF00 - FF7F reserved for TI test.
1 - 24
1 - 25
EXT EXT
F800
ROM
FF80
VECTORS
FFFF FFFF
1 - 26
1 - 27
1 - 28
1 - 29
Debugger Screen
command
menu bar
code CPU
window registers
command memory
window window
1 - 30
This module will guide you through the basic commands of the source debugger. Upon
completion of the walkthrough, you will be able to:
• Set up and manipulate windows to display variables and data structures
• Single-step C statements and/or assembly instructions
• Set breakpoints and benchmark code
• Issue debugger commands via command menus, keyboard entry, or a mouse
Note: This walkthrough is intended to demonstrate the use of the debugger interface. It is not
meant to be an opportunity to get to know the ’C54x assembly language or C. Please do
not attempt to dwell upon them, as this adds considerable time (and effort) to the process.
The assembly language will be thoroughly presented in succeeding modules.
cd \dsp54x\labs ↵
The demo program is a C file which simply loads an incrementing value to a variety of data
types. Although of little interest in terms of DSP, it is a useful platform for exercising the
debugger interface and commands.
main() {
int count;
for (;;)
for (count=0; count<1000; count++)
init(count);
}
SIM5XX lab1 ↵
The debugger assumes that the file to be loaded has a default extension of .out. We will learn
how to create output files in Module 2.
Note: If, in the process of this lab, you reach a point where the system no longer responds, or is
otherwise corrupted, you may reload the file by typing LOAD lab1 at the command
prompt. In rare cases, you may have to exit the simulator entirely by typing QUIT and
starting over.
Make the DISASSEMBLY window active. You can scroll through the code displayed in the
DISASSEMBLY window several ways. First, by using the keyboard up-arrow, down-arrow,
PgUp, and PgDn keys. And finally, by pointing the mouse at the up and down arrows on the
window border and pressing the left button.
Note: Be careful. If you click while over an element of a window, you may set a breakpoint (if
you are in a FILE or DISASSEMBLY window), or select a register or memory location
for modification. To remove the breakpoint, simply point and click at the highlighted
instruction.
When you want to return to a particular label or address, use the command addr nnnn, where
nnnn is the label or address to return to. For example, type:
addr c_int00 ↵
addr 0x0005 ↵
addr main ↵
To move the window, grab the top of the window by holding the left mouse button down and
drag the window to a new position.
To restore the screen to its original state, use the “screen configuration” command with no
arguments:
sconfig ↵
To load a particular screen configuration you may specify the desired file with the SCONFIG
command:
sconfig tc.clr
To save a configuration, use the ssave <file name> command. There is no default
extension, although .CLR (for color) is the extension generally used.
Typing sconfig once again will return you to the original configuration. The sconfig
command uses the default filename init.clr. You may use either of these configurations, or
any of your own creation, whenever using the debugger.
addr c_int00 ↵
The assembly code shown at c_int00 can be single-stepped by pressing <F8> on the keyboard
or by pointing and clicking the left mouse button.
Try running a few instructions by pressing <F8> and watching the PC value (in the CPU
window) change as the corresponding instruction is executed (highlighted). Modified register and
memory contents are also highlighted.
go main ↵
Notice that the display changes to display the C program in the FILE window. Also notice that
the CPU registers are no longer displayed. The CALLS window is opened to show which C
functions have been called.
The ability to view C source code in its native format is why our debugger is termed a “source”
debugger.
Watch Window
Suppose you want to watch the value of a C variable while single-stepping the program. Type:
wa count ↵
This creates a watch window with the value of the variable count displayed. The value
displayed for count is not meaningful at this point since it has not been initialized yet. You may
discover that opening a watch window on a variable not found in the current function will
generate a warning.
<F8>
<F8>
Notice that the variable count was assigned the value zero. You should now be at the init()
function call. Press:
<F8>
and you will go to the function. Notice the change in the CALLS window.
wa i ↵
wa a[0] ↵
Notice that the display shows a[0] as a floating-point value automatically. The debugger
displays values according to their defined type.
When a watch or display is no longer needed on screen, it may be closed by first selecting it
(using <F6> or a mouse click), and then using the close window command: the <F4> key. <F4>
does not apply the main simulator windows (CPU, MEM, Disassembly, etc.).
wa a ↵
You receive the error message Invalid watch expression because you are allowed to
watch only single, scalar values. If you have forgotten the type of the variable a, type:
whatis a ↵
To display the entire array of floating-point values, use the display command:
disp a ↵
You might want to move the DISP window over to the right of the screen.
disp example ↵
This structure has four members called i, j, k, and p. Note that they are displayed in accordance
with their type. Move this window over to the right just below the DISP: a window.
To display the contents of the array example.k, move the cursor down to highlight the line
showing k: [...] and select this by pressing:
A new window is opened which shows the elements of the array. If this had been another
structure (instead of an array), it would be shown as k: {...}. Brackets indicate arrays and
braces indicate structures.
Since this new window showing the array k is opened directly on top of the previous window,
you should move it down to make the example window visible.
restart ↵
go main ↵
Press:
<F8>
Continue executing instructions by repeatedly pressing <F8>. Observe how the values in the
watch window and display windows change. Continue stepping through the init function until
it returns to the main function. If you do not wish to see the remainder of the function in step
mode, you can complete the function and return immediately by entering:
ret ↵
Note: If you were not in a sub-function at this point, the simulator will never reach a return and,
therefore, will never halt. To stop the simulator in such an event, simply press <Esc>.
Suppose you want to single-step without seeing the details of each individual function call. You
can step across function calls using:
Next ↵
Alternatively, you can press <F10>. Notice that the next C statement is executed without
showing function calls. (Called functions are not skipped; they are just not executed in single-step
mode.)
Both the step command <F8> and next command <F10> can be executed from the command
line with an argument specifying the number of instructions to execute. For example, type:
step 10000 ↵
<Esc>
Note: You can use a Boolean expression as well as a numerical example with the step
command; e.g., step (AR0 !=0)
If you are executing within the init() function and want to return, type:
ret ↵
next 10000 ↵
<Esc>
and you will see a User halt message displayed in the command window.
restart ↵
go main ↵
MIXED Mode
To debug in mixed mode, which allows you to observe assembly instructions and C statements
simultaneously, type:
mix ↵
You should see both the C source code and the corresponding assembly code. The
DISASSEMBLY window shows highlighted memory locations which are associated with the
current C statement.
You may have to move and size your display windows and watch windows to see the CPU and
REGISTER windows. A suggestion is to remove (reset) the watch window using the command:
wr ↵
<F8>
Notice that assembly instructions are stepped. If you are currently executing with the init()
function and want to return from the function, type:
ret ↵
<F10>
Continue this while observing that the assembly instruction CALL init is skipped over.
cstep ↵
or
cnext ↵
Like their counterparts, step and next, you can execute a fixed number of instructions. For
example:
cstep 10 ↵
ASM Mode
If you are interested only in debugging an assembly language program, you can switch to
assembly mode by typing:
asm ↵
Notice that the windows that display C data structures disappear when you are in assembly mode.
This is a convenient way to clear up the screen if you want to observe CPU register values or
display memory contents. Try single-stepping by repeatedly pressing:
<F8>
and observe the changing register values in the CPU window. Changed values are highlighted so
you will notice when a change occurs.
mix ↵
Review of Modes
In summary, there are three modes of operation:
• Mixed mode (mix command) shows assembly and C (if C source exists).
• Assembly mode (asm command) shows assembly code only.
• C mode (c command) automatically switches from C to assembly displays,
depending on what type of source code is executing.
restart ↵
mix ↵
go init ↵
To set a breakpoint you can either use the command ba xxxx, where xxxx is an absolute
memory location or a valid label. This method requires that you know the address (or label). For
example, type:
ba init ↵
This sets a breakpoint at the entry point to the function. Notice that the instruction is highlighted
when a breakpoint is set.
bl ↵
br ↵
bl ↵
In addition to the ba command to add breakpoints, simply point the mouse at the line a
breakpoint is desired and press the left mouse button. The line that the breakpoint is set on should
now be highlighted. Pressing the left mouse button again will remove the breakpoint.
run ↵
The program should stop at the breakpoint. If the breakpoint is not reached, press <Esc> and
verify that the breakpoint has been set (use the bl command or look at the
FILE/DISASSEMBLY window to see a highlighted instruction).
<Tab> ↵
Notice that pressing <Tab> backs up to the previous command entered. Pressing ↵ causes that
command to be executed again. In fact, you can cycle back through all previous commands you
have entered by repeatedly pressing <Tab>. Pressing <Shift><Tab> takes you forward
through this command buffer.
Let’s assume you still have a breakpoint set at the for statement. To “benchmark” the execution
time required to execute from one breakpoint to another, you need to set a second breakpoint. Go
ahead and select another instruction for a breakpoint using either <F9>, a mouse click, or the ba
command. To benchmark, type:
run ↵
runb ↵
? clk ↵
The run command executes to the first breakpoint. The runb command is the “run-with-
benchmarking” command. The ? command tells the debugger to evaluate the following C
expression and display the result. The clk debugger variable is valid only after a runb
command and is set to the number of clock cycles between the run and runb commands.
Evaluating Expressions
To evaluate a C expression, you can use the ? command. This is one way to modify register
values, since C expressions may have side effects such as assignment. Type:
? pc ↵
You should see the pc value displayed. To modify the current pc, type:
? pc = main
? ar0 = 0
To evaluate an expression without displaying the result in the COMMAND window, use the
eval command instead of the ? command. Type:
eval pc = 0 ↵
eval pc = main ↵
CPU, MEMORY, and WATCH window registers can be modified by pointing the mouse to the
desired register and pressing the left mouse button. When the register is selected, it will be
highlighted and ready for input from the keyboard.
Point to the CPU window AR0 and press the left mouse button.
Enter a new value of 5 and press ↵ when complete.
Displaying Files
You can display any file in the FILE window. Type:
file siminit.cmd ↵
You should see the debugger’s initialization command file displayed. At this point, you can go
back to debugging and the previous C source file will automatically be displayed when you start
executing instructions.
Within the debugger COMMAND window, you can perform DOS-like commands to examine
and change the current directory. Use the command dir nnnn, where nnnn is the directory
name, to display a directory listing. Type:
dir ↵
The command cd nnnn, where nnnn is the new directory name, changes the current directory.
cls ↵
<Alt>L
then repeatedly press the right arrow key to look at the drop-down menus.
The drop-down menus can also be selected by pointing and pressing the left mouse button. For
example, select the mode menu with the mouse.
quit ↵
and you should get a display that shows more detail, but may also cause more eye strain. A larger
monitor will allow you to take full advantage of the source debugger’s high resolution modes.
The -bb switch creates a 50-line display. Another switch, -b, offers an intermediate-sized 43-
line display. Your preferred display size may be made the default by saving the screen
configuration as init.clr with the ssave command described earlier. Then the need to
explicitly use the -b switch is eliminated.
take lab1.log ↵
Congratulations, you have completed the walkthrough. To exit the debugger, press:
<Esc>
Type:
quit ↵
Sizing Window
Click on bottom right corner; drag to new shape
Type SIZE and use arrows or type coordinates Watches and Breakpoints Entry/Exit
ZOOM click on top left corner
Operation Watch Breakpoint SIM2xx <file> start simulator with <file>.out
UNZOOM click again on top left corner
ADD WA BA SIM2xx -bb high resolution mode
Screen Configuration RESET WR BR
QUIT exit simulator
SCONFIG <name> load configuration <name> LIST WL BL
SSAVE <name> save configuration DELETE WD # BD # SYSTEM go to DOS shell
<name> or hot keys or mouse clicks
Modes
ASM display ASM info or <Alt> D,A
C display C info or <Alt> D,C
MIX display both ASM and C or <Alt> D,M
1 - 31
u CALU supports:
À General-purpose operations:
À MAC
À ALU
À Special functions:
À CSSU (Viterbi)
À EXP (Norm)
À FIRS: MAC + ALU
À 16- or 32-bit operations:
À C16 mode
À ’Double’ operations
1 - 33
1 - 34
Learning Objectives
Learning Objectives
u Describe steps to create executable output files
u Create an assembly file containing:
À Code
À Constants (initialized data)
À Variables
u Create a linker command file which:
À Identifies input and output files
À Describes a system’s available memory
À Indicates where code and data shall be located
u Develop multi-file systems
2-2
Module 2
-o
.asm .obj .out
Text
ASM500 LNK500 Debug
Editor
-L -m
.lst .map
HEX500
2-3
•Contains DSP
EVM500 •ISA Card
•ISA card
XDS510 •No DSP
•PC<-> Target
2-4
2-5
Assembly Files
u Describe steps to create executable output files
u
u Create an assembly file containing:
À Code
Code
À Constants
Constants (initialized
(initialized data)
À Variables
Variables
u Create a linker command file which:
À Identifies input and output files
À Describes a system’s available memory
À Indicates where code and data shall be located
u Develop multi-file systems
2-6
Assembly Conventions
WDEVRUVSDFHV
FRORQRSWLRQDO LQVWUXFWLRQRUGLUHFWLYH
2-7
Assembly Files
u Mnemonics
À Lines of 320 code
À Generally written in upper case
À Become components of program memory
u Directives
À Begin with a period (.) and are lower case
À Can create constants and variables
À May occupy no memory space when used
to control ASM and LNK process
2-8
Type Examples
Binary 1110001b or 11111001B
Octal 226q or 572Q
Decimal 1234 or +1234 or -1234 (Default type)
Hexadecimal 0A40h or 0A40H or 0xA40
Floating-point 1.623e-23 (sign and decimal point optional)
Character ‘D’
Character “this is a string”
strings
2-9
Coding Example: z = x + y
Code .text
start LD x,A
get x
ADD y,A
add y
STL A,z
store z
B start
loop
Constants .data
x .int 2
x=2
y .int 7
y=7
Variables
z .bss z,1
2 - 10
2 - 11
Assembler
Directive Example Definition
2 - 12
; a = 0,1,2,3,4
; x = input array of length 5 a 0
; y = result array of length 1 1
2
3
4
_______
2 - 13
table 1 x
2
3
4
8 a
6
4
2
0 y
2 - 14
2 - 15
Linking
u Describe steps to create executable output files
u Create an assembly file containing:
À Code
À Constants (initialized data)
À Variables
u
u Create a linker command file which:
À Identifies
Identifies input
input and
and output
output files
files
À Describes
Describes aa system’s
system’s available
available memory
memory
À Indicates
Indicates where
where code
code and
and data
data shall
shall be
be located
located
u Develop multi-file systems
2 - 16
Linking
l)LOHVLQSXWDQGRXWSXW
l0HPRU\GHVFULSWLRQ
l+RZWRSODFHVZLQWRKZ
OLQNFPG
PDS
2 - 17
Example System
Program Data
Memory Memory
&[
8000 4000
FRGH 65$0
YDU
FFFF 3520 6000
8000
(3520
FRQVW A000
2 - 18
2 - 20
8000
32K
EPROM
2 - 21
MEMORY
{ PAGE ___: /* Program Memory */
______: org = ______, len = ______
______: org = ______, len = ______
________:
______: org = ______, len = ______
______: org = ______, len = ______
______: org = ______, len = ______
}
SECTIONS
{ .text: > EPROM PAGE 0
.bss: > SPRAM PAGE 1
.data: > DEPROM PAGE 1
}
2 - 22
2 - 23
Multiple Sections
Program Data
Memory 520 5$0 Memory
FRGH 65$0
¶&[
YDU
(3520
(3520 '(3520
YHFWRUV FRQVW
Named Sections
2 - 26
Adding Reset
sum.asm vectors.asm
GHI VWDUW UHI VWDUW
WH[W VHFW ³YHFWRUV´
VWDUW /' [$ % VWDUW
$'' \$
67/ $]
% VWDUW
GDWD
[ LQW
\ LQW
EVV ]
2 - 27
x
a .bss
y
start NOP
NOP
NOP .text
B start table 1 2 3 4
.data
8 6 4 2
FF80 B start
.vectors
2 - 29
Procedure
1. Create VECTORS.ASM
2. Copy LAB2B.ASM to LAB2D.ASM
Modify LAB2D to make start accessible
3. Assemble LAB2D and VECTORS
4. Copy LAB2C.CMD to LAB2D.CMD
Modify LAB2D.CMD to specify the desired input
and output files and the routing of the RESET vector
5. Link the system and inspect the .MAP file
6. Step through the code on the simulator to verify
performance.
2 - 30
2 - 31
MEMORY
{ Page 0:
SRAM: org = 0000h , len = 4000h
EPROM: org = 0E000h , len = _____
VECS: org = _____ , len = _____
Page 1:
SPRAM: org = 0060h , len = 0020h
DARAM: org = 0100h , len = 0400h
DEPROM: org = 8000h , len = 8000h
}
SECTIONS
{ .text: > EPROM PAGE 0
.data: > DEPROM PAGE 1
.bss: > SPRAM PAGE 1
_____: > _______________
}
2 - 34
2 - 35
BBBB GDWD
BBBB[ LQW
BBBB\ LQW
BBBB EVV ]
2 - 36
LAB2A.ASM : Solution
;; SOLUTION
SOLUTION FILE
FILE FOR
FOR LAB2A.ASM
LAB2A.ASM
NOP
NOP
start:
start:
start: NOP
NOP
NOP
NOP
BB start
start
2 - 37
; a = 0,1,2,3,4
; x = input array of length 5 a 0
1
; y = result 2
3
4
.data
a .int 0,1,2,3,4 x
.bss x,5
.bss y,1 y
2 - 38
LAB2B.ASM : Solution
.bss
bss
..bss x,4
x,4
.bss
bss
..bss a,4
a,4
.bss
bss
..bss y,1
y,1
.data
.data
.word
.word 1,2,3,4
1,2,3,4
.word
.word 8,6,4,2,0
8,6,4,2,0
.text
.text
NOP
NOP
start:
start: NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
BB start
start
2 - 39
MEMORY
{ PAGE 0: /* Program Memory */
SRAM : org = 0000h , len=4000h
EPROM : org = 8000h , len = 8000h
PAGE 1: /* Data Memory */
SPRAM : org = 0060h , len = 0020h
DARAM : org = 0080h , len = 1380h
DEPROM: org = 8000h , len = 8000h
}
SECTIONS
{ .text: > EPROM PAGE 0
.bss: > SPRAM PAGE 1
.data: > DEPROM PAGE 1
}
2 - 40
LAB2C.CMD : Solution
lab2b.obj
lab2b. obj
lab2b.obj .bss
bss
..bss data,4
data,4
-o
-o lab2c.out
lab2c.out ..
-m
-m lab2c.map
lab2c.map
MEMORY
MEMORY {{
PAGE
PAGE 0:
0: EPROM
EPROM :: org
org == 0E000h
0E000h len
len == 02000h
02000h
PAGE
PAGE 1:
1: SPRAM
SPRAM :: org
org == 00060h
00060h len
len == 00020h
00020h
DARAM : org = 00080h
DARAM : org = 00080h len
len == 01380h
01380h
DEPROM
DEPROM :: org
org == 08000h
08000h len
len == 08000h
08000h
}}
SECTIONS{
SECTIONS{
.text
.text :: >> EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DEPROM
DEPROM PAGE
PAGE 11
.bss
bss :: >>
..bss SPRAM
SPRAM PAGE
PAGE 11
}}
2 - 41
MEMORY {
Page 0:
SRAM: org = 0000h , len = 4000h
EPROM: org = E000h , len = 1F80h
VECS: org =0FF80h , len = 0080h
Page 1:
SPRAM: org = 0060h , len = 0020h
DARAM: org = 0100h , len = 0400h
DEPROM: org = 8000h , len = 8000h
}
SECTIONS {
.text: > EPROM PAGE 0
.data: > DEPROM PAGE 1
.bss: > SPRAM PAGE 1
.vectors:> VECS PAGE 0
}
2 - 42
.ref
.ref start
start ..def
def start
start
.sect ".vectors" .bss
bss
..bss x,4
x,4
.sect ".vectors"
bb start .bss
bss
..bss a,4
a,4
start
.bss
bss
..bss y,1
y,1
.data
.data
.word
.word 1,2,3,4
1,2,3,4
.word
.word 8,6,4,2,0
8,6,4,2,0
.text
.text
NOP
NOP
start:
start: NOP
NOP
NOP
NOP
NOP
NOP
NOP
NOP
BB start
start
2 - 44
LAB2D.CMD : Solution
lab2d.obj
lab2d. obj
lab2d.obj
vectors.obj
vectors. obj
vectors.obj
-o lab2d.out
-o lab2d.out
-m
-m lab2d.map
lab2d.map
MEMORY
MEMORY {{
PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS:
VECS: org
org == 0FF80h
0FF80h len
len == 00080h
00080h
PAGE
PAGE 1:
1: SPRAM:
SPRAM: org
org == 00060h
00060h len
len == 00020h
00020h
DARAM: org = 00080h
DARAM: org = 00080h len = 01380h
len = 01380h
DEPROM:
DEPROM: org
org == 08000h
08000h len
len == 08000h
08000h }}
SECTIONS{
SECTIONS{
.vectors:
.vectors: >> VECS
VECS PAGE
PAGE 00
.text :
.text : >> EPROM
EPROM PAGE
PAGE 00
.data :
.data : >> DEPROM
DEPROM PAGE 11
PAGE
.bss
bss :: >>
..bss SPRAM
SPRAM PAGEPAGE 11 }}
2 - 45
Learning Objectives
Learning Objectives
Module 3
Addressing Modes
Type Symbol Purpose, Benefit‘
Immediate # Using constants/initialization
Long 16-bit values
Short Single cycle
Indirect * Support for pointers - access arrays, lists, tables
w. Inc/Dec 0 cycle auto increment/decrement by +/- 1
w. Index 0 cycle auto increment by “n”
Direct <default> General-purpose access to data
Absolute - or - Access any location in data memory - ‘flat memory’
Paged @ Single-cycle access within boundary
SP-relative Optimal for stack-based values (C)
MMR Optimal for DP0 values (MMR and SPRAM)
Register Operate between Acc A and B
3-3
Immediate Addressing
u Long Immediate Example:
À Allows use of constant
À Up to 16-bit operand LD #1234h,A
u Short Immediate
À Available in limited cases Example:
À Init.
Init. Acc (8), DP (9),
Load A # 1 2
ASM (5), etc.
3-4
Indirect Addressing
u Hardware support of pointer concept
u Eight ARs (Address or Auxiliary Registers) available
u AR0 also used as (optional) index
u Allows fast, efficient access to arrays, lists, tables, etc.
Example
Data
100 .bss x,100
y = ∑ xn .text x x1
x2
AR1
n =1 STM #x,AR1 x3
LD *AR1 +,A .
ADD *AR1 +,A .
x100
ADD *AR1 +,A y
...
STL A,y
3-5
Indexed Addressing
u Add step size option to auto increment.
u AR0 holds step size.
u Mode selected by using *ARn+0
ARn+0 as *ARn-0
ARn-0..
u Pre-mod fixed index w. extra cycle: *+ARn(K)
*+ARn(K)
Example
Data
100 .bss x,200
y = ∑ x2 n .text AR1 x x2
x4
n =1 STM #x,AR1
x6
STM #2,AR0 .
ADD *AR1+ 0,A .
x200
ADD *AR1+ 0,A y
...
STL A,*(y)
3-6
3-8
Example
Data Memory
Addr Data
.data . .
. .
x: .word 1000h x: 01FF 1000
y: .word 0500h y: 0200 0500
. .
.text . .
LD *(x),A
Acc A 0 0 0 0 0 0 1 0 0 0
ADD *(y),A
0 0 0 0 0 0 1 5 0 0
3-9
3 - 11
.bss x,2,1
Data Memory
y .set x+1
Addr Data
.text
100
LD #x,DP 1FF ----
LD x,A 200 1000
201 0500
ADD y,A
Acc A DP
- - - - - - - - - - 0 0 4
0 1 0 0 0 0 0 4
0 1 5 0 0 0 0 4
3 - 12
3 - 13
Example
Data Memory
.text
SP
SSBX CPL Acc A 0100
LD 1,A 0 0 0 0 0 0 0 1 0 0 0050
ADD 2,A 0 0 0 0 0 0 0 1 5 0
Notes:
1. SP and DP relative direct are mutually exclusive!
exclusive!
2. Restore CPL = 0 (RSBX CPL) before using paged direct again.
3. CPL = 0 on reset.
3 - 14
3 - 15
3 - 16
Register Addressing
3 - 17
Exercise 3: Addressing
Address/Data (hex) Scratch Data1 Data2 B1
Assume: DP=0 60 20h DP=4 200 100h DP=6 300 100h
CPL=0 61 120h 201 60h 301 30h
CMPT=0 62 202 40h 302 60h
STM #2,AR0
STM #200h,AR1
STM #300h,AR2
LD 61h,A 120
120
ADD *AR1+,A
SUB 60h,A,B 200
ADD *AR1+,B,A 260
260
LD #6,DP
ADD 1,A
ADD *AR2+,A 390
390
SUB *AR2+,0,A
SUB #32,A
ADD *AR1-0,A,B 380
SUB
SUB *AR2-0,B,A
*AR2-0,B,A 320
320
STL
STL A,62h
A,62h
3 - 18
Lab 3: Addressing
P D
.bss AR(dst
AR(dst)
)
.text
AR(src
AR(src)
)
.data
vectors
ACC
Lab 3: Procedure
1. Copy LAB2D.ASM to LAB3.ASM.
LAB3.ASM. Modify LAB3 by
replacing the NOPs with code to copy the nine data
table values into the allocated RAM, as shown in the
diagram above.
2. Copy LAB2D.CMD to LAB3.CMD.
LAB3.CMD. Modify LAB3 as
required.
3. Assemble and link your code. Check the .LST and
.MAP files for expected results.
4. Step through the code on the simulator. Verify
performance; debug as necessary.
STM #2,AR0 2
STM #200h,AR1 200
STM #300h,AR2 300
300
LD 61h,A 120
120
ADD *AR1+,A 220
220 201
SUB 60h,A,B 200
ADD *AR1+,B,A 260
260 202
LD #6,DP 6
ADD 1,A 290
290
ADD *AR2+,A 390
390 301
301
SUB *AR2+,0,A 360
360 302
302
SUB #32,A 340
340
ADD *AR1-0,A,B 380 200
SUB
SUB *AR2-0,B,A
*AR2-0,B,A 320
320 300
300
STL
STL A,62h
A,62h
3 - 22
LAB3.ASM : Solution
;; LAB3.ASM:
LAB3.ASM: Data
Data Xfer
Xfer solution
solution LD
LD *AR1+,A
*AR1+,A ;4
;4
..def
def start,table,x
start,table,x STL
STL A,*AR2+
A,*AR2+
.bss
bss
..bss x,4
x,4 LD
LD *AR1+,A
*AR1+,A ;5
;5
.bss
bss
..bss a,4
a,4 STL
STL A,*AR2+
A,*AR2+
.bss
bss
..bss y,1
y,1 LD
LD *AR1+,A
*AR1+,A ;6
;6
.data
.data STL
STL A,*AR2+
A,*AR2+
table:
table: .word
.word 1,2,3,4
1,2,3,4 LD
LD *AR1+,A
*AR1+,A ;7
;7
.word
.word 8,6,4,2,0
8,6,4,2,0 STL
STL A,*AR2+
A,*AR2+
.text
.text LD
LD *AR1+,A
*AR1+,A ;8
;8
NOP
NOP STL
STL A,*AR2+
A,*AR2+
start:
start: STMSTM #table,AR1
#table,AR1 LD
LD *AR1+,A
*AR1+,A ;9
;9
STM
STM #x,AR2
#x,AR2 STL
STL A,*AR2+
A,*AR2+
LD
LD *AR1+,A
*AR1+,A ;1
;1 ;; Optional
Optional process
process solution
solution
STL
STL A,*AR2+
A,*AR2+ ..mmregs
mmregs
LD
LD *AR1+,A
*AR1+,A ;2
;2 ..bss
bss status,1
status,1
STL
STL A,*AR2+
A,*AR2+ ..def
def status
status
LD
LD *AR1+,A
*AR1+,A ;3
;3 option:
option:LDM
LDM ST0,A
ST0,A
STL
STL A,*AR2+
A,*AR2+ STL
STL A,*(status)
A,*(status)
done:
done: BB done
done
3 - 23
LAB3.CMD : Solution
lab3.obj
lab3.obj
lab3.obj
vectors.obj
vectors. obj
vectors.obj
-o
-o lab3.out
lab3.out
-m
-m lab3.map
lab3.map
MEMORY
MEMORY {{ PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS:
VECS: org
org == 0FF80h
0FF80h len
len == 00080h
00080h
PAGE 1: SPRAM: org = 00060h
PAGE 1: SPRAM: org = 00060h len
len == 00020h
00020h
DARAM:
DARAM: org
org == 00080h
00080h len
len == 01380h
01380h
}}
SECTIONS{
SECTIONS{
.vectors:
.vectors: >> VECS
VECS PAGE
PAGE 00
.text
.text :: >> EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DARAM
DARAM PAGE
PAGE 11
.bss
bss :: >>
..bss SPRAM
SPRAM PAGE
PAGE 11
}}
3 - 24
Learning Objectives
Learning Objectives
u Perform simple branch, loop control,
and subroutine operations.
u Set up and employ the stack for
subroutine call and return.
u Use the accumulator to load, store, add
and subtract 16-bit values from data
and program memory.
u Use the multiplier to implement sum-of
products equations.
4-2
Module 4
Basic Program Control
Instruction Cycles
B, CALL 4
RET 5
BACC, CALA 6
BC, CC, RC 5/3
4-3
Condition Operators
EQ NEQ OV TC C BIO
LEQ GEQ NOV NTC NC NBIO
LT GT
Examples
RC TC
CC sub,BNEQ
BC new,AGT,AOV
4-4
.bss x,5
STM #x,AR1
STM #4,AR2
LD #0,A
loop: ADD *AR1+,A
B ANZ loop ,*AR2-
STL A,y
4-5
Comparison: CMPR
For (n=5; n<10; n++)
STM #5,AR1
STM #10,AR0
loop: ...
...
*AR1+
...
...
CMPR LESS,AR1
BC loop,TC
4-6
The Stack
Setup:
Data
Memory
0 STACK .usect "STK",100
STM #STACK+100,SP
STACK
Open
SP Last Used STK
Used
Use:
CALL : PC → *--SP
64K RET : *SP++ → PC
4-7
4-8
4-9
Lab 4a
1. Modify VECTORS.ASM to allocate a stack, set up the SP, and call start
2. Copy LAB3.
LAB3.CMD to LAB4A.CMD
3. Modify LAB4A.CMD to route the stack to Data RAM
4. Link LAB3.OBJ with the modified VECTORS.OBJ to produce LAB4A.OUT
5. Simulate LAB4A.OUT to verify your results, especially the placement of a
return address on the stack
VECTORS.ASM LAB4A.CMD
MEMORY
.sect ".vectors"
{Page 1
B BEGIN
RAM: org=___,len
org=___,len=___
=___
;allocate stack[.usect
stack[.usect]
]
. . .
.text
. . .
BEGIN
}
;[setup SP]
SECTIONS
;[call START]
STACK: > RAM
4 - 10
Dual Accumulators
A B C T A B D S
ALU
A
B
MUX M
AG AH AL BG BH BL
39 - 32 31 - 16 15 - 0 39 - 32 31 - 16 15 - 0
4 - 11
Instruction Formats
Load acc-
acc-Lo with Smem
Load acc-Hi
acc-Hi with Smem
4 - 12
Load Accumulator: LD
LD _____, dst
Shift Type Data Memory Constant Accumulator
Low Acc Smem #k8
High Acc Smem,
Smem, 16 #K,16 !
T-reg
T-reg Value Smem,
Smem, TS src,
src, ASM
#K, [shft !
Fixed Value Xmem,
Xmem, [shft
[shft]] [shft]]
Extended Smem,
Smem, SHIFT ! src,
src, [SHIFT]
LEGEND
Smem:
Smem: single dat shft:
shft: 0<=S<=15 ASM: Acc.Shifter
Acc.Shifter K: 16-bit const.
const.
Xmem:
Xmem: ptr.data
ptr.data SHIFT: -16<=S<=15 TS: TREG(5-0) k8: 8-bit const.
const.
src,
src,dst:
dst: Acc.
Acc. A or B ! = 2-word size
4 - 13
4 - 14
MIN, MAX
4 - 15
LEGEND
Smem:
Smem: single dat shft:
shft: 0<=S<=15 ASM: Acc.Shifter
Acc.Shifter K: 16-bit const.
const.
Xmem:
Xmem: ptr.data
ptr.data SHIFT: -16<=S<=15 TS: TREG(5-0) k8: 8-bit const.
const.
src,
src,dst:
dst: Acc.
Acc. A or B ! = 2-word size
4 - 16
ST #K, Smem
4 - 17
MAC Unit
D
17 x 17 Multiplier :
T
- Sign / Unsigned support
T
D D P C A - 8000h x 8000h = 7FFFh
in SMUL=1 mode
17 x 17
MULTIPLIER A B 0
A B
4 - 18
Multiplier Instructions
OP Options Execution
LD Smem,
Smem, T T=S
MPY Smem,
Smem, dst dst = S . T
Xmem,
Xmem, Ymem,
Ymem, dst dst = X . Y
Smem, #K, dst !
Smem, dst = S . K
#K, dst ! dst = K . T
MAC Smem,
Smem, src src = src + S . T
Xmem,
Xmem, Ymem,
Ymem, src,
src, [dst
[dst]] dst = src + X . Y
Smem,
Smem, #K, src, [dst]] !
src, [dst dst = src + S . K
#K, src,
src, [dst]]!
[dst dst = src + K . T
MAS Smem,
Smem, src src = src - S . T
Xmem,
Xmem, Ymem,
Ymem, src,
src, [dst
[dst]] dst = src - X . Y
4 - 19
4 - 20
Examples
z=x+y-w y = mx + b y = x1 . a1 + x2 . a2
LD x,A LD m,T LD x1,T
ADD y,A MPY x,A MPY a1,B
SUB w,A ADD b,A LD x2,T
STL A,z STL A,y MAC a2,B
STL B,y
STH B,y+1
4 - 21
Program Data
Memory Memory
T AR1 1 2 3 4 RAM
Lab 3 X AR2 8 6 4 2
y
Lab 4
A LAB 3
Done
1 2 3 4 ROM
Vector 8 6 4 2
4 - 22
VECTOR4.ASM : Solution
;Solution
;Solution for
for VECTORS.ASM
VECTORS.ASM for
for LAB4A
LAB4A
.ref
.ref start
start
LEN
LEN .set
.set 100
100
STACK
STACK .usect
usect "STK",LEN
..usect "STK",LEN
.sect
.sect ".vectors"
".vectors"
BB BEGIN
BEGIN
.text
.text
BEGIN
BEGIN STM
STM #STACK+LEN,SP
#STACK+LEN,SP
call
call start
start
4 - 25
LAB4A.CMD : Solution
lab3.obj
lab3.obj
lab3.obj
vector4.obj
vector4. obj
vector4.obj
-o
-o lab4a.out
lab4a.out
-m
-m lab4a.map
lab4a.map
MEMORY
MEMORY {{ PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS
VECS : org == 0FF80h
: org 0FF80h len
len == 00080h
00080h
PAGE 1: SPRAM: org = 00060h
PAGE 1: SPRAM: org = 00060h len
len == 00020h
00020h
DARAM:
DARAM: orgorg == 00080h
00080h len
len == 01380h
01380h }}
SECTIONS{
SECTIONS{ .vectors :
.vectors : > > VECS
VECS PAGE
PAGE 00
.text
.text :: >> EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DARAM
DARAM PAGE
PAGE 11
..bss
bss :: >> SPRAM
SPRAM PAGE
PAGE 11
STK
STK :: >> DARAM
DARAM PAGE
PAGE 11 }}
4 - 26
LAB4B.ASM : Solution
.def
def
..def start,table,x
start,table,x sop:
sop: STM
STM #x,AR1
#x,AR1
.bss
bss
..bss x,4
x,4 STM
STM #a,AR2
#a,AR2
.bss
bss
..bss a,4
a,4 LD
LD *AR1+,T
*AR1+,T ;1
;1
.bss
bss
..bss y,1
y,1 MPY
MPY *AR2+,A
*AR2+,A
.text
.text LD
LD *AR1+,T
*AR1+,T ;2
;2
NOP
NOP MAC
MAC *AR2+,A
*AR2+,A
start:
start: STM
STM #table,AR1
#table,AR1 LD
LD *AR1+,T
*AR1+,T ;3
;3
STM
STM #x,AR2
#x,AR2 MAC
MAC *AR2+,A
*AR2+,A
STM
STM #8,AR7
#8,AR7 LD
LD *AR1,T
*AR1,T ;4
;4
loop:
loop: LD
LD *AR1+,A
*AR1+,A MAC
MAC *AR2,A
*AR2,A
STL
STL A,*AR2+
A,*AR2+ STL
STL A,*(y)
A,*(y)
BANZ
BANZ loop,*AR7-
loop,*AR7- RET
RET
CALL
CALL sop
sop .data
.data
CALL
CALL maxi
maxi table:
table: .word
.word 1,2,3,4
1,2,3,4
done:
done: BB done
done .word
.word 8,6,4,2,0
8,6,4,2,0
4 - 27
4 - 28
LAB4B.CMD : Solution
lab4b.obj
lab4b.obj
lab4b.obj
vector4.obj
vector4.obj
vector4.obj
-o
-o lab4b.out
lab4b.out
-m lab4b.map
-m lab4b.map
MEMORY
MEMORY {{ PAGE
PAGE 0:
0: EPROM:
EPROM: org
org == 0E000h
0E000h len
len == 01F80h
01F80h
VECS:
VECS: org = 0FF80h
org = 0FF80h len = 00080h
len = 00080h
PAGE
PAGE 1:
1: SPRAM:
SPRAM: org
org == 00060h
00060h len
len == 00020h
00020h
DARAM: org = 00080h
DARAM: org = 00080h len
len = 01380h }}
= 01380h
SECTIONS{
SECTIONS{ .vectors:
.vectors: >> VECS
VECS PAGE
PAGE 00
.text : >
.text : > EPROM
EPROM PAGE
PAGE 00
.data
.data :: >> DARAM
DARAM PAGE
PAGE 11
..bss
bss :: >> SPRAM
SPRAM PAGE
PAGE 11
STK
STK :: >> DARAM
DARAM PAGE
PAGE 11 }}
4 - 29
Learning Objectives
Learning Objectives
u Repeat Functions
u Data Move Functions
u Dual Operands (Xmem, Ymem)
u Long , Double, & Parallel Ops
5-2
Module 5
Repeat Next: RPT
u Features
À Next instruction iterated N+1 times
À Saves code space (1 or 2 words)
À Low overhead (1 or 2 cycles) Example :
À Easy to use int x[5]={0,0,0,0,0};
À Non-interruptible .bss x,5
STM #x,AR1
LD #0,A
RPT #4
u Options STL A,*AR1+
À RPT #k8 up to 256 iterations
À RPT #K up to 64K iterations
À RPT Smem ref. data mem for count value
5-3
5-4
Non-Repeatable Instructions
Generally, not operations useful to repeat; e.g.;
branches, status register ops, etc :
ANDM
ORM LD DP RSBX MVMM
XORM LD ASM SSBX CMPR
ADDM LD ARP RND DST
.bss x,5
STM #x,AR2
RPTZ B,#4
STL B,*AR2+
5-6
5-7
RPTB Example
Add 1 to each element in the array x[5]
.bss x,5
begin: LD #1,16,B
STM #4,BRC
STM #x,AR4
RPTB next-1
ADD *AR4,16,B,A
STH A,*AR4+ } Loop 5x
next: LD #0,B
…
…
5-8
Nested Loops
STM #L-1,AR7
Level Operator Cycles
1st: out
out 1 RPT 1
5-9
5 - 10
Data Move
5 - 11
Move Instructions
DATA ↔ DATA # w/c DATA ↔ MMR # w/c
MVDK Smem,
Smem, dmad 2/2 MVDM dmad,
dmad, MMR 2/2
MVKD dmad,
dmad, Smem 2/2 MVMD MMR, dmad 2/2
MVDD Xmem,
Xmem, Ymem 1/1 MVMM mmr,
mmr, mmr 1/1
LEGEND
Smem:
Smem: regular data memory address dmad:
dmad: 16-bit data memory address
Xmem,
Xmem,Ymem:
Ymem: dual operand data mems pmad:
pmad: 16-bit pgm.
pgm. memory address
MMR: any memory map register mmr:
mmr: AR0-AR7, or SP
5 - 13
5 - 14
DATA MEMORY
C BUS
D BUS
MAC A B
UNIT
5 - 15
Dual Op Caveats
u May use only AR2-AR5
u Requires less code space
u Executes more quickly
5 - 16
LD #0,B sop1:
sop1: LD
LD #0,B
#0,B
STM #a,AR2 STM
STM #a,AR2
#a,AR2
STM #x,AR3 STM
STM #x,AR3
#x,AR3
STM #19,BRC STM
STM #19,BRC
#19,BRC
RPTB done-1 RPTB
RPTB done-1
done-1
LD *AR2+,T MPY
MPY *AR2+,*AR3+,A
*AR2+,*AR3+,A
2
MPY *AR3+,A 3 ADD
ADD A,B
A,B
ADD A,B
done:
done: STM
STM B,y
B,y
done STH B,y STL
STL B,y+1
B,y+1
STL B,y+1
RPTZ A,19
MAC *AR2+,*AR3+,A
STH A,y
STL A,y+1
MPY Xmem,
Xmem,Ymem,
Ymem,dst dst = Xmem * Ymem
MAC Xmem,
Xmem,Ymem,
Ymem,src,[
src,[dst
dst]
] dst = src + Xmem * Ymem
MAS Xmem,
Xmem,Ymem,
Ymem,src,[
src,[dst
dst]
] dst = src - Xmem * Ymem
MACP Smem,
Smem,pmad,
pmad,src,[
src,[dst
dst]
] dst = src + Smem * pmad
LEGEND
Smem:
Smem: regular data memory address dmad:
dmad: 16-bit data memory address
Xmem,
Xmem,Ymem:
Ymem: dual operand data mems pmad:
pmad: 16-bit pgm.
pgm. memory address
src:
src: source accumulator dst:
dst: destination accumulator
5 - 19
Modifiers: BK + AR0
Since the only index offered is circular, regular
index is possible only if BK is set to 0, or made
very large, e.g., FFFFh.
FFFFh.
5 - 20
5 - 21
Notes
5 - 22
Words = 6 Words = 3
Cycles = 6 Cycles = 4
5 - 23
DLD Lmem,
Lmem, dst dst = Lmem
DST src,
src, Lmem Lmem = src
DADD Lmem,
Lmem, src,
src, [dst
[dst]
] dst = src + Lmem
DSUB Lmem,
Lmem, src,
src, [dst
[dst]
] dst = src - Lmem
DRSUB Lmem,
Lmem, src,
src, [dst
[dst]
] dst = Lmem - src
5 - 26
Parallel Operations
Example : Z = X + Y and F = D + E
Parallel Instructions
Instruction Example Operation
LD || MAC[R] LD Xmem,
Xmem,dst dst = Xmem << 16
LD || MAS[R] || MAC[R] Ymem,[dst2]
Ymem,[dst2] dst2 = dst2 + T * Ymem
ST || MPY ST src,
src,Ymem Ymem = src >> (16-ASM)
ST || MAC[R] || MAC[R] Xmem,
Xmem,dst dst = dst + T * Xmem
ST || MAS[R]
ST || ADD ST src,
src,Ymem Ymem = src >> (16-ASM)
ST || SUB || ADD Xmem,
Xmem,dst dst = dst + Xmem
ST || LD
ST || LD ST src,
src,Ymem Ymem = src >> (16-ASM)
|| LD Xmem,T
Xmem,T T = Xmem
5 - 28
5 - 29
Bus Usage
Instruction Activity PB CB DB EB
Program Read A,D
Program Write A D
Data Single Read A,D
Data Dual Read A,D A,D
Data Long (32-bit) Read A,D(ms) A1,D(ls
,D(ls))
Data Single Write A,D
Data Read / Data Write A,D A,D
Dual Read / Coefficient Read A,D A,D A,D
Peripheral Write A,D
Peripheral Read* A,D
**MMRs
MMRsonly
onlyaccessible
accessiblevia
viaDDBus,
Bus,MMR
MMRaccess
accessas
asYmem
Ymemop
opyields
yieldsbad
baddata!
data!
5 - 30
Module Review
5 - 31
.text .bss
.bss
start: …
MVPD tbl,*…
tbl,*… x ___ ___ ___ ___
… a ___ ___ ___ ___ RAM
MAC *,*
… y ___
ROM
.data
tbl:
tbl: .word 1,2,3,4
.word 8,6,4,2
.sect “.vectors”
B start
5 - 32
Lab 5: Procedure
1. Copy LAB4B.ASM to LAB5.ASM.
LAB5.ASM. Modify LAB5 to:
a. Perform initialization with a repeated MVPD.
MVPD.
b. Perform the sum-of-products with a repeated dual
operand MAC.
MAC.
2. Copy LAB4.CMD to LAB5.CMD.
LAB5.CMD. Modify LAB5.CMD to:
a. Load .data to program memory.
b. Input LAB5.OBJ and create LAB5.OUT and
LAB5.MAP.
LAB5.MAP.
3. Assemble, link, and simulate the program. Debug and
verify performance.
4. Optional: If time permits, modify LAB5.ASM to use
MACP. What effects would using MACP have on the
MACP.
system implementation?
5 - 33
Lab 5: Optional
Program Memory Data Memory
.text .bss
start: …
MVPD tbl,*+ x ___ ___ ___ ___ RAM
… y ___
MACP coeff,…
coeff,…
…
ROM
.data
tbl:
tbl: .word 1,2,3,4
a .word 8,6,4,2
.sect “.vectors”
B start
5 - 34
5 - 36
..bss
bss x,10
x,10
.bss x,20 ..bss
bss y,10
y,10
.bss y,20 ..bss
bss e,1
e,1
STM
STM #x,AR2
#x,AR2
STM #x,AR2 STM #y,AR3
STM #y,AR3
STM #y,AR3 STM
STM #e,AR4
#e,AR4
LD
LD #0,A
#0,A
STM
STM #10-1,BRC
#10-1,BRC
RPT #19 RPTB loop-1
RPTB loop-1
MVDD *AR2+,*
*AR2+,*AR3+
AR3+ MAC
MAC *AR2+,*
*AR2+,*AR4,A
AR4,A
*AR2+,*AR4,A
STL
STL A,*AR3+
A,*AR3+
loop:
loop:
5 - 37
Lab 5: Solution
.bss x,4
.bss a,4
.bss y,1
.data
tbl:
tbl: .word 1,2,3,4
.word 8,6,4,2,0
.text
start: STM #data,AR1
RPT #8
MVPD tbl,*AR1+
tbl,*AR1+
STM #data,AR2
STM #coeff,AR3
coeff,AR3
RPTZ A,3
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
STL A,*(result)
5 - 38
.bss x,4
.bss y,1
.data
tbl:
tbl: .word 1,2,3,4,0
a: .word 8,6,4,2
.text
STM #data,AR1
RPT #4
MVPD tbl,*AR1+
tbl,*AR1+
STM #data,AR1
RPTZ A,3
MACP *AR1+,coeff
*AR1+,coeff,A
,A
STL A,*(result)
Learning Objectives
Learning Objectives
6-2
Module 6
TMS320C54x DSP
Design Workshop
Module 6
Pipeline Issues
Pipeline Operation
P F D A R X
PREFETCH EXECUTE
PAB loaded with Execution of the
PC contents. instruction and EB
loaded with write data
FETCH READ
PB loaded by DB loaded by wrapper
wrapper manager with data1 if required.
manager. CB loaded by wrapper
manager with data2 if required.
EAB loaded with data3 write
DECODE address if required.
IR loaded with ACCESS
either PB content
or IQ content. IR DAB loaded with data1
content is decoded. read access if required.
CAB loaded with data2
read address if required.
Auxliary register update.
6-3
Pipe Flow
TIME
P1 F1 D1 A1 R1 X1
P2 F2 D2 A2 R2 X2
P3 F3 D3 A3 R3 X3
P4 F4 D4 A4 R4 X4
P5 F5 D5 A5 R5 X5
P6 F6 D6 A6 R6 X6
6-4
Standard vs.
vs. Delayed Branch: B & BD
B P1 F D! -- -- --
2 WORDS
addr.
addr. P2 F ADDR -- -- -- 4
CYCLES
P3 F3 FLUSH -- -- --
P4 FLUSH -- -- -- --
PA F D A R XA
BD P1 F1 D1 ! -- -- -- 2 WORDS
2 CYCLES
new P2 F2 NEW -- -- --
P3 F3 D3 A3 R3 X3 2 FINAL
CODE
P4 F4 D4 A4 R4 X4 WORDS
PN FN DN AN RN XN
6-5
LD x,A LD x,A
ADD y,A ADD y,A
MPY z,B BD next
STL A,r MPY z,B
B next STL A,r
6 words 6 words
8 cycles 6 cycles
6-6
Delayed Operations
BD CALLD BCD
BACCD CALAD CCD
RETD RCD
BANZD RETED
RPTBD RETFD
6-7
6-8
Conditional Execution: XC
u Allows fast choice of running one or two words of code or
substitution of NOPs.
NOPs.
u Condition evaluated early, so must be set two instructions
prior.
u Avoid change of condition in last two lines prior to XC, as
they can be recognized in event of interrupt prior to XC.
XC n,cnd,cnd,cnd
-pre- CMPR GRTR,AR1
-pre- -other-
CMPR GRTR,AR1 -other-
BC next,TC XC 1,NTC
LD *AR3+,A LD *AR3+,A
next: ABS A ABS A
STM #7,BRC
RPTBD next-1
MPY *AR2+,*
*AR2+,*AR3+,A
AR3+,A
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
next: STL A,y
STH A,y+1
6 - 11
Lab 6
6 - 12
Pipeline Cases
Average
AverageC54x
C54xSystem
SystemCode
Code Analysis:
u 99% of ’C54x code
1 requires no special
30% CCCode
30% Code 70%Assembly
70% AssemblyCode
Code attention.
No
NoProblem
Problem u Latency requirements
are resolved via a table.
2
65%CALU
65% CALUOperations
Operations 5% MMR
5% MMRWrites
Writes
No
NoProblem
Problem
4 3
2% Early
2% EarlyWrite
Write 2% Protected
2% ProtectedMMR
MMRWrite
Write 1% Regular
1% RegularMMR
MMRWrite
Write
No
NoProblem
Problem Use
UseKey
Key
5 6
1.9% Usual
1.9% UsualCase
Case 0.1% Prior
0.1% PriorReg
RegMMR
MMRWrite
Write
No
NoProblem
Problem Add
Add11Cycle
Cycle
6 - 13
6 - 14
6 - 15
6 - 16
Pipeline Events
6 - 17
DARAM Events
Read/write instructions: P D E
6 - 18
Early write held off to allow dual access to operate w/o delay.
6 - 19
6 - 20
5% MMR
5% MMRWrites
Writes
4 3
2% Early
2% EarlyWrite
Write 2% Protected
2% Protected MMR
MMRWrite
Write 1% Regular
1% RegularMMR
MMROp
Op
No
NoProblem
Problem Use
UseKey
Key
5 6
1.9%Usual
1.9% UsualCase
Case 0.1% Prior
0.1% PriorReg
RegMMR
MMROp
Op
No
NoProblem
Problem Add
Add11Cycle
Cycle
6 - 21
ST0 DP
6 - 23
Instr.
Instr. 0 writes to a control field.
P0 F0 D0 A0 R0 X0 1 word for effect.
Affect on these stages ready now
1 1
2 X2 A,B,T,SXM,ASM,OVM,FRCT,C16
3 R3 ARn,
ARn, SP(0)
4 A4 SP(1), BK, DP, CPL, DROM
example:
example: D5
5
SSBX
SSBX SXM
SXM F6
6
NOP
NOP OVLY, MP/MC-, IPTR, &
LD x,B P7
LD x,B BRC, RSA, REA, BRAF
6 - 24
6 - 25
begin: …
SSBX SXM
…
Any pipeline read
…
more than 6 words
CALL MAIN removed from the
write is immune.
main: …
LD x,A
6 - 26
6 - 29
6 - 30
Other Latencies
Ctl Field Latency 2 Latency 3 Latency 4 Latency 5 Latency 6
Latency Caveats
6 - 32
Exercise 6-2a
1. Determine if latency condition exists.
2. Note why.
3. Add appropriate number of NOPs to correct.
6 - 33
Exercise 6-2b
LD VAR1,A LD data,B
6 - 34
6 - 36
6 - 37
VECTOR6.ASM : Solution
.ref start
LEN .set 100
STACK .usect
.usect "STK",LEN
.sect ".vectors"
BD start
STM #STACK+LEN,SP
6 - 38
Learning Objectives
Learning Objectives
uu Identify
Identify&
&resolve
resolveissues
issuesfor:
for:
ÀÀ Multiplication
Multiplication
ÀÀ Addition
Addition//Subtraction
Subtraction
ÀÀ Division
Division
u Select the appropriate numerical models
À Integer vs.
vs. Fraction
À Signed vs.
vs. Unsigned math
À Rounding vs.
vs. Truncation
À Overflow vs.
vs. Carry
À Fixed point vs.
vs. Floating point
u List mnemonics to perform
À Extended precision math
À Boolean Operations
7-2
Module 7
Integer Multiplication
u Integer multiplication yields products larger than the inputs, as can
be seen in the example below, using single digit decimal values as
inputs:
9 value
x 9 times value
8 1 yields double size result
u Does the user store the lower (1) or upper (8) result?
u Both must be kept, resulting in additional resources (two cycles,
words of code, and RAM locations) to complete the store.
u Worse, how can the double-sized result be used recursively as
an input in later calculations, given that the multiplier inputs
are single-width?
7-3
Fractional Multiplication
u Multiplication of fractions yields products that never exceed the
range of a fraction, as can be seen in the example below, using
single digit decimal fractions as inputs:
. 9 value
x . 9 times value
. 8 1 yields double size result
7-5
Notes
7-6
Fractional Example
u The following example demonstrates how two’s
complement fractions perform under multiplication.
u The 4/8 bit model shown here behaves identically to
the 16/32 bit TMS320 device
0100 u What values do the inputs represent?
x 1101
u What is the result?
0100
0000
u What should be stored to memory?
7-9
7 - 10
Accumulation
7 - 11
Guard Bits
u Guard Bits: the ‘C54x offers an 8-bit extension above
the high accumulator to allow valid representation of
the result of up to 256 summations.
AG AH AL
39 31 15 0
BG BH BL
Saturation (SAT)
u SAT instruction saturates value exceeding 32-bit range in
the selected accumulator:
SAT A -or- SAT B
u Provides single-cycle ‘clipping’ function:
Before saturating After saturating
256
1
0
-1
-256
À Values not overflowed are unchanged
À Positive overflows are set to : 00 7FFF FFFF h
À Negative overflows are set to : FF 8000 0000 h
u Is automatic on store if SST=1 (LP devices)
7 - 13
Non-gain Systems
u Many systems can be modeled to have no DC gain:
À Filters with low Q.
À Any systems scaled by its’ maximum gain value.
u Input values from A/D converters are automatically
fractions, if the limits of the A/D are presumed to be +/- 1.
u Coefficient values can similarly bounded by making the
largest value the scaling factor for all other values.
u For these systems, it is known that the final value of the
process is less than or equal to the input values.
u The accumulator therefore can be allowed to temporarily
overflow, since the final result is known to be bounded by
+/- 1.
u Allows maximum usage of selected A/D and D/A
converters
À D/A bits for gain are more expensive than using analog
components
7 - 15
Number Circle
7FF0h
+ 100h = 80F0h
Overflowed Intermediate Results
+ 10h = 8100h
- 200h = 7F00h Valid Final Result
~1 –1
~1 7FFFh 7FFFh 8000h
OV
+½ –½
0 4000h C000h
Fractional Representation
~1 32K 7FFFh
½ 16K 4000h
0 ⇒ 0 0000
* 32768
–½ –16K C000h
–1 –32K 8000h
Fractions Integers Hex
7 - 18
7 - 19
.text
start: …
MVPD tbl,*…
tbl,*… .bss
.bss
…
MAC *,* x ___ ___ ___ ___
… a ___ ___ ___ ___ RAM
.data y ___
ROM tbl:
tbl: .word 0.1 , 0.2
.word 0.3 , 0.4
.word 0.8 , 0.6
.word 0.4 , 0.2
.vectors
B start
7 - 20
LAB 7 : Procedure
1. Copy LAB5.ASM to LAB7.ASM.
LAB7.ASM. Modify LAB7 to:
a. Use the fractional data table shown above
b. Perform fractional multiplication
What status bits will be important for this routine to
perform correctly?
2. Copy LAB5.CMD to LAB7.CMD.
LAB7.CMD. Modify LAB7.CMD to
input LAB7.OBJ and create LAB7.OUT and LAB7.MAP.
LAB7.MAP.
3. Assemble, link, and simulate the program. Debug and verify
performance. What answer did you get?
4. To better view the result on the simulator, try:
WA *(y)/327,y = 0 .,d
7 - 21
Division
u The ‘C54x does not have a single cycle 16-bit divide instruction
À Divide is a rare function in DSP
À Division hardware is expensive
u The ‘C54x does have a single cycle 1-bit divide instruction: conditional
subtract or SUBC
À Preceded by RPT #15, #15, a 16-bit divide is performed
À Is much faster than without SUBC
u The SUBC process operates only on unsigned operands, thus software
must:
À Compare the signs of the input operands
À If they are alike, plan a positive quotient
À If they differ, plan to negate (NEG
(NEG)) the quotient
À Strip the signs of the inputs
À Perform the unsigned division
À Attach the proper sign based on the comparison of the inputs
7 - 22
Division Routine
LD @den,16,A
MPYA @num B = num*den
num*den (tells sign)
ABS A Strip sign of numerator
STH A,@den
LD @num,A
num,A
ABS A Strip sign of denominator
7 - 23
Learning Objectives
u Identify & resolve issues for:
À Multiplication
À Addition / Subtraction
À Division
u Select the appropriate numerical models
À Integer vs.
vs. Fraction
À Signed vs.
vs. Unsigned math
À Rounding vs.
vs. Truncation
À Overflow vs.
vs. Carry
À Fixed point vs.
vs. Floating point
u List mnemonics to perform
À Extended precision math
À Boolean Operations
7 - 24
Rounding
u Result of multiplication can be rounded for MPY,
MPY, MAC
and MAS operations. This is specified by appending
the instruction with an "R
"R" suffix.
À Example: MAC with rounding is MACR.
À Rounding consists of adding 215 to the result and then
clearing the low accumulator.
u In a long sum-of-products, only the last MAC
operation should specify rounding:
RPTZ A,#98
MAC *AR2+,*
*AR2+,*AR3+,A
AR3+,A
MACR *AR2+,*
*AR2+,*AR3+,A
AR3+,A
u Rounding can also be achieved with a load operation:
LDR Smem,
Smem,dst
7 - 25
SXM=1
Before After
C G ACC C G ACC
X 00 E F 1 3 6 4 8 C X FF F F F 7 9 4 0 0
SXM=0
Before After
C G ACC C G ACC
X 00 E F 1 3 6 4 8 C X 00 0 0 F 7 9 4 0 0
7 - 26
7 - 27
Sign-suppressed Math
ADDS Smem,
Smem,src src = src + u(Smem
u(Smem))
SUBS Smem,
Smem,src src = src - u(Smem
u(Smem))
Load unsigned
LDU Smem,
Smem,dst dst = u(Smem
u(Smem))
7 - 28
Long Multiplication
X1 X0 S U
X Y1 Y0 S U
XO * Y0 U*U
Y1 * X0 S*U
X1 * Y0 S*U
Y1 * X1 S*S
W3 W2 W1 W0 S U U U
MACSU Xmem,
Xmem,Ymem,
Ymem,src src = src + u(Smem
u(Smem)*
)*Ymem
Ymem
MPYU Smem,
Smem,dst dst = u(TREG)*u(Smem
u(TREG)*u(Smem))
7 - 30
Exponent Encoder
u One cycle exponent ( [ -8, +31 ] range) computation
u Result in T register as 2’s complement value
EXPONENT
EXPONENT
ENCODER
ENCODER
A B
6
T ALU
exp A ; 1 cycle for exp
norm A ; 1 cycle normalize
-8 0 16 31
7 - 32
e1 m1 e m1
e2 m2 m2
e3 m3 m3
LD e1,T LD e,T
LD m1,T,A LD m1,T,A
LD e2,T ADD m2,T,A
ADD m2,T,A ADD m3,T,A
LD e3,T …
ADD m3,T,A
7 - 33
7 - 34
Learning Objectives
u Identify & resolve issues for :
À Multiplication
À Addition / Subtraction
À Division
u Select the appropriate numerical models :
À Integer vs.
vs. Fraction
À Signed vs.
vs. Unsigned math
À Rounding vs.
vs. Truncation
À Overflow vs.
vs. Carry
À Fixed point vs.
vs. Floating point
u List mnemonics to perform :
À Extended precision math
À Boolean Operations
7 - 35
BIT Xmem,bit
Xmem,bit TC=Xmem
TC=Xmem(15-bit)
(15-bit)
BITT Smem TC=Smem
TC=Smem(15-T(3-0))
(15-T(3-0))
bit
mem 15 n 0
7 - 36
Boolean Operations
7 - 37
Sx 39 32 31 0 C
SFTL src,SHIFT,[
src,SHIFT,[dst
dst]
] C - 00 - 31 0 0
0 - 00 - 31 0 C
ROLTC src C - 00 - 31 0 TC
ROL src C - 00 - 31 0
ROR src C - 00 - 31 0
7 - 38
Shifter Hardware
40 16
A D Bus
40 16
B C Bus
Sign Control SXM
C T(5-0)
(-16, +31) Range
Barrel Shifter ASM(4-0)
(-16, +31) (-16, +15) Range
TC Constant
40 32 (-16, +15) Range
To ALU or
MSW/LSW (0, +15) Range
Write Select
16
E Bus
7 - 39
ABS src,[
src,[dst
dst]
] dst = |src
|src||
NEG src,[
src,[dst
dst]
] dst = -src
-src
CMPL src,[dst] dst = src
7 - 40
1. How are bits tested on the 54x? What’s unusual about it?
7 - 41
LAB7.ASM : Solution
.def
def
..def start,table,y
start,table,y start:
start: STM
STM #x,AR2
#x,AR2
.bss
bss
..bss x,4
x,4 RPT
RPT #8
#8
.bss
bss
..bss a,4
a,4 MVPD
MVPD table,*AR2+
table,*AR2+
.bss
bss
..bss y,1
y,1 CALL
CALL sop
sop
.data
.data done:
done: BB done
done
table:
table: .word
.word 32768*1/10
32768*1/10 sop:
sop: STM
STM #x,AR2
#x,AR2
.word
.word 32768*2/10
32768*2/10 STM
STM #a,AR3
#a,AR3
.word
.word 32768*3/10
32768*3/10 RSBX
RSBX OVM
OVM
.word
.word 32768*4/10
32768*4/10 SSBX
SSBX SXM
SXM
.word
.word 32768*8/10
32768*8/10 SSBX
SSBX FRCT
FRCT
.word
.word 32768*6/10
32768*6/10 RPTZ
RPTZ A,#3
A,#3
.word
.word 32768*4/10
32768*4/10 MAC
MAC
*AR2+,*
*AR2+,*AR3+,A
AR3+,A
*AR2+,*AR3+,A
.word
.word 32768*2/10
32768*2/10
STH
STH A,*(y)
A,*(y)
.text
.text
RET
RET
NOP
NOP
7 - 43
Learning Objectives
Objectives
8-2
Module 8
Finite Impulse Response (FIR) Filter
Circular Buffer or Linear Buffer
X0 X1 X2
x in zz–1–1 zz–1–1
LD x2, T
a0 × a1 × a2 ×
MAC a2, A
y out
PORTR
PORTR PA,Smem
PA,Smem
PA,Smem PA
PA Smem
Smem
PORTW
PORTW Smem,PA
Smem,PA
Smem,PA PA
PA Smem
Smem
8-4
JUN
JUN PORTR JUL
JUN
JUN PORTR AUG
JUL
JUL
MAY
MAY JUN
JUN JUL
JUL
APR
APR MAY
MAY JUN
JUN
MAR
MAR APR
APR MAY
MAY
FEB
FEB MAR
MAR APR
APR
*ARn- JAN
JAN *ARn- FEB
FEB *ARn- MAR
MAR
DELAY
DELAY *AR2-
*AR2-
start JUN
JUN ARn JUN
JUN JUN
JUN
MAY
MAY MAY
MAY MAY
MAY
APR
APR APR
APR APR
APR
MAR
MAR MAR
MAR MAR
MAR
FEB
FEB FEB
FEB AUG
AUG ARn
end JAN
JAN JUL
JUL ARn JUL
JUL
8-6
Circular
(ARn) Index A ... A x ... x Buffer
Range
Element N-1
End of Buffer + 1 A ... A BK
8-7
LINK.CMD
SECTION
SECTION
{{
D_LINE:
D_LINE: align(64) {{ }} >> RAM
RAM PAGE
PAGE 11
.. .. ..
}}
8-8
8-9
FIR Filter
X0 X1 X2 X3 X4
x in zz–1–1 zz–1–1 zz–1–1 zz–1–1
a0 × a1 × a2 × a3 × a4 ×
+ + + + y out
8 - 10
8 - 12
B0
zz–1–1
-A1 B1
× X1 ×
zz–1–1
-A2 B2
× X2 ×
x(n) + + y(n)
zz–1–1 zz–1–1
b 11 a 11
+ +
zz–1–1 zz–1–1
b 12 a 12
8 - 19
u Input scaling
u Optimal Topology
8 - 20
d(n) w(n)
x(n) + + + + y(n)
zz–1–1 zz–1–1
a 11 b 11 a 21 b 21
+ + + +
zz–1–1 zz–1–1
a 12 b 12 a 22 b 22
8 - 21
B0
–(A1)/2
– A1 zz–1–1
B1
×
X1 ×
×
zz–1–1
–(A1)/2
× X2 ×
–A2 B2
8 - 22
Input Scaling
1 7
8 - 23
x(n) + + + + + + y(n)
uFIR:
uAll zero implementation
uUnconditionally stable
uLinear Phase possible
uBest for phase encoded data
uIIR
uPole & zero implementation
uStable if no errors made
uMuch better frequency performance
uBest for frequency discrimination
8 - 25
zz–1–1 A = 1.975
A B = –1.000
y(0) = 0.000
× Y1
y(1) = 0.1400
zz–1–1 y(2) = ?
B
× Y2
Notes: Y0 is the current output based on the two prior outputs, Y1 and Y2.
Initial conditions y(0) and y(1) are given, so the ‘54x will begin processing at t=2.
Since location Y0 is not an input value, results can be directly written to Y1.
8 - 26
Lab 8 : Procedure
u Create a new assembly file to:
1. Allocate RAM for coefficients and delay line
2. Establish a ROM table for coefficients and intial conditions
3. Initialize ROM into RAM
4. Initialize processor modes
5. Write code to implement signal flow diagram in infinite loop
6. Build reset vector
u Assemble the program
u Link the program using an appropriate linker command file
u Run the program on the simulator through 40 loops
u Exit the simulator and view your results by typing: PLOT OUT.DAT
u Verify the results with the instructor
u Time permitting, consider optimizing your code
8 - 27
Lab 8: Equations
8 - 28
.bss
.bss a,4,1 Alloc RAM in 1 page
b .set a+1
y1 .set a+2
y2 .set a+3
8 - 30
8 - 31
SINE LD y2,T T = y2
MPY b,A A = b*y2
LTD y1 T = y1 , y1 -> y2
MAC a,A A = (a*y2)/2 + b*y2
MAC a,A A = a*y2 + b*y2
STH A,y1 y0 -> y1
PORTW y1,0000 write to out.dat
out.dat file
B SINE loop ...
.include VECTOR6.SSM
8 - 32
SIMINIT.CMD
8 - 33
Learning Objectives
Learning Objectives
Module 9
Advanced Applications
9-4
FIRS Implementation
1. Split the data into two parts; New and Old.
2. Set up circular buffers for each part. Set up the pointers for the buffers to the
newest of “New” and the oldest of “Old”. Set up a coeffient table.
New Old Coefficients
x(5)
x(9) A/D x(4) Higher a0
addresses
x(6) x(3) a1
AR2 AR3
x(7) x(2) a2
x(8) x(5)
x(1) a3
BH = a0( x(8)+x(1) ) + a1( x(7)+x(2) ) + a2( x(6)+x(3) ) + a3( x(5)+x(4) )
3. Sum the first two data points into the high A accumulator (AH) and
decrement the data pointers.
4. Zero the B accumulator and repeat the following four times:
a. Multiply AH times the coefficient, accumulate the result into the high B
accumulator (BH) and increment the coefficient pointer.
b. Sum the next two data points and decrement the data pointers.
5. Store the result (BH) & set data pointers to oldest “Old” and oldest “New”.
6. Replace oldest “Old” value with oldest “New” value. Dec. “Old” pointer.
7. Replace oldest “New” value with a new input datum and go to step 3.
9-5
Architecture - FIRS
MAC ALU
A P C D
MPY ALU
B
ADD MUX
Acc B Acc A
9-7
Advanced Applications
d(n)
x(n) H(z) + e(n)
+
-
W(z) y(n)
9-9
+
y(n) -
+ +
e(n)
LMS
9 - 10
LMS Loading
Each Iteration ( only once )
1 - determine error : e(i) = d(i) - y(i)
2 - scale by “rate” term B : e´(i) = 2*B*e(i)
Each Term ( N sets )
3 - Qualify error with signal strength : e´´(i) = x(i-k) * e´(i)
4 - Sum error with coefficient : b(i+1) = b(i) + e´´(i)
5 - Update coefficient : b(i) = b(i+1)
Analysis :
LMS: 1 1 SUB
2 1 MPY
3 N MPY ST
4 N ADD || MPY
5 N STH
FIR a N MPY ADD
LMS
MAC
b N ADD
c 1 STH
@ 100 tap: 500+ cycles @ 100 tap: 200+ cycles
9 - 11
LMS Instruction
LMS Xmem,
Xmem, Ymem (Xmem)) << 16 + 215
;A += (Xmem
;B += (Xmem
(Xmem)) * (Ymem
(Ymem))
Before instruction LMS
LMS *AR3+,
*AR3+, *AR4+
*AR4+ After instruction
A 00 1111 2222 00 1111 2222h A 00 2111 A222
B 00 1000 0000 + 00 1000 0000h B 00 1200 0000
FRCT 0 + 8000h FRCT 0
AR3 0100 00 2111 A222h AR3 0101
AR4 0200 AR4 0201
Data memory 00 1000 0000h Data memory
0100h 1000 + 1000h * 2000h 0100h 1000
0200h 2000 00 1200 0000h 0200h 2000
The LMS instruction adapts the coefficient
and performs the MAC for the filtering in the same cycle.
Storing the coefficient will require 1 additional cycle.
9 - 12
LD
LD B2e,
B2e, TT T holds the error step amount
LD #0,B Zero out B
LD #0,B Load Branch Repeat Counter
STM
STM #N-2,
#N-2, BRC
BRC Start RPTB, next two are delay slots
RPTBD
RPTBD End-1
End-1 A = error * oldest sample
MPY
MPY *Data
*Data +0%,
+0%, AA
LMS *Coeffs B += a(n)*x(n) ... filter tap
LMS *Coeffs ,, *Data
*Data A += (a(n) << 16)+215 ... coeff. update
Store updated coefficient
ST
ST A,
A, *Coeffs+
*Coeffs+ and form A = x(n-1)*2Beta*e(n)
||
|| MPY
MPY *Data+0%,
*Data+0%, AA B = accumulated filter output
LMS
LMS *Coeffs,
*Coeffs, *Data
*Data A = updated filter coefficients
End
End STH
STH A,
A, *Coeffs
*Coeffs Store the final updated coefficient
STH
STH B,
B, *Result
*Result Store final filter result
9 - 13
Architecture - LMS
MAC : FIR ALU : LMS
D C A D
MPY ALU
B
ADD MUX
Acc B Acc A
9 - 14
Advanced Applications
Polynomial Evaluation
The general form of a 3rd order polynomial equation can be written as:
9 - 16
POLY Operation
1. Set up a pointer to the coefficients.
2. Load x into the T register.
3. Load a3 into the high A accumulator (AH). Decrement pointer.
4. Load a2 into the high B accumulator (BH). Decrement pointer.
5. Repeat the following three times:
a. Multiply AH times T, accumulate with BH and round in AH.
b. Load the next coefficient into BH. Decrement pointer.
6. Store AH as result.
BG BH BL T Coefficients
a012
?x x a3
a2
AG AH AL aa11
ARn
a3
P(x) a00
?11
P(x) = AH = [(a3x+ a2) x+a1] x+a0 ?2
9 - 17
Polynomial Evaluation
SSBX FRCT
POLY operation is affected by these bits
SSBX OVM
SSBX SXM
LD *AR4+,T T=X(0)
LD *AR3+,16,A A=A(order)=PX init
LD *AR3+,16,B B=A(order-1)
RPT #2 3 times
POLY *AR3+ A=PX=Rnd
A=PX=Rnd(B+A*T)
(B+A*T) B=An<<16
Note:
Note: The POLY instruction “expects” Q15 numbers!
9 - 18
9 - 10 DSP54x - Algorithms
Module 9
Architecture - POLY
MAC ALU
A T
T D
MPY ALU
B
ADD MUX
Acc A Acc B
POLY *AR3+
9 - 19
Advanced Applications
DSP54x - Algorithms 9 - 11
Module 9
9 - 21
Mean-Square Error
9 - 22
9 - 12 DSP54x - Algorithms
Module 9
ci 2 c opt2
< or ci 2 * Gopt < c opt2 * Gi
Gi Gopt
9 - 23
DSP54x - Algorithms 9 - 13
Module 9
STRCD Xmem,
Xmem, cond
Xmem = T if condition is true
SRCCD Xmem,
Xmem, cond
Xmem = BRC if condition is true
SACCD src,
src, Xmem,
Xmem, cond
Xmem = src << (ASM - 16) if condition is true
9 - 25
Advanced Applications
9 - 14 DSP54x - Algorithms
Module 9
Data Transmission
Fading
Multipath
XMIT RCV
Noise
Modulate Demodulate
Viterbi Encoder
G0
+
Bits
Input
bits Z-1 Z-1 Z-1 Z-1
G1
+
Bits
u N bits are fed into network.
u M (>N) bits flow out (... G0 G1 G0 G1 ...)
u e.g. 3 in : 4 out, 4 in : 8 out, etc.
u recognizable “holes” are created in data path, e.g.:
3 in : 4 out 4 in : 8 out
valid codes: 23 = 8 valid codes: 24 = 16
total codes: 24 = 16 total codes: 28 = 256
“holes” = 8 “holes” = 240
DSP54x - Algorithms 9 - 15
Module 9
state 0 0 state
n 1 1 n+1
Viterbi Decoder
+M AH,AL J
2*J
-M
Old state New state
-M
2*J+1
+M BH,BL J+8
9 - 16 DSP54x - Algorithms
Module 9
AR5 0
Metrics Old states
2*J & 2*J+1 15
AR4 16
Metrics J
AR3 24 New states
Metrics J + 8
31
Relative location
9 - 31
Viterbi Instructions
CMPS src,
src, Smem
IF { [ src (31-16) ] > [ src (15-0) ] }
THEN : ELSE :
(src(31-16)) Ð Smem
(src(31-16)) (src(31-16)) Ð Smem
(src(31-16))
0 Ð TC 1 Ð TC
(TRN << 1 ) + 0 Ð TRN (TRN << 1 ) + 1 Ð TRN
9 - 32
DSP54x - Algorithms 9 - 17
Module 9
u Dual 16-bit
T ALU operations
u T register input
ALU as dual
AH AL BH BL 16-bit operand
32 u 16-bit transition
C16=1 ALU shift register
(TRN)
MSB/LSB
u One cycle store
WRITE Max and Shift
SELECT
CSS UNIT
decision
COMP
16
TRN
TC EB [15:0]
9 - 33
B += | AH |
AH = Xmem - Ymem
B += AH2
AH = Xmem - Ymem
9 - 34
9 - 18 DSP54x - Algorithms
Module 9
Review
u What instruction is used to perform adaptive filtering?
u What instructions are used to perform Viterbi decoding?
u What features of the C54x architecture allow the FIRS
instruction to execute in a single cycle?
What might slow it down?
u What features of the C54x architecture allow the Viterbi
operations to execute so quickly?
u What mnemonic is used for solving polynomials?
What concept allows this to run so quickly?
9 - 35
DSP54x - Algorithms 9 - 19
Module 9
9 - 37
A microphone was used to collect the resulting echoes in a 10’ x 15’ room.
The echo signal z(I) was sampled in the same manner as the reference signal.
The echo signal is stored in a file called ECHO.DAT.
ECHO.DAT. Two thousand samples
or .25 seconds of data is stored in the files. LAB9A.ASM uses REF.DAT and
ECHO.DAT as inputs to the LMS filter. When the program is run, the
resulting error signal is stored in ERROR.DAT . The LMS filter length is set
for 16 taps. For a sampling rate of 8 kHz, a 16 tap filter can cancel up to
16/8000 = 2 msec of echo delay.
9 - 38
9 - 20 DSP54x - Algorithms
Module 9
Receiver Channel
Comfort noise
78 class 2 bits
RF
reception
Equalization,
132 class 1b bits
slot dissembly,
dissembly,
RPE-LPC de-encryption
s’(t) speech decoder Bit Test parity and GMSK
input 260 bits 260 Reordering 50 check bits, 53 Viterbi decode 378
de-interleaving modulation
output 260 samples bits bits discard block bits 378 class 1 bits bits performed in
if check fails RF codec
GSM stands for Global Standard for Mobile communications. It is the digital
cellular standard used in Europe and throughout the world.
9 - 39
9 - 40
DSP54x - Algorithms 9 - 21
Module 9
LAB9B .. continued
u Encode the input data by running to the first break point. Break points were
set by the take file. The encoded data is in the MEMORY1 window. Simulate
transmission errors by making changes to the encoded data. Valid changes
are between -7 and 7. Note: the encoded data is in signed antipodal format,
the format that the GSM equalizer output would be in.
u Run the Viterbi decoder and compare the input data array to the output
data array. Is the output correct?
9 - 41
The real reduction in bit rate comes from further analyzing the
excitation signal. The difference between current and previous excitation
signals is found by using Long Term Predictive analysis (LTP). The LTP
algorithm searches all of the previous sequences (15 mS of history) for the
sequence that has the highest correlation to the current sequence. The
difference is transmitted along with a pointer to the sequence that should be
selected for use. The 160 samples are reduced to 260 bits. The resulting bit
rate is 13 Kbits / sec.
9 - 42
9 - 22 DSP54x - Algorithms
Module 9
9 - 43
Additional Resources
1.1.S.
S.M.
M.Redl,
Redl, M.
Redl, M.K.K.Weber,
Weber,M.
M.W.
W.Oliphant,
Oliphant, “An
Oliphant, “AnIntroduction
Introductionto
toGSM”,
GSM”,
Artech House, 1995.
Artech House, 1995.
2.2.H.
H.Hendrix,
Hendrix
Hendrix,, “A
“ABrief
BriefTutorial
Tutorialon
onGSM
GSMDecoding
DecodingTechniques”,
Techniques”,
TI Internal paper, 1995.
TI Internal paper, 1995.
3.3.H.
H.Hendrix,
Hendrix
Hendrix,, “Viterbi
“ViterbiDecoding
DecodingTechniques
Techniqueson
onthe
theTMS320C54x
TMS320C54xFamily”,
Family”,
TI Application Report, 1995.
TI Application Report, 1995.
9 - 44
DSP54x - Algorithms 9 - 23
Module 9
9 - 24 DSP54x - Algorithms
Interrupts
Learning Objectives
Objectives
10 - 2
DSP54x - Interrupts 10 - 1
10 - 2 DSP54x - Interrupts
Module 10
Module 10
Hardware Reset Actions
u All control signals are driven inactive high
u Address lines are driven to FF80h
u Data bus is driven to high impedance state
u Interrupts are disabled : 1 → INTM
u Prior interrupts are purged : 0 → IFR
u The repeat counter (RC) is cleared
u IACK- is driven low
u An internal reset is sent to the peripherals.
u Seven CLKOUT cycles after RS- is released
the processor will fetch from 0FF80h
10 - 3
10 - 4
DSP54x - Interrupts 10 - 3
Module 10
Interrupt Locations
Interrupt Offset (Hex) Description
RS 0 Reset
NMI 4 Nonmaskable Interrupt
SINT17-30 8-3C Software Interrupt 17-30
INT0 40 External User Interrupt #0
INT1 44 External User Interrupt #1
INT2 48 External User Interrupt #2
TINT 4C Internal Timer Interrupt
RINT0 50 Serial Port 0 Receive Interrupt
XINT0 54 Serial Port 0 Transmit Interrupt
RINT1 58 Serial Port 1 Receive Interrupt
XINT1 5C Serial Port 1 Transmit Interrupt
INT3 60 External User Interrupt #3
64-7F Reserved
10 - 5
Interrupt Management
15 – 9 8 7 6 5 4 3 2 1 0
IFR Reserved
Reserved INT3
INT3 XINT1
XINT1RINT1
RINT1XINT0
XINT0RINT0
RINT0 TINT
TINT INT2
INT2 INT1
INT1 INT0
INT0
15 – 9 8 7 6 5 4 3 2 1 0
IMR Reserved
Reserved INT3
INT3 XINT1
XINT1RINT1
RINT1XINT0
XINT0RINT0
RINT0 TINT
TINT INT2
INT2 INT1
INT1 INT0
INT0
11
ST1 INTM
INTM
10 - 4 DSP54x - Interrupts
Module 10
Recognition of Interrupts
2
INT
INThigh
high22cycles
cycles
INT
INTlow
low33cycles
cycles
3
IFR
IFRBit
BitLatched
Latched
IMR
IMRBit
Bit==1?
1?
INTM
INTMBit
Bit==0?
0?
Interrupt
InterruptBegins
Begins
10 - 7
CPU
CPU Action
Action Description
Description
11→
→INTM
INTM Disable
Disableglobal
globalinterrupts
interrupts
PC→
PC → ----*(SP)
*(SP) Push
PushPC
PConto
ontopredecremented
predecrementedstack
stack
Vector(n)→
Vector(n) →PC
PC Load
LoadPC
PCwith
withint.
int. vector
int. vector“n”
“n”address
address
00→
→IACK
IACKpin
pin IACK
IACKsignal
signalgoes
goeslow
low
00→
→IFR
IFR(n)
(n) Clear
Clearcorresponding
correspondinginterrupt
interruptflag
flagbit
bit
10 - 8
DSP54x - Interrupts 10 - 5
Module 10
IACK Decoder
‘138
‘C54x (3 – 8 DeMux)
DeMux)
Addr 6 5 4 3 2 1 0 A5 C
A3 B Y0 IACK0
INT0 1 0 0 0 0 0 0
A2 A
INT1 1 0 0 0 1 0 0 Y1 IACK1
INT2 1 0 0 1 0 0 0 Y2 IACK2
INT3 1 1 0 0 0 0 0 A6 G1 Y4 IACK3
IACK - G2 -
Instruction
Instruction Description
Description
Push
PushMMRMMRonto
ontoStack
Stack
PSHM
PSHM mmr
mmr - 1 →
SP - 1 → SP
SP SP
Pop
Popfrom
fromStack
Stackto
toMMR
MMR
POPM
POPM mmr
mmr + 1 →
SP + 1 → SP
SP SP
Push
PushData
Datamemory
memoryvalue
valueonto
ontoStack
Stack
PSHD
PSHD Smem
Smem - 1 →
SP - 1 → SP
SP SP
Pop
Poptop
topof
ofStack
Stackto
toData
Datamemory
memory
POPD
POPD Smem
Smem + 1 →
SP + 1 → SP
SP SP
Modify
ModifyStack
StackPointer
Pointer
FRAME
FRAME K
K SP
SP++K K→→SP SP
10 - 10
10 - 6 DSP54x - Interrupts
Module 10
Context Save
.ref ISR1
.sect “.vectors”
... ...
INT1: BD ISR1
PSHM ST0
PSHM ST1
.mmregs
.def ISR1
.text
ISR1: PSHM AL
PSHM AH
PSHM AG
PSHM AR1
PSHM IMR
PSHM PMST
; ISR FOLLOWS...
10 - 11
Context Restore
; ISR CONCLUDES...
; Context Restore:
POPM PMST
POPM IMR
POPM AR1
POPM AG
POPM AH
POPM AL
POPM ST1
POPM ST0
RETF
10 - 12
DSP54x - Interrupts 10 - 7
Module 10
Return Instructions
Instruction
Instruction Actions
Actions Cycles
Cycles
RET
RET 55
RET[D]
RET[D] *(SP)---- →
*(SP) → PC
PC RETD 33
RETD
RETE[D] *(SP)---- →
*(SP) → PC
PC RETE
RETE 55
RETE[D] 00 →→ INTM - RETED
INTM - RETED 33
RETF →
RETF → PC
PC RETF 33
00 → RETF
RETF[D]
RETF[D] → INTM
INTM-- RETFD 11
*(SP) RETFD
*(SP)----
10 - 13
Nested Interrupts
; Nestable ISR . . .
10 - 14
10 - 8 DSP54x - Interrupts
Module 10
.loop 40h-$
RETE
.endloop
IV1: BD ISR1
PSHM ST0
PSHM ST1
IV2: BD ISR2
PSHM ST0
PSHM ST1
...
10 - 15
10 - 16
DSP54x - Interrupts 10 - 9
Module 10
NMI Interrupt
10 - 17
Ints_Off
Ints_Off
ISRs
Vectors
When about to return from NMI:
10 - 18
10 - 10 DSP54x - Interrupts
Module 10
VecN:
VecN: BD IsrN Jump to ISR in 2 words
SSBX TC Set flag (non Interruptible)
NOP Room for 1 more...
IsrN: …
…
RETED Return from ISR in 2 cycles
RSBX TC Clear flag
NOP Last word...
Fast Interrupts
Allows 3-cycle ISR, e.g.:
RINT0: NOP
RETFD
MVKD DRR0,*AR7+%
10 - 20
DSP54x - Interrupts 10 - 11
Module 10
RESET Instruction
10 - 22
10 - 12 DSP54x - Interrupts
Module 10
CLKOFF
Reserved
Reserved
MP/MC-
CLKOFF
Reserved
MP/MC-
Reserved
DROM
OVLY
DROM
AVIS
OVLY
AVIS
PMST 11 11 11 11 11 11 11 11 11
IPTR
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Interrupt
Vector 11 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00
Address
Reset
00 00 00 00 00
10 - 23
CLKOUT PSC
PSC (4)
(4) TIM
TIM (16)
(16) TINT
TDDR
TDDR (4)
(4) PRD
PRD (16)
(16)
1
TINT rate =
TCLK1 x ( TDDR+1 ) x ( PRD+1 )
10 - 24
DSP54x - Interrupts 10 - 13
Module 10
TRB
TSS
Reserved
Reserved PSC
PSC TDDR
TDDR
10 - 25
MODIFY OUT.DAT
LAB9.CMD
10 - 26
10 - 14 DSP54x - Interrupts
Module 10
Lab Procedure
10 - 27
Review
10 - 28
DSP54x - Interrupts 10 - 15
Module 10
10 - 16 DSP54x - Interrupts
Hardware Interfacing
Learning Objectives
Objectives
11 - 2
Module 11
Interfacing Memory and Peripherals
TMS320C54x
PGM
ADDRESS A (8–15)
16 DATA
PS CS1 (0–7)
CS2 OE GND
A DATA
MSTRB CS2 DATA
D
R/W WE GND
DS CS1 OE
A I/O
CS1 GND
IS OE
WE DATA
IOSTRB CS2
16
DATA
DATA
11 - 3
Read Timing
0 cycle time 25 ns
1
address # Cycle Time 25 15
1 Setup Address 5 3
3 2 2 Data Valid 5 3
data 3 Memory Speed 15 9
1
MSTRB -
Notes: 1. Address timing also includes the PS, DS, IS and MSTRB signals.
2. All times are in nanoseconds.
3. H = one-half CLOCKOUT1 cycle time.
4. MSTRB stays low across reads
11 - 4
Bit 11: If PS-DS is set, add 1 wait state when access changes
between PS and DS.
11 - 6
Memory: 4
Cost vs Speed
11 - 7
strobe
— A D —
address
data
strobe
11 - 9
— A D —
address
data
strobe
11 - 10
Cycle Time→
CycleTime → 25
25 20
20
11 Address valid to MSTRB (min)
Address valid to MSTRB (min) 2H-5 = 20 15
2H-5 = 20 15
22 Data
Datavalid
validbefore
before MSTRB
MSTRB(min)
(min)(setup
(setuptime)
time) 2H-5
2H-5==2020 15
15
33 Data
Data valid after MSTRB (min) (holdtime)
valid after MSTRB (min) (hold time) H-5 = 8
H-5 = 8 55
Notes: 1. All times are in nanoseconds.
2. H = one-half CLOCKOUT1 cycle time.
11 - 11
11 - 12
IO Memory Timing
0 12 25 37 50
CLK OUT
5
A(15-0) , R/W-
27 5
Read
7 27 5
Write
5 5 2
IO STRB -
11 - 14
RR I/O
I/O Hi
HiData
Data Low
LowData
Data Hi
HiProg
Prog Low
LowProg
Prog
Lo Prog Lo Data
I/O Mem
Hi Prog Hi Data
11 - 15
11 - 16
TMS320C54x A Lo Pgm
25 nS PS - 15
DATA
ADDRESS MSTRB - OR
16 CS
A15 -
PS
MSTRB 15 nS SRAM
A Hi Pgm
MSC PS - 15
DATA
MSTRB - OR CS D
CLOCKOUT1
A15 +
READY FF - OR
200nS EPROM
DATA
DATA
11 - 17
11 - 18
Review Questions
11 - 19
CS1 PROG
PROG
STM #________, SWWSR OE 8K EPROM
8K EPROM D
CS2
A 70ns
70ns
TMS320C54x-40
16 CS1 DATA
DATA
ADDRESS OE
PS WE 8K
8K SRAM
SRAM D
DS CS2 15
15ns
ns
IS A
MSTRB
CS1 11ADC
ADC
R/ W
OE &
&
MP/ MC 11DAC D
WE DAC
IOSTRB
CS2 120
DATA A 120ns
ns
16
11 - 20
Learning Objectives
Learning Objectives
DSP54x - Ports 12 - 1
12 - 2 DSP54x - Ports
Module 12
Module 12
TMS32054x Ports
u
u Standard
Standard Serial
Serial Port
Port
12 - 3
Clock
Frame
Data
12 - 4
DSP54x - Ports 12 - 3
Module 12
CLKX CLKR
FSX FSR
54x DX DR 54x
CLKR CLKX
#1 #2
FSR FSX
DR DX
12 - 5
Data Bus
RSR XSR
12 - 6
12 - 4 DSP54x - Ports
Module 12
CLKX
FSX
DX D7 D6 D5 D4 D3 D2 D1 D0 E7
XINT
D È DXR E È DXR
D È XSR E È XSR
12 - 7
CLKX
FSX
DX C0 D7 D6 D5 D4 D3 D2 D1 D0 E7 E6 E5 E4
XINT
12 - 8
DSP54x - Ports 12 - 5
Module 12
Data 1 2 3
Burst
Continuous
FSM = 1 : Burst
FSM = 0 : Continuous
12 - 9
12 - 6 DSP54x - Ports
Module 12
DSP54x - Ports 12 - 7
Module 12
TMS32054x Ports
12 - 14
12 - 8 DSP54x - Ports
Module 12
ABU
000 800
R-Ping
BKR ARR
11 R-Pong RINT
X-Ping
BKX AXR
11 X-Pong XINT
800 1000
SP CPU
DR RINT FFFF
DRR RCV-ISR
XINT
DX DXR XMT-ISR DBUS
12 - 15
12 - 16
DSP54x - Ports 12 - 9
Module 12
12 - 17
12 - 10 DSP54x - Ports
Module 12
RINT .usect
cntr LD #0000h
“SPRAM”,
, DP
1 Work on one
Allocate MMRs SPRAM
.text
BIT 1 , BSPCE Extract RH: ping or pong?
...
ANDM ...
#0000h , *(cntr) Do (atcounter
Clear least) two
location
other words...
XC
... 2
... , TC If in ping,
Other code...
reset AR7 to top of ping
RINT LD
STM #0000h
#0800h , DP
AR7 Config.
Work onSPCE
MMRs (ABU on)
...
BIT ...
1 , BSPCE Process RH:
Extract current
pingarray
or pong?
...
XC
CMPM 2
#count , TC
cntr If in
Is counter
ping, at
reset
nextAR7
to last
to top
iteration?
of ping
STM
XC 2
#0800h , TC
AR7 Config.
If in ping,
SPCE
reset(ABU
AR7 toon)top of ping
ADDM
STM #0001h ,
#0800h cntr
AR7 Increment
Config. SPCE
counter
(ABU on)
CMPM
ADDM #count ,
#0001h cntr Is counter at
Increment counter
next to last iteration?
XC 2 , TC If so, tell ABU to
ORM #8000h , BSPCE Stop after next RCV of next array
... ... Process current array . . .
12 - 19
ABU Caveats
u BKX, BKR range : min = 2, max = 2047 (not
(not 2048)
DSP54x - Ports 12 - 11
Module 12
Input : 1A 2B 3A 4B 5A 6B
Process : 1A 2B 3A 4B 5A
Output : 1A 2B 3A 4B ...
12 - 21
TMS32054x Ports
12 - 22
12 - 12 DSP54x - Ports
Module 12
5 2
4 3
Device
Device00 Device
Device11 ... Device
Device77
TCLK
TFRM
TDAT
TADD
TCLKX TCLK
TCLKR
TFSX TFRM
TMS320C54x
TDX TDAT
TDR
TFSR TADD
12 - 24
DSP54x - Ports 12 - 13
Module 12
TDM Signals
TCLK ...
TFRM ...
TDAT ...
bit 1 7 bit 0 7 bit 15 0 bit 14 0 bit 13 0 bit 12 0
TADD ...
a7 a6 a5 a4
12 - 14 DSP54x - Ports
Module 12
TMS32054x Ports
12 - 27
HPI Concept
FFFh
12 - 28
DSP54x - Ports 12 - 15
Module 12
12 - 29
HPIC Register
15 - 8 7-4 3 2 1 0
Copy of 7:0 0000 HINT DSPINT SMODE BOB
Both Host DSP Host
12 - 16 DSP54x - Ports
Module 12
HPI Process
INFO 01 B R DD HPIC HIPA DatL 1234 1235 1236 1237 1238
Ctrl. 0 0 0 0 00 0000 XXXX XXXX 0102 0304 0506 0708 090A
00 1 0 00 0000 XXXX XXXX 0102 0304 0506 0708 090A
Addr.
Addr. 0 1 0 0 12 0000 12XX XXXX 0102 0304 0506 0708 090A
01 1 0 34 0000 1234 0102 0102 0304 0506 0708 090A
W:D1. 1 1 0 0 AA 0000 1234 AA02 0102 0304 0506 0708 090A
11 1 0 BB 0000 1234 AABB AABB 0304 0506 0708 090A
W:+D2 1 0 0 0 CC 0000 1235 CCBB AABB 0304 0506 0708 090A
10 1 0 DD 0000 1235 CCDD AABB CCDD 0506 0708 090A
W:+D3 1 0 0 0 EE 0000 1236 EEDD AABB CCDD 0506 0708 090A
10 1 0 FF 0000 1236 EEFF AABB CCDD EEFF 0708 090A
Addr.
Addr. 0 1 0 0 12 0000 1237 0708 AABB CCDD EEFF 0708 090A
01 1 0 37 0000 1237 0708 AABB CCDD EEFF 0708 090A
R:D4+ 1 0 0 1 07 0000 1237 0708 AABB 0304 0506 0708 090A
10 1 1 08 0000 1237 0708 AABB CCDD 0506 0708 090A
R:D5+ 1 0 0 1 09 0000 1238 090A AABB CCDD 0506 0708 090A
10 1 1 0A 0000 1238 090A AABB CCDD EEFF 0708 090A
12 - 31
12 - 32
DSP54x - Ports 12 - 17
Module 12
PC7-0 HD7-0
PF2-0 PF2
HCNTRL0
PF1
HCNTRL1
PF0
HBIL
R/W - HRW-
VCC HAS-
CSIO2 HCS-
E HDS1-
HDS2-
NC HRDY
IRQ - HINT-
12 - 33
P0.7:0.0 HD7-0
P0.3
HCNTRL0
P0.2
HCNTRL1
P0.1
HBIL
P0.0
HPI HRW-
SELECT
ALE LOGIC HCS-
HAS-
P3.7/ RD- HDS1-
P3.6/ WR- HDS2-
N/C HRDY
P3.2/ INT0- HINT-
12 - 18 DSP54x - Ports
System Considerations
Learning Objectives
Learning Objectives
Boot loader
Clock options
Power management
Program security
JTAG emulation
Memory interfacing
Multiprocessor issues
13 - 2
Module 13
Boot Loader
13 - 3
Boot Sequence
13 - 4
The lower 8 bits of the word read from this port address specify the mode of
transfer.
13 - 5
Test
Yes Begin execution at
Begin Initialize INT2: HPI
mode? HPIRAM
No
Read Boot Routine Selection (BRS) word from I/0 address 0FFFFh
BRS
=
?????
HPI boot
The first step of the boot loader is to check if Host Port Interface (HPI)
boot option is selected.
In order to do that, HINT is asserted low. In HPI mode, this pin is normally
tied to INT2.
If INT2 and HINT are tied together, INT2’s bit in the Interrupt Flag Register
(IFR) will be set. The bootloader waits 20 CLOCKOUT cycles after asserting
HINT and then reads IFR bit #2
13 - 7
HPI boot
Alternative methods:
If it’s inconvenient to tie INT2 and HINT together, the following methods
will work.
Send a valid interrupt to the INT2 input pin within 30 CLOCKOUT cycles
after DSP fetches the reset vector.
or ...
Use the warm boot option described later in this section. This method is
preferred.
13 - 8
Serial boot
k = 0, standard serial port
15 87 43 0
k = 1, TDM serial port
At address 0FFFFh XXXXXXXX XXkm
XXkm 0n00 n = 0, 8 bit
n = 1, 16 bit
m = 0, CLKX, FSX output
m = 1, CLKX, FSX input
The ‘541 serial boot option can use either the buffered serial port (BSP) ot the
time-division multiplexed (TDM) serial port in standard mode during booting.
13 - 9
Code Yes
length Branch to DA
0?
No Start executing code
Decrement code length
13 - 10
13 - 11
13 - 12
Warm Boot
15 87 21 0
At address 0FFFFh XXXXXXXX ADDR 11 ADDR = 6 bit page address
13 - 13
Clock Options
CLKMD1
The Phase Locked Loop (PLL) mode is determined
CLKMD2
at start-up by the input states on three pins:
CLKMD3
These pins should not be reconfigured during normal operation.
PLL options for ‘541, ‘2 ,’3 ,’4, ‘5 and ‘6
CLKMD1 CLKMD2 CLKMD3 Option 1+ Option 2+
0 0 0 PLL x 3, ext.
ext. osc.
osc. PLL x 5, ext.
ext. osc.
osc.
1 1 0 PLL x 2, ext.
ext. osc.
osc. PLL x 4, ext.
ext. osc.
osc.
1 0 0 PLL x 3, int.
int. osc.
osc. PLL x 5, int.
int. osc.
osc.
0 1 0 PLL x 1.5, ext.
ext. osc.
osc. PLL x 4.5, ext.
ext. osc.
osc.
0 0 1 Divide by 2, ext.
ext. osc.
osc. Divide by 2, ext.
ext. osc.
osc.
0 1 1 Stop mode* Stop mode*
1 0 1 PLL x 1, ext.
ext. osc.
osc. PLL x 1, ext.
ext. osc.
osc.
1 1 1 Divide by 2, int.
int. osc.
osc. Divide by 2, int.
int. osc.
osc.
+ You can select your device with either option 1 or 2, but not both.
* PLL is disabled. System clock is not provided to CPU / peripherals.
13 - 14
13 - 15
13 - 16
Power Management
13 - 17
13 - 18
All C54x devices can disable the internal clock of external interfaces using
CLKOUT, which will place the interface into a lower power consumption mode.
* Condition at Reset
13 - 21
Program Security
13 - 22
JTAG Emulation
IEEE 1149.1
JTAG Test Bus
Vcc
Header 4.7K ‘C54x
PD EMU0 EMU0
EMU1 EMU1 Header to device
GND TRST TRST lengths greater
GND TMS TMS than 6 inches
GND TDI TDI require extra circuitry
GND TDO TDO and attention
GND TCK TCK to noise.
TCK_RET TCK_RET
GND 6 inches or more
13 - 23
Multiprocessor Issues
Major how-to’s
how-to’s,, signals involved, etc.
13 - 24
Program space
0x0000
External program space
0xF800
Boot loader
0xFC00
µ-law table
0xFD00
Α-law table
0xFE00
Sine lookup table
0xFF00
Built-in self test
0xFF80
Vector table
13 - 25
Run=, Load=
linker protocol
see sheet1
Load =eprom
Run=SRAM
problems with linker linking symbols
549 features
power up issues
13 - 27
Level Shifting
3.3 - 5v
13 - 28
13 - 29
Learning Objectives
Learning Objectives
u Invoke the compiler or shell program
À Options and Switches
À The RTS library
À The Optimizer
u Write code in C
À Numerical Types supported
À Accessing MMRs and IO Ports
À Inlining C and ASM functions
À Interrupt service routines
À Optimization tips
u Use the C support files :
À C.CMD : Linker file issues when using C
À BOOT.ASM Pre-main initialization process
u Intermix assembly files within the C environment
À Stack Model
À Register Usage
À Argument passing and result return
14 - 2
Module 14
Compiler Tool Flow
FILE.C
Code Generator
FILE.ASM
Assembler : ASM500
Shell Program :
CL500 -z Linker : LNK500 FILE.OUT
FILE.OBJ
14 - 3
14 - 5
14 - 6
Writing Code in C
u Invoke the compiler or shell program
À Options and Switches
À The RTS library
À The Optimizer
uu Write
Writecode
codein
inCC
ÀÀ Numerical
NumericalTypes
Typessupported
supported
ÀÀ Accessing
Accessing MMRs and
MMRs andIO
IOPorts
Ports
ÀÀ Inlining C and ASM functions
Inlining C and ASM functions
ÀÀ Interrupt
Interruptservice
serviceroutines
routines
ÀÀ Optimization
Optimizationtips
tips
u Use the C support files :
À C.CMD : Linker file issues when using C
À BOOT.ASM Pre-main initialization process
u Intermix assembly files within the C environment
À Stack Model
À Register Usage
À Argument passing and result return
14 - 7
Inline Assembly
u Allows direct access to assembly language from C
u Useful for operating on components not used by C, ex:
u Create a pointer and set its value to the assigned memory address :
volatile unsigned int *SPC_REG = (volatile unsigned int *) 0x0022;
u Volatile modifier :
14 - 10
Interrupts in C
14 - 11
Writing ISRs in C
int x[100] ; u Global variables allow
int *p = x ; sharing of data between
main functions & ISR
main { … } ; u Keyword
u Name of ISR function
interrupt void name(void)
{
static int y = 0 ; u Void input and return values
y += 1 ; u Locals are lost across calls
if y < 100 Statics persist across calls
*p++ = port0001;
else
u ISRs should not include calls
asm(“ intr 17 “); u Return is with enable (RETE)
} u Avoid -e or -oe
-oe options
14 - 12
Initializing Interrupts in C
Setup pointers to IMR & IFR. Initialize IMR, IFR, INTM :
volatile unsigned int *IMR = (volatile unsigned int *) 0x0000;
volatile unsigned int *IFR = (volatile unsigned int *) 0x0001;
*IFR = 0xFFFF;
*IMR = 0xFFFF;
asm(“
asm(“ RSBX INTM “);
Numerical Types in C
xxxx xxxx xxxx xxxx 16-bit int
* yyyy yyyy yyyy yyyy 16-bit int
zzzz
zzz zzzz zzzz zzzz zzzz
z zzzz zzzz zzzz 32-bit product
z=((long)(x)*((long)(y))>>15; z = x * y;
zz(Q15)
(Q15) zz(Q0)
(Q0)
u short, char,
short, char, etc, all occupy full 16-bit memories
u no byte-addressing/packing on ‘54x
u float operations supported via rts.lib
rts.lib
u float math is multicycle
14 - 14
The Optimizer
14 - 15
14 - 16
General Optimizations
u Algebraic re-ordering
example : (a+b) - (c+d) = 6 cycles
becomes : (((a+b)-c)-d) = 4 cycles
u Constant folding
example : a = (b+4) - (c+1)
becomes : a=b-c+3
u Symbolic simplification
u Alias Disambiguation
When only one pointer accesses a given memory
array, compiler may allow registers to hold values
14 - 17
Data-flow Optimizations
u Copy propagation
Following assignment to a variable, references to the variable
are replaced with the value
u Common sub-expression elimination
If two (or more) equations perform the same sub-action,
the value is saved after the first and recalled later
u Redundant Assignment Elimination
Drop assignments not used in later equations
example (int
(int j)
{ int a = 3; 3 assigned to a & propigated down; a elim’d
int b = (j*a) + (j*2); becomes (j*5)
int c = (j<<a); dead var:
var: replaced with expression
int d = (j>>3) + (j<<b); assignment unused - eliminated
call (a,b,c);
}
14 - 18
14 - 19
Loop Optimizations
u Loop induction variables “LIVs
“LIVs”
” are, for example, the “i” in ‘for i=…
u Process of making LIV op’s more efficient is called strength reduction, eg:
eg:
Inlining C Functions
call fn inline fn
Library of dual function types
call fn
Benefit - Faster: inline fn
no branch
no return
no clear of parent fn Fn ... inline fn
no setup of sub-fn
sub-fn ret
merging of fn’s with optimizer
14 - 21
Optimization Steps
u Optimize : Use -o, -mn -mn when compiling
u Use #define instead of variables for parameters
u Globals may be faster than locals
u Minimize mixing signed & unsigned integers
u Inline short/key functions : compile with -x
u Declare function as inline
u Automatically invoked for short routines within file
u Inlines can be passed between files via header
u Give compiler project visibility
u #include sub-files within main
u Optimizer will operate over all files allowing better
inlining,
inlining, register tracking, etc.
u Tune memory map via C.CMD
u Re-write key code segments in assembly
u Bulletin Board, App notes, 3rd Parties
u S/W Cooperative, Hand written
14 - 22
Optimization Process
Write
Write&&debug
debugin
inC,
C,benchmark
benchmark
Real-time
Real-time Y
goal
goalmet
met??
N
Perform
Perform C & C.CMDOptimizations
C & C.CMD Optimizations
Real-time
Real-time Y
•
goal
goalmet
met??
N
Profile.
Profile.Convert
ConvertKey
KeyFunctions
Functionsto
toASM
ASM
Real-time
Real-time Y
goal Done
Done
goalmet
met??
N
14 - 23
14 - 24
C Support Files
u Invoke the compiler or shell program
À Options and Switches
À The RTS library
À The Optimizer
u Write code in C
À Numerical Types supported
À Accessing MMRs and IO Ports
À Inlining C and ASM functions
À Interrupt service routines
À Optimization tips
u Use the
Use the C
C support
support files
files ::
À C.CMD :: Linker
C.CMD Linker file
file issues
issues when
when using
using C
C
À BOOT.ASM Pre-main
BOOT.ASM Pre-main initialization
initialization process
process
u Intermix assembly files within the C environment
À Stack Model
À Register Usage
À Argument passing and result return
14 - 25
Components of C.CMD
file1.obj Files : list here or pass via shell
vectors.obj Must be written in asm,
asm, listed here
-c Boot.asm
Boot.asm is included
-o test.out Output file name
-m test.map Map file name
-i c:\filepath Paths to search
-l rts.lib Libraries to search - last on list
-stack 400h Override stack size
-heap 200h Override heap size
MEMORY
{P or D, RAM or ROM, F,M or S} Pgm,Data,Fast,
(Pgm,Data,Fast,Med
Med,Slow)
,Slow)
SECTIONS
{ .vectors:> P ROM M Vector table
.text :> P ROM F Code
.cinit :> P ROM S Init table for global/statics
global/statics
.const :> D ROM M Constants - several options here
.switch :> P ROM M Case statement arrays
.bss :> D RAM M Globals and statics
.stack :> D RAM F Stack allocation
.sysmem :> D RAM M } Heap allocation
14 - 26
14 - 27
.ref _c_int00
FF80: B _c_int00
nop
nop
_main ...
14 - 29
file1.obj
file1.obj /* can access rts.lib
rts.lib */
-l rts.lib
rts.lib /* run-time support library */
file2.obj
file2.obj /* won’t access rts.lib
rts.lib */
14 - 30
14 - 31
14 - 32
%5$)&3/;)+0,1702906;0&)5&7&037$60
ST1
14 - 35
14 - 36