Escolar Documentos
Profissional Documentos
Cultura Documentos
SOURCE
DESTINATION
STACK
BASE
CODE
DATA
EXTRA
STACK
REGISTERS
<<<<<<<<A
B
SI
B
B
DI
B
<<<<<<<<W
B
SP
B
B
BP
B
<<<<<<<<W
B
CS
B
B
DS
B
B
ES
B
B
SS
B
P<<<<<<<<Q
INDEX
POINTER
SEGMENT
The SI and DI registers are used in string compare, load, scan, and store
instructions. The program counter (PC) register, often called the instruction
pointer register (IP) cannot be directly changed except in response to a call or
jump instruction; it is used by the microprocessor for program execution, not by
the program itself, so it does not concern us.
The segment registers, in combination with the index or pointer registers, are
used in calculating memory addresses. Whereas we have viewed the spatial
organization of data in memory up until now only in terms of an individual BYTE
or WORD, we now need to look at the larger-scale organization of groups of BYTEs
or WORDs in memory, and take an initial look at how these individual elements
can be manipulated within this context.
As an example of just why we need to know the order in which BYTEs or WORDs
are stored in memory, consider the case of the PASCAL data type, the LONGINT,
which is a 32-bit numeric value:
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
high WORD
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
low WORD
From our point of view, numerically, the most significant bit is at the left
end and the least significant bit is at the right. This is as it should be, for
us. However, when this is stored in memory, bits 0 through 7 of this double
WORD will be stored at the lowest address, with bits 8-15, bits 16-23, and bits
24-31 following it successively in the next three bytes of memory.
It is at this point that you should complain, "Why should we try to write all
of this out in binary? Isn't there some easier way?"
OK. There is. Using base sixteen, representing data in hexadecimal, vastly
simplifies both our notation and the effort to understand it, in a great many
cases. Let's introduce a polynomial in powers of sixteen to illustrate how the
number 2434941d (the 'd' is for decimal) is expressed in hexadecimal:
7
a y +
7
6
a y +
6
5
a y +
5
4
a y +
4
3
a y +
3
2
a y +
2
1
a y +
1
0
a y
0
2434941
Writing this as a polynomial with y = 16, and grouping the terms so that we
can write them from left to right in the sequence in which they appear as stored
in the computer's memory, we have:
V
1
0
3
2
( 7y + 13y ) + ( 2y + 7y )
P<<<<<<<<<<<Q
P<<<<<<<<<<Q
low BYTE
high BYTE
B
B
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
low WORD
5
4
7
6
( 2y + 5y ) + ( 0y + 0y )
P<<<<<<<<<<Q
P<<<<<<<<<<Q
low BYTE
high BYTE
B
B
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
high WORD
Note that the coefficient of the 0th term of this polynomial is 13 in base
ten. When we represent these coefficients in base sixteen, we use the letters
from A to F to represent coefficients having values from 10 to 15. The
following table shows such values in binary, decimal, and hexadecimal:
HEXADECIMAL
0
1
2
3
4
5
6
7
DECIMAL
00
01
02
03
04
05
06
07
BINARY
0000
0001
0010
0011
0100
0101
0110
0111
HEXADECIMAL
8
9
A
B
C
D
E
F
DECIMAL
08
09
10
11
12
13
14
15
BINARY
1000
1001
1010
1011
1100
1101
1110
1111
Fortunately, there are instructions which calculate an address and load the
segment and index registers with the two parts of the address. Although
calculating the actual value of an actual 32-bit address is rarely necessary
because of this, we will have to know something about how memory segmentation
works if we are to utilize assembly language routines in PASCAL programs.
In PASCAL, we simply define two LONGINT variables, for example, and then
use a statement like v1 := v1 + v2 to add v1 and v2 and place the result in the
variable v1. We don't have to be concerned with the values stored in the
microprocessor registers; indexing the variables, performing the addition, and
so on, is all done, somehow, without us losing any sleep over the deal.
Since these CPU registers are used for so many purposes, and memory is
limited, values that may need to be stored only temporarily and that cannot be
preserved in registers are saved in areas known as 'stacks'. Calling a
subroutine to perform a certain task usually involves placing the information
needed, or perhaps addresses where that information can be found, on a stack
before calling the subroutine. In PASCAL, this is done somehow, we care not
how; in assembly language, though, we must care a great deal how this is done,
because there are several different addressing modes used by various
instructions, but there is also a very crucial point to remember: Before a
subroutine is called, the address of the next instruction to be executed after
the subroutine finishes its task must be placed on the stack. At the end of the
subroutine, there is a return instruction which loads this address from the
stack into the CS (Code Segment) register and IP (Instruction Pointer) register.
If this return address were somehow modified by the action of the subroutine, or
if the SP (Stack Pointer) register were changed and the return instruction thus
copied the wrong address into the CS:IP pair of registers, there is no way of
knowing what might happen. What if the CPU jumps to some arbitrary address and
begins fetching 'instructions' from that and the following locations when those
'instructions' aren't instructions, but maybe your "So long, have a nice day!"
sign-off message in your program's data segment?
What we need to do is to understand enough about memory addressing to enable
us to write correct assembly-language subroutines that can be called by PASCAL
instructions just as though those subroutines were PASCAL procedures or
functions.
There are generally three segments used by programs; the DATA segment, the
STACK segment, and the CODE segment. Variables that are static and global are
stored in the data segment. Variables that are temporary and local as well as
return addresses are kept in the stack segment, and executable instructions are
kept in the code segment.
The CS register always contains the segment portion of the address of the
current code segment from which instructions are to be fetched for execution.
The IP register always contains the offset portion of the address of the next
instruction to be executed. Jump, interrupt, and call instructions affect these
registers, but we will never attempt to modify their contents except indirectly
through these instructions.
PASCAL uses the DS register to hold the location of the program's data
segment, so our external subroutines will always save it on the stack if there
is a chance it might be modified while the subroutine is executing. DS is the
default segment for data memory address references, and cannot always be
overridden; for example, in string instructions, DS:SI forms a register pair and
ES:DI another pair, which address source and destination operands, respectively,
and this relationship cannot be circumvented.
The SS register always contains the offset of the current stack segment, and
references using the pointer registers (SP and BP) to contain an offset always
use SS as the segment. Thus SS:BP and SS:SP form two more register pairs, which
are used only in references to the stack.
The DATA segment provides for permanent storage of data which will remain
unchanged unless it is explicitly changed by an instruction which manipulates
data. Variables in the DATA segment are static; i.e., the address at which
the area of memory assigned to any variable is located is calculated at the
time the program begins running, and that address does not change. Variables
in the STACK segment are volatile because they are referenced through the use
of an address calculation based on the stack pointer (SP) register, which is
guaranteed to change every time a subroutine is called or terminates.
There is no definitive answer as to whether a variable should be temporary or
static, and the issue is complex enough that 'profiler' programs are often used
to analyze a program's operation and determine how efficiently the stack and
data segment memory areas are being utilized. The general rule used most often
when no other obvious constraints need be met is that you request space for a
variable only for as long as it is needed; this usually means that within each
subroutine, you request stack space for variables that are used only within
that subroutine, and you only request space in the data segment for those
variables that must be available during the entire time the program is running.
In PASCAL, the VAR keyword precedes a variable declaration that reserves
space for a variable. Those VAR declarations which follow a Procedure or
Function declaration will automatically set aside stack memory for the variables
declared. All other declarations reserve space in the data segment.
A variable declaration has the form
VAR
<VarName> : <VarType> ;
where any number of spaces, tabs, or blank lines may be included. Some sample
declarations for variables of type BYTE, WORD, etc., follow; these illustrate
some alternate forms that can be used as well:
VAR
Byte1,Byte2
W,W2
L
A
:
:
:
:
BYTE;
{ This form is very readable, and is often used }
WORD;
{ although it is often wasteful of space. }
LONGINT;
ARRAY [1..4] OF LONGINT;
VAR
Byte1,Byte2,
Byte3
: BYTE;
W,W2
: WORD;
L
: LONGINT;
Array1,Array2
: ARRAY [1..4]
OF LONGINT;
{
{
{
{
{
{
}
}
}
}
}
}
The following short example illustrates the concepts of local and global
variable scope as well as static and temporary storage class. Local variables
declared within a procedure or function exist only on the stack, and may not
necessarily occupy the same position in the stack segment from one procedure
call to the next; it is thus necessary to initialize a local variable each time
the procedure is called, before the variable is to be used.
Program MainProg;
<<< The VAR keyword causes the compiler to interpret what follows as a
B
variable declaration.
B
VAR
<< The scope of GlobalByte is global to the program,
GlobalByte : BYTE; <Q and it can be referenced anywhere below this point.
B
B
This variable is static; i.e., it exists at a single
B
B
known address throughout program execution, somewhere
B
B
in the main program's DATA segment.
B
B
B
P<< The variable's TYPE
B
P<< The variable name
<< The procedure's name
B
Procedure SecondProc; << Declare this procedure as FORWARD so that it can
FORWARD;
<Q be referenced before it is actually defined.
Procedure SomeProc;
VAR LocalByte : BYTE;
BEGIN
<A
LocalByte := 1; <
SecondProc;
B
END;
<Q
<<<
Procedure SecondProc;
VAR LocalByte; <<< This second LocalByte is also located on the STACK.
BEGIN
<A This LocalByte has not been assigned a value.
GlobalByte := LocalByte; < It is local to SecondProc. Both of these
END;
<Q variables named LocalByte are temporary.
<< This is the main program body, which in this case contains only one
B
call to a subroutine. Note the period following the END statement.
B
BEGIN
<A
SomeProc;
< GlobalByte exists during MainProg execution, including during
END.
<Q the time SomeProc and SecondProc are executing.
In this example, GlobalByte is not assigned a value until SecondProc is called
by SomeProc. The value assigned to it, though, comes from the location on the
stack of the LocalByte variable local to SecondProc, and this location is not
assigned a value. Thus, GlobalByte will have an unpredictable value.
One more basic PASCAL data type, the STRING, needs to be introduced. String
variables are arrays of characters, and are most often used to store text. The
first BYTE of a string variable is its length byte, which gives the number of
valid characters following it in the memory area occupied by the variable.
VAR
S : STRING;
S8 : STRING[8];