Você está na página 1de 7



 THE PRAGMATIST'S INTRODUCTION TO HEXADECIMAL 


8086 General-Purpose Registers Index, Pointer, and Segment Registers
Spatial Organization of Groups of Bytes or Words in Memory
LONGINT - High and Low Words Polynomial in Base 16 Hexadecimal Digits
Memory Addresses and Least Significant Byte Two-step LONGINT Addition
Stacks Subroutine Calls and Return Addresses Instructions vs. Data
Data, Code, and Stack Segments - Instructions and Variable Storage Classes
CS and IP DS and SI ES and DI SS and BP SS and SP String Types
The Bricks and Mortar of Programming
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
The 8086 microprocessor has four general-purpose registers, the A, B, C, and D
registers. There are two I registers, which are the the index registers, and
four S registers, which are the segment registers. In addition, there are two
P registers, the pointer registers, a program counter register, and a status
register which is called the Flags register.
It is the size of the general-purpose registers which introduces the
constraint we have mentioned as the machine's WORD size, which in this family of
machines is 16 bits.
The general purpose registers can also be used as two half-registers, so that
operations on individual BYTEs within a WORD may be performed. The diagram
following shows these registers divided into halves; the high half is the H
half and the low half is the L half.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<A
B
1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0
B
B
5 4 3 2 1 0 9 8
7 6 5 4 3 2 1 0
B
B IJ
IKJ B
B B
AX
B = B
AH
B
AL
B B
B N
N B
B B
BX
B = B
BH
B
BL
B B
B N
N B
B B
CX
B = B
CH
B
CL
B B
B N
N B
B B
DX
B = B
DH
B
DL
B B
B N
N B
B B 15..0 B
B 7 6 5 4 3 2 1 0 B 7 6 5 4 3 2 1 0 B B
B B WORD B
B
High BYTE
B
Low BYTE
B B
B P<<<<<<<<Q
P<<<<<<<<<<<<<<<<<R<<<<<<<<<<<<<<<<<Q B
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
The AX register is known as the accumulator register, and does have some
special uses in particular instructions. It MUST be used in instructions which
perform input or output through the computer's I/O ports, as well as in certain
mathematical or string-processing instructions.
The BX register is used by certain instructions in address calculations, and
is generally used as an index into data tables.
The CX register is used as a counter in repeat or looping instructions, as
well as in some shift or rotate instructions.
The DX and AX registers are used in 16-bit multiply and divide instructions
to hold multiplicands or dividend and divisor, and after the operation, they
store the 16-bit quotient and remainder or the high and low words of a 32-bit
product. The DX register MUST be used in input/output instructions to hold
the address of the I/O port being written to or read from if that address
is larger than 8 bits in size, and the AL or AX register must be used to
hold the BYTE or WORD being read from or written to the I/O port.
Other than these general restrictions, these registers can generally be used
to store data, but you must 'work around' the effects of particular instructions
on, particularly, the AX and CX registers. We will see later on how some of

these workarounds can be carried out; it often is a very complicated undertaking


to keep data in registers; many times you have no choice but to store it in
memory and read it in every time a value is needed.
The other registers cannot be accessed as two halves.

SOURCE
DESTINATION
STACK
BASE
CODE
DATA
EXTRA
STACK

REGISTERS
<<<<<<<<A
B
SI
B
B
DI
B
<<<<<<<<W
B
SP
B
B
BP
B
<<<<<<<<W
B
CS
B
B
DS
B
B
ES
B
B
SS
B
P<<<<<<<<Q

These are shown below:

INDEX
POINTER
SEGMENT

The SI and DI registers are used in string compare, load, scan, and store
instructions. The program counter (PC) register, often called the instruction
pointer register (IP) cannot be directly changed except in response to a call or
jump instruction; it is used by the microprocessor for program execution, not by
the program itself, so it does not concern us.
The segment registers, in combination with the index or pointer registers, are
used in calculating memory addresses. Whereas we have viewed the spatial
organization of data in memory up until now only in terms of an individual BYTE
or WORD, we now need to look at the larger-scale organization of groups of BYTEs
or WORDs in memory, and take an initial look at how these individual elements
can be manipulated within this context.
As an example of just why we need to know the order in which BYTEs or WORDs
are stored in memory, consider the case of the PASCAL data type, the LONGINT,
which is a 32-bit numeric value:
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
high WORD

1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
low WORD

From our point of view, numerically, the most significant bit is at the left
end and the least significant bit is at the right. This is as it should be, for
us. However, when this is stored in memory, bits 0 through 7 of this double
WORD will be stored at the lowest address, with bits 8-15, bits 16-23, and bits
24-31 following it successively in the next three bytes of memory.
It is at this point that you should complain, "Why should we try to write all
of this out in binary? Isn't there some easier way?"
OK. There is. Using base sixteen, representing data in hexadecimal, vastly
simplifies both our notation and the effort to understand it, in a great many
cases. Let's introduce a polynomial in powers of sixteen to illustrate how the
number 2434941d (the 'd' is for decimal) is expressed in hexadecimal:

7
a y +
7

6
a y +
6

5
a y +
5

4
a y +
4

3
a y +
3

2
a y +
2

1
a y +
1

0
a y
0

2434941

0268435456 + 016777216 + 21048576 + 565536 +


24096 + 7256 + 716 + 131

Writing this as a polynomial with y = 16, and grouping the terms so that we
can write them from left to right in the sequence in which they appear as stored
in the computer's memory, we have:
V

1
0
3
2
( 7y + 13y ) + ( 2y + 7y )
P<<<<<<<<<<<Q
P<<<<<<<<<<Q
low BYTE
high BYTE
B
B
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
low WORD

5
4
7
6
( 2y + 5y ) + ( 0y + 0y )
P<<<<<<<<<<Q
P<<<<<<<<<<Q
low BYTE
high BYTE
B
B
P<<<<<<<<<<<<<<<<<<<<<<<<<<<<Q
high WORD

Note that the coefficient of the 0th term of this polynomial is 13 in base
ten. When we represent these coefficients in base sixteen, we use the letters
from A to F to represent coefficients having values from 10 to 15. The
following table shows such values in binary, decimal, and hexadecimal:
HEXADECIMAL
0
1
2
3
4
5
6
7

DECIMAL
00
01
02
03
04
05
06
07

BINARY
0000
0001
0010
0011
0100
0101
0110
0111

HEXADECIMAL
8
9
A
B
C
D
E
F

DECIMAL
08
09
10
11
12
13
14
15

BINARY
1000
1001
1010
1011
1100
1101
1110
1111

We can rewrite the sequence of bytes in memory, in hexadecimal, as follows,


where each hexadecimal 'digit' corresponds to a nibble, and each two such
'digits' to a BYTE:
7D 27 25 00
The memory address of this LONGINT is actually the location of the least
significant BYTE, which has the value 7Dh (the 'h' is for hexadecimal), but when
we think of the actual numeric value stored in this sequence of four BYTEs of
memory, or when we write the value in a program text as a hexadecimal value, we
write it as:
0025277Dh
or, in PASCAL, as $0025277D
Note the zeros at the left of the hexadecimal number. These aren't strictly
necessary in an assembly language program, but at least one is required if the
leftmost hexadecimal digit of a number is a letter. Whether writing values in
hex or binary it is good practice to pad the value with leading zeros so that
a binary BYTE value is always eight digits in the text, a binary WORD value is
always 16 digits, a hex BYTE is two digits, a hex WORD is four digits, and a
hex DWORD (Double-Word or LONGINT) is eight digits.
Since we can only move 8 or 16 bits at a time into a register, we are forced
to deal with such sequences of bytes in memory in small 'chunks'; adding two
LONGINT numbers thus requires first addressing the low word of the first number,
reading it into a register, addressing the low word of the second number, adding
the word in the register to it, and then, taking any possible carry into
account, addressing the high word of the first number, reading this into a
register, addressing the high word of the second number, and then adding the
word in the register to it.

Fortunately, there are instructions which calculate an address and load the
segment and index registers with the two parts of the address. Although
calculating the actual value of an actual 32-bit address is rarely necessary
because of this, we will have to know something about how memory segmentation
works if we are to utilize assembly language routines in PASCAL programs.
In PASCAL, we simply define two LONGINT variables, for example, and then
use a statement like v1 := v1 + v2 to add v1 and v2 and place the result in the
variable v1. We don't have to be concerned with the values stored in the
microprocessor registers; indexing the variables, performing the addition, and
so on, is all done, somehow, without us losing any sleep over the deal.
Since these CPU registers are used for so many purposes, and memory is
limited, values that may need to be stored only temporarily and that cannot be
preserved in registers are saved in areas known as 'stacks'. Calling a
subroutine to perform a certain task usually involves placing the information
needed, or perhaps addresses where that information can be found, on a stack
before calling the subroutine. In PASCAL, this is done somehow, we care not
how; in assembly language, though, we must care a great deal how this is done,
because there are several different addressing modes used by various
instructions, but there is also a very crucial point to remember: Before a
subroutine is called, the address of the next instruction to be executed after
the subroutine finishes its task must be placed on the stack. At the end of the
subroutine, there is a return instruction which loads this address from the
stack into the CS (Code Segment) register and IP (Instruction Pointer) register.
If this return address were somehow modified by the action of the subroutine, or
if the SP (Stack Pointer) register were changed and the return instruction thus
copied the wrong address into the CS:IP pair of registers, there is no way of
knowing what might happen. What if the CPU jumps to some arbitrary address and
begins fetching 'instructions' from that and the following locations when those
'instructions' aren't instructions, but maybe your "So long, have a nice day!"
sign-off message in your program's data segment?
What we need to do is to understand enough about memory addressing to enable
us to write correct assembly-language subroutines that can be called by PASCAL
instructions just as though those subroutines were PASCAL procedures or
functions.
There are generally three segments used by programs; the DATA segment, the
STACK segment, and the CODE segment. Variables that are static and global are
stored in the data segment. Variables that are temporary and local as well as
return addresses are kept in the stack segment, and executable instructions are
kept in the code segment.
The CS register always contains the segment portion of the address of the
current code segment from which instructions are to be fetched for execution.
The IP register always contains the offset portion of the address of the next
instruction to be executed. Jump, interrupt, and call instructions affect these
registers, but we will never attempt to modify their contents except indirectly
through these instructions.
PASCAL uses the DS register to hold the location of the program's data
segment, so our external subroutines will always save it on the stack if there
is a chance it might be modified while the subroutine is executing. DS is the
default segment for data memory address references, and cannot always be
overridden; for example, in string instructions, DS:SI forms a register pair and
ES:DI another pair, which address source and destination operands, respectively,
and this relationship cannot be circumvented.
The SS register always contains the offset of the current stack segment, and
references using the pointer registers (SP and BP) to contain an offset always
use SS as the segment. Thus SS:BP and SS:SP form two more register pairs, which
are used only in references to the stack.
The DATA segment provides for permanent storage of data which will remain
unchanged unless it is explicitly changed by an instruction which manipulates
data. Variables in the DATA segment are static; i.e., the address at which
the area of memory assigned to any variable is located is calculated at the

time the program begins running, and that address does not change. Variables
in the STACK segment are volatile because they are referenced through the use
of an address calculation based on the stack pointer (SP) register, which is
guaranteed to change every time a subroutine is called or terminates.
There is no definitive answer as to whether a variable should be temporary or
static, and the issue is complex enough that 'profiler' programs are often used
to analyze a program's operation and determine how efficiently the stack and
data segment memory areas are being utilized. The general rule used most often
when no other obvious constraints need be met is that you request space for a
variable only for as long as it is needed; this usually means that within each
subroutine, you request stack space for variables that are used only within
that subroutine, and you only request space in the data segment for those
variables that must be available during the entire time the program is running.
In PASCAL, the VAR keyword precedes a variable declaration that reserves
space for a variable. Those VAR declarations which follow a Procedure or
Function declaration will automatically set aside stack memory for the variables
declared. All other declarations reserve space in the data segment.
A variable declaration has the form
VAR

<VarName> : <VarType> ;

where any number of spaces, tabs, or blank lines may be included. Some sample
declarations for variables of type BYTE, WORD, etc., follow; these illustrate
some alternate forms that can be used as well:
VAR
Byte1,Byte2
W,W2
L
A

:
:
:
:

BYTE;
{ This form is very readable, and is often used }
WORD;
{ although it is often wasteful of space. }
LONGINT;
ARRAY [1..4] OF LONGINT;

VAR
Byte1,Byte2,
Byte3
: BYTE;
W,W2
: WORD;
L
: LONGINT;
Array1,Array2
: ARRAY [1..4]
OF LONGINT;

{
{
{
{
{
{

This form is my own, and has the advantage that


each variable can have its own comment adjacent
to it if needed, additional variables can be
easily added to the text, and the eye can scan
quickly down the left margin column to locate
or identify variables or their types.

}
}
}
}
}
}

The following short example illustrates the concepts of local and global
variable scope as well as static and temporary storage class. Local variables
declared within a procedure or function exist only on the stack, and may not
necessarily occupy the same position in the stack segment from one procedure
call to the next; it is thus necessary to initialize a local variable each time
the procedure is called, before the variable is to be used.
Program MainProg;

<<< This is the first statement of a program, if present.


It consists of the keyword Program followed by the
program name; in this case, MainProg.

<<< The VAR keyword causes the compiler to interpret what follows as a
B
variable declaration.
B
VAR
<< The scope of GlobalByte is global to the program,
GlobalByte : BYTE; <Q and it can be referenced anywhere below this point.

B
B
This variable is static; i.e., it exists at a single
B
B
known address throughout program execution, somewhere
B
B
in the main program's DATA segment.
B
B
B
P<< The variable's TYPE
B
P<< The variable name
<< The procedure's name
B
Procedure SecondProc; << Declare this procedure as FORWARD so that it can
FORWARD;
<Q be referenced before it is actually defined.
Procedure SomeProc;
VAR LocalByte : BYTE;
BEGIN
<A
LocalByte := 1; <
SecondProc;
B
END;
<Q

<<<

LocalByte is located on the STACK

LocalByte exists only during SomeProc execution


and its scope is limited to SomeProc, so that it cannot
be referenced outside of SomeProc.

Procedure SecondProc;
VAR LocalByte; <<< This second LocalByte is also located on the STACK.
BEGIN
<A This LocalByte has not been assigned a value.
GlobalByte := LocalByte; < It is local to SecondProc. Both of these
END;
<Q variables named LocalByte are temporary.
<< This is the main program body, which in this case contains only one
B
call to a subroutine. Note the period following the END statement.
B
BEGIN
<A
SomeProc;
< GlobalByte exists during MainProg execution, including during
END.
<Q the time SomeProc and SecondProc are executing.
In this example, GlobalByte is not assigned a value until SecondProc is called
by SomeProc. The value assigned to it, though, comes from the location on the
stack of the LocalByte variable local to SecondProc, and this location is not
assigned a value. Thus, GlobalByte will have an unpredictable value.
One more basic PASCAL data type, the STRING, needs to be introduced. String
variables are arrays of characters, and are most often used to store text. The
first BYTE of a string variable is its length byte, which gives the number of
valid characters following it in the memory area occupied by the variable.
VAR
S : STRING;
S8 : STRING[8];

{ Allocate a default total of 255 bytes for characters }


{ Allocate enough memory for 8 characters }

In the above declaration, S occupies 255 + 1 bytes of memory and S8 occupies


8 + 1 bytes of memory. A standard STRING type contains at most 255 characters
because its length byte can only have values ranging from 0 to 255.
BEGIN
S := 'This is a string.';
S8 := '12345678';
END.
In the above example, S is initialized to a string of 17 text characters which
ends with a period; beyond S[17], though, the content of the memory area which
is occupied by S is undetermined. Since S8 in the example is initialized to a
string of eight characters, its entire memory area contains known values. The
following diagram shows each byte of S in hexadecimal notation:

< The address of S is the address of S[0]


B < S[1] = 'T'
B B < S[2] = 'h'
< S[17] = '.'
B B B
B
< S[255]
B T h i s
i s
a
s t r i n g .
B
11 54 68 69 73 20 69 73 20 61 20 73 74 72 69 6E 67 2E ?? ?? ?? ... ??
B
P< 11h = 17d, the length byte of S
What we have done so far is tantamount to looking at the building materials
for a large building yet to be built and discussing how they are to be used
without having even glanced at the architect's drawings. Of course, you could
say you're not satisfied with architectural renderings, or you wouldn't be
reading this book.
We've already passed through the worst 'bottleneck' of low-level, 'its and
bits' programming, which is the very necessary and important subject of binary
and hexadecimal numbering systems. Too fast, indeed! But, never fear. You'll
have plenty of opportunity to rack your brains mastering this subject before
long, because we will take an in-depth look at the 8086 instruction set, and you
will be forced to practice converting from binary to hexadecimal and back in
order to understand what you are studying!
In the next few chapters, we will see ways of imposing structure on data that
make great sense, even in assembly language, but we won't use the 'bricks and
mortar' of programming to attempt to understand how the building is laid out.
For complex data structures, it is usually much simpler to use the PASCAL
language to define and manipulate them. There are many times, though, as we
will see, when the speed of program execution can be increased by a factor of
ten, a hundred, or more, by the judicious use of external assembly language
subroutines.

Você também pode gostar