Você está na página 1de 12

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

Data Structures And Number Systems


Copyright Brian Brown, 1984-2000. All rights reserved.
Part 1

Reference Books:
Program Design : P Juliff
IBM Microcomputer Assembly Language : J Godfrey
Programmers Craft : R Weiland
Data Storage in a computer : CIT
Microcomputer Software Design : S Campbell
DATA STRUCTURES
Just as learning to design programs is important, so is the understanding of the correct format and
usage of data. All programs use some form of data. To design programs which work correctly, a
good understanding of how data is structured will be required.
This module introduces you to the various forms of data used by programs. We shall investigate
how the data is stored, accessed and its typical usage within programs.
A computer stores information in Binary format. Binary is a number system which uses BITS to
store data.
BITS
A bit is the smallest element of information used by a computer. A bit holds ONE of TWO possible
values,
Value Meaning
0

OFF

ON

A bit which is OFF is also considered to be FALSE or NOT SET; a bit which is ON is also
considered to be TRUE or SET.
Because a single bit can only store two values, bits are combined together into large units in order to
hold a greater range of values.
NIBBLE
A nibble is a group of FOUR bits. This gives a maximum number of 16 possible different values.
2 ** 4 = 16 (2 to the power of the number of bits)

1 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

It is useful, when dealing with groups of bits, to determine which bit of the group has the least
value, and which bit has the most or greatest value.
The Least Significant Bit AND The Most Sigificant Bit
This is deemed to be bit 0, and is always drawn at the extreme right. The Most significant bit is
always shown on the extreme left, and is the bit with the greatest value.
The diagram below shows a NIBBLE, and each bits position and decimal weight value (for more
information, consult the module on Number Systems).
3 2 1 0 Bit Number
+---+---+---+---+
| | | | |
+---+---+---+---+
8 4 2 1 Decimal Weighting Value
MSB
LSB
Lets consider an example of converting binary values into decimal.
Bit
3210
Value 1 0 1 1
Bit 3 is set, so it has a decimal weight value of
8
Bit 2 is not set, so it has a decimal weight value of 0
Bit 1 is set, so it has a decimal weight value of
2
Bit 0 is set, so it has a decimal weight value of
1
Adding up all the decimal weight values for each bit= 11
So 1011 in binary is 11 in decimal!
For more examples, consult the module on Number Systems.
BYTES
Bytes are a grouping of 8 bits. This comprises TWO nibbles, as shown below.
7 6 5 4 3 2 1 0 Bit Number
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
128 64 32 16 8 4 2 1 Decimal Weighting Value
MSB
LSB

2 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

Bytes are often used to store CHARACTERS. They can also be used to store numeric values,
0 to 255
+127 to -128

Binary Coded Decimal [BCD]


Binary code decimal digits (0-9) are represented using FOUR bits. The valid combinations of bits
and their respective values are
Binary Value Digit
0000

0001

0010

0011

0100

0101

0110

0111

1000

1001

The binary combinations 1010 to 1111 are invalid and are not used.
If the computer stores one BCD digit per byte, its called normal BCD. The unused nibble may be
either all 0's or all 1's.
If two BCD digits are stored per byte, its called Packed BCD. This occurs in data transmission
where numbers are being transmitted over a communications link. Packed BCD reduces the amount
of time spent transmitting the numbers, as each data byte transmitted results in the sending of two
BCD digits.
Consider the storing of the digits 56 in Packed BCD format.
7 6 5 4 3 2 1 0 Bit Number
+---+---+---+---+---+---+---+---+
|0|1|0|1|0|1|1|0|
+---+---+---+---+---+---+---+---+
MSB
LSB
The UPPER nibble holds the value 5, whilst the LOWER nibble holds the value 6.

3 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

Status And Boolean Variables


BOOLEAN variables use a single bit to hold their value, so can only assume one of two possible
states. This is either 0 (considered to be FALSE), or 1 (considered to be TRUE).
The computer handles each boolean variable as a single bit. If the bit is TRUE, then is has a value
of 1. If the bit is FALSE, then it has the value 0.
When a group of bits are grouped together to form a limited range of values, this is known as a
STATUS variable.
Consider the case in a program where we need to keep track of the number of minutes a phone line
is busy for (within the limited range of 0 to 60). This does not require the use of a full integer, so
some programming languages allow you to specify the number of bits used to allocate to variables
with limited ranges.
The advantage of this approach, is that the storage space of status variables can be combined
together into a single 16 or 32 bits, resulting in a saving of space.
Consider where a computer allocates 16 bits of storage per status variable. If we had three status
variables, the space consumed would be 48 bits. BUT, if all the status variables could be combined
and fitted into a single 16 bits of storage, we could save 32 bits of memory. This is very important in
real-time systems where memory space is at a premium.
Consider the following diagram, which illustrates the packing of boolean and status variables
together into a single byte.
7 6 5 4 3 2 1 0 Bit Number
+---+---+---+---+---+---+---+---+
|1|0|0|1|1|0|1|0|
+---+---+---+---+---+---+---+---+
| |
| |
| |
| +--- local call/TOLL call (bit 0)
| |
+------- extension busy/free (bit 1)
| +------------------- minutes (bits 2-4)
+----------------------- extension diverted/TRUE/FALSE

The American Standard Code for Information Interchange


ASCII is a computer code which uses 128 different encoding combinations of a group of seven bits
(27 = 128) to represent,
characters A to Z, both upper and lower case
special characters, < . ? : etc
numbers 0 to 9
special control codes used for device control
Lets now look at the encoding method. The table below shows the bit combinations required for
each character.

4 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

00

01

02

03

04

05

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

06

07

08

09

0A

00 NUL SOH STX ETX EOT ENQ ACK BEL BS TAB LF

0B

0C 0D 0E 0F

VT FF CR SO SI

10 DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US
20

"

&

'

<

>

40 @

L M N

50

60

70

~ DEL

30

A computer usually stores information in eight bits. The eighth bit is unused in ASCII, thus is
normally set to 0. Some systems may use the eight bit to implement graphics or different language
symbols, ie, Greek characters.
Control codes are used in communications and printers. They may be generated from an ASCII
keyboard by holding down the CTRL (control) key and pressing another key (A to Z, plus {, \, ], ^,
<- ).
Example Code the text string 'Hello.' in ASCII using hexadecimal digits.
H = 48
e = 65
l = 6C
l = 6C
o = 6F
. = 2E
thus the string is represented by the byte sequence
48 65 6C 6C 6F 2E

CHARACTERS
Characters are non-numeric symbols used to convey language and meaning. In English, they are
combined with other characters to form words. Examples of characters are;
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789!@#$%^&*()_-=+|\

5 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

`,./;'[]{}:"<> ?

A computer system normally stores characters using the ASCII code. Each character is stored using
eight bits of information, giving a total number of 256 different characters (2**8 = 256).
In the high level language Pascal, characters are defined and used as follows,
var
plus_symbol : char;
begin
plus_symbol := '+';

Variables used in a Pascal program are declared after the keyword var. The above example declares
the variable plus_symbol to be a character type, thus eight bits of memory storage are allocated to
store its value (as yet undetermined).
Inside the main body of the program, after the keyword begin, the statement shown assigns the
symbol + to the character variable plus_symbol. This is equivalent to storing the ASCII value 2B
hexadecimal in the eight bits of memory allocated to the variable plus_symbol.
TEXT STRINGS
Text strings are a sequence of characters (ie, words or multi- character symbols). Each character is
stored one after the other, each occupying eight bits of memory storage.
The text string Hello would be stored as follows
+------+
| 48 |
+------+
| 65 |
+------+
| 6C |
+------+
| 6C |
+------+
| 6F |
+------+
In Turbo Pascal, text strings are defined and used as follows,
var

6 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

text_message : string[22];
begin
text_message := 'Welcome to text strings';
The above example declares the variable text_message to be a string type of up to 22 characters
long (but no more!). Eight bits of memory storage are allocated to store each character in the string
(a total of 22 bytes), with the value in each byte as yet undetermined.
Inside the main body of the program, after the keyword begin, the statement shown assigns the
message Welcome to text strings to the string variable text_message. This stores the ASCII value of
each character into each successive byte of memory allocated to the variable text_message.
INTEGERS
Numeric information cannot efficiently be stored using the ASCII format. Imagine storing the
number 123,769 using ASCII. This would consume 6 bytes, and it would be difficult to tell if the
number was positive or negative (though we could precede it with the character + or -).
A more efficient way of storing numeric information is to use a different encoding scheme. The
encoding scheme in most use is shown below,
Bits
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+---+--------------------------------------------+
| S | Numeric Value in Binary
|
+---+--------------------------------------------+
S = Sign Bit
Integers store whole numbers only! They do not contain fractional parts. Consider the examples
below,
Valid
Invalid
123
.987
0
0.0
278903
123.09
The sign bit (which is bit 15) indicates whether the number is positive or negative. A logic 1
indicates negative, a logic 0 indicates positive.
The number is converted to binary and stored in bits 0 to 14 of the two bytes.
Example
Store the value +263 as an integer value.

7 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

1) The sign bit is 0.


2) The decimal value +263 is 100000111 in binary.
Thus the storage in memory of the integer +263 looks like,
Bits
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+---+---------------------------------------------+
|0|0 0 0 0 0 0 1 0 0 0 0 0 1 1 1|
+---+---------------------------------------------+

When storing negative numbers, the number is stored using the two's complement format.
In Pascal, integers are defined and used as follows,
var
whole_number : integer;
begin
whole_number := 1267;
The example declares the variable whole_number to be an integer type, thus sixteen bits of memory
storage are allocated to store its value (as yet undetermined).
Inside the main body of the program, after the keyword begin, the statement shown assigns the
numeric value 1267 to the integer variable whole_number. This is equivalent to storing the bit
combination 0000010011110011 in the sixteen bits of memory allocated to the variable
whole_number.
Signed integers using 16 bits have a number range of,
-32768 to +32767 (+-2^15)
To store larger integer values would require more bits. Some systems and languages also support the
use of unsigned integers, which are deemed to be positive only.
FLOATING POINT NUMBERS
There are two problems with integers; they cannot express fractions, and the range of the number is
limited to the number of bits used. An efficient way of storing fractions is called the floating point
method, which involves splitting the fraction into two parts, an exponent and a mantissa.
The exponent represents a value raised to the power of 2.
The mantissa represents a fractional value between 0 and 1.

8 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

Consider the number


12.50
The number is first converted into the format
2n * 0.xxxxxx
where n represents the exponent and 0.xxxxx is the mantissa.
The computer industry agreed upon a standard for the storage of floating point numbers. It is called
the IEEE 754 standard, and uses 32 bits of memory (for single precision), or 64 bits (for double
precision accuracy). The single precision format looks like,
Bits
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+---+-----------------------+---------------------------------------------------------------------+
| S | Exponent value
|
Mantissa value
|
+---+-----------------------+---------------------------------------------------------------------+
S = Sign Bit
The sign bit is 1 for a negative mantissa, and 0 for a positive mantissa.
The exponent uses a bias of 127.
The mantissa is stored as a binary value using an encoding technique.
Working out the FP bit patterns
The number we have is
12.5
which expressed as fraction to the power of 2 is,
12.5 / 2 = 6.25
6.25 / 2 = 3.125
3.125 / 2 = 1.5625
1.5625 / 2 = 0.78125
NOTE: Keep dividing by 2 till a fraction between 0 and 1 results. The fraction is the mantissa value,
the number of divisions is the exponent value.
thus our values now are,
0.78125 * (2*2*2*2) (comment: this is 2 to the power of 4)

9 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

The exponent bit pattern is stored using an excess of 127. This means that this value is added to the
exponent when storing (and subtracted when removing).
The exponent bit pattern to store is,
4 + 127 = 131 = '10000011'
As the mantissa is a positive value, the sign bit is 0.
The mantissa is a little more complicated to work out. Each bit represents 2 to the power of a
negative number. It looks like,
1st bit of mantissa = 0.5
2nd
= 0.25
3rd
= 0.125
4th
= 0.0625
5th
= 0.03125
etc
The mantissa number value we have is 0.78125, which in binary is
11001000000000000000000 (0.5 + 0.25 + 0.03125)
How-ever, to make matters even more complicated, the mantissa is normalized, by moving the bit
patterns to the left (each shift subtracts one from the exponent value) till the first 1 drops off.
The resulting pattern is then stored.
The mantissa now becomes
10010000000000000000000
and the exponent is adjusted to become
131 - 1 = 130 = '10000010'
The final assembled format is,
Bits
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

10 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

+--+-----------------------+---------------------------------------------------------------------+
| 0| 1 0 0 0 0 0 1 0| 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
+--+-----------------------+---------------------------------------------------------------------+
S Exponent
Mantissa

Now lets convert the following storage format back into a decimal number.
Bits
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
+--+-----------------------+---------------------------------------------------------------------+
| 1| 1 0 0 0 0 0 1 1| 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 |
+--+-----------------------+---------------------------------------------------------------------+
S Exponent
Mantissa
This gives a negative number with an exponent value of,
131 - 127 = 4
and a mantissa value of,
1.0 + 0.5 + 0.0625 = 1.5625
(the 1.0 comes from the bit which was shifted off when the mantissa was normalized ), thus the
number is,
-1.5625 * 2^4 = -25.0
The numeric range for floating point numbers using the IEEE-754 method, using 32 bits, is
+- 0.838860808 * 2-128 to +- 0.838860808 * 2127
or, in decimal,
2.4652 * 10-39 to 1.4272 * 1038
In Pascal, floating point numbers are defined and used as follows,
var
fp_number : real;
begin

11 of 12

24/08/2015 11:02

Introduction, bits, bytes, BCD, ASCII, characters, strings, inte...

http://physinfo.ulb.ac.be/cit_courseware/datas/data1.htm

fp_number := 12.50;
The example declares the variable fp_number to be a floating- point type, thus thirty-two bits of
memory storage are allocated to store its value (as yet undetermined).
Inside the main body of the program, after the keyword begin, the statement shown assigns the
numeric value 12.50 to the real variable fp_number. This is equivalent to storing the bit
combinations 01000001010010000000000000000000 in the thirty-two bits of memory allocated to
the variable fp_number.
The advantages of storing floating point numbers in this way are,
multiplication is performed by adding exponents and mantissa's
division is performed by subtracting exponents and mantissa's
it is easy to compare two numbers to see which is the greater or lesser
large number ranges are stored using relatively few bits
The disadvantages of the storage format are,
errors are created by moving the mantissa bits
conversion backwards and forwards takes time

Home | Other Courses | Feedback | Notes | Tests

Copyright Brian Brown, 1984-2000. All rights reserved.

12 of 12

24/08/2015 11:02

Você também pode gostar