No Excuses

Lesson No.
1
Information
Technology.
Fundamentals
Contents
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Random access memory (RAM). . . . . . . . . . . . . . . . . . . . . 6
3. Processors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4. The diagram of system board elements interaction. . . . . 62
4.1. USB (Universal Serial Bus). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2. IEEE 1394 (FireWire, i-Link). . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3. ATA (Advanced Technology Attachmen) . . . . . . . . . . . . . . . . 74
4.4. SATA (Serial ATA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5. PCI (Peripheral component interconnect). . . . . . . . . . . . . . . . 82
4.6. PCI Express. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.7. System resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.8. Audio Codec Bus — Audio Codec (AC) Link. . . . . . . . . . . . . 90
5. Form factor of motherboards. . . . . . . . . . . . . . . . . . . . . . . 93
5.1.AT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.2. LPX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3. ATX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.4. BTX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.5. NLX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.6. WTX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.7. FlexATX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
STEP Computer Academy
1. Introduction
Today we are beginning to study the course «Information
Technology Fundamentals», which will allow us to lay the
foundation for the further training in the field of IT.
The current course is studied within the framework of
cooperation between STEP Computer Academy and the Cisco
educational project (Cisco Networking Academy). The company
Cisco is the absolute leader in the market of network equipment
and network solutions. The company Cisco actively supports
educational establishments by offering them participation
in the Cisco Academy project, which is aimed for effective
training of students who is getting an education in the field of
network technology and IT. The training programs at Cisco
Academy are designed to provide students with the skills
required in the sphere of design, construction and maintenance
of networks. Program materials are regularly updated, which
allows the students to obtain relevant knowledge to meet the
requirements of today.
Under the project, Cisco Academy offers a number of courses,
mainly related to network technology. You will study these
courses, if you choose a specialization «Network technology
and system administration». However, Cisco Academy offers
a course that is not directly related to network technology and
is aimed at formation students' general knowledge in IT: this
course is called the IT Essentials. We are now beginning to
explore this course.
4
Lesson 1
In order to start your training at Cisco Academy, your

instructor must create an account for you, through which you
will have access to the course materials, lab researches and
various examinations; the instructor performs these steps at
the first lesson.
Every two weeks, you will receive a lesson as a PDF document,
which describes what topics of the course at the Cisco Academy
website you must study (i.e. read the materials, fulfill tasks
and take examinations). Some additional material to explore
besides the relevant chapters of the course Cisco ITE will also
be present in the study of some topics.
Kindly note: you will receive additional material besides the
fundamental course of ITE that is not intended to serve as a
guide on the latest processors (CPU), memory sticks, chipset,
hard drives and video adapters.
Today's assessment consists of two parts: the first part
focuses on the study of Chapter 1 of the authorized Cisco
ITE course; this chapter is called «Introduction to PC». Go to
the second part only after you have finished studying of these
materials. Use your account on the Cisco academy portal to
access training materials along with the course content and
lab researches
The second assessment is to work over the materials below:
these materials will help you comprehend the issues of modern
computers architecture profoundly and better understand the
purpose and features of the key components that make up a
computer. These materials cover such components as memory,
central processor and system logic chipsets.
5
2. Random access memory (RAM)

General principles of data handling.
In the process of exploring the Cisco Academy materials
on the course ITE, you surveyed the basic components that
make up a modern computer. Let us discuss in more detail how
to organize the transfer of data between these components.
Figure below shows a conventional block diagram. Let us look
at it in greater detail.
6
Lesson 1
Block diagram illustrating the connection of nodes to a

chipset.
In this diagram, a northbridge is the main component
in the chipset that connects the fastest PC components: the
processor, RAM, and the dedicated bus PCI-Express.
As you may already know, the processor does not execute
programs written on any programming language, but it requires
a certain «machine code». That is, it interprets sequences of
bytes in the memory of the computer as commands. Sometimes
a command may be equal to one byte; sometimes it takes a few
bytes. The main memory (RAM) contains data as well. They
may be in separate areas or may be «mixed» with the code.
The difference between the code and the data is as follows:
data is something, on which the processor performs some
operations. A code is a set of commands that tells the processor,
which operation it must perform. To simplify, we can imagine
a program and its data as a sequence of bytes of some finite
length that is located continuously (we will not complicate
it too much) in the general memory array. For example, we
have an array of memory of the length of 1,000,000 bytes, and
our program (with the data) is a set of bytes numbered from
1,000 to 20,000. Other bytes represent other programs or
their data or just some memory, not occupied with anything
useful. Thus, «machine code» is a set of processor command
stored in the memory. Data are arranged in the memory as
well. In order to execute the command on the data, the CPU
must read them from the memory. In order to perform an
operation on the data, the processor must read them from
the memory and possibly write them back in the updated
(changed) form into the memory after making a specific action
7
on them. Thus, it turns out that both commands and data

fall into the processor from the memory. In fact, everything
is a little more difficult. In the most recently applied systems,
the processor as a device could not access the memory at all,
because it did not contain the corresponding components. So, it
accesses an «intermediate» specialized device called a memory
controller; in turn, the memory controller accesses the RAM
chips located on the memory modules. Therefore, the role of
the RAM controller is simple: it serves as a «bridge» between
the memory and the devices using it (by the way, they include
not only the CPU, but we will talk about it a little later). Earlier,
the memory controller was a part of the chipset (a northbridge
from the system logic chipset) — a set that was the basis of
the motherboard. The rate of data transmission between the
processor and the memory largely depends on the controller
performance; it is one of the most important components
that affect the overall performance of the computer. In turn,
any processor was necessarily equipped with a processor bus,
which is called FSB (the Front Side Bus). The bus served as a
communication channel between the processor and all other
devices in the computer: memory, video display board, hard
drive, and so on. Accordingly, the processor communicated
with the memory controller via FSB, and the latter accessed the
RAM modules on the motherboard via a special bus (without
sophistry, let us call it a «memory bus»).
The above described conventional architecture, with the
memory controller located in the northbridge, was used for a
long time, but today Intel and AMD have abandoned it to the
transfer of the memory controller directly into the processor.
The advantage of the memory controller integration is obvious:
8
Lesson 1
«the way from the core to the memory» becomes distinctly

«shorter», which allows RAM to operate faster. However, such
approach has some disadvantages. For example, if the early
devices such as hard disk or video cards could operate with
the memory through a dedicated, independent controller, now
they are forced to operate with RAM via a controller located
on the CPU as the CPU is the only device in this architecture
that provides direct access to the memory. AMD was the first
who transferred memory controller into the processor; Intel
made a similar decision a few years later.
Diagram of the interaction «the CPU — RAM».

From the above, we can make the obvious conclusion: we
should start the analysis of the effectiveness of the system
organization from the «core» — the sequence «processor —
memory — chipset». In principle, it is possible to begin the
study with any of these components, but the most convenient
way is to start with the memory.
9
Purpose of RAM.
Using the materials of Cisco, you have studied what the
memory is, why it is needed, and how the performance of
the computer depends on its capacity. Let us consider general
information about RAM.
Random access memory is a workspace of the CPU of the
computer, which stores programs and data when the computer
is powered on. RAM is often seen as a temporary storage
because it stores data and programs only when the computer
is on or before pressing the reset button.
Before shutting down or by pressing the reset button, all
data subjected to changes during the work must be saved on
the storage device, which can store information permanently
(usually a hard disk). With the power is on again, the stored
information can be loaded into the memory.
Sometimes people confuse RAM with a disk memory, since
the capacity of both types of memory devices is expressed in
the same units — mega — or gigabytes. Let us try to explain
the connection between the random access memory and disk
memory by using the following simple analogy.
Imagine a small office, where a staff member handles the
information stored in the card file. In our example, a card file
cabinet will serve as the system hard drive, where programs
and data can be stored during a long period of time. The work
table will represent the main memory of the system, which
the employee currently handles; his actions are similar to the
operation of the processor. It has direct access to any documents
on the table. However, before a specific document would be
on the table, it is necessary to find it in the cabinet. The more
cabinets are in the office, the more documents can be stored
10
Lesson 1
in them, and if the work table is large enough, it is possible to

work with multiple documents simultaneously.
Adding a hard drive to the system resembles installation of
another filing cabinet in the office — the computer can store
more information permanently.
Increasing the amount of RAM in the system resembles
installation of a large work table — a computer can work with
more programs and data at the same time.
However, there is one difference between the storage of
documents in the office and storage of files on the computer:
when the file is loaded into RAM, its copy is still stored on the
hard disk. Note: since permanent files storing is not possible
in the main memory, all the files changed after loading in the
memory must be re-saved on the hard disk before the computer
shuts down. If the changed file is not saved, the original copy
of the file on the hard disk will remain unchanged. Now let
us talk about some material things. In the modern computer,
RAM is installed in the corresponding connectors on the
motherboard by means of special modules. Certainly, we
will learn how these modules are called and how they differ.
The memory module is just a convention on the shape and
size of a small board with soldered chips and a connector, in
which it is installed. In the first place, the module consists of
memory chips, and that the parameters of these chips (their
architecture and performance) determine the effectiveness of a
particular module. So let us just agree that we will explore the
types of memory, i.e. the logic of the organization of various
types of memory chips, its speed and efficiency, on the one
hand. On the other hand, there are just the modules that are
installed in the computer, and we will examine the appearance
11
and parameters of the modules, consisting of memory chips

(chipsets) of a specific type (architecture).
Classifications of memory.
ROM (Read Only Memory). The name of this memory
specifies why it is used to read only data. ROM is also often
called non-volatile memory because any data written therein are
saved when the computer is powered off. Therefore, commands
that start the computer, i.e. software that boots the system are
stored in ROM.
RAM (Random Access Memory). The peculiarity of this
type of memory is as follows: access to the data stored in each
memory cell can be obtained at any time, if necessary.
DRAM (Dynamic Random Access Memory). The memory
cells in the DRAM chipset are tiny capacitors, which hold
charge. That is (by the presence or absence of charge) how
the bits are coded.
SRAM (Static RAM). It is named in such a manner because
it does not require periodic regeneration to store its contents,
unlike the dynamic RAM (DRAM). However, this is not its
only advantage. SRAM provides a higher speed of operation
than DRAM and can operate at the same frequency that
modern processors do.
Now let us try to classify types of electronic memory. The
following classifications can be selected based on the structure
and principles of operation:
■■ Dynamic or static.
■■ Asynchronous or synchronous.
■■ Volatile or non-volatile.
12
Lesson 1
Let us start with the first classification:

DRAM, the Dynamic Random Access Memory.
Information is stored in a cell consisting of a capacitor; the
access to the capacitor is controlled by the transistor. The presence
or absence of charge of this capacitor actually represents stored
information (0 or 1). The use of the capacitor makes this type of
memory relatively cheap and easy to manufacture, but at the same
time it causes increasing of the access times: first, the capacitor
cannot instantly charge or discharge itself; high frequencies are not
available for the dynamic memory for the same reason. Second,
while reading, the capacitor discharges, and it also cannot store
charge for a long time, i.e. it is discharged gradually. Therefore,
the cell of the dynamic memory must be constantly updated
(regenerated), which also wastes precious waiting time of the
processor. SRAM, the Static Random Access Memory.
In contrast to the dynamic memory, the cell that stores
information is an electronic switch — a trigger that keeps its
value until there is power. This element can change its value
quickly, which allows using at very high frequencies and does
not require regeneration, which gives high-grade access time.
Unfortunately, this type of memory is difficult to manufacture
and is considerably more expensive. What conclusion can be
drawn from the consideration of these types of memory? DRAM
is cheap but slow, SRAM is expensive but fast. Therefore, DRAM
and SRAM have different scopes of application: DRAM is used as
main storage, while SRAM is used as a so-called cache memory,
the intermediate high-speed memory between the processor
and RAM. We will consider DRAM as main memory today and
will talk about the role of SRAM a little later, when we discuss
processors. The second classification:
13
Asynchronous memory.
Such memory generates data with lower frequency than the
frequency of the bus, on which it operates. A typical example
of such memory is conventional DRAM, where the waiting
for each read action took about 5 cycles of the bus operation.
As you can imagine, such memory has great latency; however,
at the time when it was popular, processors with which it was
used did not differ with high performance as well. Modified
forms of asynchronous DRAM appeared later; they allowed
to achieve lower latency, but this fact relates to the history
of memory development, and you will learn it further in the
course.
Synchronous memory.
This type of memory exchanges data with controller at
the same frequency at which the memory bus operates. A
typical example is SDRAM, or Synchronous DRAM. Time
access to such memory is determined by the frequency at
which it operates. SDRAM efficiency is much higher than
its predecessors showed. First, the fact that the read scheme
on SDRAM is much more effective than the older types of
memory offered. This allows to obtain a higher speed of the
couple «CPU — RAM». Finally, the third classification:
Volatilee.
This type of memory requires constant power supply to
store information, i.e. it only works when the computer is
powered on. All memory types indicated above are volatile.
14
Lesson 1
Non-volatile.
Unlike the previous type of memory, non-volatile memory
can store information even when there is no power supply.
There are several types of non-volatile memory. A conventional
representative is ROM, Read Only Memory. As it seen from
its name, this memory is designed for read operation only,
the information in such chips was written on the production
stage. Previously it was used to store BIOS, but it was replaced
with programmable types of permanent memory later, due to
the disability to perform overwriting. The last of such types
is EEPROM, an electrically erasable programmable memory,
which we know as Flash-memory. Flash memory can store
information without updating for about 10 years, and is now
used not only as a chip for storing BIOS of the motherboard,
but also as a portable information storage device such as USB
Flash drive.
Today, a volatile synchronous dynamic memory, that is
called SDRAM, or Synchronous Dynamic RAM, is applied
as PC main memory. There have been several generations of
such memory, namely: SDRAM, DDR (Double Data Rate)
SDRAM, DDR2 SDRAM, DDR3 SDRAM and DDR4 SDRAM.
DDR2 (rarely), DDR3 and DDR4 SDRAM are commonly
used these days
Memory modules
Now we will consider how RAM is installed on computers.
First, PC memory was installed in chips mounted directly
on the motherboard; that method satisfied both users and
manufacturers of motherboards up to a certain point, but later
there was a need for a more flexible way of installing memory
15
on the motherboard. That is, it was necessary to provide system

integrators or users with the option of choosing storage capacity,
especially considering the price for the main memory at that
time (1 MB cost about 40 US dollars).
Imagine a motherboard (see. Figure), on which 2 MB of
memory is installed using 72 (!) chips (highlighted in Figure),
and the description of the motherboard states that the memory
can be extended up to 16 MB, by replacing the installed chips
with more capacious ones. And then the manufacturers of the
computer equipment decided to use small memory cards with
soldered chips, which were installed in special connectors on
the motherboard for memory installation. These cards are
called memory modules.
So let us look at the types of modules that are relevant today

for use in PC. All modern memory modules have a form factor
called DIMM, Dual Inline Memory Module, i.e. a module with
16
Lesson 1
two-sided contact configuration. All DIMM modules used

today have data bus size = 64 bits, i.e. can transfer 64 bits (8
bytes of data) concurrently.
Each type of memory described above uses one or another
type of module; they usually differ from each other in size,
the number of contact pads and keys, i.e. one or more slots
between the pads, which disable improper installation of the
modules into the connectors of the motherboard.
ТSince only DDR2, DDR3 and DDR4 SDRAM are applied
today, we consider those types of modules that are used together
with these types of memory.
The memory type DDR2 SDRAM uses the DIMM modules
with 240 contact pads (120 pads on each side) and a key
between these pads.
You can see an example of such memory module in Figure.

Such modules are customary labeled PC2-xxxx, where xxxx is
a peak data rate, expressed in MB/s: for example, the memory
module PC2-5300 has peak bandwidth of 5.3 Gb/s. Since
the capacity of such module makes up 5300 Mb/s, while the
module simultaneously sends 8 bytes of data (64 bits), we can
conclude that one communication line (from 64-x) transmits
5300/8 = 667 Mbit/s. That is why the memory chips installed
on such module are conventionally called DDR2-667. The
17
market offered chips with the rates of DDR2-400 (modules

PC2-3200, 3.2 Gb/s) up to DDR2-1066 (modules PC2-8600,
8.6 Gb/s).
The memory type DDR3 SDRAM uses the DIMM modules
with 240 contact pads (120 pads on each side) and a key
between the pads.
However, these modules are not compatible with the modules

DDR2 in neither electrical nor mechanical manner; in order
to avoid incorrect installation of memory modules instead of
the correct ones, their keys are arranged differently. Marking
such memory modules is similar to the previous modules: for
example, the module PC3-6400 is the module DDR3 SDRAM
with peak bandwidth of 6.4 GB/s. Similarly, such memory
chips are marked as DDR3-800.
18
Lesson 1
The memory type DDR4 SDRAM uses the modules DIMM

with 288 contact pads (144 pads on each side) and a key between
the pads, the minimum bandwidth of these modules is about
17Gb/s, while the maximum bandwidth is approximately 34Gb
swith peak bandwidth of 6.4 GB/s. Similarly, such memory
chips are marked as DDR3-800
As before, the modules are labeled PC4-xxxxx and the chips

are labeled DDR4-xxxx.
19
3. Processors
First, let us define what a processor is and why we need it.
The main task of the processor is execution of commands and
processing of data; the processor receives both commands
and data from the main memory, it sends back the results of
its operation to the main memory as well.
The processor is the main computing unit of the computer,

which determines its performance to the greatest extent.
Now we turn to the key features and structural elements of
the processor, which determine its performance in the first
place.
Processor command sets

The processor is a device that performs commands and
this is no surprise that the first and main characteristic of
the processor is a set of commands, with which the current
processor is able to work. For example, the processor of your
mobile phone is likely (there are exceptions) not able to run
applications written for your PC in principle. Today, there are
a variety of processor architectures, but all processors can be
divided into two broad categories, in general, in accordance
20
Lesson 1
with the fundamental approach to the formation of a set of

commands to be executed by the processor.
RISC — Reduced Instruction Set Computer — is a processor
with a reduced command set. Typically, these processors have
a set of homogeneous multi-purpose registers (internal cells of
the processor memory, which stores the currently processed
data) and they may be of a great number. The command set
is characterized by simple framework: the command codes
have a clear structure, usually with a fixed length. As a result
of hardware implementation, such internal architecture allows
low-cost decoding and execution of these instructions in the
minimum (in the limit of 1) number of clock cycles.
CISC — Complex Instruction Set Computer – is a processor
with a full set of instructions, which include the family x86.
The composition and appointment of their registers are
significantly heterogeneous; a complex set of instructions
complicates instructions decoding, which provokes higher
hardware resources consumptions. A number of clock cycles
required to execute instructions increases.
Let us consider the instructions CISC by the example of
the installation of an electric light bulb.
Take the bulb.
Insert it into the lamp socket.
Turn to the full.
The end.
21
And a similar example in the form of the instructions RISC

Hold out your hand to the light bulb.
Take the bulb.
Raise your hand to the lamp socket.
Insert the bulb in the socket.
Turn it.
Does the bulb rotate in the socket? If yes, go to step 5.
The end.
Many RISC instructions are quite simple, so execution of

any operation will need more such instructions. Their main
advantage is that the processor performs fewer operations,
which reduces the execution time of individual commands
and the whole task (program) accordingly. Modern PCs use
the CISC processors; you have the RISC processor installed
in your mobile phone or tablet, on the other hand.
Having considered various approaches to the formation
of command sets, we can characterize the key instruction
set, which is used in modern PCs. This set of commands is
called x86 and it focuses on computation with integers. This
command set can also operate floating-point numbers, but it
does such computation in so ineffective way that another set
of commands (x87) designed at the earliest stage of the PC
development focuses on the floating point calculations. That
is, all PC processors have been supporting these two sets of
commands (at the least) for a long time. In fact, support of the
command sets x86 / x87 allows to attribute the processor to
the PC architecture. Apart from these two sets of commands,
modern PC processors support other additional command
sets, which we will discuss later in this lesson.
22
Lesson 1
Clock rate
Processor performance depends largely on the clock rate,
which is usually measured in gigahertz (GHz) today. The clock
rate is determined by the parameters of the quartz resonator,
which is a quartz crystal mounted in a tin envelope. Under
the influence of the voltage, electric current oscillations with
rate defined by the shape and size of the crystal occur in the
quartz crystal. Such current frequency is called clock rate.
The smallest unit of time for the processor as a logic device is
a period of the clock rate or just a clock cycle. The processor
spends a certain number of cycles on each operation (command
execution).
Naturally, the higher the clock rate of the processor, the more
efficient it operates: a greater number of cycles take place and
more commands are performed in a unit of time. It is perfectly
natural that newer processors run at ever higher clock rates (it
is achieved by the improvement of manufacturing methods,
in particular) showing better performance. But the clock rate
is not the only factor determining the performance of the
processor. After all, the number of cycles spent on the commands
execution can also be changed. The first x86 processors spent
about 12 bars on average to perform a command; this criterion
makes up about 4.5 cycles in 286 and 386, and about 2 cycles
in 486, while the average Pentium processor executes one
command per clock cycle. Modern processors can execute
multiple commands at the same time (due to the parallel
execution of commands). Various numbers of cycles that
the processor spends to execute commands make it difficult
to compare them with the only clock rate usage. It is much
easier to use the average number of operations performed per
23
unit of time to measure the performance. There are units of

measurement for this parameter: MIPS (million integer per
second) and MFLOPS (million floating per second). These
values are used to measure performance of different units of
the processor — ALU and FPU.
ALU is traditionally responsible for two types of operations:
arithmetic operations (addition, subtraction, multiplication,
division) with integers, logic operations with integers (logical
«AND», logical «OR», «exclusive OR», and so on). This, in fact,
follows from their name. There are several ALUs in modern
processors, as a rule.
FPU is responsible for execution of the commands working
with floating-point numbers. As in the case of ALU, there
can be several individual FPUs in the processor, and they can
operate concurrently.
As you have probably guessed, theoretical comparison of
the two processors is possible with considering performance
and clock rate together: the smaller number of the cycles
the processor spends to execute the command on average,
the higher its efficiency (performance) is, even at a constant
frequency.
For example, the processor 486 (2 cycles per command on
average) operates even slower at the frequency of 133MHz
than the Pentium processor (1 cycle per command on average),
working at the frequency of 75 MHz. Evaluation of the real
performance of the processor compared to other processors
can be rather difficult, and we must understand that such
comparison is largely dependent on the task that the processor
solves. The clock rates of modern processors typically range
from 2 GHz to 4 GHz..
24
Lesson 1
Data bus of the processor

One of the most common characteristics of the processor
is data bus size and address bus size.
When we talk about the processor bus, we typically mean
a data bus, which is a set of connections for transmitting and
receiving data. The more signals come to the bus simultaneously,
the more data it transfers over a certain period of time, and
the faster it works. The bus size is similar to the number of
lanes on highways: the greater number of lanes is, the greater
the stream of cars. That is, the wider data bus, the more data it
transmits over the same period of time. The processor 286 uses
16 connections to receive and transmit binary data, therefore,
their data bus is considered to be 16-bit. The processor with
32-bit data bus (e.g. 486) provides twice as many connections,
so they transmit and receive twice as much data per a unit of
time than 16-bit processors. Certainly, such processors have
higher efficiency of data exchange. Modern processors have
a 64-bit data bus, so they can transfer 64 bits per clock cycle
to the system memory.
Thus, we have determined, which data bus the processor
uses in order to connect to the main memory; the processor
performance surely depends on the bus size. Now let us consider
the way, in which the processor handles data received from
RAM.
CPU registers
Although the processor receives data from the main memory
via a bus of some width, it does not mean that it can process
data of the same size. Let us see how it happens.
25
The length of the portion of data that the processor can

handle at one time is characterized by the size of internal
registers.
Essentially, register is a memory cell inside the processor:
for example, the processor can add numbers written in two
different registers, while the result is written in the third register.
The size of the registers describes the digit capacity of the data
handled by the processor. The size of the registers defines the
characteristics of the software and commands executable by
the processor. For example, a processor with 32-bit internal
registers can execute 32-bit commands that handle data in
the form of 32-bit data portions, and the processors with 16-
bit registers are not able to do it. In modern PC processors,
there are 32- or 64-bit internal registers (64-bit processors
dominate in PCs).
Address bus.
The address bus is a set of conductors by means of which the
memory cell, to which or from which data is sent, is transferred
to the memory controller. Each conductor transmits a bit of
the address corresponding to a digit in the address. Increasing
the number of conductors (bus sizes) used to form the address
permits to extend the number of addressable cells. The address
bus size determines the maximum memory capacity addressable
by the processor. As you may know, the binary system is used
on computers. For example, if the address bus size would make
up only one bit (one line for data transmission), then only two
values (logic zero — no voltage, logic unit — voltage is on) could
be transferred; thus, addressing two memory location would be
possible. The use of two bits for specifying the address would
26
Lesson 1
allow for addressing four memory cells (00, 01, 10, 11 on the
bus — four different addresses can be specified). In general,
the number of different values taken by n-bit binary number
is equal to 2 to the power of n. Accordingly, with the width of
the address bus of n bits, the number of different memory cells,
which can be addressed, is 2 to the power of n; therefore, it is
said that the processor supports 2 to the power n bytes of RAM,
or that the address space of the processor is 2 to the power of
n bytes. For example, the processor 8086 has a 20-bit address
bus; it can address (2 to the power of 20 = 1048576) bytes of the
main memory, i.e. 1 MB. Thus, the maximum memory capacity
supported by the processor 8086 is 1 MB. The processor 286
has an address bus equal to 24 bits, thus addressing 16 MB
(note: each new bit in the address bus doubles the capacity
of the addressable memory. It is natural, if we remember the
formula «memory capacity = 2 to the power of bus width»).
Modern processors have an address bus that is equal to 36 bits
at the minimum, which corresponds to the supported main
memory of 64 GB. However, there are processors that support
bigger size of the address bus.
Data buses and address buses are independent, so the chips
developers select their size at their own discretion. The size of
these buses is an indicator of the processor options: the size of
the data bus determines the ability of the processor to quickly
share information. The size of the address bus determines the
capacity of the memory supported by the processor.
27
The CPU clock and the system bus clock rate.

You already know that a modern processor operates at
the frequencies of about 2–4 GHz. On the other hand, RAM
runs at much lower frequencies. Moreover, not only the main
memory, but also its chipset runs at the low frequency relatively
to the processor. The motherboard contains no components
that would work faster than the system bus. The fact is that
the larger physical size of objects is, the harder is to make
them work at high frequencies. The processor chip has a very
small physical size, while the motherboard is much larger and
cannot operate at a frequency of several gigahertzes. Let us
conduct a thought experiment:
Suppose we have a computer with the processor operating at
the frequency of 2 GHz. We are planning to replace the processor
with a more productive one and set this new processor with
a frequency of 3 GHz. So, you remove the 2 GHz processor
from the processor socket and install the 3 GHz processor. The
question is as follows: will the motherboard support the new
processor in the former regime, i.e. in the same way in which
it worked with the 2.0 GHz processor? If the motherboard is
now working half as faster, the question arises: if the processor
must be changed in order to speed up it, then should we replace
the motherboard with the one that is half much productive?
That is, can the system be enhanced with a simple replacement
of the processor? If we replaced the motherboard with a new
one provided with a high-frequency chipset, would we have
to change the main memory as well? If the things went this
way, it could be said that the computer would not be a subject
to upgrade: only complete replacement would be possible.
28
Lesson 1
However, the situation is rather different. In most systems,

the speed limit for the chipset is about 800 MHz. So, the
central issue is as follows: how can both 2 GHz processor and
3 GHz processor operate ON THE SAME motherboard? If
the system bus clock rate never exceeds 800 MHz, how does
the processor run at higher frequencies? The answer lies in
the fact that the processor does NOT RUN on the system bus
clock rate (the motherboard, chipset, and memory run at this
frequency, but not the processor).
The processor only uses the system bus clock rate for its
operation. The fact is that the processor MULTIPLES the clock
rate by a certain factor, thereby producing a resultant clock rate,
at which it runs. For example: our abstract processor from the
thought experiment running at the frequency of 2 GHz uses
the system bus at the frequency of 200 MHz, multiplying it by
10, while the processor running at the frequency of 3 GHz uses
the same system bus clock rate, multiplying it by 15.
Thus, various processors can be installed on the same
motherboard: they can operate at different frequencies due
to the fact that the system bus clock rate is the same, and the
processor multiplies the clock rate of the system bus by some
multiplier fixed in the current processor.
Another important moment is to be considered now: which
system is faster, the processor 2 GHz = 400 MHz x 5 or 2 GHz =
800 MHz x 2.5? In other words, does the system bus clock rate
influence the system performance at a constant processor speed?
Naturally, it does. Of course, the computer, whose system bus
runs at the clock rate of 800 MHz, will run faster than the system
with the same processor, but at the system bus clock rate of 400
MHz, other things will be equal.
29
After all, the memory and other components are preferably

to be used at a higher frequency (of course, if their specification
permits, in other words, if they are designed for such frequency).
That is, the computer has higher performance not only because
it obtains a faster processor; its performance also depends on the
system bus clock rate, on which, in turn, RAM runs.
An important note: of course, chipset manufacturers constantly
seek to increase the system bus clock rate supported by their
products. The system bus clock rate supported by the motherboard
depends on the chipset: even with a very fast processor installed,
the system is not able to exchange data with the memory faster
than the system bus permits, which limits the entire system
performance.
As we will study chipsets in the nearest future, we will see of
which importance the support of the memory type and the system
bus clock rate available in most of the modern chipset is. In the
final analysis, the struggle for fast system bus is largely a struggle
for the acceleration of the exchange between the memory and
the processor.
On the other hand, as mentioned above, today the memory
controller is installed not on the motherboard (in the chipset)
directly but mounted in the processor. Thus, at first glance,
the motherboard does not need anything to support high-
speed exchange between the processor and the memory, but
this is only at first glance. In fact, the memory is installed not
into the CPU, but into the connectors on the motherboard,
so the motherboard manufacturer still has to provide high-
speed exchange between the processor and the memory. In
fact, we can leave all reasoning that is mentioned above and
30
Lesson 1
simply imagine that the northbridge has become a part of the

processor.
Now, when we have considered the main characteristics
and components of the CPU, let us discuss other important
architectural elements of modern processors.
Decoder
In fact, the executive units in all modern desktop x86-
processors … do not operate with the code in the standard x86.
Each processor has its own «internal» command system, which
has nothing in common with those commands (thereby «code»),
which come from the outside. In general, commands executed
by the core are much easier, «primitive» than commands of the
standard x86. To make the processor «have the appearance» of
the CPU x86, such unit as decoder exists: it is responsible for
transformation of the «external» x86-code into the «internal»
commands executed by the core (at that, one command of
the x86-code is converted in a few more simple «internal»
commands quite often). The decoder is a very important
part of the modern processor: the character of the stream of
commands (permanency) depends on its performance: whether
the stream of commands coming to the executive units will be
constant. After all, they are unable to work with the code x86,
so their behavior (whether they will do something, or stand
idle) depends largely on the speed of the decoder.
31
Superscalar architecture
Superscalar architecture is the ability to run multiple machine
instructions per clock cycle. The advent of this technology has
led to a substantial increase in performance.
The main feature of all modern processors is that they
are able to run for execution not only the command, which
(according to the code of the program) should be performed
in the current time, but the other commands, following after
it. Consider a simple example. Suppose we need to execute
the following commands:
(1) A = B + C
(2) Z = X + Y
(3) K = A + Z
It is easy to notice that the commands (1) and (2) are
completely independent from each other - they do not intersect
nor at the initial data (variables B and C in the first case, X and
Y in the second), neither at the result placement (variable A
in the first case and Z in the second case). So, if we have more
than one executive unit at the moment, the commands can be
distributed over them and then executed simultaneously, rather
than sequentially. Thus, if we take execution time for each
command equal to N clock cycles, the execution of the whole
sequence would take N * 3 cycles and it would take only N * 2
cycles under the parallel execution (the command (3) cannot
be executed without waiting for the result of execution of the
previous two commands), if we discuss the conventional case.
Note: of course, the degree of parallelism is not infinite:

the commands may be executed in parallel only if appropriate
number of free executive units of the functional block (FB)
32
Lesson 1
is available in the current moment; the units «understand»

the commands being viewed. The simplest example: a unit
belonging to ALU is physically unable to execute the instructions
designed for FPU.
In fact, everything is far more complicated. Suppose that

we have the following sequence:
(1) A = B + C
(2) K = A + M
(3) Z = X + Y
Now the queue of the commands execution by the processor
will be changed: as the commands (1) and (3) are independent
from each other (neither the initial data nor the result placement),
they may be performed in parallel and they will be performed
in parallel. The command (2) will be executed after them (as the
third command) because the command (1) must be executed
before in order to obtain a correct result of calculations. This
mechanism is called Out-of-Order Execution, or in abbreviated
form «OoO»: in cases where the execution order cannot affect
the result, the commands are sent to the execution not in the
order, in which they are located in the program code, but in
the order that allows maximum performance.
Now it should be completely clear for you why modern
CPUs require such number of homogenous executive units:
they provide parallel execution of multiple commands, which
would have be executed in the sequence in which they are
contained in the source code, one after another, if we would talk
about the «conventional» approach to the processor designing.
33
Processors, equipped with a mechanism of parallel execution of

several consecutive commands, are called «superscalar». However,
not all superscalar processors support out-of-order execution.
Thus, in the first example, we need only «simple superscalar»
(simultaneous execution of two successive commands), but
in the second example, we cannot go without rearranging
the commands, if we want to get maximum performance.
All modern CPUs have both qualities: they are superscalar
and they support out-of-order execution of commands. At
the same time, there used to be «simple superscalars» in the
history of x86, which did not support OoO.
Pipelined architecture.
Pipelined architecture (or pipelining) was introduced into
the CPU in order to enhance performance. Typically, each
command requires a number of homogenous operations, to
be executed, for example, such as: instruction fetching from
RAM, command decoding, operand addressing in RAM,
operand fetching in RAM, command execution, result record
in RAM. Each of these operations is compared to a pipeline
stage. Let us see how it happens.
34
Lesson 1
Data sample from RAM or cache

(1st stage)
Decoding:
1.Commands
2. The address of the data statements
(2nd and 3rd stages)
Performing arithmetic and

logical operations (4th stage)
Results record in the cache or RAM (5th stage)
Figure — A simple model of the pipeline operation
Let us discuss in detail how a modern processor behaves

in such situation.
Any more or less difficult program contains conditional
jump instructions: «If a certain condition is true — go to the
execution of one code area; if not true — go to the other code
area». In terms of the speed of the program code execution by
the modern processor that supports out-of-order execution,
any conditional jump instruction makes a big problem. After
35
all, as long as it is known, which code area is «urgent» after

the conditional jump is completed, it is impossible to begin
to decode and execute. In order to somehow reconcile the
concept of the pipeline with the conditional jump commands,
a special unit is designed — a branch prediction unit. As it
is clear from its name, it is engaged, in fact, in «prophecy»: it
tries to predict what code area a conditional jump command
indicates, even before it is executed. As directed by the «regular
intranuclear prophet», the processor performs quite real
actions: a «predicted» code area is loaded and the decoding and
execution of its commands begins. Moreover, the conditional
jump instructions may also be contained among the executable
instructions, and their results are also predicted; this generates
a chain of not proven predictions! Of course, if the branch
prediction unit has made a mistake, all work done in accordance
with its predictions is simply canceled and the content of the
pipeline is cleared.
In fact, the algorithms, on which the branch prediction unit
operates, are not masterpieces of artificial intelligence at all.
They are mostly simple. Conditional branch instructions are
encountered in cycles more often: a counter is set to X, and
the value of the counter is decremented by 1 after each pass
through the cycle. Accordingly, as long as the counter value is
greater than zero, the transition at the beginning of the cycle
is performed. After it becomes zero, the execution continues
further. The branch prediction unit analyzes the result of the
conditional branch instruction execution and thinks that if the
result is a transition to a certain address made N consecutive
times, then in the case of N + 1 the transition will be carried
out to the same address. However, despite all primitivism, this
36
Lesson 1
scheme works just great: for example, if the counter is set to 100
and the «operation threshold» of the branch prediction unit
(N) is equal to two transitions in a row to the same address,
it is easy to notice that 97 transitions from the 98 transitions
will be predicted correctly!
Of course, despite the relatively high efficiency of simple
algorithms, branch prediction mechanisms in modern CPUs
is still constantly being improved and enhanced to become
more complex. However, the things are about the fight for
the unity of interest: for example, to improve the efficiency of
branch prediction unit from 95 to 97 percent, or even from
97 percent to 99...
Branch prediction allows to significantly accelerate the
execution of the program, but there is a problem: if the prediction
is wrong, all contents of the pipeline would also be incorrect and
will have to be cleaned. The longer the pipeline is, the greater
the loss of time required for its cleaning will be; therefore,
the processor manufacturers constantly enhance the branch
prediction mechanism.
Lookahead execution
Another important tool for optimizing the commands execution
by the processor is the lookahead execution. This technology
involves the beginning of the instruction execution before all
operands availability. At the same time all possible actions are
performed and the decoded instruction with one operand is
placed in the execution unit, where it is waiting for the availability
of the second operand that is coming from the other pipeline.
By using this method, the processor is able to view command
in the waiting list and execute those commands, which it will
37
probably need to address later. Thus, the CPU can perform a

number of commands in advance and use the results of the
made calculations later.
Prefetch unit
Another technology that allows to increase performance of
the processor is data prefetch. The objective of this unit is to
pre-load the data that the processor will probably need soon.
By the principle of its operation, this tool is very similar to the
branch prediction unit, with the only difference: we are talking
not about the code, but about the data in this case. The general
principle is the same: if the built-in data RAM access analysis
circuit decides that a certain area of the memory will soon be
accessed, it gives the command to load this storage area in a
special, very fast memory called cache (we will talk about this
memory further), before the executable program needs it.
A prefetch unit operating in a «smart» (effective) way can
significantly reduce access time for the necessary data and,
therefore, increase the speed of the program execution. An
efficient prefetch unit also compensates the high latency of the
memory subsystem by loading the required data in the cache,
and thus leveling the delays when accessing them, if they were
not in the cache, but in the main memory.
Of course, negative consequences are inevitable in the case
of a prefetch unit error: loading de facto «unnecessary» data
in the cache, Prefetch displaces other fata from it (which may
be just necessary). In addition, due to the «anticipation» of the
read operation, an additional load on the memory controller
is made (de facto, completely useless in the case of an error ).
38
Lesson 1
Prefetch algorithms, as well as algorithms for the branch

prediction unit, do not differ with intellectuality: as a rule, the
unit seeks to track, whether any information from the memory
with a certain «step» (by the addresses) is being read; it tries to
predict from what address the data will be read in the future
program operation, on the basis of this analysis. However, as in
the case of the branch prediction unit, simplicity of the algorithms
does not mean low efficiency: on average, the prefetch unit
«hits the mark» more often than makes mistakes (this, as in the
previous case, is primarily due to the fact that «massed» data
reading from the memory usually occurs during the execution
of various cycles).
The above indicated technologies are often combined under
the general name: dynamic execution of the commands.
Cache memory
Another important component of the modern processor
or rather the sub-system «the processor — RAM» is a cache
memory.
Let us look at how information is exchanged between the
CPU and the main memory. In most cases, RAM does not
satisfy the memory requirements of modern processors in
terms of bandwidth, since it operates at significantly lower
frequencies. Modern processor operates at frequencies of about
3 GHz and, certainly, during the exchange with the memory,
the processor will wait for the arrival of new portions of data
for quite a long time, and therefore be idle. In order to avoid
this, an additional small amount of very fast memory, which
operates without delay at the frequency of the processor, is
set between the memory and the processor. This memory
39
is called a cache - memory, it belongs to the type SRAM. A

certain amount of such memory (from 32 KB to 3 MB) is built
in the modern processor; this memory reduces downtime of
the processor during the operations with RAM.
Another meaning of the word cache is «a hoard», «a
hiding place» («stash»). The mystery of this store lies in its
«transparency»: it does not provide the program with an
additional addressable memory space. The cache is an additional
high-speed storage of the information unit copies from the
main memory, the possibility of accessing which in the near
future is high. The cache cannot store a copy of the entire
main memory, because its capacity is much smaller than the
main memory.
It stores only a limited number of data blocks and a cache
directory, which is a list of their current conformance to
the areas of the main memory. In addition, not all memory
available to the processor can be cached.
Each time the cache-controller accesses the memory, it
checks whether there is a valid copy of the data requested
in the cache, by means of the directory. If it exists, it means
a cache hit, when the data is taken from the cache-memory.
If a valid copy does not exist, it is the case of a cache miss,
where the data is taken from the main memory. According
to the cashing algorithms, the data block that has been read
from the main memory can replace one of the cache blocks
under the certain conditions. The percentage of cache hits as
well as the caching efficiency depends on the intellectuality
of the algorithms.
Searching a block in the list must be performed quite fast
in order not to reduce to zero all benefits from using high-
40
Lesson 1
speed memory, due to the «procrastinating» with the decision.

Accessing the main memory can start simultaneously with the
search in the directory and break in case of the cache hit (the
architecture Look Aside). It saves time but redundant access
causes increase of power consumption. The second option is
as follows: addressing the external memory starts only after
the record of the cache miss (the architecture Look Through):
though at least one cycle of the processor is missed, the power
is saved.
In modern computers, cache is usually designed at two-
layer scheme. The primary cache or L1 Cache, is built in all
processors starting from 486. The capacity of such cache
is small (from 8 to 64 KB, sometimes greater). To increase
performance for data and commands, separate cache is always
used (so-called Harvard architecture — as opposite to Princeton
architecture that uses common memory for commands and
data). L2 Cache is external for the processors 486 and Pentium,
i.e. it is mounted on the motherboard; in Pentium Pro and
all successive processors, L2 Cache is placed in one package
with the core and connected to the processor with a special
internal bus or being a part of the processor crystal.
The difference between the cache locations is quite significant.
If it is mounted on the motherboard, it operates at the frequency
of the system (external) bus of the processor and can be accessed
through the same bus. If the cache is connected by means of a
separate bus, its operation frequency and the bus performance
will not depend on the system bus; thus, its performance will be
much higher. The cache-controller must provide coherency —
conformity of all data of the cache-memory of the both layers
containing data in the main memory.
41
The cache controller operates cache lines of the fixed length.

The line can store a copy of the main memory block, which
size certainly matches the length of the line. Information
about the address and state of the copied main memory block
is connected to every cache line. The line may be valid, which
means that it displays the corresponding block of the main
memory correctly at the current moment, or invalid. The
information about which block occupies the current line (i.e.
the older part of the address or the page number) and its state
is called a tag and stored in the cell of the tag RAM that is
connected to this line.
In the exchange operations with the main memory, the line
usually participates in full (unsectored cache). The option of the
sectored cache when a line contains several adjacent cells-sectors
and their size corresponds to the minimum portion of the data
exchange between the cache and the main memory is also possible.
At that, the directory entry that corresponds to each line must
store bits of validity for each sector of the current line.
The entry of the block that does not have a copy in the cache
is made into the main memory (to increase performance, the
entry can be made via the lazy write buffer). The behavior of the
cache controller during a write operation in the memory when a
copy of the requested area is in some cache line is determined by
its algorithms or Write Policy. Two main write policies for write
data from the cache to the main memory exist: Write Through
(WT) and Write Back (WB).
The WT Policy presupposes execution of each write operation
(even single-byte), falling into the cached block, into the cache line
and the main memory at the same time. At that, the processor will
have to perform a relatively long entry in the main memory with
42
Lesson 1
each write operation. The algorithm is quite simple to implement

and easily provides data integrity through the constant matching
copies of the data in the cache and the main memory. It does
not need to store the signs of the presence and modification –
tag information is quite enough (at that, it is believed that any
line always reflect a block, where the tag indicates which one).
However, this simplicity turns into low efficiency of the entry.
There are variations of this algorithm using delayed buffered
entry, in which data is written into the main memory via the
FIFO-buffer during free cycles of the bus.
The WB Policy permits to reduce the number of the write
operations on the main memory bus. If the memory block, into
which the entry is to be performed, is displayed in the cache, then
physical writing will be first made in the valid cache line; it will
be marked as dirty or modified, i.e. the one requiring roll-out to
the main memory. After the roll-out (recording into the main
memory), the line will become clean and can be used for caching
other blocks without data integrity loss. Data is transcribed as a
single line in the main memory. Such roll-out performed by the
controller may be delayed until the urgency (access to the cached
memory by another subscriber, replacement with new data in the
cache) or may be performed in free time, after the modification
of the entire line.
Exclusive and non-exclusive cache

The concepts of exclusive and non-exclusive caching are very
simple: in the case of a non-exclusive cache, information can be
duplicated at all layers of caching. Thus, L2 may contain data
that are already in L1I and L1D, and L3 (if present) may contain
a complete copy of the entire contents of L2 (and, accordingly,
43
L1I and L1D). Exclusive cache, as opposed to non-exclusive,

provides a clear distinction: if the information is not contained
at a certain layer of the cache, it means that it is absent at all other
layers as well. The advantage of the exclusive cache is obvious: the
overall size of the cached information in this case is equal to the
total volume of the caches at all layers — as opposed to a non-
exclusive cache, where the size of the information being cached
(at worst) is equal to the volume of the largest cache layer. The
disadvantage of the exclusive cache is less obvious, but it exists: a
special mechanism to monitor this «exclusiveness» (for example,
before deleting the information from the L1-cache, the process
of its copying to L2 is automatically initiated).
Other instruction sets

We previously discussed x86 instruction set that has the next
feature: it is designed to work with integers. What if the processor
needs to take a square root or find a sinus or logarithm? Certainly,
the processor can handle such problem but, given the fact that it
focuses on computations with whole numbers, the execution of
such operation will take a lot of cycles.
At the same time, Intel has developed so-called coprocessor for
its processors — a crystal, which is also able to execute commands,
which are some other commands, not x86, and the instruction
set supported by the coprocessor (so-called x87) is designed
to work with floating point numbers; thus, it (coprocessor) is
designed to solve the problems indicated above. In the early days
of PC, it was considered that a small amount of users needed
the coprocessor (indeed, what does the text editor user need
mathematical calculations for?) and it was installed into the system
as complementary. If it was desired, the user could purchase the
44
Lesson 1
coprocessor separately and install it into a special slot on the

motherboard. Today the situation has changed completely.
These days absolutely all users need coprocessor: in our «age of
multimedia», when the computer is more and more able to process
realistic 3D-graphics, mathematical calculations have become an
essential attribute of any multimedia application (for example,
modern games). Therefore, the coprocessor has been installed
together with the processor on the same crystal for quite a long
time. Indeed, if all users require the coprocessor, it is reasonable
to build it into the processor and not produce separately: it will
surely be cheaper.
Thus, any modern processor supports two basic instructions
sets: x86 and x87. Can the processor manufacturers abandon
these instruction sets? No! In that case, the resulting system will
not be software compatible with the PC, because PC software
written on such processor will not work! Support of these two
instruction sets is the key to software compatibility. Can the
processor manufacturers create and add new instruction sets
to the processor? Of course, they can. What is required to make
programs work better on the processor with a new set of commands?
Will the old application designed before the creating of the new
set of commands have any advantages of the execution on the
processor with a new set of commands? No! A program is nothing
but a sequence of instructions for the processor, after all. If the
program does not contain a single new command (there is not
any, since no new commands were written during the program
creation), it is natural that old application will have no benefit, if
the processor is able to execute a new set of commands. Therefore,
when a new set of commands is added to the processor, it should
be understood that there will be NO BENEFIT from it until
45
software developers start writing software considering this new

set of commands. The old program will never benefit from the
new instruction set (except that the authors will overwrite the
old program considering the new instruction set). The processor
was originally conceived as a universal device that could solve
any problems, whether the solution of mathematical equations, a
game, a typesetting program, etc., by changing the program. But
over time, special demands on performance have been made to
some areas. To solve some specific tasks, additional instruction
sets have been introduced — expansions of the basic instruction
set x86.
The technology MME

Depending on the context, the abbreviation MME means
either multi-media extensions or matrix math extensions.
The technology MME has been used for quite a long time,
due to which compression/decompression of video data,
image handling, encryption, and execution of peripheral
operations become faster, i.e. almost all of the operations
used in many modern programs. MME is an extension of
the basic command set on 57 new commands; the innovation
also lies in the introduction of a new option of the commands
execution called Single Instruction — Multiple Data (SIMD).
As it is clear from the name of a command from the MME
set, handling of multiple data is possible.
The instruction set 3DNow! / Enhanced 3DNow!

The company AMD developed the technology 3DNow! for the
processors K6-2, a competitor of Pentium MME and Pentium
II. As it can be seen from the name, this technology is aimed
46
Lesson 1
at the acceleration of the operation with three-dimensional

graphics, as well as other calculations related to video- and
audio data, though it is a simple mathematical instruction set,
in fact. The main goal pursued by AMD during the creation of
the instruction set 3DNow! was the creation of an alternative
to x87, because coprocessors used by AMD for its processors
were not productive in comparison with the Intel coprocessors,
the developer of x87, where three-dimensional graphics and
games are the main consumers of mathematical commands. If
you may notice, gaming is the engine for the selection of the
computer hardware for most consumers, so AMD made a risky
marketing venture: they introduced a new command set in
its processors, and started actively presenting their processors
as the most optimized gameplay processor. As we have said,
the success of a particular set of commands depends entirely
on its software support, so AMD staked on games to make
their processors a successful venture. Its further development
— Enhanced 3DNow! —appeared in the processors Athlon
and Duron.
Instruction Set SSE / SSE2 / SSE3

SSE (Streaming SIMD Extensions), an updated technology
MME, appeared in Pentium III. It contained 70 new instructions
for graphics and sound in addition to the existing commands
MME. The SSE instructions are similar to the MME instructions,
and they were previously called MME-2. Floating point SSE
operations are implemented as a separate module in the processor.
The new SSE instructions allow to work more effectively with
three-dimensional graphics, audio- and video data streams,
speech recognition applications. The instruction set SSE2,
47
which appeared in the Pentium 4 is again the command Set

SSE, extended to new 144 commands, which is primarily
directed at the stream computing. SSE3, which have appeared
in the latest models of Pentium 4, is the next expansion of
SSE, as well.
Block diagram of the processor nodes

Now let us try to combine all of the above mentioned
modules, blocks and nodes into a single diagram and see what
we can get, if take dual-core processor as a basis. We will get
something like this:
Multi-processor systems
Multi-processor systems
An idea of using multiple processors instead of one arose a
long time ago. Guided by the principle of «one head is good,
two is better», the processor manufacturers decided to find
48
Lesson 1
a way to increase the performance of the computer, without

changing the parameters of the processors. Thus, the system
with 2, 4 or more processors became to emerge. Theoretically,
the performance of such systems should be greater than 2, 4,
etc. times, but it is not like that in practice. When creating
multi-processor systems, a variety of conditions must be taken
into account:
■■ The motherboard must be able to work with multiple
processors, i.e. have the appropriate chipset installed as well
as the required number of processor connectors.
■■ The processor must support operation in the multi-
processor systems
Operating systems and software must also be able to work
with multiple processors; in the next course, we will mention
it again when we look at various possibilities of operating
systems. Multi-processor systems give a good increase of
performance only when they are running specialized software.
A computer with multiple processors is controlled by the
operating system that can distribute different tasks to different
processors in the computer. The programs that run on such
computer must be composed of several streams that can be
executed independently. Performance will increase through
these capabilities. Accordingly, if the operating system and
the software do not meet these requirements, there will be no
benefit from the use of multi-processor system: its performance
will be the same as the performance of a single-processor
system. There are two modes of operation for multi-processor
systems — symmetric and asymmetric.
49
Let us consider them in more detail:

■■ AMP, Asymmetric Multiprocessing. In this mode, the
operating system is running on some processors, while
applications are executed by other processors. Such mode
is ineffective under certain conditions, because the load is
distributed unevenly, so the second mode is the most commonly
used today.
■■ SMP, Symmetric Multiprocessing. In this mode, both the
operating system and applications are executed by any processor,
depending on its load that provides greater flexibility in the
distribution of the load, and thus greater productivity.
In addition to many advantages, multi-processor systems
have a distinct lack as well — the cost. Motherboards for multi-
processor systems cost much more expensive than for single-
processor systems, and the purchase of several processors is
also not a cheap pleasure. Therefore, the manufacturers have
begun to develop cheaper versions of multiprocessing.
The first of these versions, which appeared in Pentium 4, is
a technology Hyper Threading (HT), which presents «logical
multiprocessing». What is its feature?
As you already know, the processor is composed of several
units that perform various tasks, such as two ALU pipelines, one
of is always the main and other one is auxiliary, i.e. not always
loaded; one FPU and a unit that loads and unloads commands
and data from the memory. Suppose that during the program
execution there is a situation where only 60% ALU is loaded
over the operation of several pipeline stages, while FPU is not
loaded at all and loading- unloading unit is occupied by 30%.
On average, the percentage of the CPU resources use is about
30%. However, this is only a particular case, and there may be
50
Lesson 1
a lot of other situations when the processor will use a larger or

smaller number of its units, which depends on the software
in many respects. Intel states that the Pentium 4 processor
units are loaded no more than 35% on average. So they have
offered a technology that permits to use idle processor units
as an additional, second processor at small core alternations
(increase for about 10 %). To implement it, a registry file has
been added, and the programs work with such processor like
not with a single one, but with two processors. Of course, this
approach does not give a full alternative for multi-processor
systems, but performance increasing can reach 30%, depending
on the specific application of HT.
Also, with the minimization of the technological process of
processors manufacturing, there has been a possibility to create
two and more processors on a single crystal that is placed in
the standard case. These are called multi-core processors. This
type of architecture is called CMP, Cellular Multiprocessing.
A feature of this architecture and its difference from SMP lies
in using shared memory space by several cores.
CPU manufacturers are currently producing 2-, 4- and
6-core processors for workstations and processors with more
cores for servers. Figure shows a dual core processor, where
you can clearly see two identical cores.
51
Modes of the processor operation

All 32-bit and higher Intel processors, starting with 386,
support program execution in several modes. Processor modes
are designed to execute programs in various environments;
in different regimes, capabilities of the chip are not the same,
because the commands are executed in different ways. Depending
on the mode of the processor, the processor control circuit
for the system memory and tasks varies.
Processors can operate in three modes: real, protected, and
virtual real (real inside protected).
Real mode
The original IBM PC used the processor 8088, which could
perform 16-bit instructions, using 16-bit internal registers, and
address only 1 MB of memory, using 20-bit address bus. All
PC software was originally designed for this processor; it was
designed on the basis of 16-bit instruction set and a memory
model with the capacity of 1 MB. For example, DOS: all DOS
software is written based on the 16-bit instructions. Later
processors, such as 286, could also perform the same 16-bit
instructions as the original 8088, but much faster.
52
Lesson 1
In other words, the processor 286 was fully compatible

with the original processor 8088. The 16-bit mode, in which
the commands given by the processors 8088 and 80286 were
performed, was named the real mode. All programs, running
in the real mode, must use only 16-bit instructions and 20-bit
address. For this type of software, a single-task operation is
used, i.e. only one program must be performed at the time.
There is no built-in protection to prevent overwriting of the
memory cells occupied by a program, by the operating system
or other programs
It means that data or code of any program may be corrupted
during the execution of multiple programs: this can lead to
the system lock-up.
Protected mode
The first 32-bit processor designed for PC, was 386. This chip
could perform an entirely new 32-bit instruction set. In order to
take full advantage of that new command set, 32-bit operating
system and 32-bit applications were required. The new mode
has been called protected, because the programs running in
it are protected against overwriting by other programs of the
memory areas used by them.
This protection makes the system more reliable, because
programs with errors will not affect other programs or the
operating system. Having considered the fact that development
of new operating systems and applications that take advantage
of 32-bit protected mode, would take some time, Intel provided
backward-compatible real mode in 386. Due to this, the processor
386 could perform conventional 16-bit applications and operating
systems, and they were performed much faster than on any
53
other previous generation processors. It was enough for most

users: 32-bit systems and applications were not in demand at
the time, and consumers were satisfied that available 16-bit
programs ran faster. Unfortunately, due to that fact, the processor
386 was never used in the protected mode, and, therefore, all
advantages of such regime were lost. When a modern high-
performance processor runs in the real mode, it resembles a
monstrous accelerated 8088! That is, the processor can, albeit
with a huge (compared to the original 8088) speed, perform only
16-bit applications and address only to the memory of 1 MB,
with which 8088 is able to work. Therefore, there was a need
for new operating systems and new applications that could run
on modern processors in the protected 32-bit mode. However,
users resisted all attempts to move to 32-bit environment. For
them, that meant giving old software up, at least, partially.
Only in August 1995 (10 years after the release of the first
32-bit processor), the first custom 32-bit operating system
Windows 95 appeared, and users accepted it due to the fact
that it was partially 16-bit and was able to perform both new
32-bit programs and old 16-bit programs. Windows 95 used the
third mode of the processor for such backwards compatibility:
Virtual real mode

Essentially, the virtual real mode is an execution mode of
the 16-bit environment (real mode), which is implemented
within a 32-bit protected mode. Since the protected mode is
true multi-tasking, several real-mode sessions can be performed
simultaneously, with proprietary software running on its own
virtual machine in each session. All these applications can run
simultaneously, even while performing other 32-bit programs.
54
Lesson 1
You should note that any program that runs in the virtual
real mode can access memory with the capacity of up to 1 MB
and it will be as if the first and the only megabyte of memory
in the system for each such program. The virtual real window
completely mimics the processor 8088 environment and if the
performance is not considered, software in the virtual real mode
is performed as if it would be performed in the real mode on the
very first PCs. At the start of each 16-bit application, Windows
95/98 created a so-called virtual machine DOS, provided it with
1 MB of memory, so 16-bit application was performed on such
machine. Note that all processors started operating in the real
mode with power on, and switching to the 32 bit mode occurred
only when the 32-bit operating system was launched.
64-bit extended mode IA-32e (AMD64, x86-64, EM64T)

64-bit extended mode IA-32e (AMD64, x86-64, EM64T)
This mode is an extension of the architecture IA-32, developed
by the company AMD and later supported by Intel. Processors
that support 64-bit extension can run in the real mode (8086),
in the mode IA-32 or IA-32e. When using the mode IA-32,
the processor can operate in the protected mode or in the real
virtual mode.
The mode IA-32e allows to work in 64-bit mode or in the
compatibility mode, which involves simultaneous execution of
64- and 32-bit applications.
■■ 64-bit mode. Allows the 64-bit operating system to
execute 64-bit applications.
■■ Compatibility Mode. Allows the 64-bit operating system
to execute 32-bit applications.
The compatibility mode IE-32e allows to run 32-bit and
55
16-bit applications on the 64-bit operating system.
System board, or motherboard

One of the most important units of the computer is a system
board, sometimes called the motherboard or the main board.
In this lesson, we will consider different types of system boards
and their components.
So, why do we need it? On the one hand, we know that
the system unit of the computer consists of a large number
of various devices: it is the processor and the main memory,
permanent storage devices and a huge amount of other optional
equipment, which must be properly combined in order to provide
data transmission between all of them. The motherboard is
engaged in this process, that is, we are now able to formulate
the definition of the motherboard:
Motherboard is the main element of the system unit, which
allows to combine all main components and devices of the
computer as well as to ensure data transmission between them.
56
Lesson 1
CPU power module

external ports of the motherboard
Slots
Nest CPU
ATX power socket
HDD connector
northbridge
southbridge
IDE and SATA

connectors
PCI group slots
Figure — Example of motherboard
As long ago as 25 years back, the system boards (and the vast
majority of other components) of the personal computers were
based on digital chips of small and medium-scale integration
(gates, triggers, registers, etc.). And if you had to deal with the
computers XT / AT, you would see a system board with a hundred
and fifty or two hundreds of integrated circuits (ICs) cases.
57
Figure — motherboard of the standard XT / AT

(IBM PC 1981 year model)
Those chips consumed much power and occupied much

space on the boards. Developers realized quickly enough that
standard components of personal computers — floppy drive
controllers, Direct Memory Access (DMA), programmable
interrupt controllers — could be performed in the form of
specific integrated circuits (ASIC — Application Specific
Integrated Circuit).
The use of such ASICs allowed to reduce the number of
components and interterminal connections on the computers
boards as well as to reduce their cost and power consumption. The
high degree of circuit integration led to computer performance
increasing. By placing complex logic circuits in only a few
cases, the length of signal conductors (and, hence, the capacity
of the installation) was significantly reduced, thus improving
the operating frequencies of logic circuits. Optimization of
conductor distribution in the chips led to further increase of the
operation performance speed. Finally, the developers realized
58
Lesson 1
pretty quickly that all system controllers that were required for
the modern computers operation might be «packaged» in a
few chips with a high degree of integration. Since those chips
were designed as sets intended to build motherboards, they
became known as sets of integrated circuits (ICs) or chipsets.
Currently chipsets play a determining role in the design and
production of personal computers. If hundreds of chips were
installed on the first system boards, as mentioned earlier, their
number rarely exceeds two dozen today. The role of the system
board chipsets is so great that, as a rule, their new varieties
must be designed with the advent of any new technology or
processor.
Ultimately, the overall capabilities of the personal computer
are determined by the chipset of the motherboard to a large
extent. The chipset forms a backbone of any computer system.
The motherboard chipset consists of two components (which
typically represent independent chipsets associated with each
other). These components are called the northbridge and the
southbridge. The names north and south are historical. They
indicate the location of the bridge chipset with respect to the
bus PCI: the North is up, and the South is below. Why is the
bridge? This name has been given chipsets in accordance
with the functions they perform: chipsets serve to connect
different buses and interfaces. The northbridge is of a particular
complexity for the designer, because it works with the most
high-speed devices, so it must operate very quickly, providing
fast and reliable connection between the processor, the memory,
PCI-E bus, and the southbridge.
The southbridge works with slow devices, such as hard
drives, USB bus, PCI bus, etc.
59
Figure — northbridge and southbridge of the motherboard
All that is good but why do we need two bridges? Why is a

single one not enough? Why have the chipset manufacturers
divided the bridge into two separate bridges?
There are several reasons for this.
The first (and probably the most important) reason lies in
the functions performed by the chips. The northbridge must
operate much faster than the southbridge. Development of the
two bridges on a single chip complicates the development and
production of this chipset. Moreover, updating the standards of
the periphery happens very often. When using two chipsets, the
motherboard manufacturers do not have to completely change
the whole set of logic: it is enough to change the southbridge.
It is no secret that the size of the chipset core is much smaller
than the silicon substrate, on which it is installed. It is necessary
in order to properly dilute the conductors from the core of the
processor to its output-legs. Thus, the chipset still has a lot of
space, which disappears when using two chipsets instead of
60
Lesson 1
one. You may ask «why is this empty space needed for?» Some
manufacturers embed sets of graphics in the northbridge just
because of the unused space, and there is also space for further
flight of the developer’s fancy.
However, there are chipsets (they are called integrated),
in which the North and the South bridges are combined in a
single chip, the chipset produced by the company nVidia may
serve as an example.
Figure — Motherboard with nVidia chipset.
Now let us look in more detail what devices can be connected

to the chipsets and try to characterize each of them.
61
4. The diagram of system board

elements interaction
CPU
CPU
North Bridge DIMM

PCI-E 16x
(MCH)
DIMM
USB ATA
South Bridge
(ICH)
Fire Wire SATA
Audio PCI
LAN PCI-E 1x
Figure — Diagram key system components interaction
As we have discussed, we cannot imagine modern

motherboards without system logic chips. The chipset is
similar to the system board. In other words, any two cards
with the same set of chips are functionally identical.
62
Lesson 1
A system logic chipset includes a processor bus interface (which

is also called Front_Side Bus — FSB), memory controllers, bus
controllers, input-output controllers etc. A chipset also contains
the motherboard circuits. If we compare the computer processor
to the engine of a vehicle, then chassis will be likely the analog
for the chipset. It is a metal frame used to install the engine,
which acts as an intermediary between the engine and the outside
environment. The chipset is a frame, suspension, steering gear,
wheels and tires, transmission, drive shaft, differential and brakes.
The chassis of a car is a mechanism that converts the energy of
the engine into linear motion of the vehicle. The chipset, in turn,
is a junction between the processor and various components
of the computer. The processor cannot communicate with the
memory, adapter cards, and a variety of devices without the help
of chipsets. If we would use medical terminology and compare
a processor to the brain, the system logic chipset would take
the place of the spine and the central nervous system by right.
Chipset controls the interface or junctions of the processor
with the various components of the computer. Therefore, it
ultimately determines the type and performance of the processor
being used, the bus frequency, the rate, type and capacity of
the memory. Generally speaking, chipset is one of the most
important system components, perhaps, even more important
than the processor.
In 1986, the company Chips and Technologies was one of
the first who suggested using a technology to combine separate
chipsets in a single system logic. They suggested a chip that
performed the functions of the computer motherboard chipsets,
i.e. functions of the clock rate generator, bus controller, system
timer, two interrupt controllers, two direct-access memory
63
controllers (DMA controllers), and even functions of the CMOS

memory chips as well as clock chips.
Besides the processor, all main components of the computer
system board were replaced by a single chip. Four additional chips
were used as buffers and memory controllers, thus expanding
the component capabilities. The motherboard contained only
five chips. The company Chips and Technologies named that
chipset CS8220. It was a radical change in the production of
motherboards for PC. In addition to the significant reduction
of the motherboard price and its construction, there was an
option to implement the functions, for which expansion cards
were previously installed. Later, four chipsets were replaced by
a new chipset that consisted of only three chipsets; the set was
called New Enhanced AT. A system logic chipset Single Chip
AT (SCAT), which consisted of a single chip, appeared after
some time.
But that is all just history. Today, starting from the 800 series,
all chipsets are based on the HUB architecture. According to it,
the northbridge was named Memory Controller Hub (MCH),
while the southbridge was called I / O Controller Hub (ICH).
Having combined them by means of the high-speed HUB
interface, we obtaon a standard architecture «north / south
bridge». So what is the advantage of the HUB — architecture?
First of all, we should indicate the following features:
■■ Increased bandwidth between the bridges
■■ Reduced load on the PCI bus (as the hub-interface is
independent on PCI)
■■ Reduction of the wiring diagram
In addition, a new bus Low_Pin_Count (LPC), which presented
a 4-bit version of the PCI bus, designed primarily to support
64
Lesson 1
the chips of the motherboard ROM BIOS and Super I / O,

was included in ICH (I/O Controller Hub). Together with the
four signals of data functions, addresses and commands, nine
additional signals are required for the bus operation; that make
up a total of thirteen signals. It permits to significantly reduce
the number of lines connecting ROM BIOS with the Super I /
O chips. For comparison, the ISA bus was used as an interface
between the north and south bridges in the early versions of
the chipsets; the number of its signals was equal to 96.
The main function of the northbridge (MCH) is to control
and direct the stream of data from the four buses (memory bus,
PCI-E16x, processor system bus and interconnect bus for the
south bridge). It should be so balanced as to reduce downtime
when accessing the memory as much as possible, because each
device needs a quick and easy path to it. This is the main task
of the chipset developer — he should distribute all requests to
the memory in a competent and quick manner, set priorities
and create queues, if necessary.
Memory controllers, subordinated to the processor to a great
extent, were used in the early chips; in that case, the processor
had to handle a large amount of data and requests for registration
in the memory. By contrast, this approach cannot be applied
to modern systems: many tasks require enormous computing
power, which will not be available because of processing requests
for access to the memory. Therefore, in modern chipsets, the
memory controller is a stand-alone unit that provides direct
access to the memory of almost all devices in the computer.
At the same time, quite old and slow data transmission
technique is used for the communication between the chipsets;
as a result, a problem may occur when transmitting data over a
65
forward channel to the memory. For example, direct memory

access (DMA) tries to get the hard drive and, say, the bus PCI-
E16x at the same time. In such cases, delays are illegal, but the
memory cannot physically take the data from multiple devices
simultaneously. In this case, the data transmission channel
operates in time-sharing mode, and the data waiting when the
channel is free are stored in the special buffers of the northbridge.
So, a good chipset must provide proper buffering as well as
a set of procedures to provide common access to the procedure
bus in order so as both memory and transmission channel
memory are used effectively.
Unfortunately, it is very difficult to find the ideal buffer size,
because its effectiveness often depends on the software, which
is installed on the computer. For example, it can be mentioned
that efficiency of various buffers from different manufacturers
may differ radically from each other.
The southbridge (ICH) provides connection of the buses PCI,
PCI-E, SATA (2 channels), SATA, USB, FireWare, and input-
output controllers, CMOS memory and flash memory with the
system module BIOS. The southbridge includes timer, interrupt
controller, DMA controller and system board peripherals. The
southern logical hub is represented as a set of virtual bridges
and devices connected to the main bus PCI. However, when
exchanging data with broadband devices (IDE, SATA, USB,
FireWare, Network Adapter, AC'97 or HDA) the PCI bus is
not used, because the meaning of the southern hub is lost in
this case.
Let us look at the buses and explore what functions they
perform.
66
Lesson 1
Bus
Bus is a data transmission channel used jointly by the various
units of the system. Information is transmitted over the bus
in the form of groups of bits. The bus may provide a separate
line (parallel bus) for each word, or all bits of the word may
use a single line sequentially in time (serial bus). Figure shows
a typical connection of devices to the data bus.
Many receivers can be connected to the bus. The combination
of control and address signals determines for whom data on the
bus is meant. The control logic initiates special gating signals
to indicate the recipient when it should receive the data.
Output Input
device device
Output Input
RAM ROM
buffer buffer
P
r
o
c
e
s
s
o
r
Figure — Schematic representation of the data bus
«Third generation» buses have been emerging into the

market, including new versions HyperTransport and InfiniBand.
They usually allow to use both high speeds required for the
memory, video cards and interprocessor communication,
and low speeds when operating with slow devices such as
disk drives. They also tend to be more flexible in terms of the
physical connections allowing them to be used as internal
67
and external buses, for example, for connecting computers. It

leads to complex problems in satisfying various requirements,
so that most of the work concerns the buses connected to the
software rather than hardware. In general, the third-generation
buses are more similar to computer networks than the original
concept of a bus, with higher overheads than it was in the
early systems. They also allow multiple devices to use the bus
simultaneously.
4.1. USB (Universal Serial Bus)

The Universal Serial Bus is designed for peripheral devices.
The bus USB is a data transmission serial interface for medium-
speed and low-speed peripherals. High-speed devices also use
the bus IEEE 1394, which has a number of advantages.
A USB cable is a twisted pair along which data is transmitted
in both directions (differential connection) and two wires for
power supply of the peripheral device. With built-in power
supply lines, USB often permits to use devices without their
own power supply unit (if these devices consume current of
up to 500 mA).
Up to 127 devices can be connected to a single USB controller
through a chain of hubs (they use the topology «star»).
Connector sizes: USB Type A — 4x12 mm, USB type B — 7x8
mm, and A USB mini A and USB mini B — 2x7 mm.
Unlike many other types of standard connectors, durability and
mechanical strength are typical for USB.
68
Lesson 1
USB type А USB type mini B USB type А
Today the third modification of the bus USB has been

introduced.
USB 1.1
Specifications:
■■ two speeds:
■■ high rate of exchange — 12 Mbit / s
■■ low rate of exchange — 1.5 Mbit / s
■■ the maximum cable length for high rate of exchange — 5
m [1]
■■ the maximum cable length for low rate of exchange — 3
m [1]
■■ the maximum number of connected devices (including
replicators) — 127
■■ ability to connect devices with different rates of
exchange
■■ power supply voltage for peripherals — 5
■■ the maximum current consumption per unit — 500 mA
69
USB 2.0
USB 2.0 differs from USB 1.1 only with higher speed and
small modifications in the protocol for data transmission in
the mode Hi-speed (480Mbit / sec).
There are three types of speed in the operation of USB 2.0:
Low-speed 10-1500 Kbit/s (used for interactive devices:
keyboard, mouse, joystick)
Full-speed 0,5–12 Mbit/s (audio / video devices)
Hi-speed 25-480 Mbit/s (video devices, storage media)
In fact, although the speed of USB 2.0 can reach 480Mbit/s in
theory, such devices as hard disks and any media in general never
reach such speed in the bus exchange in reality, although they
can pick up the speed. This can be explained by sufficiently large
USB bus latency between the request for the data transmission
and the actual start of the transmission.
USB 3.0
USB 3.0 is the third version of the USB bus. USB 3.0 is
backward-compatible with USB 2.0 and USB 1.1 and provides
a new data transmission mode SuperSpeed, as well.
USB 3.0 connectors can be distinguished from older versions
by blue color-marking and the initials SS The companies engaged
in USB 3.0 creation: Intel, Microsoft, Hewlett-Packard, Texas
Instruments, NEC and NXP Semiconductors.
USB 3.0 can transmit data at the peak throughput of up to
5 Gbit/s.
70
Lesson 1
4.2. IEEE 1394 (FireWire, i-Link)

A serial high-speed bus designed for the exchange of digital
information between the computer and other electronic devices.
Various companies are promoting the standard under their
trademarks: Apple — FireWire; Sony — i.LINK; Yamaha —
mLAN; TI — Lynx
IEEE 1394 devices are organized in 3-layer diagram —
Transaction, Link and Physical, which correspond to the three
lower layers of the model OSI.
Transaction Layer: routing data streams with support for
asynchronous write-read protocol.
Link Layer: creates data packets and provides their delivery.
Physical Layer: converting digital information to analog
for transmission and vice versa; controlling signal level on
the bus; control bus access.
Bus Manager realizes the connection between the PCI bus
and Transaction Layer implements. It assigns devices on the
bus, number and types of logical channels, detects errors.
Data is transmitted in frames of the length of 125
microseconds. Timeslots for channels are arranged in the
frame. Both synchronous and asynchronous modes of operation
are enabled. Each channel can hold one or more timeslots.
For data transmission, the device-transmitter requests for
synchronous channel with the required bandwidth. If there
is the required number of timeslots for the current channel in
the transmitted frame, the affirmative is sent and the channel
is provided.
71
There are three types of connectors for FireWire:

4pin (IEEE 1394a without power) is mounted on laptops
and camcorders. Two wires serve for the transmission of a
signal (information) and two wires are intended for reception.
6pin (IEEE 1394a): two additional wires for power.
9pin (IEEE 1394b). Additional wires for reception and
transmission of information.
IEEE 1394
In late 1995, IEEE adopted a standard under the serial
number 1394. In Sony digital cameras, the interface IEEE
1394 appeared before the adoption of the standard and was
called iLink.
The interface was originally positioned for the transmission
of video streams. It was to the ground for the external memory
manufacturers, providing high throughput for modern high-
speed drives. Today, many motherboards, as well as almost all
modern laptop models support this interface.
Data transmission rate — 100, 200 and 400 Mbit/s, cable
length is up to 4.5 m.
72
Lesson 1
IEEE 1394a
In 2000, the standard IEEE 1394a was approved. A series of
enhancements was made in order to improve the compatibility
of devices.
The waiting time of 1/3 seconds to reset the bus until the
end of the transition process of installing a reliable connecting
or disconnecting of devices was introduced.
IEEE 1394b
In 2002, the standard IEEE 1394b with new speed emerged:
S800 - 800 Mbit/s and S1600 — 1600 Mbit/s. The maximum
cable length was also increased up to 50, 70 and up to 100
meters if using high-quality fiber-optic cables.
Corresponding devices are denoted FireWire 800 and
FireWire 1600, depending on the maximum speed.
Cables and connectors being used also changed. To achieve
maximum rates at maximum distances the use of optics was
covered: plastic optics — for the length of up 50 meters, glass
optics — for lengths up to 100 meters.
Despite the change in connectors, the standards remained
compatible, that could be achieved by using adapters.
December 12, 2007, the specification S3200 [1] with the
maximum speed up to 3.2 Gb/s was released.
IEEE 1394.1
In 2004, the release of the standard IEEE 1394.1 came out.
That standard was adopted for the possibility of building
large-scale networks and it dramatically increased the number
of the devices to be connected to a giant number — 64,449.
73
IEEE 1394c
Introduced in 2006, 1394c could be used with Cat 5e cable
from Ethernet. It is possible to use it in parallel with Gigabit
Ethernet, i.e. to use two logical and independent from each
other networks on a single cable. The maximum declared
length is 100 m; the maximum speed corresponds to S800 —
800 Mbit/s.
4.3. ATA (Advanced Technology Attachmen)

A parallel interface to connect storage devices (hard drives
and optical drives) to the computer. In the 90s of XX century,
it was de facto standard on the platform IBM PC; today it has
been superseded by its successor — SATA. Different versions
of ATA are known under the synonyms IDE, EIDE, UDMA,
ATAPI; with the advent of SATA it is also called PATA (Parallel
ATA).
To connect hard drives with PATA interface 40-conductor

cable is commonly used (also called a loop). Each loop usually
has two or three connectors, one of which is connected to the
controller connector on the motherboard (in older machines,
this controller is mounted on a separate expansion card), and
74
Lesson 1
one or two others are connected to the disks. PATA loom

transmits 16 bits of data per a unit of time. Sometimes there
are IDE loops, allowing to connect three drives to one IDE
channel, but one of the drives operates in read-only mode in
this case.
For a long time ATA loop contained 40 conductors, but with
the introduction of the mode Ultra DMA / 66 (UDMA4), its
80-conductor version was released. All additional conductors are
grounding conductors, alternating with information providers.
This alternation of conductors reduces capacitive coupling
between them, thereby reducing reciprocal pickup. Capacitive
coupling is a problem at high data rates, so this change was
necessary to ensure the normal operation of the transmission
rate of 66 MB/s (megabytes per second) established by the
specification UDMA4. Faster modes UDMA6 and UDMA5
also require 80-conductor cable.
Although the number of conductors has been doubled, the
number of contacts remains the same as well as the appearance of
connectors. Internal wiring is of course different. Connectors for
80-conductor cable must connect a large number of grounding
conductors to a small number of the grounding contacts, while
each conductor is attached to its contact in the 40-conductor
cable. 80-conductor cable connectors may typically be of
different colors (blue, gray and black), as opposed to the
40-conductor cables, where all connectors are usually of one
color (more often black).
ATA standard always sets the maximum cable length equal
to 46 cm. This limitation makes it difficult to attach devices in
large cases, or connect multiple drives to a single computer,
and almost completely eliminates the possibility of using
75
PATA drives as external drives. Although longer cables are

commercially widespread, you should note that they do not
meet the standard. The same can be said about «round» cables,
which are also widespread. ATA standard describes only
flat cables with specific characteristics of full and capacitive
reactance. Certainly, it does not mean that other cables will
not do, but, in any case, the use of non-standard cables should
be treated with caution.
If two devices are connected to a single loop, one of them
is usually called a Master and the other device is called a Slave.
Typically, the master goes before the slave in the list of disks
listed by BIOS of the computer or the operating system. In
older versions of BIOS (486 and earlier) drives were often
incorrectly marked with the following letters: «C» for the
master disk and «D» for the slave.
If loop contains only one drive, in most cases it must
be configured as a master. Some disks (e.g. Western Digital
production) have special configuration, referred to as single
(i.e. «one disk on the cable»). However, the only drive on the
cable can operate as a slave in most cases (this often occurs
when connecting CD-ROM on a single channel).
Setup called cable select was described as an optional in
the specification ATA-1 and became widespread starting
from ATA-5, because it eliminated the need to rearrange the
jumpers on the drives at any reconnecting. If the drive is set
to the mode cable select, it is automatically set as a master or
a slave, depending on its locality on the loop. To provide the
possibility of determination of the loop locality, the loop must
have cable select. The contact 28 (CSEL) in such loop is not
connected to one of the connectors (gray, usually middle).
76
Lesson 1
The controller grounds that contact. If the drive sees that the
contact is grounded (i.e. logical 0 on it), it is set as a master,
otherwise (high impedance state), it is set as a slave.
Setup called cable select was described as an optional
in the specification ATA-1 and became widespread starting
from ATA-5, because it eliminated the need to rearrange the
jumpers on the drives at any reconnecting. If the drive is set
to the mode cable select, it is automatically set as a master or
a slave, depending on its locality on the loop. To provide the
possibility of determination of the loop locality, the loop must
have cable select. The contact 28 (CSEL) in such loop is not
connected to one of the connectors (gray, usually middle).
The controller grounds that contact. If the drive sees that the
contact is grounded (i.e. logical 0 on it), it is set as a master,
otherwise (high impedance state), it is set as a slave .
80-conductor cables designed for UDMA4 are devoid of these
shortcomings. Now the master is always at the end of the loop,
so if only one device is connected, this unnecessary piece of
cable does not appear. They have «factory» cable select — it is
made within the connector simply by eliminating the contact.
As 80-conductor loops required their own connectors in any
case, their widespread application was of no big problem. The
standard also requires the use of connectors of different colors
for easy identification for both producers and assemblers.
Blue connector is intended for connection to the controller,
black — to the master, gray — to the slave.
The terms «master» and «slave» have been borrowed from
the industrial electronics (where this principle is widely used
for the interaction between nodes and devices), but they are
invalid in this case and therefore not used in the current version
77
of the standard ATA. The master and slave drives should be

properly called device 0 and device 1 respectively. There is a
common myth that the master drive manages disk access to
the channel. In fact, the controller handles disk access control
and precedence (which, in turn, is controlled by the driver of
the operating system). That is, in fact, both devices are driven
with respect to the controller.
Figure — 80-conductor ATA cables with cable select
78
Lesson 1
4.4. SATA (Serial ATA)

The serial interface designed for data exchange with storage
media (usually hard drives). SATA is an enhanced interface
ATA (IDE), which was renamed PATA (Parallel ATA) after
the emergence of SATA.
SATA uses the 7-pin connector instead of the 40-pin
connector at the PATA. SATA- cable has a smaller area, due to
which resistance to the air, blown on the computer components,
decreases; cooling system improves as well.
SATA-cable is more resistant to multiple connections due
to its shape. The supply cord SATA is also designed with an
allowance for multiple connections. SATA power connector
applies 3 supply voltages: +12 V, +5 V and +3.3 V. However,
modern devices can work without voltage + 3.3V, and this
makes it possible to use a passive adapter from the standard
power connector IDE to SATA. A range of SATA devices comes
with two power connectors: SATA and Molex.
SATA standard abandoned the traditional PATA connection
of two devices per loop; each device is provided with a separate
cable that eliminates the problem of the impossibility of
simultaneous operation of devices connected on a single
cable (and the delays arising from this), reduces potential
problems during assembly (the problem of conflict Slave /
Master devices is absent in SATA), eliminates the possibility
of errors when using non-terminated PATA-loops.
SATA standard provides hot-swapping for devices and
function of the command queue (NCQ — Native Command
Queuing).
79
SATA/150
Initially SATA standard provides bus operation at the
frequency of 1.5GHz, providing throughput of about 1.2
Gb/s (150 MB / s). (20% loss of productivity is due to the use
of coding system 8B/10B, where 2 service bits are necessary
for every 8 bits of useful information). The throughput of
SATA/150 is slightly higher that the throughput of the bus Ultra
ATA (UDMA/133). The main advantage of SATA to PATA is
the use of serial bus instead of parallel. Despite the fact that
the serial exchange method is essentially slower than parallel,
in this case it is compensated by the possibility to operate at
higher frequencies due to the greater noise immunity of the
cable. This is achieved by 1) fewer conductors and 2) integration
of information conductors in 2 twisted pairs, screened by the
shielding conductors, which are grounded.
SATA/300
The standard SATA/300 operates at the frequency of 3 GHz
and provides bandwidth of up to 2.4 Gb/s (300 Mb/s). It was
first implemented in the controller of the chipset nForce 4
produced by the company NVIDIA. Quite often, the standard
SATA 300 is called SATA II or SATA 3.0. [1] In theory, SATA/150
and SATA/300 devices must be compatible (as SATA/300
controller and SATA/150 device, and SATA/150 controller
and SATA/300 device) through support of rates matching
(downwards). However, some devices and controllers require
manual exposure mode (for example, on the Seagate HDD
that supports SATA/300 a special jumper is provided to forced
activation of the mode SATA/150).
80
Lesson 1
SATA standard provides the possibility of increasing the

speed operation of up to 600MB/s (6 GHz)
eSATA
eSATA (External SATA) is an interface to connect external
devices that supports the mode «hot swapping» (Hot-plug).
It was created later than SATA (mid 2004).
Key features of eSATA:
■■ Connectors are less fragile and constructively designed
for a larger number of connections. It needs two leads
for connection: data bus and power cable.
■■ It is limited with the length of the data cable (about 2
meters).
■■ Average data operational rate is higher than that of IEEE
1394 or USB.
■■ The CPU is significantly less loaded.
81
SATA power connector IDE SATA data bus connector An example of the SATA controller
4.5. PCI (Peripheral component interconnect)

I / O bus for connecting peripheral devices to the computer
motherboard.
The standard for the PCI bus specifies the next parameters:
■■ physical parameters (e.g. connectors and signal lines
distribution);
■■ electrical parameters (e.g. voltage);
■■ logic model (e.g. the types of bus cycles, bus addressing);
The organization PCI Special Interest Group is engaged in
the development of the standard.
From the user perspective, PCI-devices operate on the
principle «plug and play». After starting the computer, the
system software examines the configuration space for each
PCI -device connected to the bus and allocates resources. Each
device may require up to seven ranges in the PCI memory
address space or input-output PCI address space. In addition,
devices may have ROM containing the executable code for
the processor x86 or PA-RISC.
82
Lesson 1
Interrupt setup is performed by the system software (unlike

the bus ISA, where the interrupt setup is handled by the switch
keys on the card) as well. The interrupt request for the PCI
bus is transmitted by means of changing the signal level on
an IRQ line, so there is a possibility of operation of several
devices with a single interrupt request line; system software
usually tries to allocate a separate interrupt for each device
in order to increase performance.
Specification for the bus PCI:

■■ The frequency of the bus — 33.33 MHz or 66.66 MHz,
synchronous transmission;
■■ bus size — 32, or 64 bits, multiplexed bus (address and
data are transmitted over the same lines);
■■ peak bandwidth for the 32-bit version operating at the
frequency of 33.33 MHz — 133 MB per second;
■■ memory address space — 32 bits (4 bytes);
■■ address space of input-output ports — 32 bits (4 bytes);
■■ configuration address space (for one function) — 256
bytes;
■■ voltage of 3.3 or 5 volts.
83
Figure — PCI bus and its modifications
4.6. PCI Express

PCI Express or PCIe, or PCI-E (also known as 3GIO for
the 3rd Generation I / O; not to be confused with PCI-X or
PXI) is a computer bus that uses a software model of the PCI
bus and high-performance physical protocol based on the
serial data transmission.
The organization PCI Special Interest Group is involved
in the PCI Express standard development process. Unlike
the PCI bus that used common data bus to transmit data,
PCI Express is a packet network with a star topology, in the
general case; PCI Express devices interact with each other
through the medium formed by the switches, wherein each
device is directly connected to the switch by the compound
«point-point».
84
Lesson 1
Furthermore, PCI Express bus supports:

■■ hot swapping for cards;
■■ guaranteed bandwidth (QoS);
■■ energy management;
■■ continuity testing of the transmitted data.
The bus PCI Express is aimed to be used only as a local
bus. Since the PCI Express software model is largely inherited
from the bus PCI, the existing systems and controllers can be
modified to use the bus PCI Express by replacing the physical
layer only, without complete software redesign. High peak
performance of PCI Express permits to use it instead of the
buses AGP and especially PCI and PCI-X; it is expected that
PCI Express will replace these buses in personal computers.
To connect PCI Express devices, a bidirectional serial
connection «point-to-point» called lane is used; this contrasts
with PCI, in which all devices are connected to a common
32-bit parallel unidirectional bus.
PCI Express на The connection between two PCI Express
devices is called a link and it is composed of one (called 1x)
or multiple (2x, 4x, 8x, 12x, 16x and 32x) bidirectional serial
lane connections. Each device must support the connection 1x.
At the electrical layer, each connection uses low voltage
differential signaling (LVDS), reception and transmission of
information is performed by each PCI Express device over two
separate conductors. Thus, a device is connected to the PCI
Express switch with just four conductors, in the simplest case.
Using this approach has the following advantages:
85
■■ PCI Express card is placed and operates correctly in any

slot of the same or higher bandwidth (e.g. x1 card will operate
in slots x4 and x16);
■■ a slot of a larger physical size cannot use all lanes (e.g.
information transmission line corresponding to 1x or 8x can
be supplied to the slot 16x, and all this will function normally,
but it is necessary to connect all lines of «power supply» and
«earth» necessary for the slot 16x).
In both cases, the PCI Express bus will use the maximum
number of available lanes available for the card and for the
slot. However, this does not allow the device to work in a slot
intended for cards with lower bandwidth of the bus PCI Express
(e.g. a card x4 not does not physically fit a slot x1, despite the
fact that it could work in the slot x1 using only one lane).
PCI Express transmits all control data including interrupts
over the same lines that are used for data transmission. The serial
protocol can never be blocked, so latency of the PCI Express
bus is fully comparable with the latency of the bus PCI (note
that the PCI bus uses separate physical line IRQ # A, IRQ #
B, IRQ # C, IRQ # D to transmit an interrupt request signal).
In all high end serial protocols (for example, Gigabit Ethernet),
timing information must be embedded in the transmitted signal.
At the physical level, PCI Express uses a coding technique
8B/10B that has become conventional (8 bits of data are
replaced by 10 bits transmitted over the channel, thus 20%
of the traffic transmitted over the channel is redundant) that
allows to raise the noise immunity.
86
Lesson 1
Throughput 1x 2x 4x 8x 12x 16x 32x

PCI Express 1.0, Gb/s 0.5 1 2 4 6 8 16
PCI Express 2.0, Gb/s 1 2 4 8 12 16 32
Slots PCI Express x4, x16, x1, again x16, the standard 32-
bit slot PCI at the bottom, on the motherboard DFI LanParty
nForce4 SLI-DR.
87
4.7. System resources

System resources are communication channels, addresses
and signals used by the computer nodes to exchange data with
the help of buses. Typically, the system resources include:
■■ memory addresses;
■■ interrupt request channels (IRQ);
■■ direct memory access channels (DMA);
■■ input-output ports addresses.
Interrupt is a signal that tells the processor about the close
of an asynchronous event. At that, execution of the current
command sequence is suspended and control is passed to the
interrupt handlers that performs the job of handling the event
and returns control to the interrupted code.
Types of interrupts:
■■ Hardware (IRQ — Interrupt Request) — events from
peripheral devices (such as keystrokes, mouse movement, a
signal from the timer, network card or hard drive) — external
interrupts, or events in the microprocessor — (for example,
division by zero) — internal interrupts;
■■ Software — initiated by an executable program that is
synchronously, not asynchronously. Software interrupts can
be used to call operating system services.
Interrupt handlers are usually written so that their processing
time was as short as possible.
Before the end of the interrupt handling, processing or
generation of other interrupts is disabled. Some processors
support a hierarchy of interrupts, allowing interrupts of a
higher priority to be called while processing less important
interrupts.
88
Lesson 1
Interrupt vector is a memory cell containing the address

of the interrupt handler. Interrupt intercept is a change of the
interrupt handler to its own one.
Interrupt vectors are combined into the interrupt vector
table. Location of the table depends on the type and mode of
operation of the microprocessor.
Direct Memory Access (DMA) is a mode of data exchange,
without the participation of the CPU. As a result, transmission
rate is increased because the data are not forwarded to the
CPU and back again.
It is available only if there is a hardware DMA-controller.
DMA-controller may have access to the system bus
independently of the CPU. The controller comprises a plurality
of registers available for the CPU to read and write. Controller
registers set port (to be used), the direction of data transfer
(read / write), the transfer unit (byte / word by word), and the
number of bytes that should be transferred.
The CPU programs the controller DMA by setting its registers.
The processor instructs the device (e.g. disk) to read the data
into an internal buffer. The DMA-controller starts operating
by sending the device a read request (the device does not
know whether the request is coming from the processor or the
controller DMA). The memory address is already on the address
bus, so that the device knows where to send the next word
from its internal buffer. When the recording is completed, the
device sends an acknowledgment signal to the controller DMA.
Then the controller increases the used memory address and
decrements the value of its byte counter. Then, a read request
is repeated until the counter value becomes zero. At the end
of the copying cycle, the DMA controller initiates a processor
89
interrupt, signifying the end of the data transmission. The

controller can be multichannel, capable to perform multiple
operations simultaneously.
4.8. Audio Codec Bus — Audio Codec (AC) Link

This feature of the chipset is designed for transmitting mixed
signal (analog or digital) from non-chip devices built into
the motherboard of devices such as audio card, or network
devices, i.e. a modem or network card. To ease placement on
the motherboard and to reduce their cost, all these devices
are not fully functional as their «normal» counterparts: they
can be software-controlled, i.e. the CPU and the memory take
the part of their functions. AC bus was developed by Intel in
order to facilitate the introduction of such software control.
This is why some users disable all built-in functions to offload
the CPU and the memory.
The modern version of the bus, AC97 2.2 5.1 provides
a signal interface device communication. As for the sound,
the bus can be connected to the chip, which includes a codec
(coding / decoding), digital-to-analog and analog-to-digital
converters for the communication of the chip with speakers or
headphones, and the devices of linear and microphone inputs.
As for the telephony, AC bus also has a physical interface
(PHY) for connection to a telephone line. AC97 chips can also
be used for communication with embedded network cards.
Using AC bus entails a number of features for the system
designer. The most significant of them is the risk of sharp
performance degradation when using the built-in software
devices, which, as already mentioned, load the CPU. Performance
90
Lesson 1
might be degraded in complex resource consuming applications.

This is especially true for sound chips: the majority of computer
games almost always give the sound card not a single sound,
but a great number of sounds, and this great number must
be transformed in a certain way and then issued on the final
sound output device (speakers, headphones). Most often,
sounds have different transmission rates of audio samples,
depending on the desired sound quality. The audio processor
must handle all these sounds, mix them and give out to the
sound card outputs.
Another example of sound processing is processing of the
complex HRTF («head-related transfer functions» — the main
transfer function), which create 3D-positioned sound. These
functions require PCI card with the DSP processor, as well as
some amount of RAM. However, such processor is usually not
mounted on the motherboard. Although such solution would
unload the CPU, such step is not a good solution for creating
high-end sound card and for motherboard manufacturers,
as well, because the designing is quite complicated. However,
the obtained sound has pretty high quality: users would not
sense the difference between the sound of a good PCI card,
and built-in high-quality audio solution, especially when using
the same acoustics.
Of course, a much larger number of embedded devices
can support the southbridge, but they utterly load the CPU.
For example, among the manufacturers, software modems
that are as integrated into the motherboard or placed in the
card PCI are very popular today. Certainly, they load the CPU
and RAM quite a lot, as they do not have their own means of
signal processing. The only advantage of such decisions is a
91
very low cost. However, use of such devices is quite justified

and normally perceived both by producers and consumers,
with the existing computing power of the processor.
Nowadays a technology HDA or High Definition Audio
is widely implemented: it is a new standard, which supports
32-bit audio with a sampling frequency of up to 192 kHz,
surround sound formats 5.1 and 7.1. Audio processing functions
are assigned to the southbridge of the chipset and the CPU.
Built-in audio controller HDA provides better quality audio
compared with AC'97.
92
Lesson 1
5. Form factor of motherboards

Today, there are five predominant types of sizes of
motherboards — AT, ATX, BTX, LPX and NLX. In addition,
there are smaller versions of the format AT (Baby-AT), ATX
(Mini-ATX, microATX) and NLX (microNLX). Moreover,
the extension for the specification microATX, which adds a
new form factor FlexATX to this list, has been released. All
these specifications, defining the shape and dimensions of
motherboards as well as the location of the components placed
on them and features of cases, are described below.
5.1.AT
The form factor AT is divided into two different size
modifications — AT and Baby AT. The size of a full-size
AT motherboard is up to 12" wide, which means that such
motherboard is unlikely to fit in most of today's cases. Assembly
of such board will be complicated by the drive box (hard drive
included) and power supply. In addition, placement of the
board components on a large distance from each other can
cause some problems when operating at high frequencies.
Therefore, this size is uncommon after the release of the CPU
386 motherboards.
Thus, the only motherboards made in the form factor AT
available in stock, are those of the appropriate format Baby
AT. The size of the board Baby AT is 8.5" wide and 13" long.
In principle, some manufacturers can reduce the length of the
board to save material or for some other reasons. To mount
the board, the board is provided with three rows of holes.
93
All AT boards have some features in common. Almost all

of them have serial and parallel ports, which are attached to
the motherboard via joint plates. They also have a keyboard
connector soldered at the rear part of the board. A socket for
the processor is installed on the front side of the board. SIMM
and DIMM slots are located in different places, although they
are almost always located in the upper part of the motherboard.
Today, this format has long been gone from the scene.
Especially because more and more new opportunities offered
by operating systems are implemented only on the ATX
motherboards. To say nothing of usability — all connectors
are often located in one place of the motherboard; as a result, all
cables from the communication ports are extended substantially
across the entire motherboard to the rear part of the case or
from the IDE and FDD ports — to the front part. Slots for
the memory modules stray into the power supply. To put it
mildly, it is not convenient with limited freedom of actions
within a very small space of MiniTower. In addition, the issue
with cooling is poorly resolved — air does not flow directly
to the indigent part of the cooling system — the processor.
94
Lesson 1
Figure — BabyAT Motherboard Figure — Form Factor AT

Motherboard
5.2. LPX
Even before ATX, the first attempts to reduce the cost of PC
resulted in the form factor LPX creation. It was intended for use
in cases Slimline or Low-profile. The problem was solved by a
rather innovative solution — introduction of the rack. Instead
of inserting expansion cards directly into the motherboard,
they are placed in the upright stack connected to the board,
parallel to the motherboard. This allowed to significantly reduce
the height of the case, as the height of expansion cards usually
affects this parameter. Payment for such compactness was the
maximum number of connected cards — 2–3 pieces. Another
innovation, which the boards LPX began to use widely, is a
chip integrated on the motherboard. The size of the case for
the LPX is 9 x 13'', for Mini LPX — 8 x 10''.
After the appearance of NLX, it began to replace LPX
gradually.
95
Figure — LPX form factor motherboard
5.3. ATX
It is not surprise that the ATX form factor has become so
popular in all its versions.
No one can say that it is not justified. The specification ATX,
proposed by Intel in 1995, was aimed precisely at correcting all
those shortcomings that emerged in the form factor AT over
time. The solution, in fact, was very simple — to turn the Baby
AT board to 90 degrees and perform required enhancements
to the design. By the time, Intel had already had experience
in this area — the form factor LPX. The ATX just embodies
the best behavior both of Baby AT and LPX: extensibility is
taken from Baby AT and by high integration of components
from LPX. Here is what happened as a result:
96
Lesson 1
Integrated connectors of the input-output ports. In all

modern motherboards, input-output ports connectors are
present on the board, so placing their connectors on it looks
quite natural, which causes significant reduction in the number
of jumpers inside the case. In addition, at the same time among
the traditional parallel and serial ports, connector for the
keyboard, there was a place for beginners — PS/2 and USB
ports. Moreover, the cost of the motherboard was lowered,
due to the reduction of the default cables.
The ease of access to the memory modules was significantly
increased. As a result of all those changes, the memory module
slots moved farther from the slots to the motherboard, from
the CPU and power supply. As a result, the memory capacity
increasing became a matter of minutes, whereas one often has
to take up the screwdriver using the Baby AT motherboards.
The reduced distance between the board and the drives.
Connectors of the controllers IDE and FDD moved almost
close to the devices connected to them. This reduces the length
of the cable being used, thereby increasing the reliability of
the system.
Diversity of the processor and slots for expansion cards.
Processor socket is moved from the front part of the motherboard
to the rear part, next to the power supply. Such enhancement
permits to install full-length boards to the expansion slots:
they do not interfere with the processor. Moreover, the cooling
problem was solved — now air sucked by the power unit blows
directly to the processor.
Interaction with the power supply is improved. A single
20-pin connector is used instead of using two connectors
like on the AT motherboards. The option of controlling the
97
motherboard by means of the power supply is enabled —

switching on at the right time or upon the occurrence of
certain events, possibility to turn power on using keyboard,
disabling via the operating system, etc.
Voltage 3.3V. Now supply voltage 3.3 V widely used by
modern components of the system (e.g. cards PCI!) comes
from the power supply. The AT-boards use a stabilizer mounted
on the motherboard to retrieve the voltage. The ATX-boards
no longer need it.
The specific size of the motherboard is described in the
specification largely based on the convenience of the developers:
either two boards ATX (12 x 9.6''), or four boards Mini-ATX
(11.2 x 8.2'') can be obtained from a standard plate (24 x 18'').
By the way, the compatibility with older cases was also taken
into account: the maximum width of the ATX board, 12'' is
practically identical to the length of the AT boards in order
to provide an option of effortless use of the ATX board in the
AT case. However, today it is rather related to the field of pure
theory — the AT case has become a rare thing to find. The
mounting holes in the ATX motherboard are fully compliant
with the formats AT and Baby AT.
microATX
ATX form factor was designed at the time of blooming of
Socket 7 systems, which explains why much of it does not
correspond to the modern tendencies. For example, a typical
combination of slots, on the basis of which the specification
was created, looked like 3 ISA/3 PCI/1 adjacent. It sounds quite
irrelevant today, doesn't it? ISA, the absence of AGP, AMR,
etc. Again, in any case, 7slots are not used in 99%of cases,
98
Lesson 1
especially today. In general, it is a waste of resources in cheap

PC ATX. Based on such considerations, a format specification
microATX, modification ATX motherboard, designed for 4
slots for expansion cards was represented in December 1997.
In fact, the changes compared to ATX were minimal. The
size of the motherboard was reduced up to 9.6 x 9.6'', so that
it became completely square, the size of the power supply was
reduced as well. I / O connector block remained unchanged,
so the microATX motherboard could be used in the ATX 2.01
case with minimal modifications.
Figure — ATX form factor motherboard
5.4. BTX
Under the abbreviation BTX (Balanced Technology
Extended), a new standard providing numerous important
case enhancement and components hides. The main objective
99
for BTX development was to create a PC with a more efficient

cooling, as well as with maximum quiet operation. BTX
divides PC case into zones, only strictly defined components
can be placed within each of such zone. For example, the CPU
socket can only be located in a strictly defined area. If the ATX
standard gave quite a lot of freedom to the producers, the BTX
required from them to adhere strictly separated zones.
Visually, BTX case resembles a mirror image of the version
ATX. Expansion slots and I / O panel are reversed. Among
other things, this approach makes it possible to blow PCI
Express graphics card with air from the CPU cooler.
Figure — Differences between ATX and BTX.
The biggest changes in BTX relate passing the airflow:

from his capture to the release on the rear part of the case.
Fresh air is captured by a 120-mm fan from the front panel,
after which it often follows through the air pipe. The air passes
through the CPU cooler and is ejected through the rear panel.
100
Lesson 1
The air pipe directs air through the CPU cooler (thermal
module).
Low-profile version of the cooler Coolermaster.
101
SRM allows to fix the CPU cooler (thermal module).

Under the name «thermal module», a special CPU cooler,
characterized by new dimensions and area of responsibility,
is hidden. To install the thermal module SRM (Support and
Retention Module), which is fastened to the case under the
motherboard, is required. Moreover, SRM provides additional
binding and stability of the motherboard.
Today, the market offers, mainly, only ATX cases, which
support transformation to BTX. «Blank» BTX cases are very
rare. One of the first examples is the case manufactured by a
Taiwanese company Yeong Yang.
102
Lesson 1
Figure — The internal structure of BTX.

Blue air pipe is hard to miss.
103
Fully equipped PC BTX.
Today the market offers a completely new concept: cases

that can be easily transformed from the form factor ATX into
BTX. It is not yet clear how soon the BTX form factor will be
extended in the desktop PC market. Especially considering
the fact that motherboard manufacturers are not much in a
hurry to release new products under BTX.
104
Lesson 1
5.5. NLX
With time, the specification LPX, similar to the Baby AT,
ceased to meet the requirements of the time. There were new
processors and new technologies. LPX was no longer able to
provide acceptable spatial and thermal conditions for new
low-profile systems. As a result, just as ATX came to replace
Baby AT, in 1997, the specification for the LPX form factor
appeared as the development of the LPX ideas that took into
account the emergence of new technologies. The format aimed
for use in low-profile cases. During its development, both
technical factors (for example, the appearance of AGP and
DIMM modules, integration of audio / video components
on the motherboard) and the need to provide greater ease of
maintenance were taken into account. So, screwdriver is not
required at all for the assembly / disassembly of many systems
on the basis of this form factor.
105
As it is seen in the diagram, the main characteristics of the

NLX motherboard are as follows:
A rack for the expansions cards, mounted on the right edge
of the board.
The motherboard is freely detached from the rack and
moved out from the case, for example, to change the processor
or the memory.
The processor is installed in the left top corner of the board,
just opposite to the cooler.
High components such as processor and memory are grouped
in the left end of the board in order to place full-size expansion
cads on the rack.
Input-output blocks on the far end of the board of single
(in the area of the expansions cards) and double height, to
place the maximum number of connectors.
In general, a rack is a very interesting thing. In fact, it is a
single motherboard divided into two parts — a part where
system components are placed and another part connected
to it via 340-pin connector at the angle of 90 degrees; the
second part contains different input-output components —
expansions cards, ports connectors, data drives where the
power is supplied. Thus, first, the maintenance becomes more
convenient: there is no need to get access to the components
that are not required at that moment. Second, the manufacturers
obtain great flexibility: a single model of the main board is
designed, while the rack is modeled individually, for a concrete
customer, with an integration of the required components.
106
Lesson 1
Does this description not sound familiar to you? A rack

mounted on the motherboard, on which some input-output
components are located, instead of direct integration to the
motherboard, and all this serves to simplify maintenance,
let more flexibility to the manufacturers, etc.? Correct, the
specification AMR describing the same ideology for the ATX
boards appeared soon after the specification NLX.
In contrast to the other, rather rigid specifications, NLX
provides producers much more freedom in decision making.
The size of the motherboard NLX ranges from 8 x 10'' to 9
x 13.6''. NLX case must be able to manage both of these two
formats, as well as intermediate formats. Typically, the boards
that fit the minimum dimensions are referred to as Mini NLX.
The following interesting detail must also be mentioned: NLX
case has its USB ports located on the front panel, which is quite
convenient for identification solutions of the type e.Token.
107
The only thing to add is that the some places on the board must
be free due to the specification, providing opportunities for the
expansion of functions that will appear in future versions of the
specification. For example, to create motherboards for servers
and workstations based on the form factor NLX.
108
Lesson 1
5.6. WTX
However, on the other hand, powerful workstations and
servers also do not completely satisfy the specification AT and
ATX. There are other kinds of problems, where price does not
play the most important role. Such issues as ensuring proper
cooling, placing large amounts of memory, convenient support
for multiprocessor configurations, large power supply capacity,
placing a greater number of data storage controllers ports and I /
O ports were brought to the forefront. In 1998, the specification
WTX was presented. It was oriented to support dual-processor
motherboards of any configurations, as well as to support current
and future technologies, of video cards and memory.
Probably, we should pay particular attention to two new
components — Board Adapter Plate (BAP) and Flex Slot.
In this specification, the developers attempted to move
away from the conventional model, when the motherboard was
mounted to the case by means of mounting holes placed in certain
places. It is attached to the BAP and the manner of fastening is
left for the conscience of the motherboard manufacturer; the
standard BAP is attached to the case.
Besides the usual things like the size of the board (14 x
16.75''), the characteristics of the power supply (up to 850 W),
etc., WTX specification describes the architecture Flex Slot — in
a sense, AMR for workstations. Flex Slot is designed to improve
ease of maintenance, give more flexibility for the developers,
reduce the entry of the motherboard into the market. Flex Slot
card looks like this:
Any PCI, SCSI or IEEE 1394 controllers, sound, network
interfaces, parallel and serial ports, USB, means for monitoring
the state of the system can be placed on such cards.
109
In due time, the specification had support from such

motherboard manufacturers as Acer, Asus and Tyan. The
specification has been discontinued in 2008.
Figure — WTX form factor motherboard as compared to ATX
110
Lesson 1
5.7. FlexATX
Finally, the form factor Flex ATX appeared similar to the
way, in which the ideas embodied in the Baby AT and LPX gave
raise to the ATX, i.e. as the development of the specifications
microATX and NPX. It is not even a separate specification,
but a supplement to the specifying microATX. Looking at the
success of iMac, which, in fact, offered nothing new except for
appearance, PC producers also decided to go that route. In
February at the Intel Developer Forum, Intel was the first to
present FlexATX — a motherboard, which area for 25–30%
less than microATX offered.
In theory, Flex ATX motherboard can be used in cases
that comply with the ATX 2.03 or microATX 1.0, with some
modifications. However, there are enough boards for today's
cases; the matter concerns those fanciful plastic designs that
require such compactness. At the Intel Developer Forum, Intel
also demonstrated several options of the cases. The designers'
fantasy got going quite well — vases, pyramids, trees, spirals,
a wide range of different designed was proposed. Here are
some turns from the specification to deepen the impression:
«aesthetic value», «greater satisfaction from system possession».
It is a quite well done description of the motherboard PC form
factor, isn't it?
111
Flex means flex. The specification is extremely flexible, and

leaves the manufacturer a lot of things that were previously strictly
described. Thus, the manufacturer himself will determine the
size and placement of the power supply, the construction of I /
O cards, the transition to new processor technology methods to
achieve a low-profile design. In practice, only size — 9 x 7.5''is
defined more or less clearly. By the way, as for the new processor
technology — Intel at IDF showed the system at FlexATX
motherboard with Pentium III, which was stated Slot-1, but
look at the picture to judge. The specification highlights that
FlexATX boards are intended for Socket processors only...
112
Lesson 1
Figure — FlexATX case
113

No Excuses

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

No Excuses

Enviado por

Direitos autorais:

Formatos disponíveis

Lesson No.

In order to start your training at Cisco Academy, your

2. Random access memory (RAM)

Block diagram illustrating the connection of nodes to a

on them. Thus, it turns out that both commands and data

«the way from the core to the memory» becomes distinctly

Diagram of the interaction «the CPU — RAM».

in them, and if the work table is large enough, it is possible to

and parameters of the modules, consisting of memory chips

Let us start with the first classification:

on the motherboard. That is, it was necessary to provide system

So let us look at the types of modules that are relevant today

two-sided contact configuration. All DIMM modules used

You can see an example of such memory module in Figure.

market offered chips with the rates of DDR2-400 (modules

However, these modules are not compatible with the modules

The memory type DDR4 SDRAM uses the modules DIMM

As before, the modules are labeled PC4-xxxxx and the chips

The processor is the main computing unit of the computer,

Processor command sets

with the fundamental approach to the formation of a set of

And a similar example in the form of the instructions RISC

Many RISC instructions are quite simple, so execution of

unit of time to measure the performance. There are units of

Data bus of the processor

The length of the portion of data that the processor can

The CPU clock and the system bus clock rate.

However, the situation is rather different. In most systems,

After all, the memory and other components are preferably

simply imagine that the northbridge has become a part of the

Note: of course, the degree of parallelism is not infinite:

is available in the current moment; the units «understand»

In fact, everything is far more complicated. Suppose that

Processors, equipped with a mechanism of parallel execution of

Data sample from RAM or cache

Performing arithmetic and

Results record in the cache or RAM (5th stage)

Figure — A simple model of the pipeline operation

Let us discuss in detail how a modern processor behaves

all, as long as it is known, which code area is «urgent» after

probably need to address later. Thus, the CPU can perform a

Prefetch algorithms, as well as algorithms for the branch

is called a cache - memory, it belongs to the type SRAM. A

speed memory, due to the «procrastinating» with the decision.

The cache controller operates cache lines of the fixed length.

each write operation. The algorithm is quite simple to implement

Exclusive and non-exclusive cache

L1I and L1D). Exclusive cache, as opposed to non-exclusive,

Other instruction sets

coprocessor separately and install it into a special slot on the

software developers start writing software considering this new

The technology MME

The instruction set 3DNow! / Enhanced 3DNow!

at the acceleration of the operation with three-dimensional

Instruction Set SSE / SSE2 / SSE3

which appeared in the Pentium 4 is again the command Set

Block diagram of the processor nodes

a way to increase the performance of the computer, without

Let us consider them in more detail:

a lot of other situations when the processor will use a larger or

Modes of the processor operation

In other words, the processor 286 was fully compatible

other previous generation processors. It was enough for most

Virtual real mode