Computer Architecture

Basic Structure of
Computers
1
Information Handled by a Computer
 Program:
 A program is a sequence of machine instructions, located and executing from the
memory. Each instruction is a assembly or machine language instruction.
 Two types of information handled by a program: instruction and data.
 Instruction
 Govern the transfer of information within a computer as well as between the
computer and its I/O devices
 Specify the arithmetic and logic operations to be performed
 Data
 Used as operands of the instructions
 Source program is itself some data/information, which is input by the user or the
programmer
 Any instruction or data encoded in binary code: 0 and 1
2
Basic Functional Units
Arithmetic and
Input Logic Unit
(ALU)
Memory
Control Unit
Output (CU)
I/O Devices Processor
Basic Functional Units Of A Computer

3
Basic Functional Units of Computer
 A computer consists of five functionally independent main parts.

 Input unit
 Memory unit
 Arithmetic and logic unit
 Output unit
 Control unit
4
Input Unit
 Computer accepts coded information through input

units which read the data
 Keyboard is used to take input of a letter, digit or
special symbol.
 Joysticks, mouse are used as graphic input device.
 Microphone can be used to capture audio input.
5
Memory Unit
 Store programs and data
 Two types of memory : primary and secondary
 Primary memory / main memory (fast, but costly, so smaller capacity)
 Fast - memory access time in microseconds
 Programs must be stored in memory while they are being executed
 Large number of semiconductor storage cells each capable of storing one bit of
information.
 Memory is processed (read/write) in words. Usually, 1 word = 4 bytes = 32 bits; this is
called 32 bit system. But, in a 64 bit system, 1 word = 64 bits = 8 bytes
 Each byte has an address – called “byte addressable” system
 Example : RAM(Random Access Memory)
 Secondary memory (large, cheapest, but slowest)
 Very slow - access time in milliseconds (which is 1000 times slower than microseconds
access time of RAM, or million times slower than nano-second access time of cache
memory)
 Example : magnetic and optical disks
 Memory hierarchy– cache memory (fastest, smallest, most expensive),
primary memory (RAM) and secondary memory
6
Arithmetic and Logic Unit (ALU)
 ALU: Most computer operations (i.e., Instructions) are executed in
ALU of the processor.
 Load the program into memory
 Bring the instructions & operands to the processor
 Perform operation in ALU
 Store the result back to memory or retain in the processor register.
 Registers: Special places/logical units inside the CPU to hold data or

values.
 Temporarily holds data values during any arithmetic and logical operation
executing inside the CPU.
 Each register can hold 1 word of data
 Registers are extremely fast with nanosecond access time (read/write
time), compared to the microsecond access time of the main memory
(i.e., RAM) or millisecond access time of magnetic or optical storage (i.e.,
Hard disks or CD/DVD)
7
Output Unit
 Output unit is the counterpart of input unit

 It sends processed information to the outside world
 Example: printer
8
Control Unit (CU)
 All computer operations are controlled by the CU

 Performs a sequence of micro-instructions to implement some
meaningful task or sub-task.
 For example, multiplication/division by a sequence of shift and
add/sub.
 The timing signals that govern the I/O transfers, data transfer
between the processor and the memory are also generated by the
control unit.
 Control unit is usually distributed throughout the machine instead

of standing alone.
 Fast control of ALU by the control unit of CPU
9
Operations of a Computer
 Accept information in the form of programs and data through

an input unit and store it in the memory.
 Fetch the information stored in the memory, under program
control, into the ALU, where the info is processed.
 Output the processed information through an output unit.
 Control all activities inside the machine through the CU.
10
A Typical Instruction
 MOV LOCA, R0
 General format :
 Instruction = operation destination_operand, source_operand
 Moves the operand of a register R0 in the processor to a memory
location LOCA.
 The original contents of R0 are preserved.
 The original contents of LOCA is overwritten.
11
Load and Store Instruction
 Instruction that moves data from memory to register is

called load instruction (i.e. MOV R0, LOCA)
 Instruction that moves data from register to memory is

called store instruction (i.e. MOV LOCA, R0)
12
Another Typical Instruction
 Add LOCA, R0
 General format:
 Instruction = operation destination_operand, source_operand
 Add the operand in a register R0 in the processor with the operand
at memory location LOCA.
 Place the sum into the memory location LOCA.
 The original contents of R0 are preserved.
 The original contents of LOCA is overwritten.
 Instruction is fetched from the register into the memory. The
operand at R0 is fetched and added to the contents of LOCA. The
resulting sum is stored in memory location LOCA
13
Connections Between the Processor
and the Memory
Memory
MAR MDR
Control
Unit
PC R0 Processor
R1
.
. ALU
IR
Rn-1
N general purpose registers
Connections Between The Processors And The Memory 14

Examples of a Few Registers:
 Instruction Register (IR): Holds the instruction that is currently
executing by the CPU
 Program Counter Register (PC): Points to (i.e., Holds the address of) the
next instruction that will be fetched from the memory to be executed
by the CPU
 General-purpose Registers (R0 – Rn-1): Generally holds the operands for
executing the instructions of current program
 Memory Address Register (MAR): Holds the memory address to be read. A
read signal from the CPU to the memory module reads the word address
held by the MAR register
 Memory Data Register (MDR): Facilitates the transfer of operands/data
to/from memory from/to the CPU
15
Basic Operating Steps for Executing a
Program
 Programs reside in the main memory (RAM) through input devices
 PC register’s value is set to the first instruction
Repeat the following steps until the “END” instruction is executed

 The contents of PC are transferred to MAR
 A read signal is sent by CU to the memory
 The memory module reads out the location addressed by MAR
register.
 The contents of that location is loaded into (returned by) MDR.
 The contents of MDR are transferred to IR register
 Decode and execute the instruction at IR
 PC is incremented properly to point to the next instruction
16
Decoding and Execution a Typical
Instruction
 Get operands for ALU
 The operand may already in a general-purpose register
 or, may be fetched from memory (send address to MAR - send read signal to
memory module - wait for MFC signal (WMFC) from memory - get the
operand/data from MDR)
 Perform operation in ALU
 Store the result back
 Store in a general-purpose register
 or, store into memory (send the write address to MAR, and send result to
MDR – write signal to memory – WMFC)
 Meanwhile, PC is incremented to the next instruction
17
Interrupt
 Normal execution of programs may be preempted if some device

requires urgent servicing. For example, a key pressed in keyboard,
data arrived in modem, printer is ready.
 The normal execution of the current program must be interrupted if
the device raises an interrupt signal.
 It is a request from an I/O device for service by the processor.
 The processor provides the required service by executing an
appropriate interrupt-service routine
 Current system information backup and restore (PC, general-purpose
registers, control information, specific information)
18
Bus Structures
 There are many ways to connect different parts inside a computer
together.
 A group of lines/wires that serves as a connecting path for several
devices is called a bus.
 In addition to the lines that carry the data the bus must have lines for
address and control purposes.
 For example, set of connection lines of the motherboard between CPU
and ram, hard disk, I/O devices, etc.
19
Single Bus Structure
 The simplest way to interconnect functional units is to use a single bus
 Single-bus architecture: basic block diagram
Input Output Memory Processor
 All units are connected to this bus

 Bus can be used for only one transfer at a time
 Only two units can actively use the bus at any given time
 Low cost and flexible for attaching peripheral devices
20
Speed Issue
 Different devices have different transfer/ operate speed.

 If the speed of bus is bounded by the slowest device connected to it,
the efficiency will be very low.
 To solve this problem a common approach is to use buffers.
 For example: keyboard buffer holds the immediate key pressed,
printer buffer holds the next character to be printed, modem buffer
holds the last, say 10k, data bytes arrived/to be sent
21
Sample Questions
 What are the basic functional units of a computer?
 What is a register? Explain the purpose of the following registers:
PC, IR, MAR, MDR, SP, accumulator.
 Write down the basic/typical operating steps of during the
execution of a program.
 What is bus? Draw the basic block diagram of the single bus
architecture.
 LOAD LOCA, R1,ADD R1, R0- in these instructions whose contents
will be overwritten?
22
Performance
 The most important measure of a computer is how quickly it can
execute programs i.e. runtime of programs.
 Three factors affect performance:
 Hardware design (i.e., CPU clock rate)
 1GHz CPU => 1 billion Hz => 109 clock cycles/sec (Hz=cycles/sec)
 1 basic operation (i.e., Integer addition) possible in 1 cycle => 1 billion basic operations
(109 integer additions) possible in 1 sec.
 1MHz CPU=> 1 million Hz => 106 clock cycles/sec
 Instruction Set Architecture (ISA) (i.e., CISC or RISC ISA)

 CISC => instructions complex, more capable, but runs slower
 RISC => instructions simple, runs faster, but less capable
 Compiler
 How efficient your compiler to optimize your code for pipelining.
23
Improving Performance(Hardwired
Design)
 Processor time to execute a program depends on the hardware
involved in the execution of individual machine instructions.
Cache
Main Memory Processor
memory
Bus
The Processor Cache.

24
Improving Performance(Hardwired
Design)
 The processor and a relatively small cache memory can be
fabricated on a single integrated circuit chip.
 The internal speed of performing the basic steps of instruction
processing on such chips is very high.
 Cache memory is costly.
 Memory management.
25
Processor Clock
 Processor circuits are controlled by a timing signal called a clock.
 Clock defines regular time intervals known as clock cycles.
 The inverse of the length of one clock cycle is known as clock rate.
 Clock rate = 1 GHz = 109 Hz = 109 cycles/second or 109 clock pulses per second. It
also means it has a clock cycle of 1/109 =10-9 sec = 1 ns (nanosecond).
 4GHz CPU => 4x109 cy/sec => 1 clock cycle = 0.25 ns
 500 MHz => 500x106 cycles/sec => 2 ns clock pulses
 1 MHz = 106 cycles/sec; 1KHz=103 cycles/sec
 1 GHz=1000MHz, 1MHz=1000KHz, 1KHz=1000Hz
 The execution of each instruction is divided into several basic steps, each of
which completes in one clock cycle. (i.e., integer addition/subtraction =
just 1 cycle, but a division may require as many as around 30 cycles)
 Hz (hertz) – cycles per second (clock cycles / second)
 Million is denoted by MEGA(M) and billion is denoted by GIGA(G)
26
Basic Performance Equation
 T – Processor time required to execute a program that may have been
prepared in high-level language. Unit: second
 N – Dynamic instruction count. It is the number of actual machine
language instructions needed to complete the execution. A single 1-line
loop may execute more than a billion times. N is computed considering
loops, repeated function calls, recursion, etc. Unit: instructions
 S – Average number of basic steps (or, clock cycles) needed to execute
one machine instruction. Each basic step completes in one clock cycle.
Unit: cycles/instruction
 R – Clock rate: cycles/sec
 The execution time T of a program that has a dynamic instruction count N
is given by:
𝑁×𝑆 𝑖𝑛𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 × 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛
 𝑇= unit : second because
𝑅 𝑐𝑦𝑐𝑙𝑒𝑠/𝑠𝑒𝑐
27
Basic Performance Equation
 To improve T we have to reduce N and S, increase R
 the value of N is reduced if the source program is
compiled into fewer machine instructions
 the value of S is reduced if instructions have a smaller
number of basic steps to perform
 using a higher frequency clock increases the value of R
which means that the time required to complete a
basic execution step is reduced.
28
Performance Evaluation
 Example: A program with dynamic instruction count of 1000

instructions, each instruction taking 5 cycles on average and
running at a speed of 1KHz (R = 103 or 1000 cycles/second),
what will be the program execution time T?
 Answer:
𝑁×𝑆
𝑇= 𝑅
1000 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 ×5 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛
𝑇= 1000 𝑐𝑦𝑐𝑙𝑒𝑠/𝑠𝑒𝑐𝑜𝑛𝑑
= 5 second
29
Instruction Throughput
 Instruction throughput is defined as the number of instructions

executed per second.
𝑅
𝑃𝑆 =
𝑆
𝑐𝑦𝑐𝑙𝑒𝑠/𝑠𝑒𝑐𝑜𝑛𝑑
 Unit : instructions/second because 𝑐𝑦𝑐𝑙𝑒𝑠/𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛
30
Simple Exercises with T, N, S, R, Ps
 A program has dynamic instruction count of 200 millions and require an average of 3.6
clock cycles to execute an instruction. If the processor has a clock speed of 400 MHz,
then how much time is required to complete the execution of the program?
 T=NxS/R
= (200 x 106 instructions x 3.6 cycles/instructions) / (400x106
cycles/second)
= 1.8 second
 A 1GHz processor requires 2 seconds to complete the execution of a program that has
with a dynamic instruction count of 200 millions. Compute the value of S.
 S=TxR/N
= (2 second x 1x109 cycle/sec) / 200x106 instructions
= 10 cycles/instruction
 A 2GHz processor has an instruction throughput of 500 million instructions per
second. Compute the value of S.
 Ps = R / S so, S = R / 𝑃𝑠
=(2 x 109 cycles/sec) / (500x106 instruction/sec)
= 4 cycles/instruction
31
Pipelining and Superscalar Operation
 Instructions are not necessarily executed one after another.

 The value of S doesn’t have to be the number of clock cycles to
execute one instruction.
 Pipelining is the capability of overlapping the execution of successive
instructions (i.e., parallel/simultaneous execution of multiple
instructions).
 Superscalar operation is multiple instruction pipelines are
implemented in the processor.
 Goal is to increase throughput > 1 instruction/cycle (reduce S could
become <1)
32
Improving Performance: Increasing
Clock Rate
 T = N x S / R. To improve performance we need to reduce runtime T
by increasing clock rate R
 Improve the integrated-circuit (IC) technology to make the logic
circuits faster
 Reduce the time needed to complete a basic step
 Reduce the amount of processing done in one basic step. However, this
may increase the number of basic steps needed). RISC processors try to
do this (i.e., Makes instructions simpler), so each instruction can run in
a faster clock cycle and becomes more suitable for pipelining
 Increases in R that are entirely caused by improvements in IC
technology affect all aspects of the processor’s operation equally
except the time to access the main memory.
33
Improving Performance: Effect of
Instruction Set Architectures (ISA)
 Tradeoff between N and S
 Reduced Instruction Set Computers (RISC): simpler instructions => N ,
S , better than CISC, because pipelining is more effective for RISC
 Complex Instruction Set Computers (CISC): complex instructions => N ,
S , not good, as not suitable for pipelining. Instructions complex, more
capable => the program gets smaller in size (reduced N), but complex
instructions increase S and hampers pipeline. Example of CISC: intel
processors
 So, A key consideration is the use of pipelining
 S is close to 1, means the number of cycles per instruction is nearly
ideal (close to 1) (i.e. RISC processors)
 RISC is better, because easier to implement efficient pipelining with
simpler instruction sets. (i.e. RISC architecture: ARM processors)
34
Improving Performance: Compiler
Based Speedup
 A compiler translates a high-level language program into a sequence
of machine instructions.
 To reduce N, we need a suitable machine instruction set and a
compiler that makes good use of it.
 A compiler may not be designed for a specific processor; however, a
high-quality compiler is usually designed for, and with, a specific
processor.
 Goal of a smart compiler is to reduce N×S to reduce program runtime
T to improve performance.
 Compiler has nothing to do with the clock rate R
35
Performance Measurement
 T is difficult to compute. Also, T has inappropriate unit
(second) for commercial use.
 Measure computer performance using benchmark programs (a set
of sample programs, i.e., Word processing programs, games,
media (audio/video) playback , I/O intensive programs, etc.).
 System Performance Evaluation Corporation (SPEC) selects and
publishes representative application programs for different
application domains, together with test results for many
commercially available computers.
 Reference computer: a previous, renowned computer system,
picked by SPEC
𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟
𝑆𝑃𝐸𝐶 𝑟𝑎𝑡𝑖𝑛𝑔 =
𝑅𝑢𝑛𝑛𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 𝑢𝑛𝑑𝑒𝑟 𝑡𝑒𝑠𝑡
1
𝑆𝑃𝐸𝐶 𝑟𝑎𝑡𝑖𝑛𝑔 = ς𝑛𝑖=1 𝑆𝑃𝐸𝐶𝑖 𝑛
36
Multiprocessors and Multi-computers
 Multiprocessor computer
 Good for executing several different application tasks in parallel
 Good for executing subtasks of a single large task in parallel
 All processors have access to all of the memory that is shared memory
multiprocessor.
 Example: some commercial server computers using two/four processors.
Cost– processors, memory units, complex interconnection networks
 Multi computer
 Such computer only have access to its own memory
 Example: a network of computers, such as a LAN (local area network), WAN
(wide area network) or MAN (metropolitan area network) etc.
 Exchange message via a communication network – message-passing multi-
computers
37
Sample Questions
 Write down the basic performance equation including T, N, S and
R. Explain each of these terms using an example. Explain how you
can improve the overall performance by improving each of these
components
 What do you understand by CISC and RISC architecture? How they
affect the N, S and R components of the basic performance
equation?
 What is SPEC rating? What is its purpose? How the SPEC rating is
calculated?
 What do you understand by multiprocessors and multi-computer
systems?
38

Computer Architecture

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Computer Architecture

Enviado por

Direitos autorais:

Formatos disponíveis

Basic Structure of

I/O Devices Processor

Basic Functional Units Of A Computer

 A computer consists of five functionally independent main parts.

 Computer accepts coded information through input

 Registers: Special places/logical units inside the CPU to hold data or

 Output unit is the counterpart of input unit

 All computer operations are controlled by the CU

 Control unit is usually distributed throughout the machine instead

 Fast control of ALU by the control unit of CPU

 Accept information in the form of programs and data through

 Instruction that moves data from memory to register is

 Instruction that moves data from register to memory is

Connections Between The Processors And The Memory 14

Repeat the following steps until the “END” instruction is executed

 Normal execution of programs may be preempted if some device

Input Output Memory Processor

 All units are connected to this bus

 Different devices have different transfer/ operate speed.

 1MHz CPU=> 1 million Hz => 106 clock cycles/sec

 Instruction Set Architecture (ISA) (i.e., CISC or RISC ISA)

The Processor Cache.

 Example: A program with dynamic instruction count of 1000

 Instruction throughput is defined as the number of instructions

 Instructions are not necessarily executed one after another.

Você também pode gostar