Você está na página 1de 99

Machine Language

Machine Language


Machine Language is the only language understood by computers (i.e. its native to processor) Impossible for humans to read. Consists of only 0s and 1s.
0001001111110000

Machine code:
Set of commands directly executable via CPU Commands in numeric code Lowest semantic level Generally 2 executing oportunities:  Interpretiv via micro code  Directly processing via hardware Machinecode command:
-Binary word (fix length, causes elementary operations within CPU)

Machinecode program
sequence of machinecode commands

Machine code language




Structure: Operationcode


OpCode

OpAddress

Defining executable operation Spezification of operands


Constants/register addresses/storage addresses

Operandaddress


     

Difference between 1/2/3 address machines Data transport commands Arithmetic and logical commands Process controlling commands In-/output commands Special commands Disadvantage:
Difficultly readable No symbolic names(Mnemomics)

Requirements


Every machine language must contain following commands:


Load instruction (CPU loads data directly from memory into ac register) consists of:
Operation code (opcode)  Address of main memory cell from which ac register will be loaded 0010000111100011


Store instruction stores computed information into a memory cell or CPU(ALU) register

Machine code language


Bunch of Bytes  Machine language is binary codes that the computer executes.  The computer fetches the instructions from memory and executes them.  Machine language for a square root program  8b 45 e0 89 45 f8 89 45 ec 8b 45 ec  89 45 f8 8b 45 e0 ba 00 00 00 00 f7  7d f8 03 45 f8 d1 f8 89 45 ec 3b 45  f8 75 e2 8b f4


Instruction Format
The general format for a machine language instruction is

Op code Operands

The operands can be a memory address, a register or a value.




Op codes
Each assembler instruction represents a numerical machine language op code.  add 05  cmp 3B  dec FF  idiv 7F  jmp 39  push 68  sar D0


Data Location
  

Register The data is in a CPU register. Memory The data is in a location in RAM Immediate The data is part of the instruction. Immediate data items are read only.

Intel Assembler
              

The Intel assembler allows you to use one mnemonic for different op codes. There are several versions of the add instruction based on the size of the operands. The assembler picks the correct op code. Opcode Instruction Description 04 ib ADD AL,imm8 ADD imm8 to AL 05 iw ADD AX,imm16 ADD imm16 toAX 05 id ADD EAX,imm32 ADD imm32 to EAX 80 /0 ibADD r/m8,imm8 ADD imm8 to r/m8 81 /0 iwADD r/m16,imm16 ADD imm16 to r/m16 81 /0 idADD r/m32,imm32 ADD imm32 to r/m32 83 /0 ibADD r/m16,imm8 ADD sign-extended 83 /0 ibADD r/m32,imm8 ADD sign-extended i 00 /r ADD r/m8,r8 ADD r8 to r/m8 01 /r ADD r/m16,r16 ADD r16 to r/m16 01 /r ADD r/m32,r32 ADD r32 to r/m32 02 /r ADD r8,r/m8 ADD r/m8 to r8

Mnemonic to Op Code Mapping




  

Intel assembler uses the same mnemonic for the machine language instruction to: Move a byte from memory to a register Move a byte from a register to memory The Intel mov instruction generates different machine language op codes depending upon the size of the operands.

Number of Operands
In addition to the op code, the instruction might contain zero, one, two or three operands.  Different architectures use different number of operands.  A single architecture may have instructions with differing number of operands.


No operand instructions


   

Some instructions do not require any operands. The data affected is implied in the instruction. RET Return from function HLT Halt CPUID Get details about the CPU LAHF Load Status Flags into AH Reg
Op code

One Operand Instructions




 

Some unary operations require only one operand inc al - increment the al register jmp address jump to the address

Two Operand Instructions


 

  

Many instructions act on two operands Most math instructions use two operands and return the results in one of them. add al,varname add bl,al imul eax, varname

Three Operand Instructions




 

Some machines support three operands (Intel Pentium does not). Most MIPS instructions use three operands add R1,R2,R3

Additional Instruction Fields




Most architectures support an addressing mode that combines an address field in the instruction and the contents of a register add R3,addr[R7] This instruction adds the contents of register R3 with the memory location whose address is the sum of the address field and R7.

Variable or Fixed Length


Some architectures use variable length instructions. Instructions with more operands or memory addresses are longer. saves memory  Some architectures always use the same length instruction. easier to find the beginning of instructions instructions are in aligned words


Assembler and Machine


Assembler language is the easy way to write machine language.  Each line of an assembler program generates one machine language instruction.  The assembler allows you to use variable names instead of numerical addresses and instruction mnemonics instead of numerical operation codes.


Assembled Code
               

addr machine assembler 004a 8b 45 e0 mov eax, number[ebp] 004d 89 45 f8 mov good[ebp], eax 0050 89 45 ec mov better[ebp], eax 0053 8b 45 ec mov eax, better[ebp] again: 0056 89 45 f8 mov good[ebp], eax 0059 8b 45 e0 mov eax, number[ebp] 005c ba 00 00 00 00 mov edx, 0 0061 f7 7d f8 idiv good[ebp] 0064 03 45 f8 add eax, good[ebp] 0067 d1 f8 sar eax, 1 0069 89 45 ec mov better[ebp], eax 006c 3b 45 f8 cmp eax, good[ebp] 006f 75 e2 jne SHORT again 0071 8b f4 mov esi, esp

Example Machine Language


Assume each instruction of this imaginary computer is 32 bits in length  label: mnemonic reg, address[index reg]  Assembler language format


Machine Language Program


        

00 01100018 LOAD R1, y 04 2110001C DIV R1, z 08 05100020 ADD R1, five 0C 02100014 STORE R1, x 10 47000000 RET 14 x res 18 y res 1C z res 20 00000005 five +5

World of Numbers
  

The op codes are numbers The address is a number The register field is a number

Instruction Cycle
Fetch the instruction from the memory address in the Program Counter register  Increment the Program Counter  Decode the type of instruction  Fetch the operands  Execute the instruction  Store the results


Basic Processor Components


Program Counter contains the address of the next instruction to execute.  Arithmetic Logic Unit logic to perform arithmetic and logical functions  User registers hold data  Memory Address Register contains the address to be copied to or from RAM  Memory Buffer Register contains data copied to or from RAM.


Instruction Fetch

Increment Program Counter

Decode Instruction

Operand Fetch ,Execution, Result

Jump Instruction


Consider an arithmetic instruction followed by a jump instruction. The arithmetic instruction sets bits in the status register

Execution Stage of Instruction 1 Result Save Stage of Instruction 1 Instruction 2 Fetech

Write a program to add a data byte located at offset 0500h in 2000h segment to another data byte available at 0600h in the same segment & store the result at 0700H in the same segment


Flow chart

Start

Initialise Seg.R

Get content of 0500H in a GPR Perform Addition Store result in 0700H stop

  

 

MOV AX,2000H ;INT DS WITH VALUE MOV DS,AX ;2000H MOV AX,[500H] ;GET FIRST DATA BYTE FROM 0500H OFFSET. ADD AX,[600H] ;ADD THIS TO THE 2 BYTE FROM 0600H MOV [700H],AX ;STORE AX IN 0700H HLT ;STOP

Disadvantages.
 

 

Complicated &Time consuming. The chances of error being committed are more at ML in hand coding& enter the program byte/byte into stream. Debugging-more difficult. Program are not understood by everyone & the results are not stored in a user friendly form

Assembly Languages


One step up from machine language Originally a more userfriendly way to program Now mostly a compiler target Model of computation: stored program computer
ENIAC, 1946 17k tubes, 5kHz

Assembler Language
 

  

Translated into machinecode language(Interpreter) Each operation code(opcode) owns one symbolic command Assignments of operand addresses are possible Labels for command addresses Usage of pseudo commands
Commands for assembler Assigment of values/addresses(variables) Definition of the programstart addresses Allocating of memory for variables

Assembly language


 

To program in assembly you need to understand concepts behind machine language and executionfetch cycle of CPU. Assembly is a mnemonic form of machine language. As noted before, assembly is a machine specific language. Although Assembly and machine language might look similar, they are in fact two different types of languages.
Assembly consists of both binary and simple words Machine code composed only of 0s and 1s

It is possible to code machine language directly, thus bypassing assembly instructions. This is done by replacing Assembly instructions by machine instruction numbers directly.
For example:  Instead of load instruction we might say 0004; thus load001000110001 is equal to 0004001000110001  Add might equal to 2005

Execution speed remains the same because basic instructions (like ADD, SUB, LOAD, STORE, and etc.) are hardwired into CPU.

Advantages


 

Programming is not so complicated, bz the function of coding is performed by assembler. Chances of error being committed are less, bz the mnemonics are used instead of numeric opcodes.Easier to enter the ALP. Debugging are easier. The constant & address location can be labeled with suggestive labels hence imparting a user friendly interface to user Flexibility.

Human-Readable Machine Language




Computers like ones and zeros


0001110010000110

Humans like symbols


ADD R6,R2,R6 ; increment index reg.

Assembler is a program that turns symbols into machine instructions. ISA-specific: close correspondence between symbols and instruction set  mnemonics for opcodes  labels for memory locations additional operations for allocating storage and initializing data

Assembler languages-structure


<Label> Label <Mnemomic> <Operand> Comments

symbolic labeling of an assembler address (command address at Machine level)




Mnemomic
Symbolic description of an operation

Operands
Contains of variables or addresse if necessary

Comments

Execution Process

ALU register


Simplest ALU design usually includes:


Program counter register Accumulator register Stack pointer register

Memory and ALU exchange information in words. Word is a fixed chunk of data and depends on system design. Usually it is chosen so that one word fits into one memory slot.

Assembler Languages - Machine Instructions




Bitpatterns are created, executed as commands by CPU Classes:


Arithmetic/logical Operations(ADD,SUB,XOR, administrative commands - EQU, shifting&rotation commands) Data transfer(load/save operations, speicher<>register, register<>register) Control commands(jump op. [un-]conditional /relativ,control op. STOP) In-/output commands

What is it good for?


 

Because it is extremely low level, assembly language can be optimized extremely well. Therefore assembly language is used where the utmost performance is required for applications. Assembly language is also useful for communicating with the machine at a hardware level. For this reason, it is often used for writing device drivers. A third benefit of assembly language is the size of the resulting programs. Because no conversion from a higher level by a compiler is required, the resulting programs can be exceedingly small. For this reason, assembly language has been a language of choice for the demo scene. This involves coders writing extremely small programs which show off their creative and technical abilities to other members of the scene. In this tutorial you will learn how to write assembly language programs and how to make use of these to do interesting things such as calculations, graphics, writing windows programs and optimizing programs written in other languages.

Assembler All purpose Register




Arithmetic example:
Source and Destination Data width has to euqal AX , BX, CX, DX, SI, DI, BP, SP
; arithmetic operations ADD AX, BX ; AX := AX+BX SUB AH,AL ; AH := AH - AL MOV AL, CL ; AL := CL INC CX ; CX := CX+1 DEC CL ; CL := CL-1 NEG CX ; CX := -CX

AH BH CH

AL BL CL

Assembler Special Register




Unless to all-purpose registers


Special register(SS, DS, CS, ES, IP)


Never ever are


Destination/Source of a mov command Destination of arithmetic operations

Assembler Flag Register

Zero Sign Trap Interrupt enable Direction Overflow Parity Auxiliary carry Carry

Assembler Flag Register




FLAG-Bits:
C Carry numbers A Aux. Carry O Overflow Area crossing of unsigned Area crossing at BCD-design Area crossing at arithmetic operation with signed numbers S Sign True if result = negativ Z Zero Result = Null P Parity Result has an even number of 1 Bits D Direction flagDefines direction of stringcommands I Interrupt Global Interrupt Enable/Disable Flag T Trap Flag Used by debugger, allows single-stepmodus

Assembler Flag Register




Missing flags:
 

V: Twos complement overflow indicator H: Half Carry Flag

Operations and flags


ADD, SUB, NEG INC, DEC MUL, DIV AND, OR , XOR affects ---O, S, Z, A, P, C O, S, Z, A, P O, C S, Z, P, C

Assembler Jump Operations

Un-/conditioned jumps
Example:
Mov AX, 0 CMP CX, 0 again: JZ end ADD AX, CX DEC CX JMP again end: NOP (jumpzero, conditioned j.)

(unconditioned jumped)

Assembly languages are a type of low-level languages for programming computers, microprocessors, microcontrollers, and other (usually) integrated circuits. They implement a symbolic representation of the numeric machine codes and other constants needed to program a particular CPU architecture. This representation is usually defined by the hardware manufacturer, and is based on abbreviations (called mnemonics) that help the programmer remember individual instructions, registers, etc. An assembly language family is thus specific to a certain physical (or virtual) computer architecture. This is in contrast to most high-level languages, which are (ideally) portable. A utility program called an assembler is used to translate assembly language statements into the target computer's machine code. The assembler performs a more or less isomorphic translation (a one-to-one mapping) from mnemonic statements into machine instructions and data. This is in contrast with high-level languages, in which a single statement generally results in many machine instructions. Many sophisticated assemblers offer additional mechanisms to facilitate program development, control the assembly process, and aid debugging. In particular, most modern assemblers include a macro facility (described below), and are called macro assemblers

 

Assembler
Compare with: Microassembler.

 

  

Typically a modern assembler creates object code by translating assembly instruction mnemonics into opcodes, and by resolving symbolic names for memory locations and other entities.[1] The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitutione.g., to generate common short sequences of instructions as inline, instead of called subroutines, or even generate entire programs or program suites. Assemblers are generally simpler to write than compilers for high-level languages, and have been available since the 1950s. Modern assemblers, especially for RISC based architectures, such as MIPS, Sun SPARC, and HP PA-RISC, as well as x86(-64), optimize instruction scheduling to exploit the CPU pipeline efficiently. There are two types of assemblers based on how many passes through the source are needed to produce the executable program. The advantage of a one-pass assembler is speed, which is not as important as it once was with advances in computer speed and capabilities. The advantage of the two-pass assembler is that symbols can be defined anywhere in the program source. As a result, the program can be defined in a more logical and meaningful way. This makes two-pass assembler programs easier to read and maintain.[2] More sophisticated high-level assemblers provide language abstractions such as: See Language design below for more details. Note that, in normal professional usage, the term assembler is often used ambiguously: It is frequently used to refer to an assembly language itself, rather than to the assembler utility. Thus: "CP/CMS was written in S/360 assembler" as opposed to "ASM-H was a widely-used S/370 assembler."

      

Assembly language A program written in assembly language consists of a series of instructions--mnemonics that correspond to a stream of executable instructions, when translated by an assembler, that can be loaded into memory and executed. For example, an x86/IA-32 processor can execute the following binary instruction ('MOV') as expressed in machine language (see x86 assembly language): Hexadecimal: B0 61 (Binary: 10110000 01100001) The equivalent assembly language representation is easier to remember (example in Intel syntax, more mnemonic): MOV AL, 61h This instruction means: The mnemonic "mov" represents the opcode 10110000 which actually copies the value in the second operand into the register indicated by the first operand. The mnemonic was chosen by the designer of the instruction set to abbreviate "move", making it easier for the programmer to remember. Typical of an assembly language statement, a comma-separated list of arguments or parameters follows the opcode. The mnemonic "mov" may refer to a family of numeric opcodes that do the same thing, but imply different registers. The opcode 10110000 specifically copies an 8-bit value into the register AL. The opcode 10100001 is also denoted by the mnemonic "mov", but instead copies a 16-bit value into the register AX, and gets it by reading from system memory (the second operand says where) instead of copying the value of the operand itself. Transforming assembly into machine language is accomplished by an assembler, and the (partial) reverse by a disassembler. Unlike high-level languages, there is usually a one-to-one correspondence between simple assembly statements and machine language instructions. However, in some cases, an assembler may provide pseudoinstructions (essentially macros) which expand into several machine language instructions to provide commonly needed functionality. For example, for a machine that lacks a "branch if greater or equal" instruction, an assembler may provide a pseudoinstruction that expands to the machine's "set if less than" and "branch if zero (on the result of the set instruction)". Most full-featured assemblers also provide a rich macro language (discussed below) which is used by vendors and programmers to generate more complex code and data sequences. Each computer architecture and processor architecture usually has its own machine language. On this level, each instruction is simple enough to be executed using a relatively small number of electronic circuits. Computers differ by the number and type of operations they support. For example, a new 64-bit machine would have different circuitry from a 32-bit machine. They may also have different sizes and numbers of registers, and different representations of data types in storage. While most general-purpose computers are able to carry out essentially the same functionality, the ways they do so differ; the corresponding assembly languages reflect these differences. Multiple sets of mnemonics or assembly-language syntax may exist for

   

 

Basic elements Any Assembly language consists of 3 types of instruction statements which are used to define the program operations: [edit] Opcode mnemonics Instructions (statements) in assembly language are generally very simple, unlike those in high-level languages. Generally, a mnemonic is a symbolic name for a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be either immediate (typically one byte values, coded in the instruction itself) or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. [edit] Data sections There are instructions used to define data elements to hold data and variables. They define the type of data, the length and the alignment of data. These instructions can also define whether the data is available to outside programs (programs assembled separately) or only to the program in which the data section is defined.

 

  

Assembly directives and pseudo-ops Assembly directives are instructions that are executed by the assembler at assembly time, not by the CPU at run time. They can make the assembly of the program dependent on parameters input by the programmer, so that one program can be assembled different ways, perhaps for different applications. They also can be used to manipulate presentation of the program to make it easier for the programmer to read and maintain. (For example, pseudo-ops would be used to reserve storage areas and optionally their initial contents.) The names of pseudo-ops often start with a dot to distinguish them from machine instructions. Some assemblers also support pseudo-instructions, which generate two or more machine instructions. Symbolic assemblers allow programmers to associate arbitrary names (labels or symbols) with memory locations. Usually, every constant and variable is given a name so instructions can reference those locations by name, thus promoting self-documenting code. In executable code, the name of each subroutine is associated with its entry point, so any calls to a subroutine can use its name. Inside subroutines, GOTO destinations are given labels. Some assemblers support local symbols which are lexically distinct from normal symbols (e.g., the use of "10$" as a GOTO destination). Most assemblers provide flexible symbol management, allowing programmers to manage different namespaces, automatically calculate offsets within data structures, and assign labels that refer to literal values or the result of simple computations performed by the assembler. Labels can also be used to initialize constants and variables with relocatable addresses. Assembly languages, like most other computer languages, allow comments to be added to assembly source code that are ignored by the assembler. Good use of comments is even more important with assembly code than with higher-level languages, as the meaning and purpose of a sequence of instructions is harder to decipher from the code itself. Wise use of these facilities can greatly simplify the problems of coding and maintaining low-level code. Raw assembly source code as generated by compilers or disassemblerscode without any comments, meaningful symbols, or data definitionsis quite difficult to read when changes must be made

 

Macros Many assemblers support predefined macros, and others support programmer-defined (and repeatedly redefinable) macros involving sequences of text lines that variables and constants are embedded in. This sequence of text lines may include a sequence of instructions, or a sequence of data storage pseudo-ops. Once a macro has been defined using the appropriate pseudo-op, its name may be used in place of a mnemonic. When the assembler processes such a statement, it replaces the statement with the text lines associated with that macro, then processes them just as though they had appeared in the source code file all along (including, in better assemblers, expansion of any macros appearing in the replacement text). Since macros can have 'short' names but expand to several or indeed many lines of code, they can be used to make assembly language programs appear to be much shorter (require fewer lines of source code from the application programmer, as with a higher level language). They can also be used to add higher levels of structure to assembly programs, optionally introduce embedded de-bugging code via parameters and other similar features. Many assemblers have built-in (or predefined) macros for system calls and other special code sequences, such as the generation and storage of data realized through advanced bitwise and boolean operations used in gaming, software security, data management, and cryptography. Macro assemblers often allow macros to take parameters. Some assemblers include quite sophisticated macro languages, incorporating such high-level language elements as optional parameters, symbolic variables, conditionals, string manipulation, and arithmetic operations, all usable during the execution of a given macro, and allowing macros to save context or exchange information. Thus a macro might generate a large number of assembly language instructions or data definitions, based on the macro arguments. This could be used to generate record-style data structures or "unrolled" loops, for example, or could generate entire algorithms based on complex parameters. An organization using assembly language that has been heavily extended using such a macro suite can be considered to be working in a higher-level language, since such programmers are not working with a computer's lowest-level conceptual elements. Macros were used to customize large scale software systems for specific customers in the mainframe era and were also used by customer personnel to satisfy their employers' needs by making specific versions of manufacturer operating systems; this was done, for example, by systems programmers working with IBM's Conversational Monitor System/Virtual Machine (CMS/VM) and with IBM's "real time transaction processing" add-ons, CICS, Customer Information Control System, and ACP/TPF, the airline/financial system that began in the 1970s and still runs many large Global Distribution Systems (GDS) and credit card systems today.

It was also possible to use solely the macro processing capabilities of an assembler to generate code written in completely different languages, for example, to generate a version of a program in Cobol using a pure macro assembler program containing lines of Cobol code inside assembly time operators instructing the assembler to generate arbitrary code. This was because, as was realized in the 1970s, the concept of "macro processing" is independent of the concept of "assembly", the former being in modern terms more word processing, text processing, than generating object code. The concept of macro processing in fact appeared in and appears in the C programming language, which supports "preprocessor instructions" to set variables, and make conditional tests on their values. Note that unlike certain previous macro processors inside assemblers, the C preprocessor was not Turing-complete because it lacked the ability to either loop or "go to", the latter allowing the programmer to loop. Despite the power of macro processing, it fell into disuse in many high level languages (a major exception being C/C++) while remaining a perennial for assemblers. This was because many programmers were rather confused by macro parameter substitution and did not disambiguate macro processing from assembly and execution[dubious discuss]. Macro parameter substitution is strictly by name: at macro processing time, the value of a parameter is textually substituted for its name. The most famous class of bugs resulting was the use of a parameter that itself was an expression and not a simple name when the macro writer expected a name. In the macro: foo: macro a load a*b the intention was that the caller would provide the name of a variable, and the "global" variable or constant b would be used to multiply "a". If foo is called with the parameter a-c, the macro expansion of load a-c*b occurs. To avoid any possible ambiguity, users of macro processors can parenthesize formal parameters inside macro definitions, or callers can parenthesize the input parameters.[3] PL/I and C/C++ feature macros, but this facility can only manipulate text. On the other hand, homoiconic languages, such as Lisp, Prolog, and Forth, retain the power of assembly language macros because they are able to manipulate their own code as data.

Assembly Language Model


add r1,r2 sub r2,r3 PC cmp r3,r4 bne I1 sub r4,1 I1: jmp I3 ALU Registers Memory

Assembly Language Instructions




Built from two pieces

Add R1, R3, 3

Opcode What to do with the data (ALU operation)

Operands Where to get data and put the results

Operands
Each operand taken from a particular addressing mode:  Examples: Register add r1, r2, r3 Immediate add r1, r2, 10 Indirect mov r1, (r2) Offset mov r1, 10(r3) PC Relative beq 100
 

Reflect processor data pathways

Types of Opcodes


Arithmetic, logical add, sub, mult and, or Cmp Memory load/store ld, st Control transfer jmp bne Complex movs

Types of Assembly Languages




Assembly language closely tied to processor architecture At least four main types: CISC: Complex Instruction-Set Computer RISC: Reduced Instruction-Set Computer DSP: Digital Signal Processor VLIW: Very Long Instruction Word

   

CISC Assembly Language




Developed when people wrote assembly language Complicated, often specialized instructions with many effects Examples from x86 architecture
String move Procedure enter, leave

 

Many, complicated addressing modes So complicated, often executed by a little program (microcode)

RISC Assembly Language


  

Response to growing use of compilers Easier-to-target, uniform instruction sets Make the most common operations as fast as possible Load-store architecture:
Arithmetic only performed on registers Memory load/store instructions for memoryregister transfers

Designed to be pipelined

DSP Assembly Language




 

Digital signal processors designed specifically for signal processing algorithms Lots of regular arithmetic on vectors Often written by hand Irregular architectures to save power, area Substantial instruction-level parallelism

VLIW Assembly Language




     

Response to growing desire for instructionlevel parallelism Using more transistors cheaper than running them faster Many parallel ALUs Objective: keep them all busy all the time Heavily pipelined More regular instruction set Very difficult to program by hand Looks like parallel RISC instructions

The program above, written in assembly language




MOV AX, 47104 MOV DS, AX MOV [3998], 36 INT 32 When an assembler reads this sample program, it converts each line of code into one CPU-level instruction. This program uses two types of instructions, MOV and INT. On Intel processors, the MOV instruction moves data around, while the INT instruction transfers processor control to the device drivers or operating system. The first instruction, MOV AX, 47104, tells the computer to copy the number 47104 into the location AX. The next instruction, MOV DS, AX, tells the computer to copy the number in AX into the location DS. The next instruction, MOV [3998], 36 tells the computer to put the number 36 into memory location 3998. Finally, INT 32 exits the program by returning to the operating system.

Before we go on, I would like to explain just how this program works. Inside the CPU are a number of locations, called registers, which can store a number. Some registers, such as AX, are general purpose, and don't do anything special. Other registers, such as DS, control the way the CPU works. DS just happens to be a segment register, and is used to pick which area of memory the CPU can write to. In our program, we put the number 47104 into DS, which tells the CPU to access the memory on the video card. The next thing our program does is to put the number 36 into location 3998 of the video card's memory. Since 36 is the code for the dollar sign, and 3998 is the memory location of the bottom right hand corner of the screen, a dollar sign shows up on the screen a few microseconds later. Finally, our program tells the CPU to perform what is called an interrupt. An interrupt is used to stop one program and execute another in its place. In our case, we want interrupt 32, which ends our program and goes back to MS-DOS, or whatever other program was used to start our program.

An Assembly Language Program


; ;

Program to multiply a number by the constant 6 ;  .ORIG x3050  LD R1, SIX  LD R2, NUMBER  AND R3, R3, #0 ; Clear R3. It will  ; contain the product. ; The inner loop ; AGAIN ADD R3, R3, R2  ADD R1, R1, #-1; R1 keeps track of  BRp AGAIN ; the iteration. ;  HALT ; NUMBER .BLKW 1 SIX .FILL x0006 ;

Assembly Language Syntax




 

Each line of a program is one of the following: an instruction an assember directive (or pseudo-op) a comment Whitespace (between symbols) and case are ignored. Comments (beginning with ;) are also ignored.

An instruction has the following format:


LABEL OPCODE OPERANDS ; COMMENTS

optional

mandatory

Opcodes and Operands


Opcodes

reserved symbols that correspond to instructions listed in Appendix A  ex: ADD, AND, LD, LDR, Operands registers -- specified by Rn, where n is the register number numbers -- indicated by # (decimal) or x (hex) label -- symbolic name of memory location separated by comma number, order, and type correspond to instruction format  ex: ADD R1,R1,R3 ADD R1,R1,#3 LD R6,NUMBER BRz LOOP

Labels and Comments


Label

placed at the beginning of the line assigns a symbolic name to the address corresponding to line  ex: LOOP ADD R1,R1,#-1 BRp LOOP
Comment

anything after a semicolon is a comment ignored by assembler used by humans to document/understand programs tips for useful comments:  avoid restating the obvious, as decrement R1  provide additional insight, as in accumulate product in R6.use comments to separate pieces of program

Assembler Directives


Pseudo-operations do not refer to operations executed by program used by assembler look like instruction, but opcode starts with dot Operand address n n n-character string Meaning starting address of program end of program allocate n words of storage allocate one word, initialize with value n allocate n+1 locations, initialize w/characters and null terminator

Opcode .ORIG .END .BLKW .FILL .STRINGZ

Trap Codes


assembler provides pseudo-instructions for each trap code, so you dont have to remember them.

Code HALT IN

Equivalent TRAP x25 TRAP x23

Description Halt execution and print message to console. Print prompt on console, read (and echo) one character from keybd. Character stored in R0[7:0]. Write one character (in R0[7:0]) to console. Read one character from keyboard. Character stored in R0[7:0]. Write null-terminated string to console. Address of string is in R0.

OUT GETC PUTS

TRAP x21 TRAP x20 TRAP x22

Style Guidelines
the following style guidelines to improve the readability and understandability of your programs:
1.

Use

2.

3. 4. 5.

6. 7.

Provide a program header, with authors name, date, etc., and purpose of program. Start labels, opcode, operands, and comments in same column for each line. (Unless entire line is a comment.) Use comments to explain what each register does. Give explanatory comment for most instructions. Use meaningful symbolic names. Mixed upper and lower case for readability. ASCII to Binary, Input Routine, SaveR1 Provide comments between program sections. Each line must fit on the page -- no wraparound or truncations. Long statements split in aesthetically pleasing manner.

Sample Program


Count the occurrences of a character in a file.


Remember this?
Count = 0
(R2 = 0)

Done?
(R1 ?= EOT)

YES

Convert count to ASCII character


(R0 = x30, R0 = R2 + R0)

Ptr = 1st file character


(R3 = M[x3012]) YES

NO

Print count Match?


(R1 ?= R0) NO (TRAP x21)

Input char from keybd


(TRAP x23)

HALT Incr Count Load char from file


(R1 = M[R3]) (R2 = R2 + 1) (TRAP x25)

Load next char from file


(R3 = R3 + 1, R1 = M[R3])

Char Count in Assembly Language (1 of 3)


; ;

Program to count occurrences of a character in a file. ; Character to be input from the keyboard. ; Result to be displayed on the monitor. ; Program only works if no more than 9 occurrences are found. ; ; ; Initialization ;  .ORIG x3000  AND R2, R2, #0 ; R2 is counter, initially 0  LD R3, PTR ; R3 is pointer to characters  GETC ; R0 gets character input  LDR R1, R3, #0 ; R1 gets first character ;

Char Count in Assembly Language (2 of 3)


; ;

Test character for match.

If a match, increment count.

;  NOT R1, R1  ADD R1, R1, R0  NOT R1, R1  BRnp GETCHAR  ADD R2, R2, #1 ; ; Get next character from ; GETCHAR ADD R3, R3, #1  LDR R1, R3, #0  BRnzp TEST ; ; Output the count. ; OUTPUT LD R0, ASCII  ADD R0, R0, R2  OUT

; If match, R1 = xFFFF ; If match, R1 = x0000 ; If no match, do not increment file. ; Point to next character. ; R1 gets next char to test

; Load the ASCII template ; Covert binary count to ASCII ; ASCII code in R0 is ; Halt machine

displayed.  HALT

Char Count in Assembly Language (3 of 3)


; ; Storage for pointer ; ASCII .FILL x0030 PTR .FILL x4000  .END

and ASCII template

Assembly Process


Convert assembly language file (.asm) into an executable file (.obj) for the LC-3 simulator.

First Pass: scan program file find all labels and calculate the corresponding addresses; this is called the symbol table Second Pass: convert instructions to machine language, using information from symbol table

First Pass: Constructing the Symbol Table


1.

Find the .ORIG statement, which tells us the address of the first instruction. Initialize location counter (LC), which keeps track of the current instruction. For each non-empty line in the program: a) If line contains a label, add label and LC to symbol table. b) Increment LC. NOTE: If statement is .BLKW or .STRINGZ, increment LC by the number of words allocated. Stop when .END statement is reached. NOTE: A line that contains only a comment is considered an empty line.

2.

3. 

Practice


Construct the symbol table for the program in Figure 7.1 (Slides 7-11 through 7-13).

Symbol

Address

Second Pass: Generating Machine Language


For

each executable assembly language statement, generate the corresponding machine language instruction. If operand is a label, look up the address from the symbol table. problems: Improper number or type of arguments  ex: NOT R1,#7 ADD R1,R2 ADD R3,R3,NUMBER Immediate argument too large  ex: ADD R1,R2,#1023 Address (associated with label) more than 256 from instruction  cant use PC-relative addressing mode

Potential

Practice


Using the symbol table constructed earlier, translate these statements into LC-3 machine language. Statement Machine Language

LD R3,PTR ADD R4,R1,#-4 LDR R1,R3,#0 BRnp GETCHAR

LC-3 Assembler


Using assemble (Unix) or LC3Edit This one gets (Windows), loaded into the generates several different output files. simulator.

Object File Format




LC-3 object file contains Starting address (location where program must be loaded), followed by Machine instructions
Example Beginning of count character object file looks like this: .ORIG x3000 0011000000000000 AND R2, R2, #0 0101010010100000 LD R3, PTR

0010011000010001 1111000000100011 . . .

TRAP x23

Multiple Object Files




An object file is not necessarily a complete program. system-provided library routines code blocks written by multiple developers For LC-3 simulator, can load multiple object files into memory, then start executing at a desired address. system routines, such as keyboard input, are loaded automatically  loaded into system memory, below x3000  user code should be loaded between x3000 and xFDFF each object file includes a starting address be careful not to load overlapping object files

Linking and Loading




Loading is the process of copying an executable image into memory. more sophisticated loaders are able to relocate images to fit into available memory. must readjust branch targets, load/store addresses Linking is the process of resolving symbols between independent object files. suppose we define a symbol in one module, and want to use it in another some notation, such as .EXTERNAL, is used to tell assembler that a symbol is defined in another module linker will search symbol tables of other modules to resolve symbols and complete code generation before loading

Programs.data Transfer Ins




 

 

The contents of the R AX are replaced by the contents of R + the Immediate operand. (AX) (AX)+Data The contents of the Bx are replaced by the contents of the ML whose effective address is the contents of BP + displacement. (BX) ((BP)+DiSP) MOV,LEA,LDS,LES (depending on AM).

Examples of Mov Ins




Mov ds,ax MOVE(AX)TO DS


AX DS

MOV AL,E

; MOVE ASCII E TO AL
AX AL DX BX MEMORY

MOV [BX],DX ;MOVE(DX)INTO LOCATION INDICATED BY (BX)

MOV AX,Y[BP][SI];MOVE INTO AX THE CONTENTS OF THE LOCATION FOUND BY ADDING (BP),(SI) AND ;THE OFFSET OF Y OFFSET OF Y
MEMORY AX SI BP +

Program sequence for interchanging the contents of 2 ML


   

MOV AL,OPR1 MOV BL,OPR2 MOV OPR2,AL MOV OPR1,BL

;(OPR1) TO AL. ;(OPR2) TO BL. ;(AL) TO (OPR2) ;(BL) TO (OPR1)

AL BL

OPR1

OPR2

Machine language code for the program Sequence


              

PHYSICAL ADDRESS OC530 OC531 OC532 OC533 OC534 OC535 OC536 OC537 OC538 OC539 OC53A OC53B OC53C OC53D `

MACHINECODE 8A 06 00 12 8A 1E 20 1C 88 06 20 1C 88 1E

Exchange


XCHG- Mov, but instead of duplicating the source in the destination the 2 quantities are interchanged, bz immediate operands cannot be changed, neither operand in the xchg can be immediate
MOV AL,OPR1 ;LOAD AL WITH (OPR1) XCHG AL,OPR2 ;EXCHANGE (AL)WITH(OPR2) MOV OPR1,AL ;MOV AL TO OPR1

  

LEA,LDS,LES
    

  

LEA SI,COL[BX] By adding the contents of BX to the offset of CIL into S LEA BX,ARRAY MOV SI,0 MOV AX,[BX][SI]; Contents of word beginning at array to be transferred to AX. LDS,LES: same, former loads the DS R from Memory ,later loads ES from Memory. Both loads a second non segment R from memory& neither instruction affects the flags. LDS SI,STRING_SOURCE_POINTER LES DI,TABLE[BX] It facilitate switching data segments by simultaneously loading a base or index register and a segment register.

Data transfer instruction


     

Name Move LEA LDS EXCHANGE

Mnemonic&Format MOV DST,SRC LEA REG,SRC LDS RED,SRC XCHG OPR1,OPR2

Description DST SRC REG SRC REG SRC DS SRC+2 OPR1 OPR2

Binary Addtion & Subtraction



 

Name
Add ADD WITH CARRY

Mnemonic&Format
ADD DST,SRC ADD DST,SRC

Description
DST DST SRC+DST SRC+DST+CF.

      

Example for Single precision W x+y+24-z MOV AX,X ;PUT X IN AX ADD AX,Y ;ADD Y TO AX ADD AX,24 ;ADD 24 TO SUM SUB AX,Z ;SUB Z FROM X+Y+Z MOV W,AX ;STORE RESULT IN w

Double precision Compution




DPSUM

DP1+DP2

   

 

MOV AX,DP1 ;ADD THE LOW ORDER. ADD AX,DP2 ;WORDS IN DP1&DP2. MOV DPSUM,AX ;AND PUT SUM IN DPSUM MOV AX,DP1+2;ADD THE HIGHER ORDERWORDS. ADC AX,DP2+2 ;IN DP1 +2 AND DP2+2 WITH. MOV DPSUM+2,AX`;CARRY AND PUT SUM IN DPSUM+2.

Example for DOUBE precision


         

W x+y+24-z
MOV AX,X MOV DX,X+2 ADD AX,Y ADC DX,Y+2 ADD AX,24 ADC DX,0 SUB AX,Z SBB DX,Z+2 MOV W,AX MOV W=2,DX ;ADD THE DOUBLE PRECISION ;NO IN X &X+2 ;TO THE DOUBLE PRECISION ; NO IN Y &Y+2 ; ADD THE DOUBLE PRECISION ;NO 24 TO THE RESULT. ;SUB ; NO IN Z &Z+2 FROM THE SUM ;STORE THE OVERALL ;RESULT IN E &W+2

Você também pode gostar