Você está na página 1de 58

compilers

Accepts program in high level language and converts that in machine language. A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.

structure of a compiler
The front end 1. checks whether the program is correctly written in terms of the programming language syntax and semantics 2. legal and illegal programs are recognized. Errors are reported, if any, in a useful way. 3. The frontend then generates an intermediate representation or IR of the source code for processing by the middle-end. The middle end 1. optimization takes place. 2. removal of useless or unreachable code. 3. discovery and propagation of constant values, relocation. 4. The middle-end generates another IR for the following backend The back end 1. responsible for translating the IR from the middle-end into object code. 2. Register allocation.

statement of the problem

Compiler
Recognizing Basic Symbols (Lexical Analysis) Recognizing Syntactic Units and Interpreting Meaning (Syntactical Analysis) Intermediate Form Arithmetic Statements (Parse tree, matrix) Non-Arithmetic Statements (GO TO,DO,IF, matrix form) Non-executable Statements (DECLARE etc.) Storage Allocation (identifier table) Code Generation

Lexical Analysis

First phase of compiler

isolate words/tokens
Example of tokens:

key words while, procedure, var, for,..

identifier declared by the programmer


Operators +, -, *, /, <>, Numeric numbers such as 124, 12.35, 0.09E-23, etc.

Character constants
Special characters Comments

Recognizing basic elements

The source program is scanned sequentially and basic elements/tokens are recognized as identifiers, literals or terminal symbols(operators and keywords) lexical process can be done in one continuous pass through the data by creating an intermediate form of program consisting a chain or table of tokens. Lexical process discards comments since they have no effect on processing of program..

Syntax and semantics


Syntax refers to formal rules governing the construction of valid statements in a language. Semantics refers to the set of rules which give the meaning of a statement. Errors due to syntax occur in a program when rules of the programming language are violated or misused. Errors due to semantics occur in a program when statements are not meaningful.

Syntax analysis.
Once the program is broken down into tokens or uniform symbols, compiler must: Recognize the phrases(syntactic construction) each phrase is a string of tokens that has an associated meaning. Interpret the meaning of construction. The first step is concerned solely with the recognizing and thus separating the basic syntactical constructs in the source program. It also notes syntactic errors and assure some sort of recovery. Once the syntax of the statement is ascertained, second step in to interpret their meaning.(semantics)

Intermediate form
Once syntactic construction has been determined, the compiler can generate object code for each construction. However, compiler generates intermediate form from the source program. Intermediate form affords two advantages: 1. It facilitates optimization of code. 2. Allows logical separation between the machine independent phases(lexical, syntax interpretation) and machine dependent phases(code generation and assembly).

NOW, using intermediate form arises 2 questions: What form????? Which are the rules for converting source code into that form?? Form depends upon the syntactic construction eg, arithmetic, non arithmetic, or non executable statements.

ARITHMETIC STATEMENTS
one intermediate form for arithmetic statement is a parse tree. Rules for converting an arithmetic statement into a parse tree: Any variable is a terminal node of tree For every operator, construct(in order dictated by the rules of algebra) a binary (two branched) tree whose left branch is the for operand 1 and whose right branch is the tree for operand 2.

Although this picture makes it easy for us to visualize the structure of the statement, it is not a practical method for compiler. The compiler may use as an intermediate form a linear expression of parse tree called a matrix. In a matrix, operations of a program can be listed sequentially in order they would be executed. Each entry has one operator and two operands.

NON ARITHMETIC STATEMENTS


The problem with creating an accurate intermediate form for non arithmetic statements is same as arithmetic ones. DO,IF, GOTO statements

Non Executable statements


Non executable statements like DECLARE or FORTRANs DIMENSION give the compiler information that clarifies the referencing or allocation of variables and associated storage. There is no intermediate form for these statements. Instead Information contained in a non executable statements is entered into the tables, to be used by other parts of compiler.

Storage allocation

Code Generation

Code Generation
Once compiler has generated matrix and tables of supporting information it may generate the object code. One scheme is to have a table defining each type of matrix operation with the associated object code. The code generation phase would scan the matrix and generate for each entry, the code defined in the table using the operands of the matrix entries to further specialize the code.

1. Was it a good idea to generate code directly from the matrix? (line 1 and 4 Redundant Code) 2. Have we made the best use of machine?(line 12 and 13) 3. Can we generate machine language directly?

Issues of Optimality
First 2 of these questions are the issues of optimization. Optimality of matrix as an intermediate form (machine independent) Optimality of Actual Machine Code (Machine Dependent)

Machine Independent Optimization


COST=RATE*(START-FINISH))+2*RATE*(START-FINISH-100);

Matrix Line No. 1 2 3 4 5 6 7 8

Operator * * * + =

Operand1 START RATE 2 START M4 M3 M2 COST

Operand2 FINISH M1 RATE FINISH 100 M5 M6 M7

Matrix Line No. 1 2 3 4 5 6 7 8

Operator * * * + =

Operand1 START RATE 2 M1 M3 M2 COST

Operand2 FINISH M1 RATE 100 M5 M6 M7

Other Steps
Compile time computation of operations, both of whose operands are constants. Movement of computations involving nonvarying operands out of loop. Use of the properties of Boolean expressions to minimize their computation. (Logical operations)

Machine Dependent Optimization


Using as many registers as possible. Store intermediate results only when required. Shorter and faster Instructions are used (MR on place of M)

Machine Dependent Optimization


Memory Space and execution time can be reduced. Typically done while code generation.

Assembly Phase
Generating Code Defining Labels and resolving all references

Operationally assembly phase is similar to pass2 of the assembler.

Structure of a Compiler
Terminal Table Reductions Code Production

Lexical Analysis

Syntax Analysis

Interpretation

Mech-Indep. Optimization

Storage Assignment

Code Selection

Assembly and Output

Source Code

Uniform Symbol Table

Matrix

Optimized Matrix

Assembly Code

Relocatable Obj. Code

Identifier Table Literal Table

Machine Independent Compiler Features


Machine independent compilers describe The method for handling structured variables such as arrays.
Problems involved in compiling a blockstructured code

STRUCTURED VARIABLES
Structured variables discussed here are arrays, records, strings and. Arrays: In Pascal array declaration (i) Single dimension array:

A: ARRAY [ 1 . . 10] OF INTEGER


If each integer variable occupies one word of memory, then we require 10 words of memory to store this array

In general an array declaration is ARRAY [ i .. u ] OF INTEGER

Memory word allocated = ( u - i + 1) words. (ii) Two dimension array : B:ARRAY [ 0 .. 3, 1 . . 3 ] OF INTEGER
In this type of declaration total word memory required is 0 to 3 = 4 ; 1 to 3 = 3 ; 4 x 3 = 12 word memory locations.

In general: ARRAY [ l1 .. u1, l2 . . u2.] OF INTEGER Requires ( u1 - l1 + 1) * ( u2 - l2 + 1) Memory words The data is stored in memory in two different ways. They are row-major and

Column major.

All array elements that have the same value of the first subscript are stored in contiguous locations. This is called row-major order. It is shown in fig.

0,1 0,2 0,3 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 Row 1 Row 0 Row 2 Row 3 Row-major order 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 Column 0 Column 1 Column 2 Column-major order Storage of B: Array [0...3, 1,3]

Element reference: To refer to an element, we must calculate the address of the element relative to the base address of the array. Index addressing mode is made easier to access the desired array element.

Storage Allocation
Static allocation vs. dynamic allocation Static allocation Temporary variables, including the one used to save the return address, were also assigned fixed addresses within the program. This type of storage assignment is called static allocation. Dynamic allocation It is necessary to preserve the previous values of any variables used by subroutine, including parameters, temporaries, return addresses, register save areas, etc. It can be accomplished with a dynamic storage allocation technique.
Compiler 46

Recursive invocation of a procedure using static storage allocation

Compiler

47

Storage Allocation (cont.)


The dynamic storage allocation technique.
Each procedure call creates an activation record that contains storage for all the variables used by the procedure. Activation records are typically allocated on a stack.

Compiler

48

Recursive invocation of a procedure using automatic storage allocation

Compiler

49

Recursive invocation of a procedure using automatic storage allocation (cont.)

Compiler

50

Recursive invocation of a procedure using automatic storage allocation (cont.)

Compiler

51

Recursive invocation of a procedure using automatic storage allocation (cont.)

Compiler

52

Storage Allocation (cont.)


The compiler must generate additional code to manage the activation records themselves. Prologue
At the beginning of each procedure there must be code to create a new activation record, linking it to the previous one and setting the appropriate pointers.

Epilogue
At the end of the procedure, there must be code to delete the currentCompiler activation record, resetting pointers as needed.
53

Block-Structured Languages
A block is a portion of a program that has the ability to declare its own identifiers. E.g., procedure Blocks may be nested within other blocks.
When a reference to an identifier appears in the source program, the compiler must first check the symbol table for a definition of that identifier by the current block. If no such definition is found, the compiler looks for a surrounds that, and so on.
Compiler 54

Nesting of blocks in a source program

Compiler

55

Nesting of blocks in a source program (CONT.)

Compiler

56

Use of display for above procedure

Compiler

57

Use of display for above procedure


(cont.)

Compiler

58

Você também pode gostar