Compiler (Statement of Problem)

compilers
Accepts program in high level language and converts that in machine language. A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.
structure of a compiler
The front end 1. checks whether the program is correctly written in terms of the programming language syntax and semantics 2. legal and illegal programs are recognized. Errors are reported, if any, in a useful way. 3. The frontend then generates an intermediate representation or IR of the source code for processing by the middle-end. The middle end 1. optimization takes place. 2. removal of useless or unreachable code. 3. discovery and propagation of constant values, relocation. 4. The middle-end generates another IR for the following backend The back end 1. responsible for translating the IR from the middle-end into object code. 2. Register allocation.
statement of the problem
Compiler
Recognizing Basic Symbols (Lexical Analysis) Recognizing Syntactic Units and Interpreting Meaning (Syntactical Analysis) Intermediate Form Arithmetic Statements (Parse tree, matrix) Non-Arithmetic Statements (GO TO,DO,IF, matrix form) Non-executable Statements (DECLARE etc.) Storage Allocation (identifier table) Code Generation
Lexical Analysis
First phase of compiler
isolate words/tokens
Example of tokens:
key words while, procedure, var, for,..
identifier declared by the programmer

Operators +, -, *, /, <>, Numeric numbers such as 124, 12.35, 0.09E-23, etc.
Character constants
Special characters Comments
Recognizing basic elements
The source program is scanned sequentially and basic elements/tokens are recognized as identifiers, literals or terminal symbols(operators and keywords) lexical process can be done in one continuous pass through the data by creating an intermediate form of program consisting a chain or table of tokens. Lexical process discards comments since they have no effect on processing of program..
Syntax and semantics

Syntax refers to formal rules governing the construction of valid statements in a language. Semantics refers to the set of rules which give the meaning of a statement. Errors due to syntax occur in a program when rules of the programming language are violated or misused. Errors due to semantics occur in a program when statements are not meaningful.
Syntax analysis.
Once the program is broken down into tokens or uniform symbols, compiler must: Recognize the phrases(syntactic construction) each phrase is a string of tokens that has an associated meaning. Interpret the meaning of construction. The first step is concerned solely with the recognizing and thus separating the basic syntactical constructs in the source program. It also notes syntactic errors and assure some sort of recovery. Once the syntax of the statement is ascertained, second step in to interpret their meaning.(semantics)
Intermediate form
Once syntactic construction has been determined, the compiler can generate object code for each construction. However, compiler generates intermediate form from the source program. Intermediate form affords two advantages: 1. It facilitates optimization of code. 2. Allows logical separation between the machine independent phases(lexical, syntax interpretation) and machine dependent phases(code generation and assembly).
NOW, using intermediate form arises 2 questions: What form????? Which are the rules for converting source code into that form?? Form depends upon the syntactic construction eg, arithmetic, non arithmetic, or non executable statements.
ARITHMETIC STATEMENTS
one intermediate form for arithmetic statement is a parse tree. Rules for converting an arithmetic statement into a parse tree: Any variable is a terminal node of tree For every operator, construct(in order dictated by the rules of algebra) a binary (two branched) tree whose left branch is the for operand 1 and whose right branch is the tree for operand 2.
Although this picture makes it easy for us to visualize the structure of the statement, it is not a practical method for compiler. The compiler may use as an intermediate form a linear expression of parse tree called a matrix. In a matrix, operations of a program can be listed sequentially in order they would be executed. Each entry has one operator and two operands.
NON ARITHMETIC STATEMENTS

The problem with creating an accurate intermediate form for non arithmetic statements is same as arithmetic ones. DO,IF, GOTO statements
Non Executable statements

Non executable statements like DECLARE or FORTRANs DIMENSION give the compiler information that clarifies the referencing or allocation of variables and associated storage. There is no intermediate form for these statements. Instead Information contained in a non executable statements is entered into the tables, to be used by other parts of compiler.
Storage allocation
Code Generation
Code Generation
Once compiler has generated matrix and tables of supporting information it may generate the object code. One scheme is to have a table defining each type of matrix operation with the associated object code. The code generation phase would scan the matrix and generate for each entry, the code defined in the table using the operands of the matrix entries to further specialize the code.
1. Was it a good idea to generate code directly from the matrix? (line 1 and 4 Redundant Code) 2. Have we made the best use of machine?(line 12 and 13) 3. Can we generate machine language directly?
Issues of Optimality
First 2 of these questions are the issues of optimization. Optimality of matrix as an intermediate form (machine independent) Optimality of Actual Machine Code (Machine Dependent)
Machine Independent Optimization

COST=RATE*(START-FINISH))+2*RATE*(START-FINISH-100);
Matrix Line No. 1 2 3 4 5 6 7 8
Operator * * * + =
Operand1 START RATE 2 START M4 M3 M2 COST
Operand2 FINISH M1 RATE FINISH 100 M5 M6 M7
Matrix Line No. 1 2 3 4 5 6 7 8
Operator * * * + =
Operand1 START RATE 2 M1 M3 M2 COST
Operand2 FINISH M1 RATE 100 M5 M6 M7
Other Steps
Compile time computation of operations, both of whose operands are constants. Movement of computations involving nonvarying operands out of loop. Use of the properties of Boolean expressions to minimize their computation. (Logical operations)
Machine Dependent Optimization

Using as many registers as possible. Store intermediate results only when required. Shorter and faster Instructions are used (MR on place of M)
Machine Dependent Optimization

Memory Space and execution time can be reduced. Typically done while code generation.
Assembly Phase
Generating Code Defining Labels and resolving all references
Operationally assembly phase is similar to pass2 of the assembler.
Structure of a Compiler
Terminal Table Reductions Code Production
Lexical Analysis
Syntax Analysis
Interpretation
Mech-Indep. Optimization
Storage Assignment
Code Selection
Assembly and Output
Source Code
Uniform Symbol Table
Matrix
Optimized Matrix
Assembly Code
Relocatable Obj. Code
Identifier Table Literal Table
Machine Independent Compiler Features

Machine independent compilers describe The method for handling structured variables such as arrays.
Problems involved in compiling a blockstructured code
STRUCTURED VARIABLES
Structured variables discussed here are arrays, records, strings and. Arrays: In Pascal array declaration (i) Single dimension array:
A: ARRAY [ 1 . . 10] OF INTEGER

If each integer variable occupies one word of memory, then we require 10 words of memory to store this array
In general an array declaration is ARRAY [ i .. u ] OF INTEGER
Memory word allocated = ( u - i + 1) words. (ii) Two dimension array : B:ARRAY [ 0 .. 3, 1 . . 3 ] OF INTEGER
In this type of declaration total word memory required is 0 to 3 = 4 ; 1 to 3 = 3 ; 4 x 3 = 12 word memory locations.
In general: ARRAY [ l1 .. u1, l2 . . u2.] OF INTEGER Requires ( u1 - l1 + 1) * ( u2 - l2 + 1) Memory words The data is stored in memory in two different ways. They are row-major and
Column major.
All array elements that have the same value of the first subscript are stored in contiguous locations. This is called row-major order. It is shown in fig.
0,1 0,2 0,3 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 Row 1 Row 0 Row 2 Row 3 Row-major order 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 Column 0 Column 1 Column 2 Column-major order Storage of B: Array [0...3, 1,3]
Element reference: To refer to an element, we must calculate the address of the element relative to the base address of the array. Index addressing mode is made easier to access the desired array element.
Storage Allocation
Static allocation vs. dynamic allocation Static allocation Temporary variables, including the one used to save the return address, were also assigned fixed addresses within the program. This type of storage assignment is called static allocation. Dynamic allocation It is necessary to preserve the previous values of any variables used by subroutine, including parameters, temporaries, return addresses, register save areas, etc. It can be accomplished with a dynamic storage allocation technique.
Compiler 46
Recursive invocation of a procedure using static storage allocation
Compiler
47
Storage Allocation (cont.)

The dynamic storage allocation technique.
Each procedure call creates an activation record that contains storage for all the variables used by the procedure. Activation records are typically allocated on a stack.
Compiler
48
Recursive invocation of a procedure using automatic storage allocation
Compiler
49
Recursive invocation of a procedure using automatic storage allocation (cont.)
Compiler
50
Compiler
51
Compiler
52
Storage Allocation (cont.)

The compiler must generate additional code to manage the activation records themselves. Prologue
At the beginning of each procedure there must be code to create a new activation record, linking it to the previous one and setting the appropriate pointers.
Epilogue
At the end of the procedure, there must be code to delete the currentCompiler activation record, resetting pointers as needed.
53
Block-Structured Languages
A block is a portion of a program that has the ability to declare its own identifiers. E.g., procedure Blocks may be nested within other blocks.
When a reference to an identifier appears in the source program, the compiler must first check the symbol table for a definition of that identifier by the current block. If no such definition is found, the compiler looks for a surrounds that, and so on.
Compiler 54
Nesting of blocks in a source program
Compiler
55
Nesting of blocks in a source program (CONT.)
Compiler
56
Use of display for above procedure
Compiler
57
Use of display for above procedure

(cont.)
Compiler
58

Compiler (Statement of Problem)

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Compiler (Statement of Problem)

Enviado por

Direitos autorais:

Formatos disponíveis

compilers

statement of the problem

First phase of compiler

key words while, procedure, var, for,..

identifier declared by the programmer

Recognizing basic elements

Syntax and semantics

NON ARITHMETIC STATEMENTS

Non Executable statements

Machine Independent Optimization

Matrix Line No. 1 2 3 4 5 6 7 8

Operand1 START RATE 2 START M4 M3 M2 COST

Operand2 FINISH M1 RATE FINISH 100 M5 M6 M7

Matrix Line No. 1 2 3 4 5 6 7 8

Operand1 START RATE 2 M1 M3 M2 COST

Operand2 FINISH M1 RATE 100 M5 M6 M7

Machine Dependent Optimization

Machine Dependent Optimization

Operationally assembly phase is similar to pass2 of the assembler.

Assembly and Output

Uniform Symbol Table

Relocatable Obj. Code

Identifier Table Literal Table

Machine Independent Compiler Features

A: ARRAY [ 1 . . 10] OF INTEGER

In general an array declaration is ARRAY [ i .. u ] OF INTEGER

Recursive invocation of a procedure using static storage allocation

Storage Allocation (cont.)

Recursive invocation of a procedure using automatic storage allocation

Recursive invocation of a procedure using automatic storage allocation (cont.)

Recursive invocation of a procedure using automatic storage allocation (cont.)

Recursive invocation of a procedure using automatic storage allocation (cont.)

Storage Allocation (cont.)

Nesting of blocks in a source program

Nesting of blocks in a source program (CONT.)

Use of display for above procedure

Use of display for above procedure

Você também pode gostar