Escolar Documentos
Profissional Documentos
Cultura Documentos
Accepts program in high level language and converts that in machine language. A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code). The most common reason for wanting to transform source code is to create an executable program.
structure of a compiler
The front end 1. checks whether the program is correctly written in terms of the programming language syntax and semantics 2. legal and illegal programs are recognized. Errors are reported, if any, in a useful way. 3. The frontend then generates an intermediate representation or IR of the source code for processing by the middle-end. The middle end 1. optimization takes place. 2. removal of useless or unreachable code. 3. discovery and propagation of constant values, relocation. 4. The middle-end generates another IR for the following backend The back end 1. responsible for translating the IR from the middle-end into object code. 2. Register allocation.
Compiler
Recognizing Basic Symbols (Lexical Analysis) Recognizing Syntactic Units and Interpreting Meaning (Syntactical Analysis) Intermediate Form Arithmetic Statements (Parse tree, matrix) Non-Arithmetic Statements (GO TO,DO,IF, matrix form) Non-executable Statements (DECLARE etc.) Storage Allocation (identifier table) Code Generation
Lexical Analysis
isolate words/tokens
Example of tokens:
Character constants
Special characters Comments
The source program is scanned sequentially and basic elements/tokens are recognized as identifiers, literals or terminal symbols(operators and keywords) lexical process can be done in one continuous pass through the data by creating an intermediate form of program consisting a chain or table of tokens. Lexical process discards comments since they have no effect on processing of program..
Syntax analysis.
Once the program is broken down into tokens or uniform symbols, compiler must: Recognize the phrases(syntactic construction) each phrase is a string of tokens that has an associated meaning. Interpret the meaning of construction. The first step is concerned solely with the recognizing and thus separating the basic syntactical constructs in the source program. It also notes syntactic errors and assure some sort of recovery. Once the syntax of the statement is ascertained, second step in to interpret their meaning.(semantics)
Intermediate form
Once syntactic construction has been determined, the compiler can generate object code for each construction. However, compiler generates intermediate form from the source program. Intermediate form affords two advantages: 1. It facilitates optimization of code. 2. Allows logical separation between the machine independent phases(lexical, syntax interpretation) and machine dependent phases(code generation and assembly).
NOW, using intermediate form arises 2 questions: What form????? Which are the rules for converting source code into that form?? Form depends upon the syntactic construction eg, arithmetic, non arithmetic, or non executable statements.
ARITHMETIC STATEMENTS
one intermediate form for arithmetic statement is a parse tree. Rules for converting an arithmetic statement into a parse tree: Any variable is a terminal node of tree For every operator, construct(in order dictated by the rules of algebra) a binary (two branched) tree whose left branch is the for operand 1 and whose right branch is the tree for operand 2.
Although this picture makes it easy for us to visualize the structure of the statement, it is not a practical method for compiler. The compiler may use as an intermediate form a linear expression of parse tree called a matrix. In a matrix, operations of a program can be listed sequentially in order they would be executed. Each entry has one operator and two operands.
Storage allocation
Code Generation
Code Generation
Once compiler has generated matrix and tables of supporting information it may generate the object code. One scheme is to have a table defining each type of matrix operation with the associated object code. The code generation phase would scan the matrix and generate for each entry, the code defined in the table using the operands of the matrix entries to further specialize the code.
1. Was it a good idea to generate code directly from the matrix? (line 1 and 4 Redundant Code) 2. Have we made the best use of machine?(line 12 and 13) 3. Can we generate machine language directly?
Issues of Optimality
First 2 of these questions are the issues of optimization. Optimality of matrix as an intermediate form (machine independent) Optimality of Actual Machine Code (Machine Dependent)
Operator * * * + =
Operator * * * + =
Other Steps
Compile time computation of operations, both of whose operands are constants. Movement of computations involving nonvarying operands out of loop. Use of the properties of Boolean expressions to minimize their computation. (Logical operations)
Assembly Phase
Generating Code Defining Labels and resolving all references
Structure of a Compiler
Terminal Table Reductions Code Production
Lexical Analysis
Syntax Analysis
Interpretation
Mech-Indep. Optimization
Storage Assignment
Code Selection
Source Code
Matrix
Optimized Matrix
Assembly Code
STRUCTURED VARIABLES
Structured variables discussed here are arrays, records, strings and. Arrays: In Pascal array declaration (i) Single dimension array:
Memory word allocated = ( u - i + 1) words. (ii) Two dimension array : B:ARRAY [ 0 .. 3, 1 . . 3 ] OF INTEGER
In this type of declaration total word memory required is 0 to 3 = 4 ; 1 to 3 = 3 ; 4 x 3 = 12 word memory locations.
In general: ARRAY [ l1 .. u1, l2 . . u2.] OF INTEGER Requires ( u1 - l1 + 1) * ( u2 - l2 + 1) Memory words The data is stored in memory in two different ways. They are row-major and
Column major.
All array elements that have the same value of the first subscript are stored in contiguous locations. This is called row-major order. It is shown in fig.
0,1 0,2 0,3 1,1 1,2 1,3 2,1 2,2 2,3 3,1 3,2 3,3 Row 1 Row 0 Row 2 Row 3 Row-major order 0,1 1,1 2,1 3,1 0,2 1,2 2,2 3,2 0,3 1,3 2,3 3,3 Column 0 Column 1 Column 2 Column-major order Storage of B: Array [0...3, 1,3]
Element reference: To refer to an element, we must calculate the address of the element relative to the base address of the array. Index addressing mode is made easier to access the desired array element.
Storage Allocation
Static allocation vs. dynamic allocation Static allocation Temporary variables, including the one used to save the return address, were also assigned fixed addresses within the program. This type of storage assignment is called static allocation. Dynamic allocation It is necessary to preserve the previous values of any variables used by subroutine, including parameters, temporaries, return addresses, register save areas, etc. It can be accomplished with a dynamic storage allocation technique.
Compiler 46
Compiler
47
Compiler
48
Compiler
49
Compiler
50
Compiler
51
Compiler
52
Epilogue
At the end of the procedure, there must be code to delete the currentCompiler activation record, resetting pointers as needed.
53
Block-Structured Languages
A block is a portion of a program that has the ability to declare its own identifiers. E.g., procedure Blocks may be nested within other blocks.
When a reference to an identifier appears in the source program, the compiler must first check the symbol table for a definition of that identifier by the current block. If no such definition is found, the compiler looks for a surrounds that, and so on.
Compiler 54
Compiler
55
Compiler
56
Compiler
57
Compiler
58