Você está na página 1de 14

B.E.

( COMPUTER SCIENCE AND ENGINEERING) DEGREE EXAMINATIONS, APRIL 2017


SIXTH SEMESTER
611CIT03 PRINCIPLES OF COMPILER DESIGN
(REGULATIONS 2011)
(COMMON TO : B.Tech INFORMATION TECHNOLOGY)
Time: 3 Hours Maximum: 100 Marks

Part A (10 x 2 = 20 marks)


Answer ALL the questions

1. What is a Complier?
2. State some compiler construction tools?
3. What is a lexeme? Define a regular set.
4. What are the Error-recovery actions in a lexical analyzer?
5. List the properties of LR parser.
6. Write short notes on YACC
7. What are kernel and non kernel items?
8. Define back patching.
9. Define basic block and flow graph.
10. What are the characteristics of peephole optimization?

PART B (5 x 16 = 80 Marks)
11. (a) (i)What is a Compiler? Write notes on LEX tools. 8
(ii)Briefly explain grouping of phases 8
OR
(b) Explain the phases of compiler with neat sketch 16

12. (a) Define a non- deterministic finite state automata. Write an algorithm to simulate nDFA 16
OR
(b) (i) Explain specification of tokens. (8)
(ii) What is the role of Lexical analyzer in a compilation process? What are lexemes ad tokens? 8

13. (a) (i)Explain algorithm for Operator precedence parser 6


(ii)Find the SLR parsing table for the given grammar and parse the sentence 10
(a+b)*c. E->E+E | E*E | (E) | id.
OR
(b) Find the predictive parser for the given grammar and parse the sentence (a+b)*c. 16
E->E+E | E*E | (E) | id.

14. (a) (i) Generate intermediate code for the following code segment along with the required syntax
directed translation scheme: (8)
if(a>b)
x=a+b
else
x=a-b where a and x are of real and b of int type data.
(ii) Write short notes on back-patching. (8)
OR
(b) (i) Explain code generation phase with simple code generation algorithm. (10)
(ii) Write short notes on next-use information with suitable example. 6

15. (a) (i) Explain - principle sources of optimization. (8)


(ii) Write short notes on (a) Storage organization (b)Subdivision of run time memory: 8
OR
(b) What are basic blocks ad flow diagram? Explain PEEPHOLE optimization. 16
B.E. ( COMPUTER SCIENCE AND ENGINEERING) DEGREE EXAMINATIONS, APRIL 2017
SIXTH SEMESTER
611CIT03 PRINCIPLES OF COMPILER DESIGN
ANSWER KEY

1. A Complier is a program that reads a program written in one language-the source language-and
translates it in to an equivalent program in another language-the target language . As an important
part of this translation process, the compiler reports to its user the presence of errors in the source
program

2. i. Parse generator
ii. Scanner generators
iii. Syntax-directed translation engines
iv. Automatic code generator
v. Data flow engines.

3. A Lexeme is a sequence of characters in the source program that is matched by the pattern for a
token.
A language denoted by a regular expression is said to be a regular set

4. Deleting an extraneous character


2. Inserting a missing character
3. Replacing an incorrect character by a correct character
4. Transposing two adjacent characters

5. LR parsers can be constructed to recognize most of the programming languages for which the
context free grammar can be written.
2. The class of grammar that can be parsed by LR parser is a superset of class of grammars that
can be parsed using predictive parsers.
3. LR parsers work using non backtracking shift reduce technique yet it is efficient one.

6. YACC is an automatic tool for generating the parser program.


YACC stands for Yet Another Compiler which is basically the utility available from UNIX.
Basically YACC is LALR parser generator. It can report conflict or ambiguities in the form of
error messages.

7. Kernel: i. The set of items which include the initial item, SS, and all items whose
dots are not at the left end are known as kernel items.
Non Kernel: ii. The set of items, which have their dots at the left end, are known as non kernel
items

8. Back patching is the activity of filling up unspecified information of labels using


appropriate semantic actions in during the code generation process.

9. A basic block is a sequence of consecutive statements in which flow of Control enters at the
beginning and leaves at the end without halt or possibility Of branching except at the end.
A flow graph is defined as the adding of flow of control information to the Set of basic blocks
making up a program by constructing a directed graph.

10. Redundant instruction elimination


Flow-of control optimizations
Algebraic simplifications
Use of machine idioms
11.A I compiler

11Aii Groupig of phases


Front and back ends: (3)
Often, the phases are collected into a front end and a back end. The front end has those phases, which
depend primarily on source language and largely independent of the target machine. These include lexical
and syntactic analysis, the creation of symbol table, semantic analysis and the generation of intermediate
code.
Back end has those phases, which depend primarily on target machine and largely independent of the
source language, just the intermediate language. These include code optimization phase, along with
necessary error handling and symbol table operations.
Passes: (2)
several phases are implemented in a single pass consisting of reading an input file and writing an output
file. The activity of those phases can be interleaved during the pass.
Reducing the number of passes: (3)
It is desirable to have few passes, since it takes time to read and write intermediate files. But, on the other
hand, if we group several phases into one pass, then we must keep entire program in memory, because
one phase may need information in a different order than a previous phase produces it.
For some phases, grouping into one pass may present few problems:
The interface between the lexical and syntactic analyzers can be limited to a single token
It is often very hard to perform code generation until the intermediate representation has been
completely generated It cannot generate target code for a construct if we do not know the types of
the variables involved in the construct It cannot determine target address of forward jump until we
have seen the intervening source code and generated target code for it.
Intermediate and target code generation can be merged into a single pass using a technique called back
patching. Use back patching, in which blank space slot is left for missing information and fill in the slot
when the information becomes available.

11. B) PHASES OF COMPILER


A Compiler operates in phases, each of which
transforms the source program from one
representation into another. The following are the
phases of the compiler:
Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
Sub-Phases:
1) Symbol table management
2) Error handling

LEXICAL ANALYSIS:
It is the first phase of the compiler. It gets input from the source program and produces tokens as output.
It reads the characters one by one, starting from left to right and forms the tokens.
Token : It represents a logically cohesive sequence of characters such as keywords,
o operators, identifiers, special symbols etc.
o Example: a + b = 20
o Here, a,b,+,=,20 are all separate tokenso Group of characters forming a token is called the Lexeme.
The lexical analyser not only generates a token but also enters the lexeme into the symbol
o table if it is not already there.
SYNTAX ANALYSIS:
It is the second phase of the compiler. It is also known as parser. It gets the token stream as input from the
lexical analyser of the compiler and generates o syntax tree as the output.
Syntax tree:
o It is a tree in which interior nodes are operators and exterior nodes are operands.
Example: For a=b+c*2, syntax tree is

SEMANTIC ANALYSIS:
It is the third phase of the compiler.
It gets input from the syntax analysis as parse tree and checks whether the given syntax is correct or not.
It performs type conversion of all the data types into real data types.
INTERMEDIATE CODE GENERATION:
It is the fourth phase of the compiler. It gets input from the semantic analysis and converts the input into
output as intermediate code such as three address code.
The three-address code consists of a sequence of instructions, each of which has atmost three operands.
Example: t1=t2+t3
CODE OPTIMIZATION:
It is the fifth phase of the compiler. It gets the intermediate code as input and produces optimized
intermediate code as output. This phase reduces the redundant code and attempts to improve the
intermediate code so that faster-running machine code will result.
During the code optimization, the result of the program is not affected.
To improve the code generation, the optimization involves
- deduction and removal of dead code (unreachable code).
- calculation of constants in expressions and terms.
- collapsing of repeated expression into temporary string.
- loop unrolling.
- moving code outside the loop.
- removal of unwanted temporary variables.
CODE GENERATION:
It is the final phase of the compiler.
It gets input from code optimization phase and produces the target code or object code as result.
Intermediate instructions are translated into a sequence of machine instructions that perform the same
task.
The code generation involves
- allocation of register and memory
- generation of correct references
- generation of correct data types
- generation of missing code
SYMBOL TABLE MANAGEMENT:
Symbol table is used to store all the information about identifiers used in the program.
It is a data structure containing a record for each identifier, with fields for the attributes of the identifier.
It allows to find the record for each identifier quickly and to store or retrieve data from that record.
Whenever an identifier is detected in any of the phases, it is stored in the symbol table.

ERROR HANDLING:
Each phase can encounter errors. After detecting an error, a phase must handle the error so that
compilation can proceed.
In lexical analysis, errors occur in separation of tokens. In syntax analysis, errors occur during
construction of syntax tree. In semantic analysis, errors occur when the compiler detects constructs with
right
syntactic structure but no meaning and during type conversion. In code optimization, errors occur when
the result is affected by the optimization. In code generation, it shows error when code is missing etc.

12 A on Deterministic finite Automata:


12 B I specification of tokens. (8)
Regular expressions are the notations for specifying the patterns. Each pattern
matches a set of strings
Strings and languages: (2)
An alphabet is a finite set of symbols. A string over an alphabet is a finite sequence of
symbols from the alphabet. Terms for parts of a string: Prefix, Suffix, Substring, Proper
prefix and proper suffix Language: It is a set of strings over some fixed alphabet.
Operations on languages: (2)
Concatenation
Union
Kleene closure
Positive closure
Regular expressions: (2)
is a regular expression that denotes {}
if a is a symbol in , then a is a regular expression that denotes {a}
Suppose r and s are regular expressions denoting the languages L(r) and L(s).
Then,
(r) | (s) is a regular expression denoting L(r) U L(s)
(r) (s) is a regular expression denoting L(r) L(s)
(r)* is a regular expression denoting L(r)*
(r) is a regular expression denoting L(r)
A language denoted by a regular expression is said to be a regular set.
Unary operator * has the highest precedence and is left associative
Concatenation has the second highest precedence and is left associative
| has lowest precedence and is left associative
APR/MAY-'08/CS1352-Answer Key
-6-
Regular definitions: (2)
It is a sequence of definitions of the form d1->r1, d2->r2 dn->rn
Where each di is a distinct name and each ri is a regular expression over the symbols in
U {d1, d2, .. di-1}

12 B ii Role of lexical analyser:


13 A I Operator precedence parser
13 B Predictive Parser
Elimination of left recursion (2)
Calculation of First (3)
Calculation of Follow (3)
Predictive parsing table (6)
Parsing the sentence (2)

14 A I Intermediate code

Syntax directed translation scheme for if E then S1 else S2:


E.true:= newlabel;
E.false:=newlabel;
S1.next:=S.next;
S2.next:=S.next;
S.code:=E.code || gen(E.true :) || S1.code || gen(goto S.next) ||
gen(E.false :) || S2.code
Intermediate code generated:
if a>b got L1
goto L2
L1: t1:=inttoreal(b)
x:=a+t1
goto L3
L2: t2:=inttoreal(b)
x:=a-t2
L3:

14 A II back-patching. (8)
Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process. (2)
In the semantic actions the functions used are (2)
mklist(i) create a new list having i, an index into array of quadruples.
merge(p1,p2) - merges two lists pointed by p1 and p2
back patch(p,j) inserts the target label j for each list pointed by p.
Example: (4)
Source:
if a or b then
if c then
x= y+1
Translation:
if a go to L1
if b go to L1
go to L3
L1: if c goto L2
goto L3
L2: x= y+1
L3:
After Backpatching:
100: if a goto 103
101: if b goto 103
102: goto 106
103: if c goto 105
104: goto 106
105: x=y+1
106:
14 B I code generation phase with simple code generation algorithm. (10)
It generates target code for a sequence of three address statements. (2)
Assumptions:
For each operator in three address statement, there is a corresponding target language operator.
Computed results can be left in registers as long as possible.
E.g. a=b+c: (2)
Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1;
Add c, Ri where Ri has b and result in Ri. Cost=2;
Mov c, Rj; Add Rj, Ri; Cost=3;
Register descriptor: Keeps track of what is currently in each register
Address descriptor: Keeps tracks of the location where the current value of the name can be found at run
time. (2)
Code generation algorithm: For x= y op z (2)
Invoke the function getreg to determine the location L, where the result of y op z should be stored
(register or memory location)
Check the address descriptor for y to determine y
Generate the instruction op z, L where z is the current location of z
If the current values of y and/or z have no next uses, alter register descriptor
Getreg: (2)
If y is in a register that holds the values of no other names and y is not live, return register of y for L
If failed, return empty register
If failed, if X has next use, find an occupied register and empty it
If X is not used in the block, or suitable register is found, select memory location of x as L

14B II ext Use Information


If the name in a register is no longer needed, then the register can be assigned to some other name. This
idea of keeping a name in storage only if it will be used subsequently can be applied in a number of
contexts.
Computing next uses: (2)
The use of a name in a three-address statement is defined as follows: Suppose a three-address statement i
assigns a value to x. If statement j has x as an operand and control can flow from statement i to j along a
path that has no intervening assignments to x , then we say statement j uses the value of x computed at i.
Example:
x:=i
j:=x op y // j uses the value of x
Algorithm to determine next use: (2)
The algorithm to determine next uses makes a backward pass over each basic block, recording for each
name x whether x has a next use in the block and if not, whether it is live on exit from the block (using
data flow analysis). Suppose we reach three-address statement i: x: =y op z in our backward scan. Then
do the following:
Attach to statement i, the information currently found in the symbol table regarding the next use and
the liveness of x, y, and z.
In the symbol table, set x to not live and no next use
In the symbol table, set y and z to live and the next uses of y and z to i.

15 Ai Code optimization is needed to make the code run faster or take less space
or both.
Function preserving transformations:
Common sub expression elimination
Copy propagation
Dead-code elimination
Constant folding
Common sub expression elimination: (2)
E is called as a common sub expression if E was previously computed and the
values of variables in E have not changed since the previous computation.
Copy propagation: (2)
Assignments of the form f:=g is called copy statements or copies in short. The idea
here is use g for f wherever possible after the copy statement.
Dead code elimination: (2)
A variable is live at a point in the program if its value can be used subsequently.
Otherwise dead. Deducing at compile time that the value of an expression is a
constant and using the constant instead is called constant folding.
Loop optimization: (2)
Code motion: Moving code outside the loop
Takes an expression that yields the same result independent of the number of times
a loop is executed (a loop-invariant computation) and place the expression before
the loop.
Induction variable elimination
Reduction in strength: Replacing an expensive operation by a cheaper one.

15 A II storage organization
Run time storage: The block of memory obtained by compiler from OS to execute the
compiled program. It is subdivided into
Generated target code
Data objects
Stack to keep track of the activations
Heap to store all other information
Activation record: (Frame)
It is used to store the information required by a single procedure call.
Returned value Actual parameters Optional control link Optional access link
Saved machine status Local data temporaries
Temporaries are used to hold values that arise in the evaluation of expressions. Local data is the data that
is local to the execution of procedure. Saved machine status represents status of machine just before the
procedure is called. Control link (dynamic link) points to the activation record of the calling procedure.
Access link refers to the non-local data in other activation records. Actual parameters are the one which is
passed to the called procedure. Returned value field is used by the called procedure to return a
value to the calling procedure
Compile time layout of local data:
The amount of storage needed for a name is determined by its type. The field for the local data is laid out
as the declarations in a procedure are examined at compile time. The storage layout for data objects is
strongly influenced by the addressing constraints on the target machine.
(2) Parameter passing.
Call by value
A formal parameter is treated just like a local name. Its storage is in the activation record of the called
procedure
The caller evaluates the actual parameter and place the r-value in the storage for the formals
Call by reference
If an actual parameter is a name or expression having L-value, then that lvalue itself is passed
However, if it is not (e.g. a+b or 2) that has no l-value, then expression is evaluated in the new location
and its address is passed.
Code
Static data
Stack
Heap
Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out)
Actual parameters evaluated, its r-value is passed and l-value of the actual are determined
When the called procedure is done, r-value of the formals are copied back to
the l-value of the actuals
Call by name
Inline expansion(procedures are treated like a macro)

15 B PEEPHOLE optimizatio

Você também pode gostar