10.question Bank With Answers

SAFFRONY INSTITUTE OF TECHNOLOGY
2150708 System Programming (SP)

Assembler
Q. 1
Describe elements of Assembly language.

An assembly language provides three basic features:
1. Mnemonics operation codes
2. Symbolic operands
3. Data declaration
Let us consider an assembly instruction
MOVER AREG,X
MOVER is a mnemonic opcode for the operation to be performed.
AREG is a register operand in a symbolic form.
X is a memory operand in a symbolic form.
Let us consider another instruction for data declaration X

DS 1
Q. 2
DS(Declare storage) reserves area of memory.
Name of variable is X
It reserves a memory area of 1 word and associates the name X with it
Explain types of Assembly statement

1) Imperative statement
An imperative statement indicates an action to be performed during the execution of the
assembled statement.
Each imperative statement typically translates into one machine instruction.
These are executable statements.
Some example of imperative statement are given below
MOVER BREG,X
STOP
READ X
PRINT Y
ADD AREG,Z
2) Declaration statement
Declaration statements are for reserving memory for variables.
The syntax of declaration statement is as follow:
[Label]
DS
<constant>
[Label]
DC
<value>
DS: stands for Declare storage, DC: stands for Declare constant.
The DS statement reserves area of memory and associates name with them.
A DS 10
Above statement reserves 10 word of memory for variable A.
The DC statement constructs memory words containing constants.
ONE DC 1
Above statement associates the name ONE with a memory word containing the value 1
Prepared By: TEJAS PATEL
Page 1

Assembler
Any assembly program can use constant in two ways- as immediate operands, and as
literals.
Many machine support immediate operands in machine instruction. Ex: ADD AREG, 5
But hypothetical machine does not support immediate operands as a part of the machine
instruction. It can still handle literals.
A literal is an operand with the syntax=<value>. EX: ADD AREG,=5
It differs from constant because its location cannot be specified in assembly program.
3) Assembler Directive
Assembler directives instruct the assembler to perform certain action during the assembly
program.
I.
START
This directive indicates that first word of machine should be placed in the memory
word with address <constant>.
START <Constant>
Ex: START 500
First word of the target program is stored from memory location 500 onwards.
II.
END
This directive indicates end of the source program.
The operand indicates address of the instruction where the execution of program
should begin.
By default it is first instruction of the program.
END <operand 2>
Execution control should transfer to label given in operand field.
III.
ORIGIN
This directive is like START instruction, which indicates address of the next
consecutive instruction or data.
Format of this statement is as follows ORIGIN
<operand2>
Operand may constant, symbol or symbolic expression.
The ORIGIN directive is useful when the machine code is not stored in consecutive
memory location.
IV.
Sr. no.
Assembly program
START 100
LOOP
LC
MOVER BREG=2
100
MOVER AREG,N
101
ADD AREG=1
102
ORIGIN LOOP
NEXT BC ANY,LOOP
100
EQU
This directive simply associate the name <symbol> with <operand>.where
<operand> may be constant or symbol.
Page 2

Assembler
<symbol> EQU <operand2>
Ex: A
EQU B
Address of B is assigned to A in symbol table.

V.
LTORG
This directive allocates memory to all literals of current pool and update literal
table, pool table.
Format of this instruction is as follows
LTORG.
If LTORG statement is not present, literals are placed after the END statement.
Q. 3
Explain assembly scheme.
OR
Explain analysis and synthesis phases of an assembler by clearly stating their tasks. OR
Design specification of an assembler.
Analysis Phase
The primary function performed by the analysis phase is the building of the symbol table.
For this purpose it must determine address of the symbolic name.
It is possible to determine some address directly, however others must be inferred. And
this function is called memory allocation.
To implement memory allocation a data structure called location counter (LC) is used, it is
initialized to the constant specified in the START statement.
We refer the processing involved in maintaining the location counter as LC processing.
Tasks of Analysis phase
1. Isolate the label, mnemonics opcode, and operand fields of a constant.
2. If a label is present, enter the pair (symbol, <LC content>) in a new entry of
symbol table.
3. Check validity of mnemonics opcode.
4. Perform LC processing.
Sourcepr
og.
mnemonics
opcode
length
ADD
01
SUB
02
Analysis
phase
Synthesis
phase
Target
prog.
symbol
address
AGAIN
104
113
Symbol table
Page 3

Assembler
Synthesis Phase
Consider the assembly statement,
MOVER
BREG, ONE
We must have following information to synthesize the machine instruction corresponding

to this statement:
1.
Address of name ONE
2.
Machine operation code corresponding to mnemonics MOVER.
The first item of information depends on the source program; hence it must be available
by analysis phase.
The second item of information does not depend on the source program; it depends on the
assembly language.
Based on above discussion, we consider the use of two data structure during synthesis
phase:
1. Symbol table:
Each entry in symbol table has two primary field- name and address. This table is
built by analysis phase
2. Mnemonics table:
An entry in mnemonics table has two primary field- mnemonics and opcode.
Task of Synthesis phase
1. Obtain machine opcode through look up in the mnemonics table.
2. Obtain address of memory operand from the symbol table.
3. Synthesize a machine instruction.
Q. 4
Explain single pass and two pass assembler.
OR
Write difference between one pass and two pass assembler.
OR
Pass structure of assembler.

Two pass translation
Two pass translationsconsist of pass I and pass II.
LC processing is performed in the first pass and symbols defined in the program are
entered into the symbol table, hence first pass performs analysis of the source program.
So, two pass translation of assembly lang. program can handle forward reference easily.
The second pass synthesizes the target form using the address information found in the
symbol table.
First pass constructs an intermediate representation of the source program and that will
be used by second pass.
IR consists of two main components: data structure + IC (intermediate code)
Single pass translation
A one pass assembler requires 1 scan of the source program to generate machine code.
The process of forward references is talked using a process called back patching.
The operand field of an instruction containing forward references is left blank initially.
Page 4

Assembler
A table of instruction containing forward references is maintained separately called table of
incomplete instruction (TII).
This table can be used to fill-up the addresses in incomplete instruction.
The address of the forward referenced symbols is put in the blank field with the help of
back patching list.
Q. 5
Explain Data structures of assembler pass I
OR
Explain the role of mnemonic opcode table, symbol table, literal table, and pool table in
assembling process of assembly language program.
OR
Describe following data structures:OPTAB, SYMTAB, LITTAB& POOLTAB.

OPTAB
A table of mnemonics opcode and related information
OPTAB contains the field mnemonics opcodes, class and mnemonics info.
The class field indicates whether the opcode belongs to an imperative statement (IS), a
declaration statement (DS), or an assembler directive (AD).
If an imperative, the mnemonics info field contains the pair (machine code, instruction
length), else it contains the id of a routine to handle the declaration or directive
statement.
Mnemonics
opcode
Mnemonics
Class
info
MOVER
IS
(04,1)
DS
DL
R#7
START
AD
R#11
.
.
SYMTAB
A SYMTAB entry contains the symbol name, field address and length.
Some address can be determining directly, e.g. the address of the first instruction in the
program, however other must be inferred.
To find address of other we must fix the addresses of all program elements preceding it.
This function is called memory allocation.
Symbol
Address
Length
LOOP
202
NEXT
214
LAST
216
217
BACK
202
218
Page 5

Assembler
LITTAB
A table of literals used in the program.
A LITTAB entry contains the field literal and address.
The first pass uses LITTAB to collect all literals used in a program.
Awareness of different literal pools is maintained using the auxiliary table POOLTAB.
This table contains the literal number of the starting literal of each literal pool.
At any stage, the current literal pool is the last pool in the LITTAB.
On encountering an LTORG statement (or the END statement), literals in the current pool
are allocated addresses starting with the current value in LC and LC is appropriately
incremented.
Literal no
1
#1
#3
3
POOLTAB
LITTAB
Q. 6
Detail design of two pass assembler.
Pass I
Algorithm for Pass I
1) loc_cntr=0(default value)
pooltab_ptr=1; POOLTAB[1]=1;
littab_ptr=1;
2) While next statement is not END statement
a) If a label is present then
this_label=symbol in label field
Enter (this_label, loc_cntr) in SYMTAB
b) If an LTORG statement then
(i)
Process
literals
LITTAB
to
allocate
memory
and
put
the
address
field.updateloc_cntr accordingly
c)
(ii)
pooltab_ptr= pooltab_ptr+1;
(iii)
POOLTAB[ pooltab_ptr]= littab_ptr
If a START or ORIGIN statement then

loc_cntr=value specified in operand field;
d) If an EQU statement then

(i)
this_address=value specified in <address spec>;
(ii)
Correct the symtab entry for this_label to (this_label, this_address);
e) If a declaration
(i)
Code= code of the declaration statement
(ii)
Size= size of memory area required by DC/DS
Page 6

Assembler
f)
(iii)
loc_cntr=loc_cntr+size;
(iv)
Generate IC (DL,code)..
If an imperative statement then

(i)
Code= machine opcode from OPTAB
(ii)
loc_cntr=loc_cntr+instruction length from OPTAB;
(iii)
if operand is a literal then

this_literal=literal in operand field;
LITTAB[littab_ptr]=this_literal;
littab_ptr= littab_ptr +1;
else
this_entry= SYMTAB entry number of operand
generate IC (IS, code)(S, this_entry);
3) (processing END statement)

a) Perform step2(b)
b) Generate IC (AD,02)
c)
Go to pass II
Intermediate code forms:

Intermediate code consist of a set of IC units, each unit consisting of the following three
fields
1. Address
2. Representation of mnemonics opcode
3. Representation of operands
Mnemonics field
The mnemonics field contains a pair of the form
(statement class, code)
Where statement class can be one of IS, DL, and AD standing for imperative statement,
declaration statement and assembler directive respectively.
For imperative statement, code is the instruction opcode in the machine language.
For declarations and assembler directives, code is an ordinal number within the class.
Thus, (AD, 01) stands for assembler directive number 1 which is the directive START.
Codes for various declaration statements and assembler directives.
Declaration statement
DC
01
DS
02
Assembler directive
START
01
END
02
ORIGIN
03
EQU
04
LTORG
05
The information in the mnemonics field is assumed to have the same representation in all
the variants.
Page 7

Assembler
Intermediate code for Imperative statement

Variant I
First operand is represented by a single digit number which is a code for a register or the
condition code
Register
Cod
AREG
01
BREG
02
CREG
03
DREG
04
Condition
Code
LT
01
LE
02
EQ
03
GT
04
GE
05
ANY
06
The second operand, which is a memory operand, is represented by a pair of the form
(operand class, code)
Where operand class is one of the C, S and L standing for constant, symbol and literal.
For a constant, the code field contains the internal representation of the constant itself.
Ex: the operand descriptor for the statement START 200 is (C,200).
For a symbol or literal, the code field contains the ordinal number of the operands entry in
SYMTAB or LITTAB.
Variant II
This variant differs from variant I of the intermediate code because in variant II symbols,
condition codes and CPU register are not processed.
So, IC unit will not generate for that during pass I.
LOOP
START
200
(AD,01)
(C, 200)
(AD,01)
(C, 200)
READ
(IS, 09)
(S, 01)
(IS, 09)
AREG, A
(IS, 04)
(1)(S, 01)
(IS, 04)
AREG, A
MOVER
..
SUB
AREG, =1 BC
LOOP
STOP
A
DS
1 LTORG
..
.
GT,
(IS, 02)
(1)(L, 01)
(IS, 02)
AREG,(L, 01)
(IS, 07)
(4)(S, 02)
(IS, 07)
GT, LOOP
(IS, 00)
(DL, 02)
(IS, 00)
(C,1)
(AD, 05)
Variant I
(DL, 02)
(C,1)
(AD, 05)
Variant II
Page 8

Assembler
Comparison of the variants

Variant I
Variant II
Extra work in pass I
Extra work in pass II
Simplifies tasks in pass II
Simplifies tasks in pass I
Occupies more memory then pass II
Memory utilization of two passes get

better balanced.
Pass II(Algorithm)
It has been assumed that the target code is to be assembled in the are named code_area.
1. Code_area_adress= address of code_ares;
Pooltab_ptr=1;
Loc_cntr=0;
2. While next statement is not an END statement
a) Clear machine_code_buffer;
b) If an LTORG statement
i)
Process
literals
in
LITTAB
and
assemble
the
literals
in
machine_code_buffer.
c)
ii)
Size= size of memory area required for literals
iii)
Pooltab_ptr=pooltab_ptr +1;
If a START or ORIGIN statement

i)
Loc_cntr=value specified in operand field;
ii)
Size=0;
d) If a declaration statement
i)
If a DC statement then assemble the constatnt in machine_code_buffer;
ii)
Size= size of memory area required by DC/DS;
e) If an imperative statement
f)
i)
Get operand address from SYMTAB or LITTAB
ii)
Assemble instruction in machine_code_buffer;
iii)
Size=size of instruction;
If size 0 then
i)
Move
contents
of
machine_code_buffer
code_area_address+loc_cntr;
ii)
Loc_cntr=loc_cntr+size;
3. Processing end statement

a) Perform steps 2(b) and 2(f)
b) Write code_area into output file.
to
the
address
Page 9

Assembler
Q. 7Explain error reporting of assembler.

Error reporting in pass I
Listing an error in first pass has the advantage that source program need not be preserved
till pass II
But, listing produced in pass I can only reports certain errors not all.
From the below program, error is detected at statement 9 and 21.
Statement 9 gives invalid opcode error because MVER does not matchwith any mnemonics
in OPTAB.
Statement 21 gives duplicate defination error because entry of A is already exist in symbol
table.
Undefined symbol B at statement 10 is harder to detect during pass I, this error can be
detected only after completing pass I.
Sr.no
Statements
START 200
MOVER AREG,A
200
MVER BREG, A
207
Address
**ERROR* Invalid opcode

10
ADD BREG, B
208
14
A DS 1
209
21
A DC 5
227
**ERROR**
dulicate
defination
of
symbol
in
symbol A
.
.
35
END
**ERROR**
undefined
statememt 10
Error reporting in pass II

During pass II data structure like SYMTAB is availble.
Error indication at statement 10 is also easy because symbol table is searched for an entry
B. if match is not found, error is reported.
Page 10

Assembler
Q. 8
Write N! program and its equivalent machine code.
1
2
3
4
5
6
7
12
13
14
15
16
17
18
19
20
21
22
Q.9
START
READ
MOVER
MOVEM
MULT
MOVER
ADD
MOVEM
COMP
BC
MOVEM
AGAIN
101
N
BREG, ONE
BREG, TERM
BREG,TERM
CREG, TERM
CREG, ONE
CREG, TERM
CREG, N
LE, AGAIN
BREG,
RESULT
RESULT
PRINT
STOP
DS
DS
DC
DS
END
N
RESULT
ONE
TERM
1
1
1
1
Opcode
(2
digit)
Register
operand (1
digit)
Memory
operand
(3 digit)
101)
102)
103)
104)
105)
106)
107)
108)
109)
110)
09
04
05
03
04
01
05
06
07
05
0
2
2
2
3
3
3
3
2
2
113
115
116
116
116
115
116
113
104
114
111)
112)
113)
114)
115)
116)
10
00
0
0
114
000
00
001
Generate intermediate code and symbol table for following programs

Program-1
START
100
READ
READ
READ
MOVER
AREG,A
ADD
AREG,B
ADD
AREG,C
MULT
AREG,C
MOVEM
AREG,RESULT
PRINT
RESULT
STOP
A
DS
DS
DS
RESULT
DS
END
Page 11

Assembler
Program-1 IC in variant-I
(AD,01)
(C,100)
(IS,09)
(S,01)
(IS,09)
(S,02)
(IS,09)
(S,03)
(IS,04)
(01)(S,01)
(IS,01)
(01)(S,02)
(IS,01)
(01)(S,03)
(IS,03)
(01)(S,03)
(IS,05)
(01)(S,04)
(IS,10)
(S,04)
(IS,00)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(AD,02)
Program-1 Symbol table
Symbol
Address
111
112
113
RESULT
114
Program-2
Program-2 symbol table
START
101
READ
READ
MOVER
BREG,A
MULT
BREG,B
MOVEM
BREG,D
STOP
A
DS
DS
DS
END
Symbol
Address
108
109
110
Program-2 Variant-I
Program-2 Variant-II
(AD,01)
(C,101)
(AD,01)
(C,101)
(IS,09)
(S,01)
(IS,09)
(IS,09)
(S,02)
(IS,09)
Page 12

Assembler
(IS,04)
(2)(S,01)
(IS,04)
BREG,A
(IS,03)
(2)(S,02)
(IS,03)
BREG,B
(IS,05)
(2)(S,03)
(IS,05)
BREG,D
(IS,00)
(IS,00)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(DL,02)
(C,01)
(AD,02)
(AD,02)
Page 13

Q.1
Compiler
List out aspects of compilation and its implementation issue.
Two aspects of compilation are:
a) Generate code to implement meaning of a source program in the execution domain (target
code generation)
b) Provide diagnostics for violations of PL semantics in a program (Error reporting)
There are four issue involved in implementing these aspects(Q. What are the issue in
code generation in relation to compilation of expression? Explain each issue in
brief. (June-13 GTU))
1.
Data types : semantics of a data type require a compiler to ensure that variable of
a type are assigned or manipulated only through legal operation
Compiler must generate type specific code to implement an operation.
2.
Data structures: to compile a reference to an element of a data structure, the

compiler must develop a memory mapping to access the memory word allocated to
the element.
3.
Scope rules: compiler performs operation called scope analysis and name resolution
to determine the data item designated by the use of a name in the source program
4.
Control structure: control structure includes conditional transfer of control,

conditional execution, iteration control and procedure calls. The compiler must
ensure that source program does not violate the semantics of control structures.
Issues in design of code generator are:

1. Input to the Code Generator
input to the code generator consists of the intermediate representation of the source
program
There are several types for the intermediate language, such as postfix notation,
quadruples, and syntax trees or DAGs.
The detection of semantic error should be done before submitting the input to the code
generator
The code generation phase require complete error free intermediate code as an input.
2. Target program
The output of the code generator is the target program. The output may take on a variety
of forms: absolute machine language, relocatable machine language, or assembly
language.
Producing an absolute machine language program as output has the advantage that it can
be placed in a location in memory and immediately executed.
Producing a relocatable machine language program as output is that the subroutine can be
compiled separately. A set of relocatable object modules can be linked together and loaded
for execution by a linking loader.
Producing an assembly language program as output makes the process of code generation
somewhat easier .We can generate symbolic instructions and use the macro facilities of the
assembler to help generate code
TEJAS PATEL
Page

Compiler
3. Memory management
Mapping names in the source program to addresses of data objects in run time memory is
done cooperatively by the front end and the code generator. We assume that a name in a
three-address statement refers to a symbol table entry for the name.
4. Instruction selection
If we do not care about the efficiency of the target program, instruction selection is
straightforward. It requires special handling. For example, the sequence of statements
a := b + c
d := a + e
would be translated into
MOV
b, R0
ADD
c, R0
MOV
R0, a
MOV
a, R0
ADD
e, R0
MOV
R0, d
Here the fourth statement is redundant, so we can eliminate that statement.

5. Register allocation
If the instruction contains register operands then such a use becomes shorter and faster
than that of using in memory.
The use of registers is often subdivided into two sub problems:
During register allocation, we select the set of variables that will reside in registers at a
point in the program.
During a subsequent register assignment phase, we pick the specific register that a
variable will reside in.
6. Choice of evaluation
The order in which computations are performed can affect the efficiency of the target code.
Some computation orders require fewer registers to hold intermediate results than others.
Picking a best order is another difficult, NP-complete problem
7. Approaches to code generation
The most important criterion for a code generator is that it produces correct code.
Correctness takes on special significance because of the number of special cases that code
generator must face.
Given the premium on correctness, designing a code generator so it can be easily
implemented, tested, and maintained is an important design goal.
TEJAS PATEL
Page

Q.2
Compiler
What is Memory binding? Explain types of memory allocation.
Memory Binding:
A memory binding is an association between the 'memory address'
attribute of a data item and the address of a memory area.
Three important tasks of memory allocation are:
1. Determine the amount of memory required to represent the value of a data item.
2.
Use an appropriate memory allocation model to implement the lifetimes and scopes of data
items.
3.
Determine appropriate memory mappings to access the values in a non scalar data item,
e.g. values in an array.
Memory allocation are mainly divides into two types:

1.
Static binding
2.
Dynamic binding
Static memory allocation

In static memory allocation, memory is allocated to a variable before the execution of a
program begins.
Static memory allocation is typically performed during compilation.
No memory allocation or deallocation actions are performed during the execution of a
program. Thus, variables remain permanently allocated
Dynamic memory allocation
In dynamic memory allocation, memory bindings are established and destroyed during the
execution of a program
Dynamic memory allocation has two flavors-automatic allocation and program controlled
allocation.
In automatic dynamic allocation, memory is allocated to the variables declared in a
program unit when the program unit is entered during execution and is deallocated when
the program unit is exit. Thus the same memory area may be used for the variables of
different program units
In program controlled dynamic allocation, a program can allocate or deallocate memory at
arbitrary points during its execution.
It is obvious that in both automatic and program controlled allocation, address of the
memory area allocated to a program unit cannot be determined at compilation time
Dynamic memory allocation techniques
1. Explicit deallocation
Explicit Allocation of Fixed Sized Blocks
It is the simplest form of dynamic allocation.
By linking the blocks in a list allocation and deallocation can be done quickly with little or
no storage overhead.
Initialization of the area is done by using a portion of each block for a link to the next
block. Pointer available points to the first block.
Allocation consists of taking a block off the list and deallocation consists of putting the
block back on the list. We can treat each block as a variant record.
TEJAS PATEL
3
Page

Compiler
There is no space overhead because the user program can use the entire block for its own
purposes.
When the block is de-allocated then the compiler routine uses some of the space from the
block itself to link it into the list of available blocks.
Explicit Allocation of Variable-Sized Blocks
When blocks are allocated and de-allocated storage can become fragmented that is the
heap may consists of alternate blocks that are free and in use.
Fragmentation will not occur if blocks are of fixed size, but if they are of variable-size then
it occurs.
One method for allocating variable sized blocks is first fit method. When a block of size s if
allocated we search for the first free block that is of size f s (where f - size of free block).
This block is then subdivided into a used block of size s and a free block of size (f - s).
Because of that it incurs a time overhead as we have to search for a free block that is
large enough.
When a block is de-allocated, we check to see if it is next to a free block. If possible the
de-allocated block is combined with a free block next to it to create a larger free block.
Combining a adjacent free blocks into a larger free block prevent further fragmentation
from occurring.
2. Implicit De-allocation
Implicit de-allocation requires cooperation between the user program and the run-time
package, because run time package needs to know when a storage block is no longer in
use.
This cooperation is implemented by fixing the format of storage blocks.
The first problem is that of recognizing block boundaries. If the size of blocks 75 fixed,
then position information can be used.
For example if each block occupies 20 words then a new block begins every 20 words.
Otherwise in the inaccessible storage attached to a block we keep the size of a block. So
we can determine where the next block begins.
The second problem it that of recognizing if a block is in use we assume that a block is in
use if it is possible for the user program to refer to the information in the block.
The reference may occur through a pointer or after following a sequence of pointers, so the
compiler needs to know the position in storage of all pointers.
Two approaches can be used for implicit de-allocation
Reference counts
We keep track of the number of blocks that point directly to the present block. If this count
ever drops to 0 then the block can be de-allocated because it cannot be referred to i.e. the
block has become garbage that can be collected. Maintaining the reference counts can be
costly. Reference counts are best used when pointer between blocks never appear in cycles
Marking Techniques
An alternative approach is to suspend temporarily execution of the user program and use
the frozen pointers to determine which blocks are in use. This approach requires all the
pointers into the heap to be known.
Conceptually we pour paint into the heap through these pointers. Any block that is reached
TEJAS PATEL
4
Page

Compiler
by the paint is in use and the rest can be de-allocated.
In more detail, we go through the heap and mark all blocks unused. Then we follow
pointers marking as used any block that is reached in the process. A final sequential scan
of the heap allows all blocks still marked unused to be allocated.
Q.3
Explain memory allocation in block structured language.

The block is a sequence of statements containing the local data and declarations which are
enclosed within the delimiters.
Ex:
A
{
Statements
..
}
The delimiters mark the beginning and the end of the block. There can be nested blocks for
ex: block B2 can be completely defined within the block B1.
Finding the scope of the variable means checking the visibility within the block
Following are the rules used to determine the scope of the variable:
1.
Variable X is accessed within the block B1 if it can be accessed by any statement situated
in block B1.
2. Variable X is accessed by any statement in block B2 and block B2 is situated in block B1.
There are two types of variable situated in the block structured language
1. Local variable
2. Non local variable
To understand local and non local variable consider the following example
Procedure A
{
Intx,y,z
Procedure B
{
Inta,b
}
Procedure C
{
Intm,n
}
}
Procedure
Local variables
Non local variables
x,y,z
a,b
x,y,z
m,n
x,y,z
Variables x,y and z are local variables to procedure A but those are non local to block B and
TEJAS PATEL
Page

Compiler
C because these variable are not defined locally within the block B and C but are accessible
within these blocks.
Q.4
Explain activation record.

The activation record is a block of memory used for managing information needed by a
single execution of a procedure.
Return value
Actual parameter
Control link
Access link
Saved M/c status
Local variables
Temporaries
1. Temporary values: The temporary variables are needed during the evaluation of
expressions. Such variables are stored in the temporary field of activation record.
2. Local variables: The local data is a data that is local to the execution procedure is stored
in this field of activation record.
3. Saved machine registers: This field holds the information regarding the status of machine
just before the procedure is called. This field contains the registers and program counter.
4.
Control link: This field is optional. It points to the activation record of the calling
procedure. This link is also called dynamic link.
5. Access link: This field is also optional. It refers to the non-local data in other activation
record. This field is also called static link field.
6. Actual parameters: This field holds the information about the actual parameters. These
actual parameters are passed to the called procedure.
7. Return values: This field is used to store the result of a function call.
Q.5
What is side effect? Explain parameter passing methods.

Side effect: A side effect of a function call is a change in a value of a variable which is not local
to the called function.
Parameter passing mechanism
1. Call by value:
This is the simplest method of parameter passing.
The actual parameters are evaluated and their values are passed to caller procedure(formal
parameter).
TEJAS PATEL
6
Page

The operations on formal parameters do not change the values of a parameter.
Example: Languages like C, C++ use actual parameter passing method
Compiler
2. Call by value name
This extends the capability of the call by value mechanism by copying the value of formal
parameter back to corresponding actual parameter at return
Thus side effect realize at return.
3. Call by reference :
This method is also called as call by address or call by location
The address of actual parameter is passed to the formal parameter.
4. Call by name:
This is less popular method of parameter passing.
Procedure is treated like macro. The procedure body is substituted for call in caller with
actual parameters substituted for formals.
Q.6
The actual parameters can be surrounded by parenthesis to preserve their integrity.
The local names of called procedure and names of calling procedure are distinct
Explain operand descriptor and register descriptor with example

An operand descriptor has the following fields:
1. Attributes: Contains the subfields type, length and miscellaneous information
2. Addressability: Specifies where the operand is located, and how it can be accessed. It has
two subfields
Addressability code: Takes the values 'M' (operand is in memory), and 'R' (operand is in
register). Other addressability codes, e.g. address in register ('AR') and address in memory
('AM'), are also possible,
Address: Address of a CPU register or memory word.
Ex: a*b
MOVER AREG, A
MULT AREG, B
Three operand descriptors are used during code generation. Assuming a, b to be integers
occupying 1 memory word, these are:
Register descriptors
Attribute
Addressability
(int, 1)
Address(a)
(int, 1)
Address(b)
(int, 1)
Address(AREG)
A register descriptor has two fields
1. Status: Contains the code free or occupied to indicate register status.
2.
Operand descriptor #: If status = occupied, this field contains the descriptorfor the operand
contained in the register.
Register descriptors are stored in an array called Register_descriptor. One register

descriptor exists for each CPU register.
TEJAS PATEL
7
Page

Compiler
In above Example the register descriptor for AREG after generating code for a*b would be
Occupied
#3
This indicates that register AREG contains the operand described by descriptor #3.
Q.7
Explain intermediate codes for an expression

There are two types of intermediate representation
1. Postfix notation
2. Three address code.
1) Postfix notation
Postfix notation is a linearized representation of a syntax tree.
it a list of nodes of the tree in which a node appears immediately after its children
the postfix notation of x=-a*b + -a*b will be

x a b * a-b*+=
2) Three address code
In three address code form at the most three addresses are used to represent
statement. The general form of three address code representation is -a:= b op c
Wherea,b or c are the operands that can be names, constants.
For the expression like a = b+c+d the three address code will be
t1=b+c
t2=t1+d
Here t1 and t2 are the temporary names generated by the compiler. There are most three
addresses allowed. Hence, this representation is three-address code.
Q.8
Explain implementation of three address code
There are three representations used for three code such as quadruples, triples and
indirect triples.
Quadruple representation
The quadruple is a structure with at the most tour fields such as op,arg1,arg2 and result.
The op field is used to represent the internal code for operator, the arg1 and arg2
represent the two operands used and result field is used to store the result of an
expression.
Consider the input statement x:= -a*b + -a*b
t1=uminus a
(0)
t2 := t1 * b
(1)
:t3= - a
(2)
Op
Arg1
uminus
*
uminus
t1
a
Arg2
result
t1
t2
t3
TEJAS PATEL
8
Page

Compiler
t4 := t3 * b
(3)
t3
t4
t5 := t2 + t4
(4)
t2
t4
t5
x= t5
(5)
:=
t5
Triples
The triple representation the use of temporary variables is avoided by referring the pointers
in the symbol table.
the expression x : = - a * b
- a * b the triple representation is as given below
Number
Op
Arg1
(0)
uminus
(1)
(2)
uminus
(3)
(2)
(4)
(1)
(3)
(5)
:=
(4)
Indirect Triples
The indirect triple representation the listing of triples is been done. And listing pointers
are used instead of using statements.
Number
Op
Arg1
(0)
uminus
(1)
(2)
uminus
(3)
(13)
(4)
(12)
(5)
:=
(11)
Ar
(0)
Arg2
Statement
(0)
(11)
(1)
(12)
(2)
(13)
(3)
(14)
(4)
(15)
(5)
(16)
Q.9
Explain code optimization methods

I.
Compile Time Evaluation
Compile time evaluation means shifting of computations from run time to compilation.
There are two methods used to obtain the compile time evaluation.
TEJAS PATEL
9
Page

Compiler
1. Folding
In the folding technique the computation of constant is done at compile time instead of
run time.
example : length = (22/7) * d
Here folding is implied by performing the computation of 22/7 at compile time
2. Constant propagation
In this technique the value of variable is replaced and computation of an expression is

done at the compilation time.
example :pi = 3.14; r = 5;
Area = pi * r * r
Here at the compilation time the value of pi is replaced by 3.14 and r by 5 then
computation of 3.14 * 5 * 5 is done during compilation.
II.
Common Sub Expression Elimination

The common sub expression is an expression appearing repeatedly in the program which
is computed previously.
Then if the operands of this sub expression do not get changed at all then result of such
sub expression is used instead of recomputing it each time
Example:
t1 := 4 * i
t2 := a[t1]
t3 := 4 * j
t4 : = 4 * i
t5:= n
t6 := b[t4]+t5
The above code can be optimized using common sub expression elimination
t1=4*i
t2=a[t1]
t3=4*j
t5=n
t6=b[t1]+t5
The common sub expression t4:= 4 * i is eliminated as its computation is already in t1 and
value of i is not been changed from definition to use.
}
III.
Loop invariant computation (Frequency reduction)

Loop invariant optimization can be obtained by moving some amount of code outside the
loop and placing it just before entering in the loop.
This method is also called code motion.
TEJAS PATEL
10
Page

Compiler
Example:
while(i<=max-1)
{
sum=sum+a[i];
}
Can be optimized as a
N=max-1;
While(i<=N)
{ sum=sum+a[i
]; }
IV.
Strength Reduction
Strength of certain operators is higher than others.
For instance strength of * is higher than +.
In this technique the higher strength operators can be replaced by lower strength
operators.
Example:
for(i=1;i<=50;i++)
{
count = i x 7;
}
Here we get the count values as 7, 14, 21 and so on up to less than 50.
This code can be replaced by using strength reduction as follows

temp=7
for(i=l;i<=50;i++)
{
count = temp;
temp = temp+7;
}
V.
Dead Code Elimination
A variable is said to be live in a program if the value contained into is subsequently.
On the other hand, the variable is said to be dead at a point in a program if the value
contained into it is never been used. The code containing such a variable supposed to
be a dead code. And an optimization can be performed by eliminating such a dead code.
Example :
i=0;
if(i==1)
{
TEJAS PATEL
Page
11

Compiler
a=x+5;
}
if statement is a dead code as this condition will never get satisfied hence, statement can
be eliminated and optimization can be done.
VI.
Code Motion
The aim to improve the execution time of the program by reducing the evaluation
frequency of expressions.
Evaluation of expressions is moved from one part of the program to another in such a way
that it is evaluated lesser frequently.
Loops are usually executed several times.
We can bring the loop-invariant statements out of the loop.
Example:
a = 200;
while (a
> 0)
{
b = x + y;
if ( a%b == 0)
printf (%d, a);
}
The statement b = x + y is executed every time with the loop. But because it is loop
invariant,
We can bring it outside the loop.
It will then be executed only once.
a = 200;
b = x + y;
while
(a
> 0)
{
if ( a%b == 0)
printf (%d, a);
}
Q.10
Define the following terms

1) Static pointers: Access to a non local variable can be done using one reserved pointer
called static pointer.
2) Display: To speed up the access to non local variable an array of pointer is maintained
such array is called display.
3) Optimizing transformation: It is a rule of rewriting a segment of a program to
improve its execution efficiency without affecting its meaning.
4) Local optimization: the optimizing transformation are applied over small segments of
a program consisting of a few statements
TEJAS PATEL
12
Page

Compiler
5) Global optimization: The optimizing transformations are applied over a program unit.
6) Basic block: basic block is sequence of consecutive statements in which flow of
control enters at the beginning and leaves at the end without halt or branching.
Q.11
Explain control flow property

Program flow graph: a program flow graph for a program P is a directed graph Gp=(N,
E, n0) where
N: set of basic block in P
E: set of directed edges (bi,bj) indicating the possibility of control flow from the last
statement of bi to first statement of bj
n0: start node of P.
following are the property of control flow:
1) Predecessor and successor: If (bi, bj) E, bi is a predecessors of bj and bj is a
successors of bi.
2) Paths: A path is a sequence of edges such that the destination node of one edge is the
source node of the following edge.
3) Ancestor and descendants: If a path exists from bi to bj, bi is an ancestor of bj and
bj is a descendant of bi.
4) Dominators and post dominator: Block bi is dominator of block bj if every path from
n0 to bj passes through bi.
bi is post dominator of bj if every path from bj to exit node passes through bi.
Q.12
Explain Data flow property

Data Flow Properties
Before discussing the data flow properties consider some basic terminologies that be
used while giving the data flow property.
A program point containing the definition is called definition point.
A program point at which a reference to a data item is made is called reference point.
A program point at which some evaluating expression is given is called evaluation point.
For example :
Definition point
W1:x=3
Reference point
W2: y=x
Evaluation point
W3: z=a*b
I.
Available expression
TEJAS PATEL
13
Page

Compiler
An expression x+y is available at a program point w if and only if along all paths are
reaching to w.
1.
The expression x+y is said to be available at its evaluation point.
2.
The expression x+y is said to be available if no definition of any operand of the

expression (here either x or y) follows its last evaluation along the path. In other
word, if neither of the two operands get modified before their use.
B1: t1=4*i
B3: t2=4*i
B2:
t2:c+d[t1]
B4: t4=a[t2]
Expression 4 * i is the available expression for B2, B3 and B4 because this expression is not
been changed by any of the block before appearing in B4.
II.
Reaching definition
A definition D reaches at the point P if there is a path from D to P if there is a path from
D to P along witch D is not killed.
A definition D of variable x is killed when there is a redefinition of x.
The definition D1 is reaching definition for block B2, but the definition D1 not is reaching
definition for block B3, because it is killed by definition D2 in block B2.
III.
Q.13
Live variable
A live variable x is live

from p to the exit,
at point p if there is a path

along which the value of x is
used before it is
variable is said to be
redefined. Otherwise the

dead at the point.
Write a short note on
Interpreter.
An interpreter is a language processor which bridges an execution gap without generating a

machine language program.
Main component of interpreters are
Data store
TEJAS PATEL
14
Page

Compiler
Symbol table
Data manipulation routine
Types of interpreter
1) Pure interpreter
Data
Source
Program
Interpreter
Result
2) Impure interpreter
Data
Source
Program
Interpreter
IR
Interpreter
Result
It is a program that executes instructions written in a high-level language.
A high-level programming language translator that translates and runs the program at the
same time.
It converts one program statement into machine language, executes it, and then proceeds
to the next statement. This differs from regular executable programs that are presented to
the computer as binary-coded instructions.
Interpreted programs remain in the source language the programmer wrote in, which is
human readable text.
Interpreters are not much different than compilers. They also convert the high level
language into machine readable binary equivalents.
Each time when an interpreter gets a high level language code to be executed, it converts
the code into an intermediate code before converting it into the machine code.
Each part of the code is interpreted and then execute separately in a sequence and an error
is found in a part of the code it will stop the interpretation of the code without translating
the next set of the codes.
The advantage of an interpreter, however, is that it does not need to go through the
compilation stage during which machine instructions are generated.
This process can be time-consuming if the program is long. The interpreter, on the other
hand, can immediately execute high-level programs.
For this reason, interpreters are sometimes used during the development of a program,
when a programmer wants to add small sections at a time and test them quickly.
Interpreter characteristics:
TEJAS PATEL
15
Page

Q.14
Compiler
Relatively little time is spent analyzing and processing the program.
The resulting code is some sort of intermediate code.
The resulting code is interpreted by another program.
Program execution is relatively slow.
Compare one pass and two pass of compiler.

One pass compiler
A single pass compiler makes a single pass over the source text, parsing, analyzing and
generating code all at once.
A one pass assembler passes over the source file exactly once, in the same pass collecting
the labels, resolving future references and doing the actual assembly.
The difficult part is to resolve future label references and assemble code in one pass.
One-pass compiler does not "look back" at code it previously processed.
It is also called narrow compiler.
A one-pass compiler is faster than two-pass compilers.
Unable to generate efficient program because of compiler has limited scope.
Languages like PASCAL can be implemented by one-pass compiler.
Source
code
Compiler
Machine
code
Errors
Two pass compiler
A two pass assembler does two passes over the source file (the second pass can be over a file
generated in the first pass).
In the first pass all it does is looks for label definitions and introduces them in the symbol table.
In the second pass, after the symbol table is complete, it does the actual assembly by translating
the operations and so on.

Compilers use an Intermediate Representation (IR) for the program being compiled
Two-pass compiler has wide scope of passes.
Each pass takes the result of theprevious pass as the input, and creates an intermediate
output.
It is also called wide compiler.
Languages like C++ can be implemented by two-pass compiler.
TEJAS PATEL
16
Page

Source
code
Front
end
IR
Compiler
Back
end
Machine
code
Errors
TEJAS PATEL
Page
17

Language
Processor
Q.1
Explain following terms.

Semantic: It represents the rules of the meaning of the domain.
Semantic gap: It represents the difference between the semantic of two domains.
Application domain: The designer expresses the ideas in terms related to application domain of
the software
Execution domain: To implement the ideas of designer, their description has to be interpreted
in terms related to the execution domain of computer system.
Specification gap: The gap between application and PL domain is called specification and
design gap.
Execution gap: The gap between the semantic of programs written in different programming
language.
Specification
Execution
gap
gap
Application
Domain
PL Domain
Execution
Domain
Language processor: Language processor is software which bridges a specification or execution

gap.
Language translator: Language translator bridges an execution to the machine language of a
computer system.
Detranslator: It bridges the same execution gap as language translator, but in the reverse
direction.
Preprocessor: It is a language processor which bridges an execution gap but is not a language
translator
Language migrator: It bridges the specification gap between two programming languages.
Interpreter: An interpreter is a language processor which bridges an execution gap without
generating a machine language program.
Source language: There are more than thousand high level languages; they are called as high
level languages.
Target language: Each machine has its own machine language; they are called as target
languages.
Problem oriented language: Programming language features directly model aspects of the
application domain, which leads to very small specification gap. Such a programming language
can only be used for specific application; hence they are called problem oriented languages.
Procedure oriented language: Procedure oriented language provides general purpose facilities
required in most application domains. Such a language is independent of specific application
domains and results in a large specification gap which has to be bridged by an application
designer.

Language
Processor
Q.2
Explain Language processing activity

There are mainly two types of language processing activity which bridges the semantic gap
between source language and target language.
1. Program generation activities
2. Program execution activities
Program generation
A program generation activity aims an automatic generation of a program.
Program generator is software, which aspects source program and generates a program in target
language.
Program generator introduces a new domain between the application and programming language
domain is called program generator domain.
Errors
Program
specification
Program
generator
Target
Program
Program Execution
Two popular models for program execution are translation and interpretation.
Translation
The program translation model bridges the execution gap by translating a program written in PL,
called source program, into an equivalent program in machine or assembly language of the
computer system, called target program.
Errors
Source
program
Translator
Data
M/c language
program
Target
program
Interpretation
The interpreter reads the source program and stores it in its memory.
The CPU uses the program counter (PC) to note the address of the next instruction to be
executed.
The statement would be subjected to the interpretation cycle, which could consist the following
steps:
1.
2.
3.
TEJAS PATEL
Fetch the
instruction
Analyse the
statement and
determine its
meaning, the computation to be performed and its operand.

Execute the meaning of the statement.
Page
2

Language
Processor
Interpreter
Memory
Source
prog.
+
Data
PC
Error
Q.3
What is phases and passes of compiler?

Language processing= analysis of SP + synthesis of TP.
Each language processor consists of mainly two phases
1. Analysis phase
2. Synthesis phase
Analysis phase uses each component of source language to determine relevant information
concerning a statement in the source statement. Thus, analysis of source statement consists of
lexical, syntax and semantic analysis.(Front end)
While, synthesis phase is concerned with the construction of target language. It includes mainly
two activities memory allocation and code generation.(Back end)
Language Processor
Source
Program
Analysis
phase
Errors
Synthesis
phase
Target
Program
Errors
If language processing can be performed on statement by statement basis- that is, analysis of
source statement cab be immediately followed by synthesis of equivalent target statement. This
may not be feasible due to:
Forward reference: a forward reference of a program entity is a reference to the entity which
precedes its definition in the program.
This problem can be solved by postponing the generation of target code until more information
concerning the entity becomes available.
It leads to multipass model of language processing.
Language processor pass: a language processor pass is the processing of every statement in a
source program, to perform language processing function.
In Pass I: Perform analysis of the source program and note relevant information
In Pass II:
It once again analyses the source program to generate target code using type
information noted in pass I.

The language processor performs certain processing more than once.
This can be avoided using an intermediate representation (IR) of the source program
TEJAS PATEL
3
Page

Language
Processor
An intermediate representation is a representation of a source program which reflects the effect
of some, but not all, analysis and synthesis task performed during language processing.
Source
Program
Front End
Target
Program
Back End
Intermediate
representation (IR)
Phases of Language processor (Toy compiler)

Lexical Analysis (Scanning)
Lexical analysis identifies the lexical unit in a source statement. Then it classifies the units into
different lexical classes. E.g. ids, constants, keyword etc...And enters then into different tables.
This classification
language.
may be based on the nature of a string or on the specification of the source
(For example, while an integer constant is a string of digits with an optional sign,
a reserved id is an id whose name matches one of the reserved names mentioned in the
language specification.)
Lexical analysis builds a descriptor, called a token. We represent token as
Consider following code
code#no
i: integer;
a,b: real;
a=b+i;
The statement a:b+i is represented as a string of token
a
Id#1
Op#1
Id#2
Op#2
Id#3
Syntax analysis (parsing)

Syntax analysis processes the string of token to determine statement class and also check
whether given statement is syntax wise valid or not.
It then builds an IC which represents the structure of the statement.
semantic analysis to determine the meaning of the statement.
The IC is passed to
Semantic Analysis
Semantic
analysis of declaration
imperative statements.
statements
differs from the semantic
The former results in addition of information
Type, length and dimensionality
of variables.
analysis of
to the symbol table, e.g.
TEJAS PATEL
4
Page

Language
Processor
The letter identifies the sequence of actions necessary to implement the meaning of a source
statement.
In both cases the structure of a source statement guides the application of the semantic rules.
When semantic analysis determines the meaning of a sub tree in the IC, it adds information to a
table or adds an action to the sequence of actions.
It then modifies the IE to enable further semantic analysis. The analysis ends when the tree has
been completely processed. The updated tables and the sequence of actions constitute the IR
produced by the analysis phase.
It adds information to a table or adds action to the sequence of actions.
The analysis ends when the tree has been completely processed.
=
a, real
a, real
+
a,real
temp,real
b, real
b,real
i, int
i*,real
Intermediate representation
IR contains intermediate code and table.
Symbol table
symbol
Type
int
real
real
i*
real
temp
real
length
address
Intermediate code
1. Convert(id1#1) to real, giving (id#4)
2. Add(id#4) to (id#3), giving (id#5)
3. Store (id#5) in (id#2)
Memory allocation
Memory allocation
is a simple task given the presence of the symbol table. The memory
requirement of an identifier is computed from its type, length and dimensionality and memory is
allocated to it.
The address of the memory area is entered in the symbol table. After memory allocation,
the
symbol table looks as shown below.

Symbol
Type
int
2000
real
2001
real
2002
TEJAS PATEL
5
length
address
Page

Language
Processor
Code generation
the synthesis phase may decide to hold the value of i* and temp in machine registers and may
generate the assembly code
Q.4
CONV_R
AREG, I
ADD_R
AREG, B
MOVEM
AREG, A
Explain following terms

Formal language: A language L can be considered to be a collection of valid sentences. Each
sentence consists of sequence of words, and each word as a sequence of letter or graphic
symbols acceptable in L. a language specified in this manner is known as a formal language.
Terminal symbol: The alphabet of L, denoted by Greek symbol , is the collection of symbols in
character set. Lower case letters a,b,c is used to denote symbols in . A symbol in the alphabet
is known as terminal symbol of L.
Nonterminal symbols: A nonterminal symbol is the name of a syntax category of a language.
An NT written as a single capital letter or as a name enclosed between <>
Production: a production is also called a rewriting rule, is a rule of the grammar. A production
has the form
Nonterminal symbol := string of Ts and NTs
Ex: <article>:= a | an | the
Grammar: a grammar G of a language LG is a quadruple ( , SNT, S, P) where
= alphabet of LG, the set of terminals
SNT= set of NTs
S= distinguished symbol( start symbol)
P= set of production
Binding: a binding is the association of an attribute of a program entity with a value
Binding time: binding time is the time at which binding is performed.
Static binding: a static binding is a binding performed before the execution of a program
begins.
Dynamic binding: a dynamic binding is a binding performed after the execution of a program
has begun.
Q.5
Explain Derivation, Reduction and Parse tree?

Derivation
Reduction
Let production P1 of grammar G be the form
Let production P1 of grammar G be the form
P1: A:=
P1: A:=
And let be a string such that = A , then
And let be a string such that = , then
replacement of A by in string constitutes a
replacement of by A in string constitutes a
derivation according to production P1.
reduction according to production P1.
The derivation operation helps to generate

valid string.
Reduction operation helps to recognize valid

strings.
TEJAS PATEL
6
Page

Language
Processor
Consider the grammar G
<sentence>= <noun phrase><verb phrase>
<noun phrase>= <article><noun>
<verb phrase>= <verb><noun phrase>
<article>= a| an| the
<noun>= boy | apple
<verb>= ate
EX: according to grammar we perform the
The following strings are sentential forms
following reduction.
of LG
The boy ate an apple
<noun phrase><verb phrase>
<article> boy ate an apple
<article><noun><verb phrase>
<article><noun> ate an apple
The boy <verb phrase>
<article><noun><verb> an apple
The boy <verb><noun phrase>
<article><noun><verb><article> apple
The boy ate <article><noun>
<article><noun><verb><article><noun>
The boy ate an apple
<noun phrase><verb><article><noun>
<noun phrase><verb><noun phrase>
<noun phrase><verb phrase>
<sentence>
Parse tree
A sequence of derivation or reduction reveals the syntactic structure of a string with respect to G.
We depict the syntactic structure in the form of a parse tree.
Derivation according to the production A:= gives rise to the following elemental parse tree
..
NTi
Ex:
<sentence>
<Noun phrase>
<Article>
. < Noun> i
<Verb phrase>
< Noun>
<Noun phrase>
<Article>
The
TEJAS PATEL
boy
ate
an
<Noun>
apple
Page

Language
Processor
Q.6
Give classification of grammars

Type-0 grammars
This grammar known as phrase structure grammar or unrestricted grammar, contains production
of the form
A->X
Where A and X can be strings of Ts and NTs.
Type-1 grammar
This grammar is also known as context sensitive grammar because their productions specify that
derivation or reduction of strings can take place only in specific contexts.
A grammar G is said to context sensitive if all the production are in the form of
X->Y
Where
X->combination of T and NT with at least one NT
Y->combination of T and NT and should be non empty.
Length of Y must be greater than or equal to X.
Type-2 grammar
Type2 grammar is also called context free grammar.
A grammar is said to be context free grammar if all the production in the form
A->X
Where A-> single NT
X-> combination of T and NT.
Type-3 grammar (regular grammar)
Left linear grammar
A grammar is said to be left linear grammar if the leftmost character symbol of RHS of
production rule is NT.
A->Ba | a
Right linear grammar
A grammar is said to be right linear grammar if the rightmost character symbol of RHS of
production rule is NT.
A->a | aB
Operator Grammar
An operator grammar is the grammar none of whose production contains two or more
consecutive NT, in any RHS alternative.
TEJAS PATEL
8
Page

Q.1
Linker & Loader
Define the following terms

1) Translation time address: Translation time address is used at the translation time. This
address is assigned by translator
2) Linked time address:Link time address is used at the link time. This address is assigned by
linker
3) Load time address:Load time address is used at the load time. This address is assigned by
loader
4) Translated origin: Address of origin assumed by the translator
5) Linked origin: Address of origin assumed by the linker while producing a binary program
6) Load origin: Address of origin assumed by the loader while loading the program for
execution.
Q.2
Describe in detail how relocation and linking is performed.
Program relocation is the process of modifying the addresses used in the address sensitive
instruction of a program such that the program can execute correctly from the designated
area of memory.
If linked origin translated origin, relocation must be performed by the linker.
If load origin linked origin, relocation must be performed by the loader.
Let AA be the set of absolute address - instruction or data addresses used in the instruction
of a program P.
AA implies that program P assumes its instructions and data to occupy memory words
with specific addresses.
Such a program called an address sensitive program contains one or more of the
following:
An address sensitive instruction: an instruction which uses an address i AA.
An address constant: a data word which contains an address i AA.
An address sensitive program P can execute correctly only if the start address of the memory
area allocated to it is the same as its translated origin.
To execute correctly from any other memory area, the address used in each address
sensitive instruction of P must be corrected.
Performing relocation
Let the translated and linked origins of program P be t_originp and l_originp, respectively.
Consider a symbol symb in P.
Let its translation time address be tsymb and link time address be lsymb.
The relocation factor of P is defined as
Relocation _factorp=l_originp-t_originp
Note that relocation_factorp can be positive, negative or zero.
Consider a statement which uses symb as an operand. The translator puts the address tsymb
in the instruction generated for it. Now,
.....(1)
TEJAS PATEL
1
Page

Tsymb= t_originp + dsymb
Where dsyml_bis the offset of symb in P. Hence
lsymb = l_originp + dsymb
Using (1),
lsymb = t_originp + Relocation _factorp + dsymb
Linker & Loader
= t_originp + dsymb+ Relocation _factorp

= tsymb + Relocation _factorp
.....(2)
Let IRPp designate the set of instructions requiring relocation in program P. Following (2) ,
relocation of program P can be performed by computing the relocation factor for P and
adding it to the translation time address(es) in every instruction i IRP p.
Linking
Consider an application program AP consisting of a set of program units SP = {Pi}.
A program unit Pi interacts with another program unit Pj by using addresses of Pjs
instructions and data in its own instructions.
To realize such interactions, Pj and Pi must contain public definitions and external references
as defined in the following: (Explain public definition and external reference)
o
Public definition: a symbol pub_symb defined in a program unit which may be

referenced in other program units.
External reference: a reference to a symbol ext_symb which is not defined in the

program unit
Q.3
What is program relocation? Explain characteristics of self-relocating programs.

Definition (program relocation): Program relocation is the process of modifying the addresses used in
the address sensitive instruction of a program such that the program can execute correctly from the
designated area of memory.
Self Relocating Programs
A self relocating program is a program which can perform the relocation of its own address
sensitive instructions.
It contains the following two provisions for this purpose:

o
A table of information concerning the address sensitive instructions exists as a part of

the program.
Code to perform the relocation of address sensitive instructions also exists as a part
of the program. This is called the relocating logic.
The start address of the relocating logic is specified as the execution start address of the
program.
Thus the relocating logic gains control when the program is loaded in memory for the
execution.
It uses the load address and the information concerning address sensitive instructions to
TEJAS PATEL
2
Page

Linker & Loader
perform its own relocation.
Execution control is now transferred to the relocated program.
A self relocating program can execute in any area of the memory.
This is very important in time sharing operating systems where the load address of a
program is likely to be different for different executions.
Q.4
Explain design of linker

1. Relocation & linking requirement in segmented addressing
Use of the segmented addressing structure reduces the relocation requirement of a

program.
Sr.No.
statement
Offset
0000
0001
DATA_HERE
SEGMENT
0002
ABC
DW
0003
DW?
0012
SAMPLE
SEGMENT
ASSUME
0013
25
0002
CS:SAMPLE
DS:DATA_HERE
0014
MOV
AX, DATA_HERE
0000
0015
MOV
DS, AX
0003
0016
JMP
0005
0017
MOV
AL, B
0008
AX, BX
0196
0027
MOV
0043
SAMPLE
ENDS
0044
END
Consider the above program, the ASSUME statement declares the segment register
CS and DS to be available for memory addressing.
Hence all memory addressing is performed using suitable displacement for their
contents.
Translation time address of A is 0196. In statement 16, a reference to A is assembled

as a displacement of 196 form the content of the CS register.
This avoids the use of an absolute address; hence the instruction is not address
sensitive. Now no relocation is needed is segment SAMPLE is to be loaded with the
address 2000 because the CS register would be loaded with the address 2000 by a
calling program.
The effective operand address would be calculated as <CS>+0196, which is the

correct address 2196.
A similar situation exists with the reference to B in statement 17. The reference to B
TEJAS PATEL
3
Page

Linker & Loader
is assembled as a displacement of 0002 from the content of DS register.
Since the DS register would be loaded with the execution time address of
DATA_HERE, the reference to B would be automatically relocated to correct address.
2. Linking requirement
In FORTRAN all program units are translated separately, hence all sub program calls
and common variable references require linking.
Pascal procedures are typically nested inside the main program; hence procedure
references do not require linking.
In C, program files are program files translated separately so, only function calls that
cross file boundaries and references to global data require linking.
A name table (NTAB) is defined for use in program linking. Each entry of the table
contains the following fields:
Symbol: symbolic name of an external reference or an object module
Linked_address: for a public definition, this field contains linked address of the
symbol. For an object module, it contains the linked origin of the object module.
Q.5
Write a brief note on MS-DOS linker
We discuss the design of a linker for the Intel 8088/80x86 processors which resembles LINK
of MS DOS in many respects.
It may be noted that the object modules of MS DOS differ from the Intel specifications in
some respects.
Object Module Format (Explain object module of the program)
An Intel 8088 object module is a sequence of object records, each object record describing
specific aspects of the programs in the object module.
There are 14 types of object records containing the following five basic categories of
information:
Binary image (i.e. code generated by a translator)
External references
Public definitions
Debugging information (e.g. line number in source program).
Miscellaneous information (e.g. comments in the source program).
We only consider the object records corresponding to first three categories-a total of eight
object record types.
Each object record contains variable length information and may refer to the contents of
previous object records.
Each name in an object record is represented in the following format:

length( 1 byte)
THEADR, LNAMES and SEGDEF records
name
TEJAS PATEL
4
Page

Linker & Loader
THEADR record
80H
length
T-module name
check-sum
The module name in the THEADR record is typically derived by the translator from the source
file name.
This name is used by the linker to report errors.
An assembly programmer can specify the module name in the NAME directive.
LNAMES record
96H
length
name-list
check-sum
The LNAMES record lists the names for use by SEGDEF records.
SEGDEF record
98H
length
attributes
segment length
name index
(1-4)
(2)
(1)
check-sum
A SEGDEF record designates a segment name using an index into this list.
The attributes field of a SEGDEF record indicates whether the segment is relocatable or
absolute, whether (and in what manner) it can be combined with other segments, as also the
alignment requirement of its base address (e.g. byte, word or paragraph, i.e. 16 byte,
alignment).
Stack segments with the same name are concatenated with each other, while common
segments with the same name are overlapped with one another.
The attribute field also contains the origin specification for an absolute segment.
EXTDEF and PUBDEF record

8CH
90H
length
length
base
external reference list
name
offset
check-sum
check-sum
(2-4)
The EXTDEF record contains a list of external references used by the programs of this
module.
A FIXUPP record designates an external symbol name by using an index into this list.
A PUBDEF record contains a list of public names declared in a segment of the object module.
The base specification identifies the segment.
Each (name, offset) pair in the record defines one public name, specifying the name of the
symbol and its offset within the segment designated by the base specification.
LEDATA records
A0H
length
segment index
data offset
(1-2)
(2)
data
check-sum
An LEDATA record contains the binary image of the code generated by the language
translator.
Segment index identifies the segment to which the code belongs, and offset specifies the
TEJAS PATEL
5
Page

Linker & Loader
location of the code within the segment.

FIXUPP record
9CH
length
locat
fix
frame
target
target
(1)
dat
datum
datum
offset
(1)
(1)
(1)
(2)
checksum
A FIXUPP record contains information for one or more relocation and linking fixups to be
performed.
The locat field contains a numeric code called loc code to indicate the type of a fixup.
The meanings of these codes are given in Table

Loc code
Meaning
Low order byte is to be fixed.
Offset is to be fixed.
Segment is to be fixed.
Pointer (i.e., segment: offset) is to be fixed.
locat also contains the offset of the fixup location in the previous LEDATA record.
The frame datum field, which refers to a SEGDEF record, identifies the segment to which the
fixup location belongs.
The target datum and target offset fields specify the relocation or linking information.
Target datum contains a segment index or an external index, while target offset contains an
offset from the name indicated in target datum.
The fix dat field indicates the manner in which the target datum and target offset fields are to
be interpreted.
The numeric codes used for this purpose are given in below table.
code
contents of target datum and offset fields
Segment index and displacement.
External index and target displacement.
Segment index (offset field is not used).
External index (offset field is not used).
MODEND record
8AH
length
type
start addr
(1)
(5)
check-sum
The MODEND record signifies the end of the module, with the type field indicating whether it
is the main program.
This record also optionally indicates the execution start address.
This has two components: (a) the segment, designated as an index into the list of segment
names defined in SEGDEF record(s), and (b) an offset within the segment.
TEJAS PATEL
6
Page

Q.6
Linker & Loader
What is an overlay? Explain overlay structured program and its execution.
An overlay is part of a program (or software package) which has the same load origin as
some other part of the program.
Overlay is used to reduce the main memory requirement of a program.
Overlay structured program
We refer to a program containing overlays as an overlay structured program. Such a

program consists of
o
A permanently resident portion, called the root.
A set of overlays.
Execution of an overlay structured program proceeds as follows:
To start with, the root is loaded in memory and given control for the purpose of execution.
Other overlays are loaded as and when needed.
Note that the loading of an overlay overwrites a previously loaded overlay with the same load
origin.
This reduces the memory requirement of a program.
It also makes it possible to execute programs whose size exceeds the amount of memory
which can be allocated to them.
The overlay structure of a program is designed by identifying mutually exclusive modules

that is, modules which do not call each other.
Such modules do not need to reside simultaneously in memory.
Execution of an overlay structured program
For linking and execution of an overlay structured program in MS DOS the linker produces a
single executable file at the output, which contains two provisions to support overlays.
First, an overlay manager module is included in the executable file.
This module is responsible for loading the overlays when needed.
Second, all calls that cross overlay boundaries are replaced by an interrupt producing
instruction.
To start with, the overlay manager receives control and loads the root.
A procedure call which crosses overlay boundaries leads to an interrupt.
This interrupt is processed by the overlay manager and the appropriate overlay is loaded into
memory.
When each overlay is structured into a separate binary program, as in IBM mainframe
systems, a call which crosses overlay boundaries leads to an interrupt which is attended by
the OS kernel.
Q.7
Control is now transferred to the OS loader to load the appropriate binary program.
TEJAS PATEL
7
Page

Linker & Loader
Explain different loading scheme

1) Compile & go loader
Assembler is loaded in one part of memory and assembled program directly into their
assigned memory location
After the loading process is complete, the assembler transfers the control to the starting
instruction of the loaded program.
Advantages
The user need not be concerned with the separate steps of compilation, assembling,
linking, loading, and executing.
Execution speed is generally much superior to interpreted systems.
They are simple and easier to implement.
Program
loader in
memory
Source
program
Compiler & go
assembler
Assembler
Disadvantages
There is wastage in memory space due to the presence of the assembler.
The code must be reprocessed every time it is run.
2) Absolute loader
It is a simple type of loader scheme which fits object code into main memory without
relocation.
This load accepts the machine text and placed into main memory at location prescribe by the
translator.
Advantage
Very simple
Disadvantage
Programmer must specify load address
In multiple subroutines environment programmer requires to do linking.
3) Subroutine linkage loader
A program unit Pi interacts with another program unit Pj by using address of Pj s instruction
and data in its own instruction.
To realize such instruction pj an dpi must contain public definitions and external reference
Public definition: program unit which may be referenced in other program unit
External reference: This is not defined in program unit containing the reference.
ENTRY statement: this list the public definition of the program unit.
EXTRN statement: lists the symbol in which external references are made in the program
TEJAS PATEL
8
Page

Linker & Loader
unit.
4) Relocating loader (BSS loader)
To avoid possible assembling of all subroutine when a single subroutine is changed and to
perform task of allocation and linking for the programmer, the general class of relocating
loader was introduced.
Binary symbolic loader (BSS) is an example of relocating loader.
The output of assembler using BSS loader is

1. Object program
2. Reference about other program to be accessed
3. Information about address sensitive entities.
Let us consider a program segment as shown below

Offset=10
ADD AREG,X
Offset=30
X DS 1
In the above program the address of var5iable X in the instruction ADD AREG, X will be 30
If this program is loaded from the memory location 500 for execution then the address of X
in the instruction ADD AREG, X must become 530.
Offset=10
ADD AREG,X
500
ADD AREG,X
X DS 1
Offset=30
530
X DS 1
Use of segment register makes a program address insensitive
Actual address is given by content of segment register + address of operand in instruction
So, 500+30=530 is actual address of variable X.
5) Direct linking loader
It is a general re-locatable loader and is perhaps the most popular loading scheme presently
used.
Advantages
Allowing multiple segments
Allowing multiple data segment
Flexible intersegment referencing
Accessing ability
Relocation facility
Disadvantage
Not suitable in multitasking
TEJAS PATEL
9
Page

Linker & Loader
6) Dynamic loader
It uses overlay structure scheme
In order for the overlay structure to work, it is necessary for the module loader to load their
various procedures as they are needed.
The portion of the loader that actually interprets the calls and loads the necessary procedure
is called overlay supervisor or flipper.
Q.8
This overlay scheme is called dynamic loading or load on call( LOCAL)
An algorithm for first pass of a linker

1. Extract load_origin from the command line.
2. Repeat step 3 for each module to be linked.
3. Select the next object module from the command line. For each record in the object
module:
(a) If an LNAMES record then enter the name of the module in the name directory
(NAMED).
(b) If a SEGDEF record then
(i)
i= name index from the record

(ii) segment_name= NAMED [i]
(iii) If an absdute segment then enter (segment_name, segment_addr) in ESD.
(iv) If the segment is relocatable then
Align load_origin with the next paragraph. It should be multiple of lb.
Enter (segment_name, load_origin) in ESD.
Load_origin= load_origint segment length.
(c) If a PUBDEF record then

(i) i= base
(ii) Segment_name= NAMED [i] Symbol = name
(iii) Segment_addr= load address of segment_name in ESD
(iv) SymboLaddr= segment_addr+ offset
(v) Enter (symbol, s)'mbol_addr) in ESD.
Q.9
Object module of linker
The objet module of a program contains all information necessary to relocate and link the
program with other programs.
The object module of a program P consists of 4 components:

1.
Header: The header contains translated origin, size and execution start address of P.
2.
Program: This component contains the machine language program corresponding to P.
3.
Relocation table: (RELOCTAB) This table describes IRRP. Each RELOCTAB entry contains a
TEJAS PATEL
10
Page

Linker & Loader
single field:
Translated address: Translated address of an address sensitive instruction.
4.
Linking table (LINKTAB): This table contains information concerning the public definitions and
external references in P.
Each LINKTAB entry contains three fields:
Symbol:
Symbolic name
Type:PD/EXT indicating whether public definition or external reference

Translated address:
For a public definition, this is the address of the first memory word
allocated to the symbol. For an external reference, it is the address of
the memory word which is required to contain the address of the symbol.
Example:
Statement
Address
START
500
ENTRY
TOTAL
EXTRN
HAX, ALPHA
Code
+ 09 0 540
A
READ
500) 501)
LOOP
.
.
.
+ 04 1 000
MOVER
AREG, ALPHA
518)
BC
ANY, HAX
519)
+ 06 6 000
. .
.
BC
LT, LOOP
STOP
538) 539)
DS
TOTAL
DS
540)
END
541)
+ 06 1 601
+ 00 0 000
1. Translated origin=500, size=42, execution start address=500.

2. Machine language instruction shown in code
3. Relocation table
500
538
4. Linking table
TEJAS PATEL
ALPHA
EXT
518
MAX
EXT
519
PD
540
Page
11

Q.1
MacroProcessors
Explain macro, macro definition and Macro call

Macro: macro is a unit of specification for program generation through expansion.
Macro definition: macro definition is enclosed between macro header and macro end statement
Macro definition consist of

1.
Macro prototype statement: it declares macro name and formal parameter list
2.
One or more model statement: from which an assembly statement can be generated
3.
Macro preprocessor statement: used to perform auxiliary function
Macro call:A macro is called by writing macro name in the mnemonics field and set of actual
parameters.
<macro name>[<actual parameter name>]
Q.2
Explain macro expansion

A macro call leads to macro expansion. During macro expansion, the macro call statement is replaced
by a sequence of assembly statements.
Each expanded statement is marked with a + preceding its label field.
Two key notations concerning macro expansion are:
1. Expansion time control flow
2. Lexical Substitution
1. Flow of control during expansion
This determines the order in which model statements are visited during macro expansion.
Default flow of control during macro expansion is sequential.
A preprocessor statement can alter flow of control during expansion such that model
statements are never visited during expansion (conditional expansion) or repeatedly visited
during expansion (expansion time loop).
The flow control during macro expansion is implemented using a macro expansion
counter(MEC)
Algorithm:
1. MEC:= statement number of first statement following the prototype statement;
2. While statement pointed by MEC is not a MEND statement
(a) If a model statement then
(i) Expand the statement.
(ii) MEC:= MEC+1;
(b) Else (i.e. a preprocessor statement)
(i) MEC:= new value specified in the statement;
3.
Exit from macro expansion
MEC is set to point at the statement following the prototype statement. It is incremented by 1after
expanding a model statement
TEJAS PATEL
1
Page

MacroProcessors
2. Lexical substitution
Amodel statement consist of 3 types of strings:
1. An ordinary string, which stand for itself
2. Name of formal parameter which is preceded by the character &.
3. Name of preprocessor variable, is preceded by the character &.
During lexical expansion, strings of type 1 are retained without substitution. Strings of types 2 and 3
are replaced by the values of the formal parameters or preprocessor variables.
2.1 Positional parameters
A positional formal parameter starts with '&' sign and it is defined in operand field of macro
name.
The actual parameters of macro call on macro using positional parameters are simply ordinary
string.
The value of first actual parameter of macro call is assigned to first positional formal
parameter defined in operand field of macro name.
The value of second actual parameter of macro call is assigned to second positional" formal
parameter defined in operand field of macro name.
Similarly the value of nth actual parameter is assigned to nth positional formal parameter
defined in operand field of macro name.
Positional parameter is always used at the place of operand2.
Value of positional parameter should not be keywords.
2.2 Keyword parameters
A keyword formal parameter starts with &KW string or &OP string or &REG or &CC depending
on macro processor. It is defined in operand field of macro name.
A keyword formal parameter ends with = sign depending on macro processor. It is defined in
operand field of macro name.
Formal keyword parameter mayor may not have default value. Again this is depends on macro
processor.
The actual parameter of macro call on macro using keyword parameter is simply ordinary
string if they are used as positional parameters.
Keyword parameter is always used at the place of mnemonic instruction or at the place of
operand 1.
Value of keyword parameter is always keywords. That are ADD, SUB, AREG, BREG, LT, LE etc.
2.3 Label parameters
A label formal parameter starts with &LAB string depending on macro processor. It is defined
in operand field of macro name.
A label formal parameter ends with = sign depending on macro processor. It is defined in
operand field of macro name.
Every label formal parameter should not have any default value. Again this depends on macro
processor.
The actual parameter of macro call on macro using label parameter is simply ordinary string if
they are used as a positional parameter.
TEJAS PATEL
2
Page

Label parameter is always used at the place of label field.
Value of label parameter should not be keyword.
MacroProcessors
2.4 Macros with mixed parameters lists
A macro may be defined to all parameters i.e. positional parameter, keyword parameter and
label parameter
Q.3
Explain types of parameter

Positional Parameter
A positional formal parameter is written as &<parameter name>
The value of a positional parameter XYZ is determined by the rule of positional association as
follows:
1.
Find the original position of XYZ in the list of formal parameter in the macro prototype
statement.
2.
Find the actual parameter specification occupying the same ordinal position in the list
of actual parameter in macro call statement.
Keyword parameter
Keyword parameters are used for following purposes: 1.
Default value can be assigned to the parameter
2.
During a macro call, a keyword parameter is specified by its name. it takes the
following form:
<parameter name>=<parameter value>
MACRO
INCR &VARIABLE=X, &INCR=Y, &REG=AREG
MEND
VARIABLE is a keyword parameter with default value as X

INCR is a keyword parameter with default value as Y
REG is a keyword parameter with default as AREG
The position of keyword parameter during macro call is not important.
Q.4
Compare the features of subroutine and macros with respect to following: (i) Execution
Speed (ii) Processing requirement by assembler (iii) Flexibility and generality
Macros use string replacement for its invocation whereas subroutines use calls.
Due to replacement nature, macro can exist multiple copies in the programs whereas
subroutines can exist only in one copy.
Because of multiple copies possibility, you cannot obtain a macros address, whereas you can
obtain a subroutines address.
Macros can be faster since it doesnt have calling and return time penalty.
Macros can be harder to debug since the replacement may be obstacle in the resulting code.
TEJAS PATEL
Page

MacroProcessors

(i) Execution speed
MACRO
SUBROUTINE
At the time of execution each and every
At the time of execution, execution control
macro call replaced with macro definition i.e.
transfers
it expands main program
execution of subroutine it returns to the

main
to
the
program
subroutine
again
and
and
after
executing
remaining instructions.
This
process
not
required
any
stack
manipulation operation during the execution
operation during the execution of program.
of program
This process requires stack manipulation
That means it stores current address in

stack and the execution control goes to sub
routine, after execution of subroutine it pop
address from stack and return to the main
program.
It
requires
extra
processing
time
for
expansion but at once.
It not requires extra processing time for

expansion but every time at each subroutine
call it requires stack manipulation operation.
Speed of its object code is very fast because
it not requires any stack manipulation.
Speed of its object code is becomes slow

because it requires stack manipulation at
each subroutine.
(ii) Processing requirement by assembler

MACRO
In assembly level macro should be defined
SUBROUTINE
subroutine can be defined anywhere.
before main program.
In high level macro can be defined any
macro statement in assembly language is as
i.e. before or after the main program. This

depends on high level language.
where i.e. before or after the main program.
In assembly level as well as in high level
subroutine call in assembly language is
follows
follows
[Label]<Macro name>[<parameter list>]
[Label]<CALL><Subroutine name>
Example:
FACTORIAL
A,FACT
Example:
CALL FACTORIAL
Where FACTORIAL is the name of macro and
Where
A, FACT is the list of actual parameters.
subroutine.
(iii)
is
the
name
of
nesting
in
Flexibility and generality

MACRO
FACTORIAL
In assembly level programming looping and
SUBROUTINE
We
can
use
looping
and
nested looping like facilities used in macro
subroutines in low level as well as in high
In high level programming looping and
level.
nested looping should not be used
TEJAS PATEL
4
Page

MacroProcessors
Its object code requires large amount of
main memory as well as secondary memory
It is suitable in real time operating system or
memory as well as secondary memory
environment
It is not suitable in real time operating

system or environment
Here time factor is more important than
space
Q.5
Its object code requires less amount of main
Here space factor is more important than

time
Explain nested macro calls OR

Define two macros of your choice to illustrate nested calls to these macros. Also show their
corresponding expansion.
A model statement in a macro may constitute a call on another macro. Such calls are known
as nested macro calls.
We refer to the macro containing the nested call as the outer macro and the called macro as
the inner macro.
Expansion of nested macro calls follows the last-in-first-out (LIFO) rule. Thus, in a structure of
nested macro calls, expansion of the latest macro call (i.e. the innermost macro call in the
structure) is completed first.
Example
The below defined is the definition of INCR_D macro.
MACRO
INCR_D
&MEM_VAL=,&INCR_VAL=, &REG=AREG
MOVER
&REG, &MEM_VAL
ADD
&REG, &INCR_VAL
MOVEM
&REG, &MEM_VAL
MEND
Macro COMPUTE defined below contains a nested call on macro INCR_D defined above.
MOVEM
BREG, TM
MOVER
BREG, X
ADD
BREG, Y
MOVEM
BREG, X
MOVER
BREG, TM
MACRO
COMPUTE
&FIRST, & SECOND
MOVEM
BREG, TMP
INCR_D
&FIRST, & SECOND, REG=BREG
MOVER
BREG, TMP
MEND
TEJAS PATEL
5
Page

MacroProcessors

The expanded code for the call
COMPUTE
X, Y
is described as follows.
+
MOVEM BREG TEMP[1]

+ MOVER BREG, X 2
COMPUTE X , Y
INCR_D
X,Y
MOVER BREG,TEMP[5]
+ ADD BREG, Y 3
+ MOVEM BREG, X 4
Q.6
Advanced macro facilities
1. Alteration of flow of control during expansion

Expansion time statement: OR (Explain expansion time statements AIF and AGO for macro
programming)
AIF
An AIF statement has the syntax:

o
AIF (<expression>) <sequencing symbol>
where<expression> is a relational expression involving ordinary strings, formal parameters

and their attributes, and expansion time variables.
If the relational expression evaluates to true, expansion time control is transferred to the
statement containing <sequencing symbol> in its label field.
AGO
An AGO statement has the syntax:

o
AGO <sequencing symbol>
It unconditionally transfers expansion time control to the statement containing <sequencing

symbol> in its label field.
Expansion time loopsor (Explain expansion time loop)
It is often necessary to generate many similar statements during the expansion of a macro.
This can be achieved by writing similar model statements in the macro.
Expansion time loops can be written using expansion time variables (EVs) and expansion time
control transfer statements AIF and AGO.
Example
MACRO
&M
CLEAR
&X, &N
LCL
&M
SET
MOVER
AREG, =0
MOVEM
AREG, &X+&M
.MORE
SET
&M + 1
&M
AIF
(&M NE N) .MORE
TEJAS PATEL
6
Page

MacroProcessors

MEND
The LCL statement declares M to be a local EV.
At the start of expansion of the call, M is initialized to zero.
The expansion of model statement MOVEM, AREG, &X+&M thus leads to generation of the
statement MOVEM AREG, B.
The value of M is incremented by 1 and the model statement MOVEM..is expanded

repeatedly until its value equals the value of N.
2. Expansion time variable or (Explain expansion time variable with example)
Expansion time variables (EV's) are variables which can only be used during the expansion of
macro calls.
A local EV is created for use only during a particular macro call.
A global EV exists across all macro calls situated in a program and can be used in any macro
which has a declaration for it.
Local and global EV's are created through declaration statements with the following syntax:
o
LCL <EV specification> [,<EV specification> .. ]
GBL <EV specification> [,<EV specification> .. ]
<EV specification> has the syntax &<EV name>, where <EV name> is an ordinary string.
Values of EV's can be manipulated through the preprocessor statement SET.
A SET statement is written as:

o
< EV specification > SET <SET-expression>
where< EV specification > appears in the label field and SET in mnemonic field.
A SET statement assigns value of <SET-expression> to the EV specified in < EV specification

>.
Example
MACRO
CONSTANTS
&A
&A
LCL
&A
SET
DB
&A
SET
&A+l
DB
&A
MEND
The local EV A is created.
The first SET statement assigns the value '1' to it.
The first DB statement thus declares a byte constant 1.
The second SET statement assigns the value '2' to A and the second DB statement declares a
constant '2'.
3.
Attributes of formal parameter
An attribute is written using the syntax

<attribute name> <formal parameter spec>
It represents information about the value of the formal parameter, i.e. about the
TEJAS PATEL
7
Page

MacroProcessors

corresponding actual parameter.
The type, length and size attributes have the names T, L and S.
Example
MACRO
DCL_CONST
&A
AIF
(L'&A EQ 1) .NEXT
--.NEXT
-MEND
Here expansion time control is transferred to the statement having .NEXT field only if the
actual parameter corresponding to the formal parameter length of ' 1'.
Q.7
Explain lexical and semantic expansion OR

Explain tasks involved in macro expansion.
Lexical expansion:
Lexical expansion implies replacement of a character string by another character string during
program generation.
Lexical expansion is to replace occurrences of formal parameters by corresponding actual

parameters.
Semantic expansion:
Semantic expansion implies generation of instructions tailored to the requirements of a

specific usage.
Semantic expansion is characterized by the fact that different uses of a macro can lead to
codes which differ in the number, sequence and opcodes of instructions.
Eg: Generation of type specific instructions for manipulation of byte and word operands.
Semantic expansion is the generation of instructions tailored to the requirements of a specific

usage.
It can be achieved by a combination of advanced macro facilities like AIF, AGO statements and
expansion time variables.
Here, the number of MOVEM AREG, Statements generated bya call on CLEAR is determined
by the value of the second parameter of CLEAR.
Macro EVAL of example is another instance of conditional expansion wherein one of two
alternative code sequences is generated depending on the peculiarities of actual parameters of
a macro call.
Below example illustrates semantic expansion using the type attribute.

Example
MACRO
CREATE_CONST
&X, &Y
TEJAS PATEL
8
Page

MacroProcessors
&Y
AIF
(T &X EQ B) .BYTE
DW
25
&A
Q.8
.OVER
AGO .BYTE
ANOP
&Y
DB
.OVER
MEND
25
nd
This macro creates a constant 25 with the name given by the 2
The type of the constant matches the type of the first parameter.
parameter.
Describe task and data structures considered for the design of a macro preprocessor
Macro preprocessor
The macro preprocessor accepts an assembly program containing macro definitions and calls
and translates it into an assembly program which does not contain any macro definition or
calls.
Below figure shows a schematic of a macro preprocessor.
The program from output by the macro preprocessor can now be handed over to an assembler
to obtain the target form output by macro preprocessor can now be handed over to an
assembler to obtain language form of program.
Macro PreAssembler
processor
Program with macro

definitionand calls
Program
Without
Target program
Macros
Following are the task of macro preprocessor:

1. Identify macro calls in the program.
2. Determine the values of formal parameters.
3. Maintain the values of expansion time variables declared in a macro.
4. Organize expansion time control flow.
5. Determine the values of sequencing symbols.
6. Perform expansion of a model statement.
Data Structures
Task has identified the key data structures of the macro preprocessor. To obtain a detailed
design of the data structures it is necessary to apply the practical criteria of processing
efficiency and memory requirements.
The tables APT, PDT and EVT contain pairs which are searched using the first component of
the pair as a key-for example, the formal parameter name is used as the key to obtain its
value from APT. This search can be eliminated if the position of an entity within a table is
known when its value is to be accessed. We will see this in the context of APT.
The value of a formal parameter ABC is needed while expanding a model statement using it,
viz.
TEJAS PATEL
9
Page

MacroProcessors

MOVER AREG, &ABC
Let the pair (ABC, ALPHA) occupy entry #5 in APT. The search in APT can be avoided if the
model statement appears as
MOVER AREG, (P, 5)
in the MDT, where(P, 5) stand for the words parameter #5.
Thus, macro expression can be made for efficient by storing an intermediate code for a
statement, rather than its source form, in the MDT.
All parameter names could be replaced by pairs of the form (P, n) in the model statement and
preprocessor statement stored in MDT.
An interesting offshoot of this decision is that the first component of the pairs stored in APT is
no longer used during macro expansion, e.g. the information (P, 5) appearing in a model
statement is sufficient to access the value of formal parameter ABC. Hence APT containing
(<formal parameter name>, <value>) pairs is replaced by another table called APTAB which
only contains <value>'s.
To implement this simplification, ordinal numbers are assigned to all parameter of a macro. A
table named parameter name table (PNTAB) is used for this purpose.
Parameter names are entered in PNTAB in the same order in which they appear in the
prototype statement.
The entry # of a parameter's entry in PNTAB is now its ordinal number. This entry is used to
replace the parameter name in the model and preprocessor statements the macro while
storing it in the MDT.
In effect, the information (<formal parameter name>, <value>) in APT been split into two
tables PNTAB which contains formal parameter name.
APTAB - which contains formal parameter value.(i.e. contains actual parameter)
Other data structures are given below:
Table
Field in each entry
Macro name table (MNT)
Macro name, Number of positional parameters (#P
Number of keyword parameters (#KP), "Number
expansion time variables (#EV), MDT pointer (MDT

KPDTAB pointer (KPDTP), SSTAB pointer (SSTP)
Q.9
Parameter Name Table (PNTAB)
Parameter name
EV Name Table (EVNTAB)
EV name
SS Name Table (SSNTAB)
SS name
Keyword Parameter Default Table (KPDTAB)
Parameter name, default value
Macro Definition Table (MDT)
Label, Opcode, Operands
Actual Parameter Table (EVTAB)
Value
SS Table (SSTAB)
MDT entry #
Explain design specification task for macro preprocessor with suitable example
TEJAS PATEL
10
Page

MacroProcessors
Design Overview
We begin the design by listing all tasks involved in macro expansion.

1. Identify macro calls in the program.
2.
Determine the values of formal parameters.
3.
Organize expansion time control flow.
4.
Maintain the values of expansion time variables declared in a macro.
5.
Determine the values of sequencing symbols.
6.
Perform expansion of a model statement.
The following 4 step procedure is followed to arrive at a design specification for each task:
1.
Identify the information necessary to perform a task.
2.
Design a suitable data structure to record the information.
3.
Determine the processing necessary to obtain the information.
4.
Determine the processing necessary to perform the task.
Application of this procedure to each of the preprocessor tasks is described as follows.

Identify macro calls
A table called the macro name table (MNT) is designed to hold the names of macros defined in a
program. A macro name is entered in this table when a macro definition is processed. While
processing a statement in the source program, the preprocessor compares the string found in its
mnemonic field with the macro names in MNT. A match indicates that the current statement is a
macro call.
Determine value of formal parameters
A table called the actual parameter table (APT) is designed to hold the values formal parameters
during the expansion of a macro call. Each entry in the table is a pair
(<formal parameter name>,<value>)
Two items of information are needed to construct this table, names of formal parameters, and default
values of keyword parameters. For this purpose, a table called parameter default table (PDT) is used
for each macro. This table would be accessible from the MNT entry of a macro and would contain
pairs of the form (<formal parameter name>, <default value>). If a macro call statement does not
specify a value for some parameter par, its default value would be copied from PDT to APT.
Maintain expansion time variables
An expansion time variables table(EVT) contains pairs of the form
(<EV name>, <value>)
The value field of a pair is accessed when a preprocessor statement or a model statement under
expansion refers to an EV.
Organize expansion time control file
TEJAS PATEL
11
Page

MacroProcessors
Macro definition table (MDT) stores set of preprocessor statements and model statements. The flow of
control during macro expansion determines when a model statement is to be visited for expansion. It
is updated after expanding a model statement or on processing a macro preprocessor statement.
Determine values of sequencing symbols
A sequencing symbols table(SST) is maintained to hold this information. The table contains pairs of
the form
(<sequencing symbols name >, <MDT entry #>)
where<MDT entry #> is the number of the MDT entry which contains the model statement defining
the sequencing symbol.
Perform expansion of a model statement
This is a trivial task given the following:
1.
MEC points to the MDT entry containing the model statement.
2.
Values of formal parameters and EV's are available in APT and EVT, respectively.
3.
The model statement defining a sequencing symbol can be identified from SST.
4.
Expansion of a model statement is achieved by performing a lexical substitution for the

parameters and EV's used in the model statement.
Q.10
Write a macro that moves n number from the first operand to the second operand, where n
is specified as third operand of the macro.
MACRO
&source, &dest, &N
MOVEA
LL
&M
LCL
&M
SET
AREG, &source + &M
.NE
MOVER
XT
&M
AREG, &dest + &M

MOVEM
&M + 1
SET
( &M NE &N) .NEXT
AIF
MEND
Q.11
Write a macro which takes B, C and D as parameters and calculates B*C+C*D.

MACRO
EVAL
&X, &Y, &Z
MOVER
AREG, &X
MUL
AREG, &Y
MOVEM
AREG,&X
MOVER
AREG, &Y
MUL
AREG, &Z
ADD
AREG, &X
MEND
Q.12
Draw a flow chart and explain simple one pass macro processor.
TEJAS PATEL
12
Page

MacroProcessors
Start
MDTC =1
MNTC =1
Read line
From
source
No
Is Macro
Pseudo up
Yes
Is
END
Read line
From
source
Yes
Go for
Assembl
Update
MNT
N
o
Search
in
No
Found
Yes
Update
PNTAB
Read line
From
Write into
output source
file
Replace formal
parameter
Write into
output
MDTC++
Is
MEND
?
No
Yes
In this type of preprocessor only one pass is used to construct data structure and use that data
structure.
It is also called as preprocessor, Because it is processed before translator. It is shown in figure.
TEJAS PATEL
13
Page

MacroProcessors
Source code
with macro
One pass
Macro processor
MNT
MDT
PNTAB
APTAB
SSTAB
KPPTAB
Source code
without macro
Figure: One pass macro processor

Data Structure
Macro name table (MNT):
This is used to store all information of macro definition that is macro
name, MDTP, Total number of positional parameters.

Macro definition table (MDT): This is used to store all program of macro definition.
Parameter name table (PNTAB): This is used to store all positional parameter name of macro
definition.
Keyword parameter default table (KPDTAB or KPT): This is used to store all keyword parameter
name of macro definition with its default values.
EV Name table (EVNTAB or EVT): this is used to store all expansion time variable name of macro
definition with its type (global or local).
SS Name table (SSNTAB): This is used to store all labels of macro definition.
SS Table (SSTAB): This is used to store MDT entry where sequencing symbol is defined in MDT.
EV Table (EVTAB): This is used to store current all value of the expansion time variables of macro
definition.
Actual parameter table (APTAB): This is used to store name of actual parameters defined in macro
call.
Algorithm:
Step 1: Initialize all other pointer variables to 1 or 0.
MDTP=1, MNTP=1, KPTP=1, LC=1.
th
Step 2: Read LC
line from source code that means input program.
Step 3: Isolate label instruction and operand from line.

Step 4: If instruction="MACRO"
If yes
4.1: LC=LC+ 1.
th
4.2: Read LC
line from source code that means input program.
4.3: Isolate label instruction and operand from line.

4.4: Enter macro name in MNT.
TEJAS PATEL
Page
14

MacroProcessors
Find out total number of parameter, keyword parameter and expansion time variables
and store it in MNT.
Store the value of all pointers in MNT.
4.5: Update PNTAB, KPDTAB, EVNTAB, SSNTAB, SSTAB.
4.6: Increments all the pointers of updated tables.
4.7: MNTP=MNTP+1.
4.8: LC=LC+1.
th
4.9: Read LC line from source code that means input program.
4.10: Isolate label instruction and operand from line and store it into MDT at MDTP location.
4.11: MDTP=MDTP+ 1.
4.12 : If instruction="MEND"
If yes
Go to step 2.
If no
Go to step 4.6.
If no
Go to step 4.
Step 5: Search instruction in MNT.
Step 6: If instruction found in MNT?
If yes
6.1: Find out Actual parameter &store it in APTAB.
6.2: Find out MDTP from MNT.
6.3: Search macro definition from MDT at MDTP position.
6.4: Adjust all model statements as follows.
6.4.1: Replace Actual parameters with formal parameters using PNTAB, KPDTAB, and APTAB.
6.4.2: Replace each expansion time variable name with its value using EVNTAB, EVTAB.
6.4.3: Find out labels from SSNTAB and its address from SSTAB, sequence label with
sequence number and replace it in old place.
6.5: Write all these adjusted model statements in output source file.
6.6: LC=LC+1.
6.7: Go to step 2.
If no
6.8: If instruction ="END"
If yes
Go to Assembler.
If no
Write line in output source file LC=LC+1.
Go to step 2.
TEJAS PATEL
Page
15

Parsing
Q.1
What is parsing? Explain types of parsing.

Parsing or syntactic analysis is the process of analyzing a string of symbols according
to the rules of a formal grammar

Parsing is a technique that takes input string and produces output either a parse tree if string is
valid sentence of grammar, or an error message indicating that string is not a valid sentence of
given grammar.
There are mainly two types of parsing
1. Top down parsing: A top down parser for a given grammar G tries to derive a string
through a sequence of derivation starting with a start symbol.
Top down parsing methods are:
Top down parsing(with backtracking/ without backtracking)
Recursive decent parser
LL(1) parser
2. Bottom up parsing: In bottom up parsing, the source string is reduced to the start
symbol of the grammar. Bottom up parsing method is also called shift reduce parsing.
Bottom up parsing methods are:
Nave bottom up parsing
Operator precedence parsing
Q.2
Explain parse tree and abstract syntax tree.

A set of derivations applied to generate a string can be represented using a tree. Such a tree is
known as a parser tree.
While Abstract syntax tree represents the structure of a source string in more economical
manner.
EX:
Write
unambiguous
production
rules
(grammar)
for
arithmetic
expression
containing +, -, *, / and ^ (exponentiation).Construct parse tree and abstract syntax

tree for:
<id> - <id> * <id> ^ <id> + <id>.(GTU DEC_11)
Unambiguous grammar for arithmetic expression containing +, -, *, / and ^
E->E-T|T
T->T*F|F
F->F/G|G
G->G^H|H
H->H+I|I
I-><id>
TEJAS PATEL
1
Page

Parsing

Parse tree
E
F
F
<id>
<id> <id>
<id>
<id>
Abstract Syntax Tree
id
*
id
id
id
Q.3
id
Explain left factoring and left recursion.

Left Factoring:
For each non-terminal A with two or more alternatives(production rules) with a common non
empty prefix, let say
A->1 |.| n| 1|m
Converted it into
A->A| 1|m
TEJAS PATEL
2
Page

Parsing
A-> 1 |.| n
EX:
A->xByA | xByAzA | a
B->b
Left factored, the grammar becomes
A->xByAA | a
A->zA |
B-> b
Left Recursion:
A grammar is left-recursive if we can find some non-terminal A which will eventually derive
a sentential form with itself as the left-symbol.
Immediate left recursion occurs in rules of the form
Where
and are sequences of non-terminals and terminals, and doesn't start with.
For example, the rule

is immediately leftrecursive.
It could be replaced by the non-left recursive productions as
A -
-
The general algorithm to remove immediate left recursion follows.
A -mn
where:
A is a left-recursive nonterminal
is a sequence of non-terminals and terminals that is not null ( )

is a sequence of non-terminals and terminals that does not start with A.
replace the A-production by the production:
A - -m
And create a new nonterminal
-n
Q.4
Top down parsing methods

1) Nave top down parsing or brute force parsing
Naive top down parsing algorithm
Current sentential form (CSF) = S
Let CSF be of the form A, such that is a string of Ts and A is the leftmost NT in CSF.
Exit with success if CSF=
Make a derivation A->1B according to a production A=1B of G such that 1 is a string

of Ts. This makes CSF= 1B.
TEJAS PATEL
3
Page

Parsing
Go to step 2.
Ex:
Consider a given grammar
S->aAb
A->cd | c derive string acb
S
backtracking
2) Top down parsing without backtracking

Elimination of backtracking in top down parsing have several advantages:
Parsing would become more efficient and it would be possible to perform semantic action and
precise error reporting during parsing
We use left factoring to ensure that the RHS alternatives will produce a unique terminal symbol in
first position
Consider the grammar
E-> T+ E | T
T-> V*T | V
V-> Id
Perform left factoring on given grammar
Now grammar will
E->TE
E->+E|
T->VT
T->*T|
V->Id
Now parsing of the string <id>+<id>*<id>
Sr No.
CSF
symbol
prediction
<id>
E->TE
TE
<id>
T->VT
VTE
<id>
V-><id>
<id>TE
T->
<id>E
E->+E
<id>+E
<id>
E->TE
<id>+TE
<id>
T->VT
<id>+VTE
<id>
V-><id>
<id>+<id> TE
T->*T
TEJAS PATEL
4
Page

Parsing

10
<id>+<id> *TE
<id>
T->VT
11
<id>+<id>*V TE
<id>
V-><id>
12
<id>+<id>*<id>TE
T->
13
<id>+<id>*<id>E
E->
14
<id>+<id>*<id>
3) Recursive decent parser

A top down parser that executes a set of recursive procedures to process the input
without backtracking is called recursive-decent parser, and parsing is called recursive
decent parsing
Ex:
S->E
E->VE
E->+VE |
V->Id
Recursive decent method given below for above grammar
S()
{
E();
}
E()
{
V();
E();
}
E()
{
If(next symbol==+)
{
V();
E();
}
}
V()
{
If(next symbol==Id)
{
Return;
}
Else
{
Print(error);
}
}
TEJAS PATEL
5
Page

Parsing

4) LL(1) parser OR Describe working of LL(1) parser and parse the given string
An LL(1) parser is a table driven parser for left-to-left parsing.
The' 1' in LL(1) indicates that the grammar uses a look-ahead of one source symbol-that is, the
prediction to be made is determined by the next source symbol.
A major advantage of LL(1) parsing is its amenability to automatic construction by a parser

generator.
Consider the grammar is given below:

E ::= TE
E ::= +TE |
T ::= FT
T ::= *FT |
F ::= (E)|<id>
FIRST and FOLLOW for each NT
FIRST
FOLLOW
{(,id}
{$,)}
{+,}
{$,)}
{(,id}
{+,$,)}
{*,}
{+,$,)}
{(,id}
{+,*,$,)}
Predictive parsing table

Source Symbol
Non- terminal
<id>
E =>TE
E
T =>FT
E =>
E =>
T =>
T =>
T=>FT
T =>
-|
E=>TE
E => +TE
T =>* FT
F =><id>
F=>(E)
A parsing table entry PT (nti, t j) indicates what prediction should be made if ntiis the leftmost
NT in a sentential form and tjis the next source symbol.
A blank entry in PT indicates an error situation.
A source string is assumed to be enclosed between the symbols ' |-' and ' -|'.
Hence the parser starts with the sentential form |- E -|.
The
sequence
of
predictions
made
by
the
parser
for
the
source
string
<id>*<id>*<id>+<id>-| can be given as follows:

Current sentential form
Symbol
Prediction
|- E -|
<id>
E => TE
|- TE -|
<id>
T => FT
|- FTE -|
<id>
F =><id>
T => *FT
|- <id>TE -|
|-
TEJAS PATEL
6
Page

Parsing

|- <id>*FTE -|
<id>
F =><id>
T => *FT
<id>
F =><id>
|- <id>*<id>TE -|
|- <id>*<id>*FTE -|
Q.5
|- <id>*<id>*<id>TE -|
T =>
|- <id>*<id>*<id>E -|
E => +TE
|- <id>*<id>*<id>+TE -|
<id>
T => FT
|- <id>*<id>*<id>+FTE -|
<id>
F =><id>
|- <id>*<id>*<id>+<id>TE -|
-|
T =>
|- <id>*<id>*<id>+<id>E -|
-|
E =>
|- <id>*<id>*<id>+<id> -|
Define the following terms:

1) Simple precedence:a grammar symbol a precedes symbol b, where each of a, b is a T or NT
of G, if in a sentential form ab, a should be reduced prior to b in a bottom up parsing.
2) Simple precedence grammar: a grammar G is a simple precedence grammar if for all
terminal and nonterminal symbol a, b of G, a unique precedence relation exist for a, b.
3) Simple phrase: is a simple phrase of the sentential form . if there exist a production of
the grammar A::= and ->A is a reduction in the sequence of reduction. ->->..S.
4) Handle: a handle of a sentential form is the leftmost simple phrase in it.
5) Handle pruning: the process of discovering a handle and reducing it to appropriate LHS NT is
known as handle pruning.
Q.6
Bottom up parsing methods

1) Nave bottom up parsing algorithm
1. SSM := 1; n := 0;
2. r := n;
3. Compare the string of r symbols to the left of SSM with all RHS alternatives in G which have
length of r symbols.
4. If a match is found with a production A ::=, then
reduce the string of r symbols to NT A;
n := n r + 1;
Goto step 2;
5. r:=r-1;
if (r > 0), then goto step 3;
6. If no more symbols exist to the right of SSM then
If current string form = S then
exit with success;
else report error and exit with failure;
7.
SSM := SSM + 1;
n := n + 1;
goto step 2;
2) Operator Precedence Parsing
TEJAS PATEL
7
Page

Parsing
What is operator precedence parsing? Show operator precedence matrix for following
operators :+,-,*,(,). Parse following string: |-<id> + <id> * <id>-|(GTU Dec_11,Jan_13)
Operator precedence parsing is based on bottom-up parsing techniques and uses a precedence
table to determine the next action.
The table is easy to construct and is typically hand-coded.

This method is ideal for applications that require a parser for expressions and where embedding
compiler technology.
Disadvantages
1.
It cannot handle the unary minus (the lexical analyzer should handle
the unary
minus).
2.
Small class of grammars.
3.
Difficult to decide which language is recognized by the grammar.
Advantages
1.
simple
2.
powerful enough for expressions in programming languages
Operator Precedence Matrix for the operators +, -, *, /,id,(, ) is given as follows:
LHS
oper ator s
RHS operators
+
<
<
<
<
<
<
<
<
<
>
<
<
<
<
id
<
|-
<
>
>
>
>
>
>
.
>
.
>
>
>
>
>
>
<
>
<
>
>
>
>
<
>
<
>
>
>
<
>
<
Now consider the grammar E->E+E | E*E | id and string is id+id*id
id
-|
<
<
<
<
<
<
>
<
<
<
>
>
>
>
<
<
>
>
>
>
>
>
=
.
>
.
>
>
<
>
>
TEJAS PATEL
8
Page

Parsing

We will follow following steps to parse the given string: 1.
.
Scan the input string until first > is encountered 2.

.
Scan backward until < is encountered

3.
The handle is string between < and >

.
|- < Id > + < Id > * < Id > -|
..
Handle id is obtained between < >

Reduce this by E->id
..
E+ < Id > * < Id > -|

..
E+ E * < Id > -|

E+E*E
Remove all non terminal
+*
Insert |- and -|
|- +* -|
.
Place relation between operators
..
|- < +< * >-|
The * operator is surrounded by < >
indicates * becomes handle we have to re

E*E
.
|- < + >-|
+ becomes handle. Hence reduce E+E
|- -|
Parsing Done
Operator precedence parsing (Stack base (Algorithm))

Operator precedence parsing Algorithm:
Here, Consider parsing of the string

|- <id>a + <id> b * <id>c -|
according to grammar , where <id >a represents a.
Figure below shows steps in its parsing.
Figures (a)-(c) show the stack and the AST when current operator is '+', '*' and '-|'
respectively.
In Fig. (c), TOS operator .> current operator.
This leads to reduction of '*'. Figure (d) shows the situation after the reduction.
The new TOS operator, i.e. '+', .> current operator
This leads to reduction of '+' as shown in Fig. (e).
TEJAS PATEL
9
Page

Parsing

Current Operator
+
(a)
Stack
SB,TOS
AST
a
||-
(b)
SB
TOS
b
|-
(c)
-|
a
b
c
SB
TOS
*
|-
-|
(d)
SB
TOS
*
b
(e)
-|
SB,TOS
c
+
|a
c
Q.7
Explain Shift Reduce parser

Shift reduce parser attempts to construct parse tree from leaves to root.
Thus it works on the same principle of bottom up parser.
A shift reduce parser requires following data structures
1) Input buffer 2) Stack
The parser performs following basic operation
1) Shift
2) Reduce
3) Accept
4) Error
Ex: consider the grammar E->E-E | E*E | id perform shift reduce parsing for string id-id*id
Stack
Input buffer
Action
Id-id*id$
Shift
$id
-id*id$
Reduce E->id
$E
-id*id$
shift
$E-
id*id$
shift
$E- id
*id$
Reduce E->id
$E-E
*id$
shift
$E-E*
id$
shift
$E-E*id
Reduce E->id
$E-E*E
Reduce E->E*E
TEJAS PATEL
10
$E-E
Reduce E->E-E
$E
Accept
Page

Q.8
Parsing
Compare top down and bottom up parser.

Top down parser
A parser is top-down if it discovers a parse tree top to bottom.
A top-down parse corresponds to a preorder traversal of the parse tree.
A leftmost derivation is applied at each derivation step.
Top-down parsers come in two forms
Recursive-Descent Parsing
Backtracking is needed (If a choice of a production rule does not work, we backtrack to
try other alternatives.)
It is a general parsing technique, but not widely used.
Not efficient
Predictive Parsing
Predict the production rule to be applied using lookahead tokens.
no backtracking
efficient
Needs a special form of grammars (LL (1) grammars).
Recursive Predictive Parsing is a special form of Recursive Descent parsing without

backtracking.
Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.
Bottom up parser
Bottom-up parsers build parse trees from the leaves and work up to the root.
Bottom-up syntax analysis known as shift-reduce parsing.
An easy-to-implement shift-reduce parser is called operator precedence parsing.

Bottom up parser having two techniques
Shift-reduce parsing
Shift input symbols until a handle is found. Then, reduce the substring to the nonterminal on the lhs of the corresponding production.
Operator-precedence parsing
Based on shift-reduce parsing.
Identifies handles based on precedence rules.
General method of shift-reduce parsing is called LR parsing.

Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the
leaves (the bottom) and working up towards the root (the top).
At each reduction step a particular substring matching the right side of a production is replaced
by the symbol on the left of that production, and if the substring is chosen correctly at each
step, a rightmost derivation is traced out in reverse.
Q.9
Regular expression and DFA for declaring a variable in c language.

Regular Expression to declare variable in c language
integer
[+|-](d)+
TEJAS PATEL
11
Page

Parsing

real number
[+|-](d)+.(d)+
real number with

optional fraction
[+|-](d)+.(d)*
identifier
l( l|d)*
DFA for declaring a variable in c

Figure shows a DFA for recognizing identifiers, unsigned integers and unsigned real numbers
with fractions. The DFA has 3 final states Id,Int and Real corresponding to identifier, unsigned
integer and unsigned real respectively. Note that a string like '25.' is invalid because it leaves
the DFA in state S2 which is not a final state.
State
Next Symbol
Start
Id
Int
Id
Id
Id
Int
Int
S2
Real
Real
Real
S2
Figure: DFS for integers, real numbers and identifiers
Q.10
Write algorithm for operator precedence parsing.

Data Structure:Stack: each stack entry is a record with two fields, operator and operand_pointer
Node: a node is a record with three fieldssymbol, left_pointer, and right_pointer.
Functions:Newnode(operator,
l_operatorand_pointer,
r_operand_pointer)
creates
node
with
appropriate
Pointer fields and returns a pointer to the node.
1. TOS:= SB-1; SSM=0;
2. Push |- on the stack.
3. Ssm=ssm+1;
4. x:=newnode(source symbol, null, null)
TOS.operand_pointer:=x;
TEJAS PATEL
12
Page

Parsing
Go to step 3;
5. while TOS operator .> current operator,
x:=newnode(TOS operator, TOSM.operand_pointer, TOS.operand_pointer)
pop an entry of the stack;
TOS.operand_pointer:=x;
6. If TOS operator <. current operator, then
Push the current operator on the stack.
Go to step 3;
7. while TOS operator .= current operator, then
if TOS operator = |-- then exit successfully
if TOS operator =(, then
temp:=TOS.operand_pointer;
pop an entry off the stack
TOS.operand_pointer:=temp;
Go to step 3;
8. if no precedence define between TOS operator and current operator the report error and exit
unsuccessfully.
Q.11
Write complete grammar for an arithmetic expression containing operators +,*,$

using recursive specification and backusNaur Form (BNF) where $ is exponential
operator.
<exp> ::= <exp> + <term> | <term>
<term>::= <term> + <factor> | <factor>
<factor> ::= <factor> $ <primary> | <primary>
<primary>::= <id> | <constant> | (<exp>)
<id>::= <letter>|<id>[<letter>|<digit>]
<const>::=[+/-]<digit> | <const><digit>
<letter>::= a|b|c||z
<digit>::= 0 | 1| 2| 3| 4| 5| 6| 7| 8| 9
TEJAS PATEL
Page
13

10.question Bank With Answers

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

10.question Bank With Answers

Enviado por

Direitos autorais:

Formatos disponíveis

SAFFRONY INSTITUTE OF TECHNOLOGY

2150708 System Programming (SP)

Describe elements of Assembly language.

MOVER is a mnemonic opcode for the operation to be performed.

AREG is a register operand in a symbolic form.

X is a memory operand in a symbolic form.

Let us consider another instruction for data declaration X

DS(Declare storage) reserves area of memory.

It reserves a memory area of 1 word and associates the name X with it

Explain types of Assembly statement

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

Address of B is assigned to A in symbol table.

Explain assembly scheme.

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

We must have following information to synthesize the machine instruction corresponding

Address of name ONE

Machine operation code corresponding to mnemonics MOVER.

Explain single pass and two pass assembler.

Write difference between one pass and two pass assembler.

Pass structure of assembler.

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

Explain Data structures of assembler pass I

Describe following data structures:OPTAB, SYMTAB, LITTAB& POOLTAB.

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

Detail design of two pass assembler.

POOLTAB[ pooltab_ptr]= littab_ptr

If a START or ORIGIN statement then

d) If an EQU statement then

this_address=value specified in <address spec>;

Correct the symtab entry for this_label to (this_label, this_address);

Code= code of the declaration statement

Size= size of memory area required by DC/DS

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

If an imperative statement then

Code= machine opcode from OPTAB

loc_cntr=loc_cntr+instruction length from OPTAB;

if operand is a literal then

3) (processing END statement)

Intermediate code forms:

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

Intermediate code for Imperative statement

Prepared By: TEJAS PATEL

SAFFRONY INSTITUTE OF TECHNOLOGY

Comparison of the variants

Extra work in pass I

Extra work in pass II

Simplifies tasks in pass II

Simplifies tasks in pass I

Occupies more memory then pass II

Memory utilization of two passes get

Size= size of memory area required for literals

If a START or ORIGIN statement

Loc_cntr=value specified in operand field;

If a DC statement then assemble the constatnt in machine_code_buffer;

Size= size of memory area required by DC/DS;

the postfix notation of x=-ab + -ab will be

Consider the input statement x:= -ab + -ab