Você está na página 1de 42

Intermediate Code Generation

Rupesh Nasre.

CS3300 Compiler Design


IIT Madras
Aug 2015
Character stream

Machine-Independent
Machine-Independent
Lexical
LexicalAnalyzer
Analyzer Code
CodeOptimizer
Optimizer

Intermediate representation

Backend
Token stream
Frontend

Syntax
SyntaxAnalyzer
Analyzer Code
CodeGenerator
Generator

Syntax tree Target machine code

Machine-Dependent
Machine-Dependent
Semantic
SemanticAnalyzer
Analyzer Code
CodeOptimizer
Optimizer

Syntax tree Target machine code

Intermediate
Intermediate Symbol
Code
CodeGenerator
Generator Table
2
Intermediate representation
Role of IR Generator
● To act as a glue between front-end and
backend (or source and machine codes).
● To lower abstraction from source level.
– To make life simple.
● To maintain some high-level information.
– To keep life interesting.
● Complete some syntactic checks, perform more
semantic checks.
– e.g. break should be inside loop or switch only.
3
Representations
● Syntax Trees
++
– Maintains structure of the construct
** 44
– Suitable for high-level representations
33 55
● Three-Address Code
– Maximum three addresses
t1
t1==33**55
in an instruction t2
t2==t1
t1++44
3AC
– Suitable for both high and
low-level representations mult
mult3,
3,55 2AC
add
add44
● Two-Address Code
● … push
push33 1AC
push
push55 or
– e.g. C mult
mult stack
add
add44 machine
Syntax Trees and DAGs
a + a * (b – c) + (b – c) * d
++ ++
++ ** ++ ** AAsmall
smallproblem:
problem:
subgraph
subgraphisomorphism
isomorphism
aa ** -- dd ** dd isisNP-complete.
NP-complete.
aa -- bb cc aa --
bb cc bb cc

Production Semantic Rules


E→E+T $$.node = new Op($1.node, '+', $3.node)
E→E-T $$.node = new Op($1.node, '-', $3.node)
E→T $$.node = $1.node
T→(E) $$.node = $2.node
T → id $$.node = new Leaf($1)
T → num $$.node = new Leaf($1)
Value Numbering
a + a * (b – c) + (b – c) * d
++ ++ 9
++ ** ++ 6 8 ** AAsmall
smallproblem:
problem:
subgraph
subgraphisomorphism
isomorphism
aa ** -- dd ** 5 7 dd isisNP-complete.
NP-complete.
aa -- bb cc 1 aa -- 4
bb cc bb cc
2 3
● Uniquely identifies a node in the DAG.
● A node with value number V contains children of numbers < N.
● Thus, an ordering of the DAG is possible.
● This corresponds to an evaluation order of the underlying expression.
● For inserting l op r, search for node op with children l and r.
● Classwork: Find value numbering for a + b + a + b.
Three-Address Code
● An address can be a name, constant or
temporary.
● Assignments x = y op z; x = op y.
● Copy x = y.
● Unconditional jump goto L.
● Conditional jumps if x relop y goto L.
● Parameters param x.
● Function call y = call p.
● Indexed copy x = y[i]; x[i] = y.
● Pointer assignments x = &y; x = *y; *x = y.
3AC Representations
● Triples
Instructions cannot be
● Quadruples reordered.
Instructions can be
reordered.

Assignment statement: a = b * - c + b * - c;

op arg1 arg2 result op arg1 arg2


t1 = minus c minus c t1 0 minus c
t2 = b * t1 * b t1 t2 1 * b (0)
t3 = minus c minus c t3 2 minus c
t4 = b * t3 * b t3 t4 3 * b (2)
t5 = t2 + t4 + t2 t4 t5 4 + (1) (3)
a = t5 = t5 a 5 = (4)
3AC Representations
● Triples
Instructions cannot be
● Quadruples reordered.

Assignment statement: a = b * - c + b * - c;

op arg1 arg2
(0) (2) 0 minus c
(1) (3) 1 * b (0)
(2) (0) 2 minus c
(3) (1) 3 * b (2)
(4) (4) 4 + (1) (3)
(5) (5) 5 = (4)

Indirect triples can be reordered


SSA
● Classwork: Allocate registers to variables.
● Some observations
pp==aa++bb
– Definition of a variable kills qq==pp––cc
its previous definition. pp==qq**dd
pp==ee––pp
– A variable's use refers to qq==pp++qq
its most recent definition.
– A variable holds a register for a r1 r1
a long time, if it is live longer. b r2 r2
p r3 r1
pp1 ==aa++bb
1 c r4 r2
qq1 ==pp1 ––cc q r5 r1
1 1
pp2 ==qq1 **dd d r6 r2
2 1
pp3 ==ee––pp2 e r7 r3
3 2
qq2 ==pp3 ++qq1
2 3 1
SSA
● Static Single Assignment
– Each definition refers to a different variable (instance)

ifif(flag)
(flag)
ifif(flag)
(flag) xx1 ==-1;
-1;
1
xx==-1;
-1; else
else flag
else
else xx2 ==1; flag
1;
xx==1;
1; 2
xx3 ==Φ(xΦ(x11,,xx22))
yy==xx**a; a; 3
yy==xx3 **a; a;
3
xx==-1
-1 xx==11

yy==xx**aa
Language Constructs
● Declarations
– Types (int, int [], struct, int *)
– Storage qualifiers (array expressions, const, static)
● Assignments
● Conditionals, switch
● Loops
● Function calls, definitions
SDT Applications
● Finding type expressions
– int a[2][3] is array of 2 arrays of 3 integers.
– in functional style: array(2, array(3, int))
Production Semantic Rules
array
array T → B id C T.t = C.t
C.i = B.t
B → int B.t = int
22 array
array
B → float B.t = float
C → [ num ] C1 C.t = array(num, C1.t)
33 int
int C1.i = C.i
C→ε C.t = C.i

Classwork:
Classwork:Write
Writeproductions
productionsand
andsemantic
semanticrules
rulesfor
forcomputing
computing
types
typesand
andfinding
findingtheir
theirwidths
widthsininbytes.
bytes. 13
SDT Applications
● Finding type expressions
– int a[2][3] is array of 2 arrays of 3 integers.
– in functional style: array(2, array(3, int))
Production Semantic Rules
array
array T → B id C T.t = C.t; C.iw = B.sw;
C.i = B.t; T.sw = C.sw;
B → int B.t = int; B.sw = 4;
22 array
array
B → float B.t = float; B.sw = 8;
C → [ num ] C1 C.t = array(num, C1.t);
33 int
int C1.i = C.i; C.sw = C1.sw * num.value;
C→ε C.t = C.i; C.sw = C.iw;

Classwork:
Classwork:Write
Writeproductions
productionsand
andsemantic
semanticrules
rulesfor
forcomputing
computing
types
typesand
andfinding
findingtheir
theirwidths
widthsininbytes.
bytes. 14
Type Equivalence
● Since type expressions are possible, we need
to talk about their equivalence.
● Let's first structurally represent them.
● Question: Do type expressions form a DAG?
Can they contain cycles?
array
array

x y
22 array
array data
data
next
next
int
int // float
float
33 int
int
union { struct node {
int x; int data;
int a[2][3] float y; struct node *next; 15
}; };
Type Equivalence
● Two types are structurally equivalent
iff one of the following conditions is true.
1.They are the same basic type.
Name
2.They are formed by applying the same equivalence
constructor to structurally equivalent types.
3.One is a type name that denotes the other. typedef

 int a[2][3] is not equivalent to int b[3][2];


 int a is not equivalent to char b[4];
 struct {int, char} is not equivalent to struct {char, int};
16
 int * is not equivalent to void *.
Type Checking
● Type expressions are checked for
– Correct code
– Security aspects
– Efficient code generation
– …
● Compiler determines that type expressions
conform to a collection of logical rules, called as
the type system of the source language.
● Type synthesis: if f has type s → t and x has type
s, then expression f(x) has type t.
● Type inference: if f(x) is an expression, then for
some α and β, f has type α → β and x has type α. 17
Type System
● Potentially, everything can be checked
dynamically...
– if type information is carried to execution time.
● A sound type system eliminates the need for
dynamic type checking.
● A language implementation is strongly typed if a
compiler guarantees that the valid source
programs (it accepts) will run without type errors.

18
Type Conversions
● int a = 10; float b = 2 * a;
● Widening conversion are safe.
– int → long → float → double.
– Automatically done by compiler, called coercion.
● Narrowing conversions may not be safe.
– int → char.
– Usually, enforced by the programmers, called casts.
– Sometimes, deferred until runtime, dyn_cast<...>.

19
Declarations
● When declarations are together, a single offset
on the stack pointer suffices.
– int x, y, z; fun1(); fun2();
● Otherwise, the translator needs to keep track of
the current offset.
– int x; fun1(); int y, z; fun2();
● A similar concept is applicable for fields in
structs.
● Blocks and Nestings
– Need to push the current environment and pop. 20
Expressions
● We have studied expressions at length.
● To generate 3AC, we will use our grammar and
its associated SDT to generate IR.
● For instance, a = b + - c would be converted to
– t1 = minus c
– t2 = b + t1
– a = t2

21
Array Expressions
● For instance, create IR for c + a[i][j].
● This requires us to know the types of a and c.
● Say, c is an integer (4 bytes) and a is int [2][3].
● Then, the IR is

t1
t1 == ii ** 12
12 ;; 33 ** 44 bytes
bytes
t2
t2 == jj ** 44 ;; 11 ** 44 bytes
bytes
t3
t3 == t1
t1 ++ t2t2 ;; offset
offset from
from aa
t4
t4 == a[t3]
a[t3] ;; assuming
assuming base[offset]
base[offset] is
is present
present in
in IR.
IR.
t5
t5 == cc ++ t4t4
22
Array Expressions
● a[5] is a + 5 * sizeof(type)
● a[i][j] for a[3][5] is a + i * 5 * sizeof(type) + j *
sizeof(type)
● This works when arrays are zero-indexed.
● Classwork: Find array expression to be generated
for accessing a[i][j][k] when indices start with low,
and array is declared as type a[10][20][30].
● Classwork: What all computations can be
performed at compile-time?
● Classwork: What happens for malloced arrays?
23
Array Expressions
void
voidfun(int
fun(inta[][])
a[][]){{ We view an array to be a D-
a[0][0]
a[0][0]==20;
20; dimensional matrix. However, for
}} the hardware, it is simply single
void
voidmain()
main(){{ dimensional.
int
inta[5][10];
a[5][10];
fun(a);
fun(a);
printf("%d\n",
printf("%d\n",a[0][0]);
a[0][0]);
}}

ERROR: type of formal parameter 1 is incomplete


● How to optimize computation of the offset for a
long expression a[i][j][k][l] with declaration as
int a[w4][w3][w2][w1]?
– i * w3 * w2 * w1 + j * w2 * w1 + k * w1 + l 24
– Use Horner's rule: ((i * w3 + j) * w2 + k) * w1 + l
Array Expressions
● In C, C++, Java, and so ● In Fortran, we use
far, we have used row- column-major storage
major storage. format.
– All elements of a row are – each column is
stored together. stored together.

0,0 0,2
1,0 3,0

1,2

3,2 0,2 2,2 3,2

25
IR for Array Expressions
● L → id [E] | L [E]
L → id [ E ] { L.type = id.type;
L.addr = new Temp();
gen(L.addr '=' E.addr '*' L.type.width); }
L → L1 [ E ] { L.type = L1.type;
T = new Temp();
L.addr = new Temp();
gen(t '=' E.addr '*' L.type.width);
gen(L.addr '=' L1.addr '+' t); }
E → id { E.addr = id.addr; }
E→L { E.addr = new Temp();
gen(E.addr '=' L.base '[' L.addr ']'); }
E → E1 + E2 { E.addr = new Temp();
gen(E.addr '=' E1.addr + E2.addr); }
S → id = E { gen(id.name '=' E.addr); }
S→L=E { gen(L.base '[' L.addr ']' '=' E.addr); }
26
Note
Notethat
that the
thegrammar
grammarisisleft-recursive.
left-recursive.
t1
t1 == ii ** 12
12 ;; 33 ** 44 bytes
bytes
t2
t2 == jj ** 44 ;; 11 ** 44 bytes
bytes
t3
t3 == t1
t1 ++ t2t2 ;; offset
offset from
from aa
t4
t4 == a[t3]
a[t3];; assuming
assuming base[offset]
base[offset] is
is present
present in
in IR.
IR.
t5
t5 == cc ++ t4t4

L → id [ E ] { L.type = id.type;
L.addr = new Temp();
gen(L.addr '=' E.addr '*' L.type.width); }
L → L1 [ E ] { L.type = L1.type;
T = new Temp();
L.addr = new Temp();
gen(t '=' E.addr '*' L.type.width);
gen(L.addr '=' L1.addr '+' t); }
E → id { E.addr = id.addr; }
E→L { E.addr = new Temp();
gen(E.addr '=' L.base '[' L.addr ']'); }
E → E1 + E2 { E.addr = new Temp();
gen(E.addr '=' E1.addr + E2.addr); }
S → id = E { gen(id.name '=' E.addr); } 27

S→L=E { gen(L.base '[' L.addr ']' '=' E.addr); }


Control Flow
● Conditionals
– if, if-else, switch
● Loops
– for, while, do-while, repeat-until
● We need to worry about
– Boolean expressions
– Jumps (and labels)

28
Control-Flow – Boolean Expressions
● B → B || B | B && B | !B | (B) | E relop E | true | false
● relop → < | <= | > | >= | == | !=
● What is the associativity of ||?
● What is its precedence over &&?
● How do we optimize evaluation of (B1 || B2) and (B3 &&
B4)?
– Short-circuiting: if (x < 10 && y < 20) ...
– Classwork: Write a C program to find out if C uses
short-circuiting or not.
● if (p && p->next) ... 29
Control-Flow – Boolean Expressions
● Source code:
– if (x < 100 || x > 200 && x != y) x = 0;
● IR: without short-circuit with short-circuit

b1
b1 == xx << 100
100 b1
b1 == xx << 100
100
b2
b2 == xx >> 200
200 iftrue
iftrue b1
b1 goto
goto L2
L2
b3
b3 == xx !=
!= yy b2
b2 == xx >> 200
200
iftrue
iftrue b1
b1 goto
goto L2
L2 iffalse
iffalse b2
b2 goto
goto L3
L3
iffalse
iffalse b2
b2 goto
goto L3
L3 b3
b3 == xx !=
!= yy
iffalse
iffalse b3
b3 goto
goto L3
L3 iffalse
iffalse b3
b3 goto
goto L3
L3
L2:
L2: L2:
L2:
xx == 0;
0; xx == 0;
0;
L3:
L3: L3:
L3: 30
...
... ...
...
3AC for Boolean Expressions
B1.true
B1.true==B1.true;
B1.true;
B → B1 || B2
B1.false
B1.false==newLabel();
newLabel();
B2.true
B2.true==B.true;
B.true;
B2.false
B2.false==B.false;
B.false;
B.code
B.code==B1.code
B1.code++
label(B1.false)
label(B1.false)++
B2.code;
B2.code;

B → B1 && B2 B1.true
B1.true==newLabel();
newLabel();
B1.false
B1.false==B.false;
B.false;
B2.true
B2.true==B.true;
B.true;
B2.false
B2.false==B.false;
B.false;
B.coee
B.coee==B1.code
B1.code++
label(B1.true)
label(B1.true)++
B2.code;
B2.code;

31
3AC for Boolean Expressions
B1.true
B1.true==B.false;
B.false;
B → !B1
B1.false
B1.false==B.true;
B.true;
B.code
B.code==B1.code;
B1.code;

B → E1 relop E2 B.code
B.code==E1.code
E1.code++E2.code
E2.code++
gen('if'
gen('if'E1.addr
E1.addrrelop
relopE2.addr
E2.addr
'goto'
'goto'B.true)
B.true)++
gen('goto'
gen('goto'B.false);
B.false);

B → true B.code
B.code==gen('goto'
gen('goto'B.true);
B.true);

B → false B.code
B.code==gen('goto'
gen('goto'B.false);
B.false);

32
SDD for while
S → while ( C ) S1 L1
L1 ==newLabel();
newLabel();
L2
L2 ==newLabel();
newLabel();
SS1.next
.next ==L1;
L1;
1
C.false
C.false ==S.next;
S.next;
C.true
C.true ==L2;
L2;
S.code
S.code ==“label”
“label”++L1
L1++
C.code
C.code++
”label”
”label”++L2
L2++
SS1.code
.code++
1
gen('goto'
gen('goto'L1);
L1);

33
3AC for if / if-else
B.true
B.true==newLabel();
newLabel();
S → if (B) S1
B.false
B.false==SS11.next
.next==S.next;
S.next;
S.code
S.code==B.code
B.code++
label(B.true)
label(B.true)++
SS1.code;
.code;
1

B.true
B.true==newLabel();
newLabel();
S → if (B) S1 else S2 B.false
B.false==newLabel();
newLabel();
SS1.next
.next==SS22.next
.next==S.next;
S.next;
1
S.code
S.code==B.code
B.code++
label(B.true)
label(B.true)++SS11.code
.code++
Gen('goto'
Gen('goto'S.next)
S.next)++
label(B.false)
label(B.false)++SS22.code;
.code;

34
Control-Flow – Boolean Expressions
● Source code: if (x < 100 || x > 200 && x != y) x = 0;
without optimization with short-circuit
b1
b1 == xx << 100
100
b2
b2 == xx >> 200
200
b3
b3 == xx !=
!= yy b1
b1 == xx << 100
100
iftrue
iftrue b1
b1 goto
goto L2
L2 iftrue
iftrue b1
b1 goto
goto L2
L2
goto
goto L0L0 b2
b2 == xx >> 200
200
L0:
L0: iffalse
iffalse b2
b2 goto
goto L3
L3
iftrue
iftrue b2
b2 goto
goto L1
L1 b3
goto b3 == xx !=
!= yy
goto L3L3 iffalse
L1: iffalse b3
b3 goto
goto L3
L3
L1: L2:
iftrue
iftrue b3
b3 goto
goto L2
L2 L2:
goto xx == 0;
0;
goto L3L3
L2: L3:
L3:
L2:
xx == 0;
0; ...
...
L3:
L3: Avoids redundant gotos.
35

...
...
Homework
● Write SDD to generate 3AC for for.
– for (S1; B; S2) S3
● Write SDD to generate 3AC for repeat-until.
– repeat S until B

36
Backpatching
● if (B) S required us to pass label while
evaluating B.
– This can be done by using inherited attributes.
● Alternatively, we could leave the label
unspecified now...
– … and fill it in later.
● Backpatching is a general concept for one-pass
code generation
– Recall stack offset computation in A3.
B → true B.code
B.code==gen('goto
gen('goto–');
–');

B → B1 || B2 backpatch(B1.false);
backpatch(B1.false);
...
... 37
break and continue
● break and continue are disciplined / special gotos.
● Their IR needs
– currently enclosing loop / switch.
– goto to a label just outside / before the enclosing block.
● How to write the SDD to generate their 3AC?
– either pass on the enclosing block and label as an
inherited attribute, or
– use backpatching to fill-in the label of goto.
– Need additional restriction for continue.
● Classwork: How to support break label? 38
IR for switch
● Using nested if-else
switch(E)
switch(E) {{
● Using a table of pairs case
case VV11:: SS11
case
case VV22:: SS22
– <Vi, Si>
……
● Using a hash-table case
case VVn-1 :: SSn-1
n-1 n-1
default:
default: SSnn
– when i is large (say, > 10) }}
● Special case when Vis are
consecutive integrals.
– Indexed array is sufficient.
39
switch(E)
switch(E) {{
case
case VV11:: SS11
case
case VV22:: SS22
……
case
case VVn-1 :: SSn-1
n-1 n-1
default:
default: SSnn
}}

40
Functions
● Function definitions
– Type checking / symbol table entry
– Return type, argument types, void
– Stack offset for variables
– Stack offset for arguments
● Function calls
– Push parameters
– Switch scope / push environment
– Jump to label for the function
– Switch scope / pop environment 41
– Pop parameters
Language Constructs
● Declarations
– Types (int, int [], struct, int *)
– Storage qualifiers (array expressions, const, static)
● Assignments
● Conditionals, switch
● Loops
● Function calls, definitions

Você também pode gostar