LIACS Leiden University Leiden, Fall 2011 Why Intermediate Code? Front-end: generates intermediate representation Back-end: generates target code
Why Intermediate code? Retargeting is facilitated A machine-independent code optimizer can be applied Intermediate Code Parser Static Checker Intermediate Code Generator Code Generator Intermediate Languages Syntax trees, Postfix notation, Three-address code (x := y op z) Example: a := b * -c + b * -c
assign + * a b uniminus c * b uniminus c Syntax tree: assign + * a b uniminus c DAG: Postfix notation: a b c uniminus * b c uniminus * + assign Three-Address Code Sequence of statements: x := y op z x + y * z => t1 := y * z t2 := x + t1
Example: a := b * -c + b * -c
t1 := - c t2 := b *t1 t3 := - c t4 := b * t3 t5 := t2 + t4 a := t5 Linearized representation of a syntax tree t1 := - c t2 := b *t1 t5 := t2 + t2 a := t5 Linearized representation of a syntax DAG Types of Three-Address Statements Assignment statements: x := y op z and x := op y Indexed assignments: x := y[ i ] and x[ i ] := y Pointer assignments: x := & y, x := * y, * x := y Copy statements: x := y Unconditional jumps: goto L Conditional jumps: if x relop y goto L Function calls: param x, call p, n, and return y
p(x1,..., xn) => param x1 . . . param xn call p, n Syntax-directed Translation into Three-Address Code Temporary names are created E E1 + E2 => t := E1 + E2, e.g. t5 := t2 + t4 a := t5 When expression is an identifier no new temporary Nonterminal E has two attributes: E.place the name which holds the value of E E.code three-address code sequence Syntax-directed Translation Production Semantic Rule S id := E S.code := E.code || gen( id.place := E.place ) E E 1 + E2 E.place := newtemp(); E.code := E1.code || E2.code || gen( E.place := E1.place + E2.place ) E E1 * E2 E.place := newtemp(); E.code := E1.code || E2.code || gen( E.place := E1.place * E2 .place ) E - E1 E.place := newtemp(); E.code := E1.code || gen( E.place := uminus E1.place ) E ( E1 ) E.place := E1.place E.code := E.code E id E.place := id.place E .code := Function newtemp() returns distinct names (t1, t2) gen(x := y op z) => x := y op z To produce three-address code for assignments Flow-of-control Statements Production Semantic Rule S while E do S1 S.begin := newlabel() S.after := newlabel() S.code := gen( S.begin : ) || E.code || gen ( if E.place = 0 goto S.after ) || S1.code || gen ( goto S.begin) || gen( S.after : ) While statements E.code ... if E.place = 0 goto S.after S1.code ... goto S.begin ... S.begin : S.after : Flow-of-control Statements Example i := 2 * n + k while
i := 2 * n + k while i do i := i - k Source program fragment t1 := 2
L1: if i = 0 goto L2
L2: t1 := 2 t2 := t1 * n t3 := t2 + k i := t3 L1: if i = 0 goto L2 t4 := i - k i := t4 goto L1 L2: Three-address code sequence Implementation (1) Quadruple is record: op, arg 1, arg 2, result
Quadruples, triples, and idirect triples op arg 1 arg 2 result (0) uniminus c t1 (1) * b t1 t2 (2) uniminus c t3 (3) * b t3 t4 (4) + t2 t4 t5 (5) := t5 a Pro: easy to rearrange code for global optimization Cons: lots of temporaries t1 := - c t2 := b *t1 t3 := t4 := b * t3 t5 := t2 + t4 a := t5 t1 := - c t2 := b *t1 t3 := - c t4 := b * t3 t5 := t2 + t4 a := t5 a := b * -c + b * -c Three-address code Quads (quadruples) Implementation (2) Triple is record: op, arg 1, arg 2
Quadruples, triples, and idirect triples op arg 1 arg 2 (0) uniminus c (1) * b (0) (2) uniminus c (3) * b (2) (4) + (1) (3) (5) := a (4) Pro: temporaries are implicit Cons: difficult to rearrange code t1 := - c t2 := b *t1 t3 := t4 := b * t3 t5 := t2 + t4 a := t5 t1 := - c t2 := b *t1 t3 := - c t4 := b * t3 t5 := t2 + t4 a := t5 a := b * -c + b * -c Three-address code Triples Implementation (3) Implicit Triples: pointers to triples Quadruples, triples, and idirect triples op arg 1 arg 2 (14) uniminus c (15) * b (14) (16) uniminus c (17) * b (16) (18) + (15) (17) (19) := a (18) Pro: temporaries are implicit and it is easier to rearrange code t1 := - c t2 := b *t1 t3 := t4 := b * t3 t5 := t2 + t4 a := t5 t1 := - c t2 := b *t1 t3 := - c t4 := b * t3 t5 := t2 + t4 a := t5 a := b * -c + b * -c Three-address code Implicit Triples Triple Container statement (0) (14) (1) (15) (2) (16) (3) (17) (4) (18) (5) (19) Names and Scope The three-address code is simplistic it assumes that the names of variables can be easily resolved by the back end in global or local variables
We need local symbol tables to record global and local declarations in procedures, blocks, and structs to resolve names Declarations in a Procedure P M D M { offset := 0 } D D ; D D id : T { enter( id.name, T.type, offset); offset := offset + T.width } T integer { T.type := integer; T.width := 4 } T real { T.type := real; T.width := 8 } T array [ num ] of T1 { T.type := array( num.val, T1.type ); T.width := num.val x T1.width } T ^ T1 { T.type := pointer( T1.type ); T.width := 4 } Keeping Track of Scope Information Nested procedures P D D D ; D | id : T | proc id ; D ; S
D proc id ; D1 ; S: then a new symbol table is created The new table points to the enclosing procedures table
Procedure enter() knows in which table to make an entry Symbol Table Operations mktable( previous ): returns a pointer to a new table which is linked to a previous table in the outer scope
enter( table, name, type, offset ): creates a new entry in table
addwidth( table, width ): accumulates the total width of all entries in table
enterproc( table, name, newtable ): creates a new entry in table for procedure with local scope newtable Processing Declarations Nested Procedures P M D { addwidth( top( tblptr ), top( offset ) ); pop( tblptr ); pop( offset ) } M { t := mktable( nil ); push( t, tblptr ); push( 0, offset ) } D D1 ; D2 D proc id ; N D1 ; S
{ t := top( tblptr ); addwidth( t, top( offset ) ); pop( tblptr ); pop( offset ); enterproc( top( tblptr ), id.name, t ) } D id : T { enter( top ( tblptr ), id.name, T.type, top( offset ) ); top( offset ) := top( offset ) + T.width } N { t := mktable( top( tblptr ) ); push( t, tblptr ); push( 0, offset ) } Assignment Statements S id := E { p := lookup( id.name ); if p <> nil then emit( p := E.place ) else error } E E1 + E2 { E.place := newtemp; emit( E.place := E1.place + E2.place ) } E E1 * E2 { E.place := newtemp; emit( E.place := E1.place * E2.place ) } E - E1
{ E.place := newtemp; emit( E.place := uniminus E1.place )} E ( E1 ) { E.place := E1.place } E id { p := lookup( id.name ); if p <> nil then E.place := p else error } A Translation Scheme Reusing Temporary Names Each time newtemp() generates a new temporary name Can we reuse temporaries? E E1 + E2 results in:
Modify newtemp() to use a stack : Keep a counter c, initialized to 0 newtemp() increments c and returns temporary $c Decrement counter on each use of a $i in a three-address statement evaluate E1 into t1 evaluate E2 into t2 t3 := t1 + t2 evaluate E1 into t1 evaluate E2 into t2 t3 := t1 + t2 If t1 is no longer used, can reuse t1 instead of using new temp t3 Reusing Temporary Names Statement Value of c 0 $0 := a * b 1 $1 := c * d 2 $0 := $0 + $1 1 $1 := e * f 2 $0 := $0 - $1 1 X := $0 0 Example: x := a * b + c * d e * f Addressing Array Elements A : array [low .. high] of type;
i th element of A begins in location base + (i low) x W base = A[low], W = sizeof( type )
Compile time evaluation if rewritten as i x W + ( base low x W )
c = base low x W is stored in symbol-table entry for A A[i] = i x W + c
One-Dimensional Arrays t1 := c t2 := i * W t3 := t1[t2] := t3 t1 := c t2 := i * W t3 := t1[t2] := t3 Addressing Array Elements In Row-major form A[i,j] is equivalent to A[i][j]
Two-Dimensional Arrays... (1) A[1,1] A[1,2] A[1,3] A[2,1] A[2,2] A[2,3] A[1,1] A[2,1] A[1,2] A[2,2] A[1,3] A[2,3] Row-major Column-major First Column First Row A : array [1..2, 1..3] of integer; Addressing Array Elements A[i,j]= base+((ilow i )x N j +j-low j )x W N j = high j -low j +1
((i x Nj) + j) x W + c, where c = base - ((low i x N j ) + low j ) x W
Two-Dimensional Arrays... (2) A : array [1..2, 1..3] of integer; // Row-major form t1 := i * 3 t1 := t1 + j t2 := c t3 := t1 * 4 t4 := t2[t3] := t4 t1 := i * 3 t1 := t1 + j t2 := c t3 := t1 * 4 t4 := t2[t3] := t4 Addressing Array Elements Grammar Synthesized Attributes:
E.place name of temp holding value of E Elist.array array name Elist.place name of temp holding index value Elist.ndim number of array dimensions L.place lvalue (=name of temp) L.offset index into array (=name of temp) null indicates non-array simple id S L := E E
L
Elist
S L := E E E + E | ( E ) | L L Elist ] | id Elist Elist , E | id [ E Addressing Array Elements A Translation Scheme ... (1) S L := E { if L.offset = null then /* L is a simple id */ emit( L.place := E.place ) else emit( L.place [ L.offset ] := E.place ) } E E1 + E2 { E.place := newtemp; emit( E.place := E1.place + E2.place ) } E ( E1 ) { E.place := E1.place } E L
{ if L.offset = null then /* L is a simple id */ E.place := L.place else begin E.place := newtemp; emit( E.place :=L.place [ L.offset ] ) end } Addressing Array Elements A Translation Scheme... (2) L Elist ] { L.place := newtemp; L.offset := newtemp emit( L.place := c( Elist.array ) ) emit( L.offset := Elist.place * width( Elist.array )) } L id { L.place := id.place; L.offset := null } Elist Elist 1, E { t := newtemp; m := Elist 1.ndim +1; emit( t := Elist 1.place * limit( Elist 1.array, m )); emit( t := t + E.place ); Elist.array := Elist 1.array; Elist.place := t; Elist.ndim := m } Elist id [ E
Applied to boolean variables or relational expressions
Relational expression: E1 relop E2, where E1 and E2 are arithmetic expressions
The grammar:
Attribute op: <, , =, , >, represented by relop
E E or E | E and E | not E | ( E ) | id relop id | true | false E E or E | E and E | not E | ( E ) | id relop id | true | false Translating Boolean Expressions True = 1, false = 0
Example: a or b and not c
a < b: if a<b then 1 else 0
Numerical representation t1 := not c t2 := b and t1 t3 := a or t2 t1 := not c t2 := b and t1 t3 := a or t2 100: if a < b goto 103 101: t := 0 102: goto 104 103: t := 1 104: 100: if a < b goto 103 101: t := 0 102: goto 104 103: t := 1 104: Translating Boolean Expressions A Translation Scheme E E1 or E2 { E.place := newtemp; emit( E.place := E1.place or E2.place ) } E E1 and E2 { E.place := newtemp; emit( E.place := E1.place and E2.place ) } E not E1 { E.place := newtemp; emit( E.place := not E1.place ) } E ( E1 ) { E.place := E1.place } E id1 relop id2 { E.place := newtemp; emit( if id1.place relop.op id2.place goto nextstat + 3 ); emit( E.place := 0 ); emit( goto nextstat + 2 ); emit( E.place := 1 ) } E true { E.place := newtemp; emit( E.place := 1 ) } E false { E.place := newtemp; emit( E.place := 0 ) } Backpatching Code generation problem: Labels (addresses) that control must go to may not be known at the time the jump statements are generated
A solution - backpatching: Generate branching statements with empty jumps Put the statements in a goto list Fill the labels of these statements when the proper label is determined Backpatching Quadruples in a quadruple array Labels are indexes into the array
Manipulate the list of lables: makelist( i ) creates a new list containing three-address location i, returns a pointer to the list
merge( p1 , p2 ) concatenates lists pointed to by p1 and p2, returns a pointer to the concatenates list
backpatch( p, i ) inserts i as the target label for each of the statements in the list pointed to by p Backpatching Grammar for boolean expressions Synthesized Attributes:
E.code three-address code E.truelist backpatch list for jumps on true E.falselist backpatch list for jumps on false M.quad location of current three-address quad E E1 or M E2 M E2
id1 relop id2
M E E1 or M E2 | E1 and M E2 | not E1 | ( E1 ) | id1 relop id2 | true | false M Backpatching A Translation Scheme... (1) E E1 or M E2 { backpatch( E1.falselist, M.quad ); E.truelist := merge( E1.truelist, E2.truelist ); E.falselist := E2.falselist }
E E1 and M E2 { backpatch( E1.truelist, M.quad ); E.truelist := E2.truelist; E.falselist := merge( E1.falselist, E2.falselist ) }
E not E1 { E.truelist := E1.falselist; E.falselist := E1.truelist }
E ( E1 ) { E.truelist := E1.truelist; E.falselist := E1.falselist } Backpatching A Translation Scheme... (2) E id1 relop id2 { E.truelist := makelist( nextquad ); E.falselist := makelist( nextquad + 1 ); emit( if id1.place relop.op id2.place goto_ ); emit( goto_ ) } E true { E.truelist := makelist( nextquad ); emit( goto_ ) } E false { E.falselist := makelist( nextquad ); emit( goto_ ) } M { M.quad := nextquad } Backpatching Example: a<b or c<d and e<f E id1 relop id2 E E1 or M E2 (M.quad := 102)
E E1 and M E2 (M.quad := 104)
backpatch( {102}, 104 ) backpatch( {101}, 102 )
100: if a < b goto_ 101: goto_ 102: if c < d 103: goto_ 104: if e < f 105: goto_ 100: if a < b goto_ 101: goto_ 102: if c < d goto_ 103: goto_ 104: if e < f goto_ 105: goto_ 100: if a < b goto TRUE 101: goto 102 102: if c < d 103: goto FALSE 104: if e < f 105: goto FALSE 100: if a < b goto TRUE 101: goto 102 102: if c < d goto 104 103: goto FALSE 104: if e < f goto TRUE 105: goto FALSE backpatch( {102}, 104 ) backpatch( {101}, 102 ) Flow-of-control Statements Grammar Synthesized Attributes:
E.code three-address code E.truelist backpatch list for jumps on true E.falselist backpatch list for jumps on false M.quad location of current three-address quad S.nextlist backpatch list for jumps to the next statement after S (or nil) L.nextlist backpatch list for jumps to the next statement after L (or nil)
S if E then S
L
S if E then S | if E then S else S | while E do S | begin L end | A L L ; S | S Flow-of-Control Statements A Translation Scheme using backpatching... (1) S if E then M1 S1 N else M2 S2 { backpatch( E.truelist, M1.quad ); backpatch( E.falselist, M2.quad ); S.nextlist := merge(S1.nextlist, merge(N.nextlist,S2.nextlist) } N { N.nextlist := makelist( nextquad ); emit( goto_) } M { M.quad := nextquad } S if E then M S1 { backpatch( E.truelist, M.quad ); S.nextlist := merge( E.falselist, S1.nextlist) } S wihle M1 E do M2 S1 { backpatch( S1.nextlist, M1.quad ); backpatch( E.truelist, M2.quad ); S.nextlist := E.falselist; emit( goto_ M1.quad ) } Flow-of-Control Statements A Translation Scheme using backpatching... (2) S begin L end { S.nextlist := L.nextlist } S A { S.nextlist := nil } L L1 ; M S { backpatch( L1.nextlist, M.quad ); L.nextlist := S.nextlist } L S { L.nextlist := S.nextlist } Intermediate Code Generation