Você está na página 1de 55

Chap.

6, Intermediate Code
Generation
J. H. Wang
Dec. 17, 2008
Outline
• Variants of Syntax Trees
• Three-Address Code
• Types and Declarations
• Translation of Expressions
• Type Checking
• Control Flow
• Backpatching
• Switch-Statements
• Intermediate Code for Procedures
Introduction
• In analysis-synthesis model of compiler
– Front end
• Details of source language
– Back end
• Details of target machine
• Intermediate code generation
– Intermediate representation
• Syntax trees
• Three-address code
– Static checking
• Type checking
• Remaining syntactic checks
Structure of Compiler Front End

source
program Intermediate
Intermediate Intermediate
Parser Static
Static Code
Code
Parser Code
Code
Checker
Checker code Generation
Generation
Generation
Generation

Front End Back End


A Sequence of Intermediate
Representations

Source High-level Low-level Target


Program Intermediate Intermediate Code
Representation Representation
• High-level representations
– Syntax trees
– Suitable for static type checking
• Low-level representations
– Three-address code
– Suitable for register allocation and instruction
selection
Variants of Syntax Trees
• Syntax tree
– Constructs in source program
• Directed acyclic graph (DAG)
– Common subexpressions
• Ex:
– a+a*(b-c)+(b-c)*d
– DAG: (Fig.6.3)
– SDD to produce syntax trees or DAG’s (Fig.6.4)
– Steps for constructing DAG (Fig. 6.5)
Value-Number Method for
Constructing DAG’s
• Nodes of syntax trees or DAG’s are stored
in an array of records
– Ex: i=i+10 (Fig. 6.6)
– The integer index of the record: value number
– Signature of an interior node: <op,l,r>
• Op: label
• L: left child’s value number
• R: right child’s value number
• Algorithm 6.3: the value-number method
for constructing DAG’s
• Input: label op, node l, node r
• Output: the value number of a node in the
array with signature <op,l,r>
• Method: search the array for a node M
with label op, left child l, and right child r
– Create a new node if not found, and return its
value number
• A more efficient approach: hash table
– Efficient for dictionaries
• Insertion, deletion, and membership of a set
• O(1)
– Hash function h
• h(op,l,r) determines the bucket
• Each bucket can be implemented as linked lists
Three-Address Code
• At most one operator in an instruction
– x+y*z
– t1=y*z
t2=x+t1
• Ex: three-address code is a linearized
representation of a syntax tree or a DAG
– (Fig. 6.8)
Addresses and Instructions
• Address
– A name: pointer to its symbol-table entry
– A constant
– A compiler-generated temporary
• Instruction
– Assignment: x=y op z (binary)
– Assignment: x = op y (unary)
– Copy instruction: x= y
– Unconditional jump: goto L
– Conditional jump: if x goto L, ifFlase x goto L
– Conditional jump: if x relop y goto L
– Procedure calls and returns:
• param x
• call p, n
• y = call p, n
• return y
– Indexed copy: x=y[i], x[i]=y
– Address and pointer assignments: x=&y,
x=*y, *x=y
– Ex. 6.5 (Fig. 6.9)
Three Representations of
Instructions
• Three representations of instructions in a
data structure
– Quadruples
– Triples
– Indirect triples
Quadruples
• Quadruple (quad): four fields
– op, arg1, arg2, result
• Exceptions:
– Unary operators: no arg2
– Param: no arg2 and result
– Conditional and unconditional jumps: put the
target label in result
• Ex. 6.6: a=b*-c+b*-c; (Fig. 6.10)
Triples
• Triple: three fields
– op, arg1, arg2
– Refer to the result by its position, not by an
explicit temporary name
• Value number
– Equivalent to signatures in Algorithm 6.3
• DAG and triple representations of expressions are
equivalent
– Ex. 6.7 (Fig. 6.11)
• Benefit of quadruples over triples
– In optimizing compilers, instructions are
moved around
• Moving instructions for quadruples does not
change other instructions
Indirect Triples
• Indirect triples: a list of pointers to triples
• (Fig. 6.12)
– An optimizing compiler can move
instructions by reordering the instruction list
(pointers)
Static Single-Assignment Form
• SSA: different from three-address code
– All assignments are to variables with distinct
names
• (Fig. 6.13)
– A notational convention called -function to
combine two definitions of a variable
• if (flag) x=-1; else x=1;
y=x*a;
• if (flag) x1=-1; else x2=1;
x3= (x1,x2);
Types and Declarations
• Type checking
– Ex: &&
• Translation applications
– Storage at run time
Type Expressions
• A type expression: either a basic type or formed
by applying type constructors to a type
expression
– Ex. 6.8 (Fig. 6.14)
– Basic type
– Type name
– Array
– Record
– s  t (function from type s to type t)
– s  t (Cartesian product)
– Variables
Type Equivalence
• When type expressions are represented by
graphs, two types are structurally
equivalent iff
– They are the same basic type
– They are formed by applying the same type
constructor to structurally equivalent types
– One is a type name that denotes the other
• name equivalence: the first two
Decalrations
• D  T id; D | 
T  B C | record ‘{‘ D ’}’
B  int | float
C   | [num] C
Storage Layout for Local Names
• Width of a type: number of storage units
for objects of that type
– (Fig. 6.15)
– Ex. 6.9: int [2][3] (Fig. 6.16)
Sequences of Declarations
• (Fig. 6.17)
P  {offset = 0;} D
D  T id; {top.put(id.lexeme, T.type,
offset); offset = offset + T.width; } D1
D
• Offset: the next available relative address
Fields in Records and Classes
• The field names within a record must be distinct
• The offset or relative address for a field name is
relative to the data area for that record
• Ex. 6.10
– float x;
record { float x; float y; } p;
record { int tag; float x; float y; } q;
• (Fig. 6.18)
T  record ‘{‘ { Env.push(top); top=new Env();
Stack.push(offset); offset = 0; }
D ‘}’ { T.type = record(top); T.width=offset;
top.Env.pop(); offset = Stack.pop(); }
Translation of Expressions
• Operations within expressions
– (Fig. 6.19)
– gen(x ‘=‘ y ‘+’ z): x=y+z
– Ex. 6.11: a = b+-c
• t1=minus c
t2 = b+t1
a=t2
Incremental Translation
• Code attributes can be long strings
• (Fig. 6.20)
– The attribute code is not used
Addressing Array Elements
• For an array A with n elements
– A[i] begins at: base + i*w
• For k-dimensional arrays,
– base+i1*w1+i2*w2+…+ik*wk
– Base + ( (… ((i1*n2+i2)*n3+i3) …)*nk+ik)*w
• row-major vs. column-major
Translation of Array References
• LL[E] | id [E]
• (Fig. 6.22)
– L.addr
– L.array
– L.type
• Ex. 6.12: c+a[i][j] (Fig. 6.23 & 6.24)
Type Checking
• To assign a type expression to each component
of the source program
• To determine that these type expressions
conform to logical rules called type system for the
source language
– A sound type system eliminates the need for dynamic
type checking
• Ideas from type checking have been used to
improve the security of systems that allow
software modules to be imported and executed
Rules for Type Checking
• Type synthesis: build up the type of an
expression from the types of its
subexpressions
– If f has type st and x has type s,
then expression f(x) has type t
• Type inference: determine the type of a
language construct from the way it is used
– If f(x) is an expression,
then for some  and , f has type   and x
has type 
Type Conversions
• EE1+E2
– if (E1.tye=integer and E2.type=integer)
E.type=integer;
else if (E1.type=float and E2.type=integer) …

• Widening vs. narrowing conversions
– (Fig. 6.25)
• Implicit vs. explicit conversions
– Implicit: coercions
– Explicit: casts
• Semantic actions
– EE1+E2 { E.type=max(E1.type, E2.type);
a1=widen(E1.addr, E1.type, E.type);
a2=widen(E2.addr, E2.type, E.type);
E.addr=new Temp();
gen(E.addr ‘=‘ a1 ‘+’ a2); }
– max(t1, t2)
– widen(a, t, w): (Fig. 6.26)
Overloading of Functions and
Operators
• An overloaded symbol has different
meanings depending on its context
– Ex: operator+
– If f can have type siti, for i=1,…,n, where si!
=sj for i!=j
and x has type sk for some k,
then expression f(x) has type tk
• The value-number method can be applied
to type expressions to resolve overloading
based on argument types, efficiently
Type Inference and Polymorphic
Functions
• Type inference is useful for languages like ML
– Strongly typed, but does not require names to be
declared before they are used
• Polymorphic: any code fragment that can be
executed with arguments of different types
– Ex: fun length(x) =
if null(x) then 0 else length(tl(x))+1;
– Type of length: .list()integer
– (Fig. 6.29)
• Substitution: mapping from type variables
to type expressions
– S(t)
• Two type expressions t1 and t2 unify if
there exists some substitution S such that
S(t1) = S(t2)

• Algorithm 6.16: Type inference for polymorphic
functions
• Input: A program
• Output: inferred types for the names in the
program
• Method: the type of a function f(x1,x2): s1xs2t
– Check function definitions and expression in the input
• For function definition fun id1(id2)=E
• For function application E1(E2)
• Ex. 6.17 (Fig. 6.30)
An Algorithm for Unification
• Algorithm 6.19: Unification of a pair of nodes in a type
graph
• Input: A graph representing a type and a pair of nodes
m and n to be unified
• Output: Boolean value true if the expressions
represented by the nodes m and n unify; false, otherwise
• Method: setting the set field of nodes in the equivalence
class point to the representative node
– (Fig. 6.32)
– find(n)
– union(m,n)
– Ex. 6.20 (Fig. 6.33)
Control Flow
• If-else and while statements: boolean
expressions
– Alter the flow of control
– Compute the logical values
• Boolean expressions
– Boolean operators (&&, ||, !) applied to boolean
variables or relational expressions
– BB||B | B&&B | !B | (B) | E rel E | true | false
• rel.op: <, <=, =, !=, >, >=
Short-Circuit Code
• Short-circuit (or jumping) code
– if (x<100 || x>200 && x!=y) x=0;
• if x<100 goto L2
ifFalse x>200 goto L1
ifFlase x!=y goto L1
L2: x=0
L1:
Flow-of-Control Statements
• Sif (B) S1
Sif (B) S1 else S2
Swhile (B) S1
• (Fig. 6.35)
• (Fig. 6.36 & 6.37) SDD for flow-of-control
statements
– If x<100 goto L2
goto L3
L3: if x>200 goto L4
goto L1
L4: if x!=y goto L2
goto L1
L2: x=0
L1:
• The code is not optimal
– Redundant goto’s
Avoiding Redundant Goto’s
– if x>200 goto L4
goto L1
L4: …
– ifFalse x>200 goto L1
L4: …
– B.true=fall
B.false=S1.next=S.next
S.code=B.code||S1.code
– (Fig. 6.39 & 6.40)
– (Fig. 6.41)
Boolean Values and Jumping
Codes
• Use two phases
• Use one pass for statements, but two
passes for expressions
Backpatching
• Key problem: matching a jump instruction
with the target of the jump
– Passing labels as inherited attributes, a
separate pass is needed to bind labels to
addresses
– Backpatching: passing lists of jumps as
synthesized attributes
One-pass Code Generation using
Backpatching
• B.truelist: a list of jump instructions into
which we must insert the label to which
control goes if B is true
• B.falselist
– Makelist(i)
– Merge(p1,p2)
– Backpatch(p,i)
Backpatching for Boolean
Expressions
• (Fig. 6.43)
• (Fig. 6.44)
For Flow-of-Control Statements
• (Fig. 6.45)
• (Fig. 6.46)
Break, Continue, and Goto-
Statements
• Goto-statement: by maintaining a list of unfilled
jumps for each label, and then backpatching the
target when it’s known
• Break-statement: jumping to the first instruction
after the code for enclosing statement S
• Continue-statement: similar
• for (;;readch()) {
if (peek==‘ ‘ || peek==‘\t’) continue;
else if (peek==‘\n’) line = line+1;
else break;
}
Switch Statements
• Switch (E) {
case V1: S1
case V2: S2

case Vn-1: Sn-1
default: Sn
}
Translation of Switch-Statements
• Evaluate E
• Find the value Vj in the list of cases
– An n-way branch
• To create a table of pairs: (value, label)
• Hash table
• An array of (max-min) buckets
• Execute statement Sj
Syntax-directed Translation of
Switch-Statements
• (Fig. 6.49)
• (Fig. 6.50)
Intermediate Code for Procedures
• (Chap. 7)
• Ex. 6.25
• (Fig. 6.52)
• Function types
• Symbol tables
• Type checking
• Function calls
Thanks for Your Attention!

Você também pode gostar