Você está na página 1de 18

CS495P MAIN PROJECT REPORT

Compiler Dire ted Monitoring of Run


Time Data A ess using SUIF
SUBMITTED IN PARTIAL FULFILMENT OF THE DEGREE OF

BACHELOR OF TECHNOLOGY

by

Haynes M G Y2033

Job Abraham Y2029

Under the guidan e of


Dr. Vineeth Kumar Paleri

Department of Computer Engineering


National Institute of Te hnology, Cali ut

2005,Monsoon Semester
National Institute of Te hnology, Cali ut
Department of Computer Engineering

Certied that this Main Proje t Report entitled

Compiler Dire ted Monitoring of Run


Time Data A ess using SUIF
is a bonade report of the work done by

Haynes M G Y2033
Job Abraham Y2029

in partial fullment of the


Ba helor of Te hnology Degree

Dr. Vineeth Kumar Paleri Dr. M.P. Sebastian


Professor Professor and Head

Dept. of Computer Engineering Dept. of Computer Engineering


A knowledgement
We thank Dr. Vineeth Kumar Paleri, Professor , Department of Computer S ien e and
Engineering, for his guidan e and o-operation in the ompletion of this proje t. I also a -
knowledge the advi e and guidan e given to me by my friends and lassmates.
Haynes M G Y2033
Job Abraham Y2029
Abstra t
This proje t aims at two obje tives. First, study the working of the SUIF infrastru ture.Se ond,
develop a pass whi h will allow us to implement run time monitoring of data a ess. This
onsists of inserting fun tion alls at dierent points in the input program (whi h is in SUIF
format) and developing a monitor whi h is alled at run time.
Contents

1 Introdu tion 1
1.1 Problem Spe i ation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 The SUIF ompiler system and intermediate format 2
2.1 SUIF kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 SUIF toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 SUIF intermediate format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3.1 File Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Pro edure Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.3 Instru tion Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.4 Symboli Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.5 Other Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The ompiler pass 8
3.1 Working of the pass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 The run time monitor 11
4.1 Re Obj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Re A ess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Re Link . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Con lusion 12
1 Introdu tion

1.1 Problem Spe i ation


The level of representation for programs in a parallelizing ompiler is a ru ial element of the
ompiler design. Traditional ompilers for unipro essor systems generally use program repre-
sentations that are too low-level for parallelization. At the other extreme, many parallelizing
ompilers are sour e-to-sour e translators and their analyses and optimizations work dire tly
on abstra t syntax trees.
SUIF stands for Stanford University Intermediate Format. SUIF ompiler system is a platform
for resear h on ompiler te hniques for high-performan e ma hines. The SUIF ompiler sys-
tem is powerful, modular, exible, learly do umented, and omplete enough to ompile large
ben hmark programs. Therefore we hose SUIF as our preferred platform.

The tasks at hand an be lassied as follows.

• Setting up SUIF ompiler system and installing additional pa kages.

• Understanding the SUIF format and the working of the ompiler system.

• Implementing a pass to he k for data a ess at run time and do an analysis of the same
using a run time monitor.

1.2 Literature Survey


We began reading on SUIF from the SUIF website at Stanford University. (http://www.suif.stanford.edu)
The following do uments were referred from the SUIF website

• An Overview of the SUIF System - This do ument gives a omplete overview of the SUIF
ompiler system , ar hite ture, the design of the SUIF base kernel and the ompiler toolkit
available with it.
• The SUIF Library - This do ument is a referen e manual for the SUIF library.This li-
brary gives us a omplete referen e about the SUIF intermediate format and the routines
available to manipulate these data stru tures.
• The SUIF Cookbook - This is a guide whi h ontains a series of examples des ribing
various passes. This guide introdu es us to writing new passes and explains how to
ompile and run the newly reated passes.
• The SUIF sour e ode - The a tual sour e ode was also extensively referred to make
ourselves lear about SUIF internals.

Other do uments referred to in lude

• Compilers Prin iples, Te hniques and Tools: Aho, Sethi & Ullman [1℄

• Compiler Dire ted Monitoring of Program Data A ess [2℄


1.3 Motivation
Embedded systems suer from two major limitations. They are memory and power onstraints.
Therefore there exists a need to add dierent kinds of optimizations to ompilers to address
theses issues. This proje t aims at redu ing these onstraints using ompiler based optimization.

2 The SUIF ompiler system and intermediate format

The ompiler is stru tured as a small kernel plus a toolkit onsisting of various ompilation
analyses and optimizations built using the kernel. The kernel denes the intermediate repre-
sentation and the interfa e between passes of the ompiler. This interfa e is always the same
so that the passes in the toolkit an easily be enhan ed, repla ed, or rearranged. All program
information ne essary to implement s alar and parallel ompiler optimizations is easily avail-
able from the SUIF kernel.
The intermediate program representation is a hierar hy of data stru tures dened in an obje t-
oriented lass library. This intermediate representation retains almost all the high-level infor-
mation from the sour e ode. A essing and manipulating the data stru tures are generally
straightforward due to the modular design of the kernel.

2.1 SUIF kernel


The SUIF kernel performs three major fun tions:

• It denes the intermediate representation of programs. This representation supports


high-level program-restru turing transformations as well as low-level analyses and opti-
mizations.
• It provides fun tions to a ess and manipulate the intermediate representation. Hiding
the low-level details of the implementation makes the system easier to use and helps
maintain ompatibility if the representation is hanged.
• It stru tures the interfa e between ompiler passes. SUIF passes are separate programs
that ommuni ate via les. The format of these les is the same for all stages of a ompila-
tion. The system supports experimentation by allowing user-dened data in annotations.

SUIF is a mixed-level program representation. Besides the onventional low-level operations,


it in ludes three high-level onstru ts: loops, onditional statements, and array a esses. The
loop and onditional representations are similar to abstra t syntax trees but are language-
independent. These onstru ts apture all the high-level information ne essary for paralleliza-
tion. This approa h also redu es the many dierent ways of expressing the same information
to a anoni al form, thus simplifying the design of the analyzers and optimizers.
The symbol tables in a SUIF program hold detailed symbol and type information. The
information is omplete enough to translate SUIF ba k to legal and high-level C ode. The ker-
nel provides an obje t-oriented implementation of the SUIF intermediate format. This SUIF
library denes a C ++ lass for ea h element of the program representation, allowing us to
provide interfa es to the data stru tures that hide the underlying details.

2.2 SUIF toolkit


The SUIF toolkit onsists of a set of ompiler passes implemented as separate programs. Ea h
pass typi ally performs a single analysis or transformation and then writes the results out to a
le. This is ine ient but exible. SUIF les always use the same output format so that passes
an be reordered simply by running the programs in a dierent order. New passes an be freely
inserted at any point in a ompilation. The new optimization that is to be implemented will
be one of the passes.

2.3 SUIF intermediate format


The SUIF library provides an obje t-oriented implementation of the SUIF intermediate format.
It is written in C++. The library denes lasses to represent all the various elements of the
intermediate format and to perform some ommon operations on them. It also ontains the
ode to read and write the binary les that hold the SUIF ode between passes of the ompiler.
SUIF representation is roughly organized as a hierar hy. Obje ts near the top of the hierar hy
ontain the lower-level obje ts. The various elements of SUIF are des ribed here starting from
the top of the hierar hy and working down.

A diagram showing the representation of a SUIF program is as shown below.


2.3.1 File Level

At the root of the hierar hy for a SUIF program is a "le set" ontaining a list of the les being
ompiled. Ea h entry within the le set is a "le set entry" that ontains the input and output
streams for a parti ular le.
The le level of the SUIF hierar hy also ontains the global symbol tables. The le set ontains
the global symbol table that is shared a ross all of the les. This shared symbol table is the key
to supporting interpro edural analysis. Referen es to a global symbol or type from dierent
les an point to the same entry in the shared global symbol table, making it easy to determine
that they refer to the same entity. Ea h le set entry also ontains its own symbol table for
things de lared privately within that le.
Lower levels of the SUIF hierar hy an be rea hed through the global symbol tables. Besides
the types and variables, the global symbol tables ontain symbols for the pro edures. The
pro edure bodies an be a essed through these pro edure symbols. If the body of a pro edure
is ontained in one of the input les, the orresponding pro edure symbol automati ally re ords
a pointer to the input le and provides a method to read the body into memory. The pro edure
symbol also has other methods to write the body to an output le and to ush the body from
memory. Many SUIF programs need to pro ess all the pro edures. This an be done by sear h-
ing through the global symbol tables for the pro edure symbols. However sin e this is su h
a ommon task, the le set entries in lude pro edure iterators to step through all the pro edures.

The "le set" is very helpful when we are working with interpro edural passes. Without it,
we would have to ombine all the ode into one big le.

2.3.2 Pro edure Level

Pro edure bodies are represented using a language-independent form of abstra t syntax trees
(ASTs). In the rst stages of a ompilation, the high-level stru ture is represented by a language-
independent form. This format is alled high-SUIF. This format is well suited for passes that
require the high level stru ture of ode like dependen e analysis and loop transformation.
Later in the ompilation pro ess, the ASTs are redu ed to sequential lists of instru tions. This
form alled low-SUIF, works well for some s alar optimizations and for ode generation. Both
formats are represented using the same tree data stru tures. They only dier in the amount of
information present .

An AST ontains the following types of nodes:

• Instru tion Nodes: They form the leave nodes of the ast. Ea h of these nodes ontains
a single instru tion or expression tree. In low-SUIF ode, a pro edure body is redu ed
to a list of instru tion nodes ontaining individual instru tions. This form resembles the
quadruple representation used by traditional s alar optimizers. Ea h instru tion node
ontains a single instru tion or expression tree. Methods are provided to atta h/deta h
an instru tion, apply a fun tion over all instru tions in an expression tree and so on.
• Blo k nodes: Blo k nodes represent nested s opes. A blo k node ontains a symbol table
and a list of the AST nodes within the blo k. The s ope of the symbols and types dened
in the symbol table is restri ted to the AST nodes within the blo k. They annot be
referen ed from outside the blo k.
• If nodes: Conditional stru tures may be represented by if nodes. An "if" node has three
parts, ea h of whi h is a list of AST nodes. The header list ontains ode to evaluate
the ondition and either bran h to the else list or fall through to the then list. Be ause
the header an ontain ontrol ow, it is easy to implement short- ir uit evaluation of
onditional expressions. SUIF has two dierent kinds of loops. One type is the "for"
loops and the other is "loop node".
• A loop node ontains two lists of AST nodes. It represents a "do-while" loop. It ontains
two tree node lists: the body and the test. The body list omes rst and holds the loop
body. The test list ontains ode to evaluate the "while" expression and onditionally
bran h ba k to the beginning of the body. The for loop node in addition to the loop body
spe ies the index variable and the range of values for the index. The lower bound, upper
bound, and step operands are expressions that are evaluated on e at the beginning of the
loop. The for loop also has an optional landing_pad part whi h is used to exe ute loop
invariant ode.

2.3.3 Instru tion Level

Ea h instru tion node in an abstra t syntax tree holds a SUIF instru tion. Most SUIF in-
stru tions perform simple operations; the op odes resemble those for a typi al RISC pro essor.
However, more omplex instru tions are used in pla es where it is important to retain high-level
information.
SUIF supports both expression trees (high-SUIF) and at lists (low-SUIF) of instru tions. In
an expression tree, the instru tions for an expression are all grouped together. This works well
for high-level passes. Be ause expression trees do not totally order the evaluation of the instru -
tions, they do not work so well for ba k-end optimization and s heduling passes. Thus SUIF
also provides the at list presentation where ea h instru tion node ontains a single instru tion.
Most SUIF instru tions use a "quadruple" format with a destination operand and two sour e
operands; however, some instru tions require more spe ialized formats. For example, ld (load
onstant) instru tions have an immediate value eld in pla e of the sour e operands. The al
( all) instru tion implements a ma hine-independent pro edure all with a list of parameters.
This hides the details of various linkage onventions.

2.3.4 Symboli Information

SUIF in ludes detailed symboli information. Symbols and types are dened in nested s opes
orresponding to the blo k stru ture of the program. A symbol table is atta hed to ea h ele-
ment of the main SUIF hierar hy that denes a new s ope. Symbols re ord information about
variables, labels, and pro edures. The SUIF type system is similar to C but also has some
support for FORTRAN and other languages.
The symbol tables are dened in a tree stru ture that forms a hierar hy parallel to the main
SUIF hierar hy. Ea h table re ords a pointer to its parent and keeps a list of its hildren. The
global symbol table at the root is atta hed to the le set and is shared a ross all the les. Its
hildren are the le symbol tables atta hed to the le set entries. The pro edure symbol tables
for the AST pro edure nodes are in the next level down, followed by the blo k symbol tables
for blo k nodes within the ASTs. The blo k symbol tables may be nested to any level.
Ea h symbol table ontains a list of symbols that are dened within the orresponding
s ope. There are three dierent kinds of symbols: variables, labels, and pro edures.

A variable symbol ontains a pointer to the type for the variable. The type determines
the amount of storage used to hold the variable as well as the interpretation of its ontents.
Some additional ags are used for variable symbols. e.g.: ags to distinguish between formal
parameters and those that have their address taken, ags to identify variables that represent
ma hine registers et .

Label symbols an only be de lared within pro edures. The position of a label in the ode
is marked with a spe ial instru tion.

Pro edure symbols an only be de lared in the global and le s opes. A pro edure symbol
ontains a pointer to the AST for the body of the pro edure if it exists. It also provides methods
to read the body from an input le, write it to an output le, and ush it from memory. The
pro edure symbol also has a pointer to the type for the pro edure.

The SUIF type system an represent most, if not all, high-level types for C programs and
for many other languages. The types are implemented with various kinds of type nodes. Ea h
type node ontains an operator that spe ies the kind of node. Some of the type operators
dene base types that stand alone, while other operators refer to other type's nodes. For
example, a type node with the TYPE_INT operator denes a new integer type. A node with
the TYPE_PTR operator an then refer to the integer type node to reate a type for pointers
to integers.

2.3.5 Other Data

SUIF is designed to be extended with new kinds of analyses and optimizations. These future
extensions will generally require that additional information be atta hed to SUIF obje ts and
propagated between passes. SUIF provides "annotations" whi h allow user-dened data stru -
tures to be atta hed to most SUIF obje ts. This is the primary me hanism for making SUIF
easily extensible.

New annotations an be de lared by any program and used to re ord whatever information
is needed within that program. They an also be written to the SUIF output les so that other
programs an use them. An annotation manager re ords the annotation names and the format
of the data asso iated with ea h kind of annotation.
3 The ompiler pass

The pass inserts fun tion alls at appropriate pla es. By appropriate pla es we mean every
data a ess. This does not in lude a esses to literal values like numbers. For example after
applying the pass, the following hanges o ur.

Before applying the pass:


int main ( )
{
int a,b;
a=3;
b=a+1;
return 0;
}

After applying the pass the above program gets hanged to:
int main()
{
int a,b;
Re Obj('a',&a);
Re Obj('b',&b);
a=3;
Re A ess(&a);
b=a+1;
Re A ess(&b);
Re A ess(&a);
return 0;
}

Re Obj () is used to notify the monitor about the dierent variables dened in a pro edure.
The arguments for this fun tion are the name of the variable and the address of the variable.
Re A ess () is used to re ord ea h data a ess. The argument for this fun tion is the address
of the variable a essed.
3.1 Working of the pass
We start by iterating through ea h le in the sour e program. For ea h le, we iterate through
ea h pro edure in the le.
The pro edure body an be a essed as a list of type tree_node_list.
By iterating through this list, we a ess the individual tree nodes in the pro edure body.
The individual nodes in the pro edure body an be one of these ve types.

• tree_blo k : For representing nested s opes

• tree_if : For representing the if onstru t

• tree_loop : For representing loop onsru ts like while,do_while

• tree_for : For representing the for loop onstru t

• tree_instr : For representing SUIF instru tions

We an dire tly a ess the sour e operand(s) and the destination operand for nodes of type
tree_instr only.For the other nodes, we have to re ursively traverse through these nodes further
deeper until we rea h a tree_instr node.
Now that we have a ess to the individual operands we an insert ode into the tree_node_list.We
make use of the SUIF library itself for this purpose.We make obje ts of type tree_node and
insert them into the tree_node_list.

The steps involved in making a tree_node for a fun tion all are as follows:

1. We iterate through ea h sour e operand of the instru tion. If the operands are instru -
tions, we re ursivley examine this instru tion for it's operands.If the operand is an im-
mediate value , we do nothing. If the operand is a variable, we lookup in the pro edure
symbol table for this operand and extra t the name of the variable.We will be using this
name to identify ea h operand.
2. For ea h sour e operand whi h is a variable, we have to insert a fun tion all (Re A ess)
to re ord its a ess. This is done by reating an obje t of type in_ al. in_ al is a lass
dened in the SUIF library to represent fun tion alls.
in_ al obje ts have as attributes the address of the alled pro edure, the number of
arguments and ea h argument.We have to set ea h of these attributes orre tly .
3. The above two steps are repeated for the destination operand also.We make an in_ al
obje t for the destination operand .
4. On e we have reated an in_ al obje t , we have to make a tree node for this obje t and
insert the node into the tree_node_list.We make use of the insert_before/insert_after
fun tions dened in the tree_node_list lass for this purpose.
E.g:
in_ al * f = (in_ al *)re a ess;
operand addr = f->addr_op(). lone();
in_ al * new_f = new in_ al();
new_f->set_addr_op(addr);
int args = f->num_args();
new_f->set_num_args(args);
/* ode to reate ea h operand */
......
......
......
new_f->set_argument(0,p);
new_f->set_argument(1,q);
new_f->set_argument(2,r);

tree_node * new_tn = new tree_instr(new_f);


/* tl is the tree_node_list, tn is the tree_node of the instru tion whi h we are analyzing */
tl->insert_after(new_tn,tl->lookup(tn));

We also have to insert fun tion alls for ea h variable dened in the pro edure. This is available
from the symbol table of the pro edure.
The steps involved are as follows:

1. Iterate through ea h symbol in the symbol table and extra t the name of ea h symbol
2. If the symbol is a label, do nothing. If it is a variable symbol , reate an in_ al obje t
(Re Obj) with arguments properly set .
3. Make a tree_node for this obje t and insert the node into the tree_node_list at the
orre t position.

E.g:
in_ al * f = (in_ al *)re obj;
operand addr = f->addr_op(). lone();
in_ al * new_f = new in_ al();
new_f->set_addr_op(addr);
int args = f->num_args();
new_f->set_num_args(args);
/* ode to reate ea h operand */
......
......
......
new_f->set_argument(0,sr 1);
new_f->set_argument(0,sr 2);

tree_node * new_tn = new tree_instr(new_f);


tp->body()->push(new_tn);

After we have inserted fun tion alls at the required pla es, we have to write the hanges
ba k to an output le. This output le is in SUIF format itself.
The output le is onverted to normal C ode using the s2 utility that omes along with
the SUIF 1.x ompiler system.

On e we have the normal C ode we an ompile it using any standard C ompiler like g and
do an analysis of the ode.

4 The run time monitor

A transformed program invokes the run time monitor through the high level interfa e onsisting
mainly of three types of fun tion alls:

• Re Obj for a data allo ation

• Re A ess for a data a ess

• Re Link for the extra tion of an internal address.

In addition to the above three fun tions we have written a fun tion Analyzer() to display the
number of a ess for ea h variable and the per entage of a ess for ea h data.

4.1 Re Obj
For ea h data unit in a Re Obj all, the monitor reates a re ord whi h ontains its memory
address. For fast retrieval of shadow data, we store them in a hash table indexed by the starting
address of data units. We used linear hashing with haining.

4.2 Re A ess
A ess re ording in Re A ess involves hash-table sear h to nd the shadow data, and re ord
the a ess. The rst parameter of Re A ess is used in hash-table sear h. It is either the start-
ing address of a data unit or an internal address. The hash entry is initialized by Re Obj in
the rst ase and by Re Link in the se ond ase. Note that Re Link happens before a program
takes an internal address from a data unit.

4.3 Re Link
At Re link, the monitor inserts the extra ted address into the hash table and links it to the
shadow re ord of its data unit. In the worst ase, a program stores the address of every data
element, and the hash table has one entry for ea h data element. However, our experien e
shows that a program usually takes at most a onstant number of internal addresses from any
data unit.

4.4 Analyzer
A all to this fun tion is inserted at the end of the main() fun tion. This fun tion prints out
number of a esses and the per entage of a ess for ea h data.
5 Con lusion

Appropriate fun tion alls were inserted to the input program by the new pass whi h re ords the
data a esses using the run time monitor. This proje t serves as a basis for further appli ation
spe i work. For example the re ording of data a ess an be used for appli ations like data
ooading . A further improvement possible in this proje t would be sele tive monitoring where
we further optimize on where to insert fun tion alls. In our base s heme , we are monitoring
every data a ess.However if a variable is a essed frequently in a short ode sequen e,we an
re ord the rst a ess and omit the rest.
The urrent base s heme implements monitoring of data a ess for primitive data types like
int, har , oat and for arrays. The implementation an be easily extended to monitor data
a ess in the ase of pointers and stru tures.
Referen es

[1℄ Alfred V. Aho , Ravi Sethi , Jerey D. Ullman, Compilers: prin iples, te hniques, and tools,
Addison-Wesley Longman Publishing Co., In ., Boston, MA, 1986
[2℄ Compiler Dire ted Monitoring of Program Data A ess - Chen Ding, Yutao Zhong June 2002
ACM SIGPLAN Noti es, Pro eedings of the 2002 workshop on Memory system performan e
MSP '02, Volume 38 Issue 2 supplement , pages 1-12
[3℄ The SUIF 1.x Compiler System - http://www.suif.stanford.edu

Você também pode gostar