Escolar Documentos
Profissional Documentos
Cultura Documentos
Mooly Sagiv
Outline
• Subjects Studied
• Questions & Answers
Lexical Analysis (Scanning)
• input
– program text (file)
• output
– sequence of tokens
• Read input file
• Identify language keywords and standard identifiers
• Handle include files and macros
• Count line numbers
• Remove whitespaces
• Report illegal symbols
• [Produce symbol table]
The Lexical Analysis Problem
• Given
– A set of token descriptions
– An input string
• Partition the strings into tokens
(class, value)
• Ambiguity resolution
– The longest matching token
– Between two equal length tokens select the first
Jlex
• Input
– regular expressions and actions (Java code)
• Output
– A scanner program that reads the input and
applies actions when input regular expression is
matched
regular expressions
Jlex
parser-table
control
stack
Efficient Parsers
• Pushdown automata
• Deterministic
• Report an error as soon as the input is not a
prefix of a valid program
• Not usable for all context free grammars
context free grammar
cup “Ambiguity
errors”
input
t1 t2
Bottom-Up Parsing
1 2
input
t1 t2 t4 t5 t6 t7 t 8
Example Grammar for Predictive LL Top-
Down Parsing
• Grammar 1 EE+T
2 ET
3 TT*F
4 TF
5 F id
6 F (E)
• C-code?
Bottom-Up Syntax Analysis
• Input
– A context free grammar
– A stream of tokens
• Output
– A syntax tree or error
• Method
– Construct parse tree in a bottom-up manner
– Find the rightmost derivation in (reversed order)
– For every potential right hand side and token decide when a
production is found
– Report an error as soon as the input is not a prefix of valid
program
Constructing LR(0) parsing table
• Add a production S’ S$
• Construct a finite automaton accepting “valid
stack symbols”
• States are set of items A
– The states of the automaton becomes the states of
parsing-table
– Determine shift operations
– Determine goto operations
– Determine reduce operations
– Report an error when conflicts arise
1: S E$ 2: S E $
T 4: E T E
5: E T 7: E E + T
6: E E + T
i $
10: T i 2: S E $
11: T i
12: T (E)
+
(
i
13: T ( E)
4: E T 7: E E + T
6: E E + T 10: T i
10: T i 12: T (E)
12: T (E)
i (
E
+ T
14: T (E ) 8: E E + T
)
15: T (E) 7: E E + T
Parsing “(i)$”
1: S E$ 2: S E $
T 4: E T E
5: E T 7: E E + T
6: E E + T
i $
10: T i 2: S E $
11: T i
12: T (E)
+
(
i
13: T ( E)
4: E T 7: E E + T
6: E E + T 10: T i
10: T i 12: T (E)
12: T (E)
i (
E
+ T
14: T (E ) 8: E E + T
)
15: T (E) 7: E E + T
Summary (Bottom-Up)
• LR is a powerful technique
• Generates efficient parsers
• Generation tools exit LALR(1)
– Bison, yacc, CUP
• But some grammars need to be tuned
– Shift/Reduce conflicts
– Reduce/Reduce conflicts
– Efficiency of the generated parser
Summary (Parsing)
• Context free grammars provide a natural way to
define the syntax of programming languages
• Ambiguity may be resolved
• Predictive parsing is natural
– Good error messages
– Natural error recovery
– But not expressive enough
• But LR bottom-up parsing is more expressible
Abstract Syntax
• Intermediate program representation
• Defines a tree - Preserves program
hierarchy
• Generated by the parser
• Declared using an (ambiguous) context free
grammar (relatively flat)
– Not meant for parsing
• Keywords and punctuation symbols are not
stored (Not relevant once the tree exists)
• Big programs can be also handled (possibly
via virtual memory)
Semantic Analysis
• Requirements related to the “context” in
which a construct occurs
• Examples
– Name resolution
– Scoping
– Type checking
– Escape
• Implemented via AST traversals
• Guides subsequent compiler phases
Abstract Interpretation
Static analysis
• Automatically identify program properties
– No user provided loop invariants
• Sound but incomplete methods
– But can be rather precise
• Non-standard interpretation of the program
operational semantics
• Applications
– Compiler optimization
– Code quality tools
• Identify potential bugs
• Prove the absence of runtime errors
• Partial correctness
Constant Propagation
[x?, y?, z?]
z =3
[x?, y?, z 3]
while (x>0) [x?, y?, z3]
[x?, y?, z3]
if (x=1)
[x1, y?, z3] [x?, y?, z3]
y =7 y =z+4
assert y==7
a := 0 ;
/* c */
L0: a := 0 b := a +1 ;
/* ac */
L1: b := a + 1 c := c +b ;
/* bc */
c := c + b
a := b*2 ;
/* bc */
a := b * 2
/* ac */
c <N goto L1
if c < N goto L1
/* c */
return c return c ;
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
return c ;
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
c <N goto L1
{c}
return c ;
a := 0 ;
b := a +1 ;
c := c +b ;
a := b*2 ;
{c}
c <N goto L1
{c}
return c ;
a := 0 ;
b := a +1 ;
c := c +b ;
{c, b}
a := b*2 ;
{c}
c <N goto L1
{c}
return c ;
a := 0 ;
b := a +1 ;
{c, b}
c := c +b ;
{c, b}
a := b*2 ;
{c}
c <N goto L1
{c}
return c ;
a := 0 ;
{c, a}
b := a +1 ;
{c, b}
c := c +b ;
{c, b}
a := b*2 ;
{c}
c <N goto L1
{c}
return c ;
a := 0 ;
{c, a}
b := a +1 ;
{c, b}
c := c +b ;
{c, b}
a := b*2 ;
{c, a}
c <N goto L1
{c, a}
return c ;
Summary Iterative Procedure
• Analyze one procedure at a time
– More precise solutions exit
• Construct a control flow graph for the
procedure
• Initializes the values at every node to the
most optimistic value
• Iterate until convergence
Basic Compiler Phases
Overall Structure
Techniques Studied
• Simple code generation
• Basic blocks
• Global register allocation
• Activation records
• Object Oriented
• Assembler/Linker/Loader
Heap Memory Management
• Part of the runtime system
• Utilities for dynamic memory allocation
• Utilities for automatic memory reclamation
– Garbage Colletion
Garbage Collection
• Techniques
– Mark and sweep
– Copying collection
– Reference counting
• Modes
– Generational
– Incremental vs. Stop the world