Você está na página 1de 42

Recap

Mooly Sagiv
Outline
• Subjects Studied
• Questions & Answers
Lexical Analysis (Scanning)
• input
– program text (file)
• output
– sequence of tokens
• Read input file
• Identify language keywords and standard identifiers
• Handle include files and macros
• Count line numbers
• Remove whitespaces
• Report illegal symbols
• [Produce symbol table]
The Lexical Analysis Problem
• Given
– A set of token descriptions
– An input string
• Partition the strings into tokens
(class, value)
• Ambiguity resolution
– The longest matching token
– Between two equal length tokens select the first
Jlex
• Input
– regular expressions and actions (Java code)
• Output
– A scanner program that reads the input and
applies actions when input regular expression is
matched
regular expressions

Jlex

input program scanner tokens


Summary

• For most programming languages lexical


analyzers can be easily constructed
automatically
• Exceptions:
– Fortran
– PL/1
• Lex/Flex/Jlex are useful beyond compilers
Syntax Analysis (Parsing)
• input
– Sequence of tokens
• output
– Abstract Syntax Tree
• Report syntax errors
• unbalanced parenthesizes
• [Create “symbol-table” ]
• [Create pretty-printed version of the program]
• In some cases the tree need not be generated
(one-pass compilers)
Pushdown Automaton
input
u t w $

parser-table
control

stack
Efficient Parsers
• Pushdown automata
• Deterministic
• Report an error as soon as the input is not a
prefix of a valid program
• Not usable for all context free grammars
context free grammar

cup “Ambiguity
errors”

tokens parser parse tree


Kinds of Parsers
• Top-Down (Predictive Parsing) LL
– Construct parse tree in a top-down matter
– Find the leftmost derivation
– For every non-terminal and token predict the next production
– Preorder tree traversal
• Bottom-Up LR
– Construct parse tree in a bottom-up manner
– Find the rightmost derivation in a reverse order
– For every potential right hand side and token decide when a production
is found
– Postorder tree traversal
Top-Down Parsing
1

input

t1 t2
Bottom-Up Parsing

1 2

input

t1 t2 t4 t5 t6 t7 t 8
Example Grammar for Predictive LL Top-
Down Parsing

expression  digit | ‘(‘ expression operator expression ‘)’


operator  ‘+’ | ‘*’
digit  ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
Example Grammar for Predictive LL Top-
Down Parsing

expression  digit | ‘(‘ expression operator expression ‘)’


operator  ‘+’ | ‘*’
digit  ‘0’ | ‘1’ | ‘2’ | ‘3’ | ‘4’ | ‘5’ | ‘6’ | ‘7’ | ‘8’ | ‘9’
static int Parse_Expression(Expression **expr_p) {
Expression *expr = *expr_p = new_expression() ;
/* try to parse a digit */
if (Token.class == DIGIT) {
expr->type=‘D’; expr->value=Token.repr –’0’; get_next_token();
return 1; }
/* try parse parenthesized expression */
if (Token.class == ‘(‘) {
expr->type=‘P’; get_next_token();
if (!Parse_Expression(&expr->left)) Error(“missing expression”);
if (!Parse_Operator(&expr->oper)) Error(“missing operator”);
if (Token.class != ‘)’) Error(“missing )”);
get_next_token();
return 1; }
return 0;
}
Parsing Expressions
• Try every alternative production
– For P  A1 A2 … An | B1 B2 … Bm
– If A1 succeeds
• Call A2
• If A2 succeeds
– Call A3
• If A2 fails report an error
– Otherwise try B1
• Recursive descent parsing
• Can be applied for certain grammars
• Generalization: LL1 parsing
int P(...) {
/* try parse the alternative P  A1 A2 ... An */
if (A1(...)) {
if (!A2()) Error(“Missing A2”);
if (!A3()) Error(“Missing A3”);
..
if (!An()) Error(Missing An”);
return 1;
}
/* try parse the alternative P  B1 B2 ... Bm */
if (B1(...)) {
if (!B2()) Error(“Missing B2”);
if (!B3()) Error(“Missing B3”);
..
if (!Bm()) Error(Missing Bm”);
return 1;
}
Predictive Parser for Arithmetic Expressions

• Grammar 1 EE+T
2 ET
3 TT*F
4 TF
5 F  id
6 F  (E)

• C-code?
Bottom-Up Syntax Analysis
• Input
– A context free grammar
– A stream of tokens
• Output
– A syntax tree or error
• Method
– Construct parse tree in a bottom-up manner
– Find the rightmost derivation in (reversed order)
– For every potential right hand side and token decide when a
production is found
– Report an error as soon as the input is not a prefix of valid
program
Constructing LR(0) parsing table
• Add a production S’  S$
• Construct a finite automaton accepting “valid
stack symbols”
• States are set of items A 
– The states of the automaton becomes the states of
parsing-table
– Determine shift operations
– Determine goto operations
– Determine reduce operations
– Report an error when conflicts arise
1: S  E$ 2: S  E  $
T 4: E   T E
5: E  T  7: E  E  + T
6: E   E + T
i $
10: T   i 2: S  E $ 
11: T  i 
12: T   (E)
+
(
i
13: T  ( E)
4: E   T 7: E  E +  T
6: E   E + T 10: T   i
10: T   i 12: T   (E)
12: T   (E)
i (
E
+ T
14: T  (E ) 8: E  E + T 
)
15: T  (E)  7: E  E  + T
Parsing “(i)$”

1: S  E$ 2: S  E  $
T 4: E   T E
5: E  T  7: E  E  + T
6: E   E + T
i $
10: T   i 2: S  E $ 
11: T  i 
12: T   (E)
+
(
i
13: T  ( E)
4: E   T 7: E  E +  T
6: E   E + T 10: T   i
10: T   i 12: T   (E)
12: T   (E)
i (
E
+ T
14: T  (E ) 8: E  E + T 
)
15: T  (E)  7: E  E  + T
Summary (Bottom-Up)
• LR is a powerful technique
• Generates efficient parsers
• Generation tools exit LALR(1)
– Bison, yacc, CUP
• But some grammars need to be tuned
– Shift/Reduce conflicts
– Reduce/Reduce conflicts
– Efficiency of the generated parser
Summary (Parsing)
• Context free grammars provide a natural way to
define the syntax of programming languages
• Ambiguity may be resolved
• Predictive parsing is natural
– Good error messages
– Natural error recovery
– But not expressive enough
• But LR bottom-up parsing is more expressible
Abstract Syntax
• Intermediate program representation
• Defines a tree - Preserves program
hierarchy
• Generated by the parser
• Declared using an (ambiguous) context free
grammar (relatively flat)
– Not meant for parsing
• Keywords and punctuation symbols are not
stored (Not relevant once the tree exists)
• Big programs can be also handled (possibly
via virtual memory)
Semantic Analysis
• Requirements related to the “context” in
which a construct occurs
• Examples
– Name resolution
– Scoping
– Type checking
– Escape
• Implemented via AST traversals
• Guides subsequent compiler phases
Abstract Interpretation
Static analysis
• Automatically identify program properties
– No user provided loop invariants
• Sound but incomplete methods
– But can be rather precise
• Non-standard interpretation of the program
operational semantics
• Applications
– Compiler optimization
– Code quality tools
• Identify potential bugs
• Prove the absence of runtime errors
• Partial correctness
Constant Propagation
[x?, y?, z?]
z =3
[x?, y?, z  3]
while (x>0) [x?, y?, z3]
[x?, y?, z3]
if (x=1)
[x1, y?, z3] [x?, y?, z3]
y =7 y =z+4

[x1, y7, z3] [x?, y7, z3]

assert y==7
a := 0 ;

/* c */
L0: a := 0 b := a +1 ;
/* ac */
L1: b := a + 1 c := c +b ;
/* bc */
c := c + b
a := b*2 ;
/* bc */
a := b * 2
/* ac */
c <N goto L1
if c < N goto L1
/* c */
return c return c ;
a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

return c ;

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;

c <N goto L1

{c}

return c ;

a := 0 ;

b := a +1 ;

c := c +b ;

a := b*2 ;
{c}

c <N goto L1

{c}

return c ;

a := 0 ;

b := a +1 ;

c := c +b ;
{c, b}

a := b*2 ;
{c}

c <N goto L1

{c}

return c ;

a := 0 ;

b := a +1 ;

{c, b}
c := c +b ;
{c, b}

a := b*2 ;
{c}

c <N goto L1

{c}

return c ;

a := 0 ;
{c, a}

b := a +1 ;

{c, b}
c := c +b ;
{c, b}

a := b*2 ;
{c}

c <N goto L1

{c}

return c ;

a := 0 ;
{c, a}

b := a +1 ;

{c, b}
c := c +b ;
{c, b}

a := b*2 ;
{c, a}

c <N goto L1

{c, a}

return c ;

Summary Iterative Procedure
• Analyze one procedure at a time
– More precise solutions exit
• Construct a control flow graph for the
procedure
• Initializes the values at every node to the
most optimistic value
• Iterate until convergence
Basic Compiler Phases
Overall Structure
Techniques Studied
• Simple code generation
• Basic blocks
• Global register allocation
• Activation records
• Object Oriented
• Assembler/Linker/Loader
Heap Memory Management
• Part of the runtime system
• Utilities for dynamic memory allocation
• Utilities for automatic memory reclamation
– Garbage Colletion
Garbage Collection
• Techniques
– Mark and sweep
– Copying collection
– Reference counting
• Modes
– Generational
– Incremental vs. Stop the world

Você também pode gostar