Você está na página 1de 226

Syntax Analysis

Sarfaraz Masood
(Asstt Prof.)

Department of Computer Engineering


Jamia Millia Islamia
(A Central University)
New Delhi – 110025

sarfarazmasood2002@yahoo.com

CH4.1
Syntax Analysis - Parsing
 An overview of parsing :
 Functions & Responsibilities

 Context Free Grammars


 Concepts & Terminology
 Writing and Designing Grammars
 Resolving Grammar Problems / Difficulties
 Top-Down Parsing
 Recursive Descent & Predictive LL
 Bottom-Up Parsing
 LR & LALR
 Concluding Remarks/Looking Ahead
CH4.2
An Overview of Parsing

Why are Grammars to formally describe Languages


Important ?
1. Precise, easy-to-understand representations
2. Compiler-writing tools can take grammar and
generate a compiler
3. allow language to be evolved (new statements,
changes to statements, etc.) Languages are not
static, but are constantly upgraded to add new
features or fix “old” ones
ADA  ADA9x, C++ Adds: Templates, exceptions,
How do grammars relate to parsing process ?
CH4.3
Parsing During Compilation

regular
expressions errors

lexical token rest of interm


source parser parse
program analyzer get next tree front end repres
token

symbol
• uses a grammar to check table • also technically part
structure of tokens or parsing
• produces a parse tree
• syntactic errors and • includes augmenting
recovery info on tokens in
• recognize correct syntax source, type checking,
• report errors semantic analysis
CH4.4
Parsing Responsibilities

Syntax Error Identification / Handling


Recall typical error types:
Lexical : Misspellings
Syntactic : Omission, wrong order of tokens
Semantic : Incompatible types
Logical : Infinite loop / recursive call
Majority of error processing occurs during syntax analysis
NOTE: Not all errors are identifiable !! Which ones?

CH4.5
Key Issues – Error Processing

• Detecting errors
• Finding position at which they occur
• Clear / accurate presentation
• Recover (pass over) to continue and find later
errors
• Don’t impact compilation of “correct”
programs

CH4.6
What are some Typical Errors ?
#include<stdio.h>
int f1(int v)
As reported by MS VC++
{ int i,j=0;
for (i=1;i<5;i++) 'f2' undefined; assuming extern returning int
{ j=v+f2(i) } syntax error : missing ';' before ‘}‘
syntax error : missing ';' before ‘return‘
return j; }
fatal error : unexpected end of file found
int f2(int u)
{ int j;
j=u+f1(u*u)
return j; } Which are “easy” to
int main() recover from? Which
{ int i,j=0; are “hard” ?
for (i=1;i<10;i++)
{ j=j+i*I; printf(“%d\n”,i);
printf("%d\n",f1(j));
return 0;
}
CH4.7
Error Recovery Strategies

Panic Mode– Discard tokens until a “synchro” token is


found ( end, “;”, “}”, etc. )
-- Decision of designer
-- Problems:
skip input miss declaration – causing more errors
miss errors in skipped material
-- Advantages:
simple suited to 1 error per statement
Phrase Level – Local correction on input
-- “,” ”;” – Delete “,” – insert “;”
-- Also decision of designer
-- Not suited to all situations
-- Used in conjunction with panic mode to
allow less input to be skipped
CH4.8
Error Recovery Strategies – (2)
Error Productions:
-- Augment grammar with rules
-- Augment grammar used for parser
construction / generation
-- example: add a rule for
:= in C assignment statements
Report error but continue compile
-- Self correction + diagnostic messages

Global Correction:
-- Adding / deleting / replacing symbols is
chancy – may do many changes !
-- Algorithms available to minimize changes
costly - key issues

CH4.9
Motivating Grammars

• Regular Expressions
 Basis of lexical analysis
 Represent regular languages
• Context Free Grammars
 Basis of parsing
 Represent language constructs
 Characterize context free languages

Reg. Lang. CFLs

EXAMPLE: anbn , n  1 : Is it regular ?


CH4.10
Context Free Grammars :
Concepts & Terminology
Definition: A Context Free Grammar, CFG, is described
by T, NT, S, PR, where:
T: Terminals / tokens of the language
NT: Non-terminals to denote sets of strings generated by
the grammar & in the language
S: Start symbol, SNT, which defines all strings of the
language
PR: Production rules to indicate how T and NT are
combined to generate valid strings of the language.
PR: NT  (T | NT)*
Like a Regular Expression / DFA / NFA, a Context Free
Grammar is a mathematical model
CH4.11
Context Free Grammars : A First Look
assign_stmt  id := expr ;
expr  expr operator term
expr  term
term  id What do “blue”
term  real symbols represent?
term  integer
operator  +
operator  -

Derivation: A sequence of grammar rule applications


and substitutions that transform a starting non-term
into a sequence of terminals / tokens.
Simply stated: Grammars / production rules allow us to
“rewrite” and “identify” correct syntax.
CH4.12
Derivation

Let’s derive: id := id + real – integer ;


using production:
assign_stmt assign_stmt  id := expr ;
 id := expr ; expr  expr operator term
id := expr operator term; expr  expr operator term
id := expr operator term operator term; expr  term
 id := term operator term operator term; term  id
 id := id operator term operator term; operator  +
 id := id + term operator term; term  real
 id := id + real operator term; operator  -
 id := id + real - term; term  integer
 id := id + real - integer;

CH4.13
Example Grammar

expr  expr op expr


expr  ( expr )
expr  - expr
Black : NT
expr  id
op  + Blue : T
op  - expr : S
op  * 9 Production rules
op  /
op  

To simplify / standardize notation, we offer a


synopsis of terminology.
CH4.14
Example Grammar - Terminology

Terminals: a,b,c,+,-,punc,0,1,…,9, blue strings


Non Terminals: A,B,C,S, black strings
T or NT: X,Y,Z
Strings of Terminals: u,v,…,z in T*
Strings of T / NT:  ,  in ( T  NT)*
Alternatives of production rules:
A 1; A 2; …; A k;  A  1 | 2 | … | 1
First NT on LHS of 1st production rule is designated as
start symbol !

E  E A E | ( E ) | -E | id
A+|-|*| / |
CH4.15
Grammar Concepts

A step in a derivation is zero or one action that


replaces a NT with the RHS of a production rule.
EXAMPLE: E  -E (the  means “derives” in one
step) using the production rule: E  -E
EXAMPLE: E  E A E  E * E  E * ( E )
DEFINITION:  derives in one step
+
 derives in  one step

*
derives in  zero steps

EXAMPLES:  A      if A  is a production rule


1  2 …  n  1 
*
n ;  
*
 for all 
If  
*  and    then  
* 
CH4.16
How does this relate to Languages?
+
Let G be a CFG with start symbol S. Then S W
(where W has no non-terminals) represents the language
generated by G, denoted L(G). So WL(G)  S W. +

W : is a sentence of G

When S  (and  may have NTs) it is called a


sentential form of G.

EXAMPLE: id * id is a sentence
Here’s the derivation:
E  E A E E * E  id * E  id * id
Sentential forms
E
*
id * id CH4.17
Other Derivation Concepts

Leftmost: Replace the leftmost non-terminal symbol


E  E A E  id A E  id * E  id * id
lm lm lm lm

Rightmost: Replace the rightmost non-terminal symbol


 EAE 
E rm rm
E A id 
rm
E * id 
rm
id * id

Important Notes: A


If A     , what’s true about  ?
lm
If A     , what’s true about  ?
rm

Derivations: Actions to parse input can be represented


pictorially in a parse tree. CH4.18
Examples of LM / RM Derivations

E  E A E | ( E ) | -E | id
A+|-|*| / |

A leftmost derivation of : id + id * id

A rightmost derivation of : id + id * id

CH4.19
Derivations & Parse Tree
E
E EAE E A E
E
E*E E A E

*
E

 id * E E A E

id *
E

 id * id E A E

id * id
CH4.20
Parse Trees and Derivations

Consider the expression grammar:


E  E+E | E*E | (E) | -E | id
Leftmost derivations of id + id * id
E
E
EE+E E + E  id + E E + E
E + E
id

id + E  id + E * E E + E

id E * E

CH4.21
Parse Tree & Derivations - continued

id + E * E  id + id * E E + E

id E * E

id

id + id * E  id + id * id E + E

id E * E

id id

CH4.22
Alternative Parse Tree & Derivation

EE*E
E
E+E*E
E * E
 id + E * E E + E id
 id + id * E id id
 id + id * id

WHAT’S THE ISSUE HERE ?


Two distinct leftmost derivations!

CH4.23
Resolving Grammar Problems/Difficulties

Regular Expressions : Basis of Lexical Analysis


Reg. Expr.  generate/represent regular languages
Reg. Languages  smallest, most well defined class
of languages
Context Free Grammars: Basis of Parsing
CFGs  represent context free languages
CFLs  contain more powerful languages

anbn – CFL that’s


Reg. Lang. CFLs
not regular.

CH4.24
Resolving Problems/Difficulties – (2)

Since Reg. Lang.  Context Free Lang., it is possible


to go from reg. expr. to CFGs via NFA.

Recall: (a | b)*abb

a
start a b b
0 1 2 3

CH4.25
Resolving Problems/Difficulties – (3)
Construct CFG as follows:
1. Each State I has non-terminal Ai : A0, A1, A2, A3

2. If i a j then Ai a Aj : A0 aA0, A0 aA1


: A0 bA0, A1 bA2
3. If i b j then Ai bAj
: A2 bA3
4. If I is an accepting state, Ai  : A3  

5. If I is a starting state, Ai is the start symbol : A0

T={a,b}, NT={A0, A1, A2, A3}, S = A0


PR ={ A0 aA0 | aA1 | bA0 ;
A1  bA2 ; a
A2  bA3 ; start
0 a 1 b 2 b 3
A3   }
b
CH4.26
How Does This CFG Derive Strings ?

a
start a b b
0 1 2 3

b
vs.
A0 aA0, A0 aA1
A0 bA0, A1 bA2
A2 bA3, A3 

How is abaabb derived in each ?

CH4.27
Regular Expressions vs. CFGs

Regular expressions for lexical syntax


1. CFGs are overkill, lexical rules are quite
simple and straightforward
2. REs – concise / easy to understand
3. More efficient lexical analyzer can be
constructed
4. RE for lexical analysis and CFGs for parsing
promotes modularity, low coupling & high
cohesion.
CFGs : Match tokens “(“ “)”, begin / end, if-then-else,
whiles, proc/func calls, …
Intended for structural associations between tokens !
Are tokens in correct order ?
CH4.28
Resolving Grammar Difficulties :
Motivation
The structure of a grammar affects the compiler design
recall “syntax-directed” translation
• Different parsing approaches have different needs
Top-Down vs. Bottom-Up

redesigning a grammar may assist in producing


better parsing methods.

- ambiguity
- -moves
Grammar
- cycles
Problems
- left recursion
- left factoring
CH4.29
Resolving Problems: Ambiguous
Grammars
Consider the following grammar segment:
stmt  if expr then stmt
| if expr then stmt else stmt
| other (any other statement)
What’s problem here ?
Let’s consider a simple parse tree:

stmt

if expr then stmt else stmt

E1 S1 expr then stmt else stmt


if
Else must match to previous
E2 S2 S3
then. Structure indicates parse
subtree for expression.
CH4.30
Example : What Happens with this string?

If E1 then if E2 then S1 else S2

How is this parsed ?

if E1 then if E1 then
if E2 then if E2 then
S1 vs. S1
else else
S2 S2

What’s the issue here ?

CH4.31
Parse Trees for Example

Form 1: stmt

if expr then stmt

E1 then stmt else stmt


if expr

E2 S1 S2

Form 2:
stmt

if expr then stmt else stmt

E1 expr then stmt S2


if

What’s the issue here ? E2 S1


CH4.32
Removing Ambiguity

Take Original Grammar:


stmt  if expr then stmt
| if expr then stmt else stmt
| other (any other statement)

Or to write more simply:

SiEtS
| iEtSeS
| s
Ea
The problem string: i a t i a t s e s

CH4.33
Revise to remove ambiguity:

SiEtS SM|U
| iEtSeS M iEtMeM| s
| s UiEtS|iEtMeU
Ea Ea

Try the above on iatiatses

stmt  matched_stmt | unmatched_stmt


matched_stmt  if expr then matched_stmt else matched_stmt | other
unmatched_stmt  if expr then stmt
| if expr then matched_stmt else unmatched_stmt
CH4.34
Resolving Difficulties : Left Recursion

A left recursive grammar has rules that support the


derivation : A  +
A, for some .

Top-Down parsing can’t reconcile this type of grammar,


since it could consistently make choice which wouldn’t
allow termination.
A  A  A  A … etc. A A | 

Take left recursive grammar:


A  A | 
To the following:
A  A’
A’  A’ | 
CH4.35
Why is Left Recursion a Problem ?
Consider: Derive : id + id + id
EE+T | T EE+T
TT*F | F
F  ( E ) | id

How can left recursion be removed ?


EE+T | T What does this generate?
EE+TT+T
EE+TE+T+TT+T+T

How does this build strings ?
What does each string have to start with ?
CH4.36
Resolving Difficulties : Left Recursion (2)

Informal Discussion:

Take all productions for A and order as:


A  A1 | A2 | … | Am | 1 | 2 | … | n
Where no i begins with A.
Now apply concepts of previous slide:
A  1A’ | 2A’ | … | nA’
A’  1A’ | 2A’| … | m A’ | 
For our example:
E  TE’
EE+T | T E’  + TE’ | 
TT*F | F T  FT’
T’  * FT’ | 
F  ( E ) | id F  ( E ) | id

CH4.37
Resolving Difficulties : Left Recursion (3)
Problem: If left recursion is two-or-more levels deep,
this isn’t enough
S  Aa | b
S  Aa  Sda
A  Ac | Sd | 

Algorithm:
Input: Grammar G with ordered Non-Terminals A1, ..., An
Output: An equivalent grammar with no left recursion
1. Arrange the non-terminals in some order A1=start NT,A2,…An
2. for i := 1 to n do begin
for j := 1 to i – 1 do begin
replace each production of the form Ai  Aj
by the productions Ai  1 | 2 | … | k
where Aj  1|2|…|k are all current Aj productions;
end
eliminate the immediate left recursion among Ai productions
end CH4.38
Using the Algorithm

Apply the algorithm to: A1  A2a | b| 


A2  A2c | A1d
i=1
For A1 there is no left recursion

i=2
for j=1 to 1 do
Take productions: A2  A1 and replace with
A2  1  | 2  | … | k |
where A1 1 | 2 | … | k are A1 productions
in our case A2  A1d becomes A2  A2ad | bd | d
What’s left: A1 A2a | b | 
Are we done ?
A2  A2 c | A2 ad | bd | d
CH4.39
Using the Algorithm (2)

No ! We must still remove A2 left recursion !


A1 A2a | b | 
A2  A2 c | A2 ad | bd | d

Recall:
A  A1 | A2 | … | Am | 1 | 2 | … | n

A  1A’ | 2A’ | … | nA’


A’  1A’ | 2A’| … | m A’ | 

Apply to above case. What do you get ?

CH4.40
Removing Difficulties : -Moves

Transformation: In order to remove A  find all


rules of the form B uAv and add the rule B uv to
the grammar G.
Why does this work ?
Examples:
E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id

A1  A2 a | b
A2  bd A2’ | A2’
A2’  c A2’ | bd A2’ | 
CH4.41
Removing Difficulties : Cycles

How would cycles be removed ?


Make sure every production is adding
some terminal(s) (except a single  -
production in the start NT)…
e.g.
Transformation:
S  SS | ( S ) |  substitute A uBv by
Has a cycle: S  SS  S the productions A
u1v| ... | unv
S assuming that all the
productions for B are
Transform to: B 1| ... | n
SS(S)|(S)|
CH4.42
Removing Difficulties : Left Factoring

Problem : Uncertain which of 2 rules to choose:


stmt  if expr then stmt else stmt
| if expr then stmt
When do you know which one is valid ?
What’s the general form of stmt ?
A  1 | 2  : if expr then stmt
1: else stmt 2 : 

Transform to: EXAMPLE:


A   A’ stmt  if expr then stmt rest
A’  1 | 2 rest  else stmt | 
CH4.43
Resolving Grammar Problems

Note: Not all aspects of a programming language can


be represented by context free grammars / languages.
Examples:
1. Declaring ID before its use
2. Valid typing within expressions
3. Parameters in definition vs. in call
These features are call context-sensitive and define yet
another language class, CSL.

Reg. Lang. CFLs CSLs

CH4.44
Context-Sensitive Languages - Examples

Examples:
L1 = { wcw | w is in (a | b)* } : Declare before use
L2 = { an bm cn dm | n  1, m  1 }
an bm : formal parameter
cn dm : actual parameter

What does it mean when L is context-sensitive ?

CH4.45
How do you show a Language is a CFL?

L3 = { w c wR | w is in (a | b)* }

L4 = { an bm cm dn | n  1, m  1 }

L5 = { an bn cm dm | n  1, m  1 }

L6 = { an bn | n  1 }

CH4.46
Solutions

L3 = { w c wR | w is in (a | b)* }
SaSa | bSb | c

L4 = { an bm cm dn | n  1, m  1 }
S aSd | aAd
A  b A c | bc
L5 = { an bn cm dm | n  1, m  1 }
S  XY
X  a X b | ab
Y  c Y d | cd
L6 = { an bn | n  1 }
S  a S b | ab
CH4.47
Example of CS Grammar

L2 = { an bm cn dm | n  1, m  1 }

SaAcH
A aAc | B
BbBD|bD
DccD
DDHDHd
cDHcd

CH4.48
Top-Down Parsing

• Identify a leftmost derivation for an input string


• Why ?
• By always replacing the leftmost non-terminal symbol via a
production rule, we are guaranteed of developing a parse tree in a
left-to-right fashion that is consistent with scanning the input.
• A  aBc  adDc  adec (scan a, scan d, scan e, scan c - accept!)
• Recursive-descent parsing concepts
•Predictive parsing
• Recursive / Brute force technique
• non-recursive / table driven
• Error recovery
• Implementation
CH4.49
Top-Down Parsing

 From Grammar to Parser, take I

CH4.50
Recursive Descent Parsing

• General category of Parsing Top-Down


• Choose production rule based on input symbol
• May require backtracking to correct a wrong choice.
• Example: S cAd
input: cad
A  ab | a
S
cad S cad
c A d
c A d
a b Problem: backtrack

S S
S cad
cad cad
c d c A d
A c A d
a b a
a
CH4.51
Top-Down Parsing

 From Grammar to Parser, take II

CH4.52
Predictive Parsing
•Backtracking is bad!
•To eliminate backtracking, what must we do/be sure of for grammar?
• no left recursion
• apply left factoring
• (frequently) when grammar satisfies above conditions:
current input symbol in conjunction with current non-terminal
uniquely determines the production that needs to be applied.
• Utilize transition diagrams:
For each non-terminal of the grammar do following:
1. Create an initial and final state
2. If A X1X2…Xn is a production, add path with edges X1, X2,
… , Xn
• Once transition diagrams have been developed, apply a
straightforward technique to algorithmicize transition diagrams with
procedure and possible recursion.

CH4.53
Transition Diagrams
• Unlike lexical equivalents, each edge represents a token
•Transition implies: if token, match input else call proc
• Recall earlier grammar and its associated transition diagrams
E  TE’ T  FT’ F  ( E ) | id
E’  + TE’ |  T’  * FT’ | 
T E’
E: 0 1 2 How are transition
diagrams used ?
+ T E’
E’: 3 4 5 6
Are -moves a
 problem ?
F T’
T: 7 8 9 Can we simplify
transition diagrams ?
* F T’
T’: 10 11 12 13 Why is simplification
 critical ?
( E )
F: 14 15 16 17
id CH4.54
How are Transition Diagrams Used ?
TD_E’()
main()
{
{ token = get_token();
TD_E(); if token = ‘+’ then
} { TD_T(); TD_E’(); }
} What
happened to
TD_F()
TD_E() { -moves?
{ token = get_token();
if token = ‘(’ then … “else
TD_T();
{ TD_E(); match(‘)’); } unget()
TD_E’(); else and
} if token.value <> id then
{error + EXIT}
terminate”
else
TD_T() ...
}
{ NOTE: not
TD_F(); TD_E’() all error
{
TD_T’(); conditions
token = get_token();
} if token = ‘*’ then have been
{ TD_F(); TD_T’(); } represented.
}

CH4.55
How can Transition Diagrams be
Simplified ?
+ T E’
E’: 3 4 5 6

CH4.56
How can Transition Diagrams be
Simplified ? (2)
+ T E’
E’: 3 4 5 6


+ T
E’: 3 4 5


6

CH4.57
How can Transition Diagrams be
Simplified ? (3)
+ T E’
E’: 3 4 5 6

T

+ T +
E’: 3 4 5 E’: 3 4

 
6 6

CH4.58
How can Transition Diagrams be
Simplified ? (4)
+ T E’
E’: 3 4 5 6

T

+ T +
E’: 3 4 5 E’: 3 4

 
6 6

T E’
E: 0 1 2

CH4.59
How can Transition Diagrams be
Simplified ? (5)
+ T E’
E’: 3 4 5 6

T

+ T +
E’: 3 4 5 E’: 3 4

 
6 6

T E’
E: 0 1 2

T +
E: 0 3 4


6
CH4.60
Additional Transition Diagram
Simplifications
• Similar steps for T and T’
• Simplified Transition diagrams:
*

F  Why is simplification
T: 7 10 13
important ?
F

* How does code change?


T’: 10 11


13

( E )
F: 14 15 16 17
id
CH4.61
Top-Down Parsing

 From Grammar to Parser, take III

CH4.62
Motivating Table-Driven Parsing

1. Left to right scan input


2. Find leftmost derivation
Terminator
Grammar: E  TE’
E’  +TE’ |  Input : id + id $
T  id
Derivation: E 

Processing Stack:

CH4.63
Non-Recursive / Table Driven
a + b $ Input (String + terminator)

Stack X Predictive Parsing Output


NT + T Program
Y
symbols of
CFG Z What actions parser
Parsing Table should take based on
Empty stack $ M[A,a] stack / input
symbol

General parser behavior: X : top of stack a : current input


1. When X=a = $ halt, accept, success
2. When X=a  $ , POP X off stack, advance input, go to 1.
3. When X is a non-terminal, examine M[X,a]
if it is an error  call recovery routine
if M[X,a] = {X  UVW}, POP X, PUSH W,V,U
DO NOT expend any input CH4.64
Algorithm for Non-Recursive Parsing
Set ip to point to the first symbol of w$;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is terminal or $ then
Input pointer
if X=a then
pop X from the stack and advance ip
else error()
else /* X is a non-terminal */
if M[X,a] = XY1Y2…Yk then begin
pop X from stack;
push Yk, Yk-1, … , Y1 onto stack, with Y1 on top
output the production XY1Y2…Yk
end May also execute other code
based on the production used
else error()
until X=$ /* stack is empty */

CH4.65
Example

E  TE’
E’  + TE’ | 
T  FT’ Our well-worn example !
T’  * FT’ | 
F  ( E ) | id

Table M

Non- INPUT SYMBOL


terminal
id + * ( ) $
E ETE’ ETE’
E’ E’+TE’ E’ E’
T TFT’ TFT’
T’ T’ T’*FT’ T’ T’
F Fid F(E)
CH4.66
Trace of Example
STACK INPUT OUTPUT

CH4.67
Trace of Example
STACK INPUT OUTPUT
$E id + id * id$
$E’T id + id * id$ E TE’
$E’T’F id + id * id$ T FT’
$E’T’id id + id * id$ F  id
$E’T’ + id * id$
$E’ + id * id$ T’   Expend Input
$E’T+ + id * id$ E’  +TE’
$E’T id * id$
$E’T’F id * id$ T FT’
$E’T’id id * id$ F  id
$E’T’ * id$
$E’T’F* * id$ T’  *FT’
$E’T’F id$
$E’T’id id$ F  id
$E’T’ $
$E’ $ T’  
$ $ E’  

CH4.68
Leftmost Derivation for the Example

The leftmost derivation for the example is as follows:

E  TE’  FT’E’  id T’E’  id E’  id + TE’  id + FT’E’


 id + id T’E’  id + id * FT’E’  id + id * id T’E’
 id + id * id E’  id + id * id

CH4.69
What’s the Missing Puzzle Piece ?
Constructing the Parsing Table M !
1st : Calculate First & Follow for Grammar
2nd: Apply Construction Algorithm for Parsing Table
( We’ll see this shortly )

Basic Tools:
First: Let  be a string of grammar symbols. First() is the set
that includes every terminal that appears leftmost in  or
in any string originating from .
NOTE: If   * , then  is First( ).

Follow: Let A be a non-terminal. Follow(A) is the set of terminals


a that can appear directly to the right of A in some
sentential form. (S 
* Aa, for some  and ).
NOTE: If S  * A, then $ is Follow(A). CH4.70
Motivation Behind First & Follow
Is used to help find the appropriate reduction to follow
First: given the top-of-the-stack non-terminal and the current
input symbol.
Example: If A   , and a is in First(), then when
a=input, replace A with  (in the stack).
( a is one of first symbols of , so when A is on the stack
and a is input, POP A and PUSH .

Follow: Is used when First has a conflict, to resolve choices, or


when First gives no suggestion. When   *  or   ,
then what follows A dictates the next choice to be made.
Example: If A   , and b is in Follow(A ), then when 
 * and b is an input character, then we expand A with
 , which will eventually expand to , of which b follows!
(   : i.e., First( ) contains .)
*
CH4.71
An example.
STACK INPUT OUTPUT
$S abbd$

SaBCd
B  CB |  | S a
Cb

CH4.72
Computing First(X) :
All Grammar Symbols
1. If X is a terminal, First(X) = {X}
2. If X  is a production rule, add  to First(X)
3. If X is a non-terminal, and X Y1Y2…Yk is a production rule
Place First(Y1) in First(X)
*
if Y1 , Place First(Y2) in First(X)
if Y2 
* , Place First(Y3) in First(X)

* ,
if Yk-1  Place First(Yk) in First(X)
NOTE: As soon as Yi 
*  , Stop.

Repeat above steps until no more elements are added to any First(
) set.
Checking “Yj *  ?” essentially amounts to checking whether 
belongs to First(Yj)
CH4.73
Computing First(X) :
All Grammar Symbols - continued
Informally, suppose we want to compute
First(X1 X2 … Xn ) = First (X1) “+”
First(X2) if  is in First(X1) “+”
First(X3) if  is in First(X2) “+”

First(Xn) if  is in First(Xn-1)

Note 1: Only add  to First(X1 X2 … Xn) if 


is in First(Xi) for all i
Note 2: For First(X1), if X1 Z1 Z2 … Zm ,
then we need to compute First(Z1 Z2 … Zm) !

CH4.74
Example 1

Given the production rules:

S  i E t SS’ | a
S’  eS | 
E b

CH4.75
Example 1

Given the production rules:

S  i E t SS’ | a
S’  eS | 
E b

Verify that

First(S) = { i, a }
First(S’) = { e,  }
First(E) = { b }

CH4.76
Example 2
Computing First for: E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id

CH4.77
Example 2
Computing First for: E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id
First(TE’)
First(T) “+” First(E’)
First(E) * 
Not First(E’) since T 
First(T)

First(F) “+” First(T’) First(F) Not First(T’) since F 


* 

First((E)) “+” First(id) “(“ and “id”

Overall: First(E) = { ( , id } = First(F)


First(E’) = { + ,  } First(T’) = { * ,  }
First(T)  First(F) = { ( , id }
CH4.78
Computing Follow(A) :
All Non-Terminals
1. Place $ in Follow(S), where S is the start symbol and $
signals end of input
2. If there is a production A B, then everything in
First() is in Follow(B) except for .
3. If A B is a production, or A B and   * 
(First() contains  ), then everything in Follow(A) is in
Follow(B)
(Whatever followed A must follow B, since nothing
follows B from the production rule)

We’ll calculate Follow for two grammars.

CH4.79
The Algorithm for Follow – pseudocode
1. Initialize Follow(X) for all non-terminals X
to empty set. Place $ in Follow(S), where S is the start
NT.
2. Repeat the following step until no modifications are
made to any Follow-set
For any production X  X1 X2 … Xm
For j=1 to m,
if Xj is a non-terminal then:
Follow(Xj)=Follow(Xj)(First(Xj+1,…,Xm)-{});
If First(Xj+1,…,Xm) contains  or Xj+1,…,Xm= 
then Follow(Xj)=Follow(Xj) Follow(X);
CH4.80
Computing Follow : 1st Example
Recall: S  i E t SS’ | a First(S) = { i, a }
S’  eS |  First(S’) = { e,  }
E b First(E) = { b }

CH4.81
Computing Follow : 1st Example
Recall: S  i E t SS’ | a First(S) = { i, a }
S’  eS |  First(S’) = { e,  }
E b First(E) = { b }

Follow(S) – Contains $, since S is start symbol


Since S  i E t SS’ , put in First(S’) – not 
Since S’ 
* , Put in Follow(S)

Since S’  eS, put in Follow(S’) So…. Follow(S) = { e, $ }

Follow(S’) = Follow(S) HOW?

Follow(E) = { t }

CH4.82
Example 2
Compute Follow for: E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id

CH4.83
Example 2
Compute Follow for: E  TE’
E’  + TE’ | 
T  FT’
T’  * FT’ | 
F  ( E ) | id

First Follow
E ( id E $)
E’ + E’ $)
T ( id T +$)
T’ * T’ +$)
F ( id F +*$)

CH4.84
Constructing Parsing Table

Algorithm:
Table has one row per non-terminal / one column per
terminal (incl. $ )
1. Repeat Steps 2 & 3 for each rule A
2. Terminal a in First()? Add A to M[A, a ]
3.1  in First()? Add A  to M[A, b ] for all
terminals b in Follow(A).
3.2  in First() and $ in Follow(A)? Add A  to
M[A, $ ]
4. All undefined entries are errors.

CH4.85
Constructing Parsing Table – Example 1
S  i E t SS’ | a First(S) = { i, a } Follow(S) = { e, $ }
S’  eS |  First(S’) = { e,  } Follow(S’) = { e, $ }
E b First(E) = { b } Follow(E) = { t }

CH4.86
Constructing Parsing Table – Example 1
S  i E t SS’ | a First(S) = { i, a } Follow(S) = { e, $ }
S’  eS |  First(S’) = { e,  } Follow(S’) = { e, $ }
E b First(E) = { b } Follow(E) = { t }
S  i E t SS’ Sa Eb
First(i E t SS’)={i} First(a) = {a} First(b) = {b}

S’  eS S
First(eS) = {e} First() = {} Follow(S’) = { e, $ }

Non- INPUT SYMBOL


terminal a b e i t $
S S a S iEtSS’
S’ S’ 
S’ eS S 
E E b
CH4.87
Constructing Parsing Table – Example 2
E  TE’ First(E,F,T) = { (, id } Follow(E,E’) = { ), $}
E’  + TE’ | 
T  FT’ First(E’) = { +,  } Follow(F) = { *, +, ), $ }
T’  * FT’ |  First(T’) = { *,  } Follow(T,T’) = { +, ) , $}
F  ( E ) | id

CH4.88
Constructing Parsing Table – Example 2
E  TE’ First(E,F,T) = { (, id } Follow(E,E’) = { ), $}
E’  + TE’ | 
T  FT’ First(E’) = { +,  } Follow(F) = { *, +, ), $ }
T’  * FT’ |  First(T’) = { *,  } Follow(T,T’) = { +, ) , $}
F  ( E ) | id

Expression Example: E  TE’ : First(TE’) = First(T) = { (, id }


M[E, ( ] : E  TE’
by rule 2
M[E, id ] : E  TE’
(by rule 2) E’  +TE’ : First(+TE’) = + : M[E’, +] : E’  +TE’
(by rule 3) E’   :  in First( ) T’   :  in First( )
M[E’, )] : E’   (3.1) M[T’, +] : T’   (3.1)
M[E’, $] : E’   (3.2) M[T’, )] : T’   (3.1)
(Due to Follow(E’) M[T’, $] : T’   (3.2)

CH4.89
LL(1) Grammars
L : Scan input from Left to Right
L : Construct a Leftmost Derivation
1 : Use “1” input symbol as lookahead in conjunction
with stack to decide on the parsing action
LL(1) grammars == they have no multiply-defined
entries in the parsing table.
Properties of LL(1) grammars:
• Grammar can’t be ambiguous or left recursive
• Grammar is LL(1) when A 
1.  &  do not derive strings starting with the
same terminal a
2. Either  or  can derive , but not both.
Note: It may not be possible for a grammar to be
manipulated into an LL(1) grammar
CH4.90
Error Recovery
When Do Errors Occur? Recall Predictive Parser Function:

a + b $ Input

Stack X Predictive Parsing Output


Program
Y
Z
Parsing Table
$ M[A,a]

1. If X is a terminal and it doesn’t match input.


2. If M[ X, Input ] is empty – No allowable actions
Consider two recovery techniques:
A. Panic Mode
B. Phrase-level Recovery CH4.91
Panic-Mode Recovery
 Assume a non-terminal on the top of the stack.
 Idea: skip symbols on the input until a token in a selected
set of synchronizing tokens is found.
 The choice for a synchronizing set is important.
 some ideas:
 define the synchronizing set of A to be FOLLOW(A).
then skip input until a token in FOLLOW(A) appears
and then pop A from the stack. Resume parsing...
 add symbols of FIRST(A) into synchronizing set. In
this case we skip input and once we find a token in
FIRST(A) we resume parsing from A.
 Productions that lead to  if available might be used.
 If a terminal appears on top of the stack and does not match
to the input == pop it and and continue parsing (issuing an
error message saying that the terminal was inserted).

CH4.92
Panic Mode Recovery, II

General Approach: Modify the empty cells of the Parsing Table.


1. if M[A,a] = {empty} and a belongs to Follow(A) then we set
M[A,a] = “synch”
Error-recovery Strategy :
If A=top-of-the-stack and a=current-input,
1. If A is NT and M[A,a] = {empty} then skip a from the input.
2. If A is NT and M[A,a] = {synch} then pop A.
3. If A is a terminal and A!=a then pop token (essentially inserting
it).

CH4.93
Revised Parsing Table / Example

Non- INPUT SYMBOL


terminal
id + * ( ) $
E ETE’ ETE’
E’ E’+TE’ E’ E’
T TFT’ TFT’
T’ T’ T’*FT’ T’ T’
F Fid F(E)

From Follow sets. Pop Skip input symbol


top of stack NT
“synch” action

CH4.94
Revised Parsing Table / Example(2)
STACK INPUT Remark
$E + id * + id$ error, skip +
$E id * + id$
$E’T id * + id$
$E’T’F id * + id$
$E’T’id id * + id$ Possible
$E’T’ * + id$ Error Msg:
$E’T’F* * + id$ “Misplaced +
I am skipping it”
$E’T’F + id$ error, M[F,+] = synch
$E’T’ + id$ F has been popped
$E’ + id$
$E’T+ + id$
$E’T id$
$E’T’F id$ Possible
$E’T’id id$ Error Msg:
$E’T’ $ “Missing Term”
$E’ $
$ $

CH4.95
Writing Error Messages
 Keep input counter(s)
 Recall: every non-terminal symbolizes an abstract language
construct.
 Examples of Error-messages for our usual grammar
 E = means expression.
 top-of-stack is E, input is +
“Error at location i, expressions cannot start with a ‘+’” or
“error at location i, invalid expression”
 Similarly for E, *
 E’= expression ending.
 Top-of-stack is E’, input is * or id
“Error: expression starting at j is badly formed at location i”
 Requires: every time you pop an ‘E’ remember the location

CH4.96
Writing Error-Messages, II
 Messages for Synch Errors.
 Top-of-stack is F input is +
 “error at location i, expected
summation/multiplication term missing”
 Top-of-stack is E input is )
 “error at location i, expected expression missing”

CH4.97
Writing Error Messages, III
 When the top-of-the stack is a terminal that does
not match…
 E.g. top-of-stack is id and the input is +
 “error at location i: identifier expected”
 Top-of-stack is ) and the input is terminal other
than )
 Every time you match an ‘(‘
push the location of ‘(‘ to a “left parenthesis” stack.
– this can also be done with the symbol stack.
 When the mismatch is discovered look at the left
parenthesis stack to recover the location of the
parenthesis.
 “error at location i: left parenthesis at location m has
no closing right parenthesis”
– E.g. consider ( id * + (id id) $
CH4.98
Incorporating Error-Messages to the Table
 Empty parsing table entries can now fill with the
appropriate error-reporting techniques.

CH4.99
Phrase-Level Recovery

• Fill in blanks entries of parsing table with error


handling routines that do not only report errors but may
also:
• change/ insert / delete / symbols into the stack and /
or input stream
• + issue error message
• Problems:
• Modifying stack has to be done with care, so as to
not create possibility of derivations that aren’t in
language
• infinite loops must be avoided
• Essentially extends panic mode to have more complete
error handling

CH4.100
How Would You Implement TD Parser
• Stack – Easy to handle. Write ADT to manipulate its contents
• Input Stream – Responsibility of lexical analyzer
• Key Issue – How is parsing table implemented ?
One approach: Assign unique IDS

Non- INPUT SYMBOL


terminal
id + * ( ) $
E ETE’ ETE’ synch synch
E’ E’+TE’ E’ E’
T TFT’ synch TFT’ synch synch
T’ T’ T’*FT’ T’ T’
F Fid synch synch F(E) synch synch

All rules have Also for blanks


Ditto for synch
unique IDs which handle
actions
errors
CH4.101
Revised Parsing Table:

Non- INPUT SYMBOL


terminal
id + * ( ) $
E 1 18 19 1 9 10
E’ 20 2 21 22 3 3
T 4 11 23 4 12 13
T’ 24 6 5 25 6 6
F 8 14 15 7 16 17

1 ETE’
2 E’+TE’ 9 – 17 : 18 – 25 :
3 E’ Sync Error
4 TFT’ Actions Handlers
5 T’*FT’
6 T’
7 F(E)
8 Fid
CH4.102
Revised Parsing Table: (2)

Each # ( or set of #s) corresponds to a procedure that:


• Uses Stack ADT
• Gets Tokens
• Prints Error Messages
• Prints Diagnostic Messages
• Handles Errors

CH4.103
How is Parser Constructed ?
One large CASE statement:
state = M[ top(s), current_token ]
switch (state)
{
case 1: proc_E_TE’( ) ;
break ; Combine  put
… in another switch
case 8: proc_F_id( ) ;
break ;
case 9: proc_sync_9( ) ;
Some sync
break ;
… actions may be
case 17: proc_sync_17( ) ; same
break ;
case 18: Some error
… Procs to handle errors handlers may be
case 25: similar
}
CH4.104
Final Comments – Top-Down Parsing

So far,
• We’ve examined grammars and language theory and
its relationship to parsing
• Key concepts: Rewriting grammar into an acceptable
form
• Examined Top-Down parsing:
Brute Force : Transition diagrams & recursion
Elegant : Table driven
• We’ve identified its shortcomings:
Not all grammars can be made LL(1) !
• Bottom-Up Parsing - Future
CH4.105
Bottom Up Parsing

CH4.106
Bottom Up Parsing
 “Shift-Reduce” Parsing
 Reduce a string to the start symbol of the grammar.
 At every step a particular substring is matched (in
left-to-right fashion) to the right side of some
production and then it is substituted by the non-
terminal in the left hand side of the production.
abbcde
Consider:
aAbcde
S  aABe
aAde
A  Abc | b aABe
Bd S

Rightmost Derivation:
S  aABe  aAde  aAbcde  abbcde
CH4.107
Handles
 Handle of a string = substring that matches the
RHS of some production AND whose reduction to
the non-terminal on the LHS is a step along the
reverse of some rightmost derivation.
 Formally: handle of a right sentential form
 is <A  , location of  in >
that satisfies the above property.
 i.e. A   is a handle of  at the location immediately
after the end of , if:
=> A =>
rm 
*
S rm

 A certain sentential form may have many different handles.


 Right sentential forms of a non-ambiguous grammar
have one unique handle [but many substrings that look like
handles potentially !].
CH4.108
Example

Consider:
S  aABe
A  Abc | b
Bd

S  aABe  aAde  aAbcde  abbcde

It follows that:
(S ) aABe is a handle of aABe in location 1.
(B ) d is a handle of aAde in location 3.
(A ) Abc is a handle of aAbcde in location 2.
(A ) b is a handle of abbcde in location 2.

CH4.109
Example, II

Grammar:
S  aABe
A  Abc | b
Bd
Consider aAbcde (it is a right sentential form)
Is [A  b, aAbcde] a handle?
if it is then there must be:
S rm … rm aAAbcde rm aAbcde

no way ever to get two consecutive


A’s in this grammar. => Impossible

CH4.110
Example, III

Grammar:
S  aABe
A  Abc | b
Bd
Consider aAbcde (it is a right sentential form)
Is [B  d, aAbcde] a handle?
if it is then there must be:
S rm … rm aAbcBe rm aAbcde
we try to obtain aAbcBe not a right
sentential form
S rm aABe ?? aAbcBe

CH4.111
Handle Pruning
 A rightmost derivation in reverse can be obtained
by “handle-pruning.”
 Apply this to the previous example.
S  aABe
A  Abc | b
Bd

abbcde Ab
aAbcde A  Abc
aAde Bd
aABe S  aABe
S

CH4.112
Handle Pruning, II
 Consider the cut of a parse-tree of a certain right
sentential form.
S

Left part Handle (only terminals here)


Viable prefix
CH4.113
Shift Reduce Parsing with a Stack
 The “big” problem : given the sentential form
locate the handle
 General Idea for S-R parsing using a stack:
using a stack:
1. “shift” input symbols into the stack until a
handle is found on top of it.
2. “reduce” the handle to the corresponding non-
terminal.
(other operations: “accept” when the input is
consumed and only the start symbol is on the stack,
also: “error”).
 Viable prefix: prefix of a right sentential form that
appears on the stack of a Shift-Reduce parser.
CH4.114
What happens with ambiguous grammars
Consider:
EE+E | E*E|
| ( E ) | id
Derive id+id*id
By two different Rightmost
derivations

CH4.115
Example
STACK INPUT Remark EE+E
$ id + id * id$ Shift | E*E
$ id + id * id$ Reduce by E  id
$E + id * id$
| ( E ) | id

CH4.116
Conflicts
 Conflicts [appear in ambiguous grammars]
either “shift/reduce” or “reduce/reduce”

 Another Example:

stmt  if expr then stmt


| if expr then stmt else stmt
| other (any other statement)
Stack Input
if … then else … Shift/ Reduce
conflict

CH4.117
More Conflicts
stmt  id ( parameter-list )
stmt  expr := expr
parameter-list  parameter-list , parameter | parameter
parameter  id
expr-list  expr-list , expr | expr
expr  id | id ( expr-list )

Consider the string A(I,J)


Corresponding token stream is id(id, id)
After three shifts:
Stack = id(id Input = , id)
Reduce/Reduce Conflict … what to do?
(it really depends on what is A,
an array? or a procedure? CH4.118
Removing Conflicts
 One way is to manipulate grammar.
 cf. what we did in the top-down approach to
transform a grammar so that it is LL(1).
 Nevertheless:
 We will see that shift/reduce and reduce/reduce
conflicts can be best dealt with after they are
discovered.
 This simplifies the design.

CH4.119
Introduction to LR Parsing

CH4.120
Example
Consider:
S  aABe
A  Abc | b
Bd
Rightmost Derivation of the string abbcde:
S  aABe  aAde  aAbcde  abbcde
The (unique) handle is underlined for each step.

A viable prefix is
(1) a string that equals a prefix of a right-sentential form up
to (and including) its unique handle.
(2) any prefix of a string that satisfies (1)
Examples: a, aA, aAd, aAbc, ab, aAb,…
Not viable prefixes: aAde, Abc, aAA,…
CH4.121
Shift/Reduce Parser
STACK INPUT Remark
$ abbcde$ SHIFT
$a bbcde$ SHIFT
$ab bcde$ REDUCE
$aA bcde$ SHIFT
$aAb cde$ SHIFT (?)
$aAbc de$ REDUCE
$aA de$ SHIFT
$aAd e$ REDUCE
$aAB e$ SHIFT
$aABe $ REDUCE
$S $ ACCEPT Observe: all
Strings in the
stack are viable
prefixes

CH4.122
When to shift? When to Reduce?
 Sometimes on top of the stack something appears
to be a handle (i.e., matches the RHS of a
production).
 But: maybe we have not shifted enough elements to
identify the (real) handle.
 Observe the correct sequence of Shift and Reduce
steps preserves the property that the stack IS a
viable prefix.
Example
$aAb cde$ Shift or Reduce?
 If we shift we obtain aAbc in the stack.
 Recall that Abc is a handle.
 Instead if we reduce we obtain aAA in the stack.
(this is NOT a viable prefix!!!)
CH4.123
When to Shift? When to Reduce? II
 In order to make shift/reduce decisions:
 We need to look to perhaps a few elements inside the
stack.
 We need to make sure that the way we modify the stack
preserves the “viable prefix condition.”
 For our previous example:
 Any b appearing to the right of “A” should not be
reduced.
 In fact we can come up with heuristic decisions based
on the grammar structure:
 A “b” is reduced only if it is to the right of “a”
 PROBLEM: what kind of information do we need to store
inside the stack so that we can make decisions as above just
by looking at the top element?

CH4.124
LR Parsing
 LR (left-to-right, rightmost derivation).
 LR(1) = 1 lookahead symbol.
 Use stack
 Stack should contain “more information” (in a
compressed form) compared to a Top-Down Table-
driven parser.
 LR(1):
 Decisions are taken looking at the top of the
stack + 1 input element.

CH4.125
Anatomy of an LR parser
a + b $ Input (String + terminator)
“States”
Stack
NT + T
symbols s LR Parsing Output
of CFG Program
X
s What actions parser
Parsing Table should take based on
X
action[.,.] goto[.,.] stack / input

General parser behavior: s : top of stack a : current input


1. If action[s,a]=“accept” halt, accept, success
2. If action[s,a]=“reduce by production A ” do the following:
2a. Pop 2*|| elements from the stack.
2b. Push A
2c. Push goto[s*,A]
3. If action[s,a]=“shift and goto state s*”
Shift; push s*
CH4.126
Example
1. S  aABe
action goto
2. A  Abc
3. A  b a b c d e $ S A B
4. B  d
0 s1 9
1 s3 2
2 s4 s8 5
3 r3 r3
4 s6
5 s7
6 r2 r2
7 r1
8 r4
9 acc
CH4.127
Example, II
STACK INPUT Remark
$0 abbcde$

CH4.128
Interesting Fact + LR Parsing Table
Construction Methods
HOW TO CONSTRUCT SUCH TABLES?
 The set of all viable prefixes is Regular.
 It is possible to write a DFA that recognizes it!
 Use the DFA as an aid to construction of the table.

Design Methodologies:
 SLR (simple LR)
“short table but limited methodology.”
 Canonical LR
“general methodology but big table.”
 LALR (lookahead LR)
“in between”

CH4.129
SLR Parsing

CH4.130
Items
 SLR (Simple LR parsing)
 DEF A LR(0) item is a production with a “marker.”
E.g. S  aA.Be
intuition: it indicates how much of a certain production
we have seen already (up to the point of the marker)
 CENTRAL IDEA OF SLR PARSING: construct a DFA
that recognizes viable prefixes of the grammar.
 Intuition: Shift/Reduce actions can be decided based on
this DFA (what we have seen so far & what are our next
options).
 Use “LR(0) Items” for the creation of this DFA.

CH4.131
Basic Operations
 Augmented Grammar:
E’  E
EE+T | T EE+T | T
TT*F | F TT*F | F
F  ( E ) | id F  ( E ) | id

CLOSURE OPERATION of a set of Items:

Function closure(I)
{ J=I;
repeat for each A .B in J and each produtcion
B of G such that B. is not in J: ADD B. to J
until … no more items can be added to J
return J
}
EXAMPLE consider I={ E’.E }

CH4.132
GOTO function
 Definition.
Goto(I,X) = closure of the set of all items
A X. where A .X belongs to I
 Intuitively: Goto(I,X) set of all items that
“reachable” from the items of I once X has been
“seen.”
 E.g. consider I={E’ E. , E E.+T} and compute
Goto(I,+)

Goto(I,+) = { E E+.T, T  .T * F , T  .F ,
F  .( E ) , F  .id }

CH4.133
The Canonical Collections of Items for G

Procedure Items(G’:augmented grammar)


{ C:={ closure [S’  .S] }
repeat
for each set of items I in C and each
grammar symbol X
such that goto(I,X) is not empty and not in C
do add goto(I,X) to C
until no more sets of items can be added to C
} I0 I1
E’  .E E’  E.
E  .E + T E  E. + T
E’  E …
EE+T | T E  .T
TT*F | F T  .T * F I2 I11
T .F E  T.
F  ( E ) | id
F  .( E ) T  T. * F
F  .id
CH4.134
The DFA For Viable Prefixes
 States = Canonical Collection of Sets of Items
 Transitions defined by the Goto Function.
 All states final except I0
E + T *
I0 I1 I2 I3 I7
F
I3

… Look p. 226
Intuition: Imagine an NFA with states all the items
in the grammar and transitions to be of the form:
“A .X” goes to “A X.” with an arrow
labeled “X”
Then the closure used in the Goto functions
Essentially transforms this NFA into the DFA above
CH4.135
Example
 S’  S
 S  aABe Start with I0 = closure(S’ .S)
 A  Abc
 A b
 Bd

CH4.136
Example, II
E’  E
EE+T | T
TT*F | F
F  ( E ) | id

start with I0 = closure(E’  E)

CH4.137
Relation to Parsing
 An item A  1.2 is valid for a viable prefix
1 if we have a rightmost derivation that yields
Aw which in one step yields 12w
 An item will be valid for many viable prefixes.
 Whether a certain item is valid for a certain viable
prefix it helps on our decision whether to shift or
reduce when  1 is on the stack.
 If 2 looks like we still need to shift.
 If 2= it looks like we should reduce A  1
 Not a total solution since two valid items may
tell us different things.
CH4.138
Sanity Check
 E+T* is a viable prefix (and the DFA will be at
state I7 after reading it)
 Indeed: E’=>E=>E+T=>E+T*F is a rightmost
derivation, T*F is the handle of E+T*F, thus
E+T*F is a viable prefix, thus E+T* is also.
 Examine state I7 … it contains
T  T*.F
F  .(E)
F  .id
 i.e., precisely the items valid for E+T*:
E’=>E=>E+T=>E+T*F
E’=>E=>E+T=>E+T*F=>E+T*(E)
E’=>E=>E+T=>E+T*F=>E+T*id
 There are no other valid items for for the viable
prefix E+T*
CH4.139
SLR Parsing Table Construction

Input: the augmented grammar G’


Output: The SLR Parsing table functions ACTION & GOTO

1. Construct C={I0,..,In} the collections of LR(0) items for G’


2. “State i” is constructed from Ii
If [A  .a] is in Ii and goto(Ii,a)=Ik then we set
ACTION[i,a] to be “shift k” (a is a terminal)
If [A  .] is in Ii then we set ACTION[i,a] to reduce “A”
for all a in Follow(A) --- (note: A is not S’)
If [S’  S.] is in Ii then we set ACTION[i,$] = accept
3. The goto transitions for state i are constructed as follows for
all A, if goto(Ii,A)=Ik then goto[i,A]=k
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the
set of items I0
CH4.140
Example.
I0 Since F  .( E ) is in I0
E’  .E
Goto(I0, E)=I1 And Goto(I0,( )=I4
E  .E + T
E  .T Goto(I0,T)=I2 we set ACTION(0, ( )=s4
T  .T * F Goto(I0,( )=I4
T .F Since E’  E. is in I1
F  .( E ) I4 We set ACTION(1,$)=acc
F  .id F  (.E)
E  .E + T
E  .T Since E  T. is in I2 and
I1 Follow(E)={$,+,) }
E’  E. T  .T * F
E  E. + T T .F We set ACTION(2,$)=rE T
F  .( E ) ACTION(2,+)=rE T
I2 F  .id ACTION(2,))=rE T
E  T.
T  T. * F Follow(T)=Follow(F)={ ) , + , * , $ }
CH4.141
Construct the whole table..
 (SLR table has no multiply defined labels).

CH4.142
LR(0) parsing : Another Example

 each state in the automaton represents a collection of


LR(0) items:
 an item is a rule from the grammar combined with “.”
to indicate where the parser currently is in the input
 eg: S’ ::= . S $ indicates that the parser is just beginning to
parse this rule and it expects to be able to parse S then $ next
 A whole automaton state looks like this:

1
collection of
S’ ::= . S $
LR(0) items
S ::= . ( L )
state number S ::= . x

• LR(1) states look very similar, it is just that the items contain some look-ahead info
CH4.143
LR(0) parsing

 To construct states, we begin with a particular LR(0) item


and construct its closure
 the closure adds more items to a set when the “.”
appears to the left of a non-terminal
 if the state includes X ::= s . Y s’ and Y ::= t is a rule
then the state also includes Y ::= . t

Grammar:

0. S’ ::= S $ 1
• S ::= ( L )
• S ::= x S’ ::= . S $
• L ::= S
• L ::= L , S

CH4.144
LR(0) parsing

 To construct states, we begin with a particular LR(0) item


and construct its closure
 the closure adds more items to a set when the “.”
appears to the left of a non-terminal
 if the state includes X ::= s . Y s’ and Y ::= t is a rule
then the state also includes Y ::= . t

Grammar:

0. S’ ::= S $ 1
• S ::= ( L ) S’ ::= . S $
• S ::= x S ::= . ( L )
• L ::= S
• L ::= L , S

CH4.145
LR(0) parsing

 To construct states, we begin with a particular LR(0) item


and construct its closure
 the closure adds more items to a set when the “.”
appears to the left of a non-terminal
 if the state includes X ::= s . Y s’ and Y ::= t is a rule
then the state also includes Y ::= . t

Grammar:

0. S’ ::= S $ 1
• S ::= ( L ) S’ ::= . S $
• S ::= x S ::= . ( L ) Full
• L ::= S S ::= . x Closure
• L ::= L , S

CH4.146
LR(0) parsing
 To construct an LR(0) automaton:
 start with start rule & compute initial state with
closure
 pick one of the items from the state and move “.”
to the right one symbol (as if you have just
parsed the symbol)
this creates a new item ...
... and a new state when you compute the closure of
the new item
mark the edge between the two states with:
– a terminal T, if you moved “.” over T
– a non-terminal X, if you moved “.” over X
 continue until there are no further ways to move “.” across items
and generate new states or new edges in the automaton
CH4.147
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x
• L ::= S
• L ::= L , S

S’ ::= . S $
S ::= . ( L )
S ::= . x

CH4.148
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x
• L ::= S
• L ::= L , S

S’ ::= . S $
S ::= . ( L )
S ::= . x

S’ ::= S . $

CH4.149
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x
• L ::= S
• L ::= L , S
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L )
S S ::= . x

S’ ::= S . $

CH4.150
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x
• L ::= S
• L ::= L , S
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L )
S S ::= . x

S’ ::= S . $

CH4.151
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x
• L ::= S
• L ::= L , S
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L )
S S ::= . x

S’ ::= S . $

CH4.152
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x
• L ::= S
• L ::= L , S
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S

S’ ::= S . $

CH4.153
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x
• L ::= S
• L ::= L , S
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S

S
S’ ::= S . $
L ::= S .

CH4.154
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x S ::= x .
• L ::= S
• L ::= L , S x
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S

S
S’ ::= S . $
L ::= S .

CH4.155
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x S ::= x .
• L ::= S
• L ::= L , S x
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S’ ::= S . $
S ::= ( L ) .
L ::= S .

CH4.156
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x S ::= x .
L ::= L , . S
• L ::= S
S ::= . ( L )
• L ::= L , S x
S ::= . x
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S’ ::= S . $
S ::= ( L ) .
L ::= S .

CH4.157
Grammar:

0. S’ ::= S $
• S ::= ( L )
• S ::= x S ::= x .
L ::= L , . S
• L ::= S S ::= . ( L )
• L ::= L , S x
S ::= . x
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S’ ::= S . $
S ::= ( L ) .
L ::= S .

CH4.158
L ::= L , S .
Grammar:

0. S’ ::= S $ S
• S ::= ( L )
• S ::= x S ::= x .
L ::= L , . S
• L ::= S
S ::= . ( L )
• L ::= L , S x
S ::= . x
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S’ ::= S . $
S ::= ( L ) .
L ::= S .

CH4.159
L ::= L , S .
Grammar:

0. S’ ::= S $ S
• S ::= ( L )
• S ::= x S ::= x .
L ::= L , . S
• L ::= S
S ::= . ( L )
• L ::= L , S x
( S ::= . x
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S’ ::= S . $
S ::= ( L ) .
L ::= S .

CH4.160
Grammar: L ::= L , S .
0. S’ ::= S $
• S ::= ( L ) S
• S ::= x
• L ::= S S ::= x .
x
• L ::= L , S L ::= L , . S
x S ::= . ( L )
x
( S ::= . x
(
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x L ::= . L , S
S ::= . ( L ) L S ::= ( L . )
S ::= . x L ::= L . , S
S
)
S
S’ ::= S . $
S ::= ( L ) .
L ::= S .

CH4.161
Assigning numbers to states:

Grammar: 9 L ::= L , S .

0. S’ ::= S $ S
• S ::= ( L ) 8
• S ::= x x
2 S ::= x .
• L ::= S L ::= L , . S
x
• L ::= L , S x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $
6 S ::= ( L ) .
7 L ::= S .

CH4.162
computing parse table

 State i contains X ::= s . $ ==> table[i,$] = a


 State i contains rule k: X ::= s . ==> table[i,T] = rk for all
terminals T
 Transition from i to j marked with T ==> table[i,T] = sj
 Transition from i to j marked with X ==> table[i,X] = gj

states Terminal seen next ID, NUM, := Non-terminals X,Y,Z ...


...
1
2 sn = shift & goto state n gn = goto state n
3 rk = reduce by rule k
... a = accept
n = error
CH4.163
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1
2
3
4
...

CH4.164
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1 s3
2
3
4
...

CH4.165
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1 s3 s2
2
3
4
...

CH4.166
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1 s3 s2 g4
2
3
4
...

CH4.167
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3
4
...

CH4.168
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3 s3 s2
4
...

CH4.169
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3 s3 s2 g7 g5
4
...

CH4.170
9 L ::= L , S .

8 S
x
2 S ::= x .
x L ::= L , . S
x S ::= . ( L )
( S ::= . x
1 (
S’ ::= . S $ S ::= ( . L )
S ::= . ( L ) L ::= . S ,
S ::= . x ( L ::= . L , S 5
S ::= . ( L ) L S ::= ( L . )
3 S ::= . x L ::= L . , S
S
)
S
4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3 s3 s2 g7 g5
4 a
...

CH4.171
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S
3 s3 s2 g7 g5
• L ::= L , S
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
8 s3 s2 g9
yet to read
9 r4 r4 r4 r4 r4
input: ( x , x ) $

stack: 1

CH4.172
states ( ) x , $ S L
0. S’ ::= S $ 1 s3 s2 g4
• S ::= ( L )
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S 3 s3 s2 g7 g5
• L ::= L , S 4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
9 r4 r4 r4 r4 r4
input: ( x , x ) $

stack: 1(3

CH4.173
1. S’ ::= S $ states ( ) x , $ S L
2. S ::= ( L ) 1 s3 s2 g4
3. S ::= x 2 r2 r2 r2 r2 r2
4. L ::= S
3 s3 s2 g7 g5
5. L ::= L , S
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
9 r4 r4 r4 r4 r4
input: ( x , x ) $

stack: 1(3x2

CH4.174
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S
• L ::= L , S 3 s3 s2 g7 g5
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read
8 s3 s2 g9
input: ( x , x ) $ 9 r4 r4 r4 r4 r4

stack: 1(3S

CH4.175
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S 3 s3 s2 g7 g5
• L ::= L , S 4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
8 s3 s2 g9
yet to read
9 r4 r4 r4 r4 r4
input: ( x , x ) $

stack: 1(3S7

CH4.176
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S 3 s3 s2 g7 g5
• L ::= L , S
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
( x , x ) $ 9 r4 r4 r4 r4 r4
input:

stack: 1(3L

CH4.177
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S
3 s3 s2 g7 g5
• L ::= L , S
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
( x , x ) $ 9 r4 r4 r4 r4 r4
input:

stack: 1(3L5

CH4.178
states ( ) x , $ S L
0. S’ ::= S $
1 s3 s2 g4
• S ::= ( L )
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S 3 s3 s2 g7 g5
• L ::= L , S 4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
9 r4 r4 r4 r4 r4
input: ( x , x ) $

stack: 1(3L5,8

CH4.179
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S
3 s3 s2 g7 g5
• L ::= L , S
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
input: ( x , x ) $ 9 r4 r4 r4 r4 r4

stack: 1(3L5,8x2

CH4.180
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L )
1 s3 s2 g4
• S ::= x
• L ::= S 2 r2 r2 r2 r2 r2
• L ::= L , S 3 s3 s2 g7 g5
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read
8 s3 s2 g9
input: ( x , x ) $ 9 r4 r4 r4 r4 r4
stack: 1(3L5,8S

CH4.181
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x
• L ::= S 2 r2 r2 r2 r2 r2
• L ::= L , S 3 s3 s2 g7 g5
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
input: ( x , x ) $ 9 r4 r4 r4 r4 r4

stack: 1(3L5,8S9

CH4.182
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S
3 s3 s2 g7 g5
• L ::= L , S
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
( x , x ) $ 9 r4 r4 r4 r4 r4
input:

stack: 1(3L

CH4.183
0. S’ ::= S $ states ( ) x , $ S L
• S ::= ( L ) 1 s3 s2 g4
• S ::= x 2 r2 r2 r2 r2 r2
• L ::= S 3 s3 s2 g7 g5
• L ::= L , S
4 a
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
yet to read 8 s3 s2 g9
9 r4 r4 r4 r4 r4
input: ( x , x ) $

stack: 1(3L5 etc ......

CH4.184
LR(0)
 Even though we are doing LR(0) parsing we are using
some look ahead (there is a column for each non-terminal)
 however, we only use the terminal to figure out which state
to go to next, not to decide whether to shift or reduce

states ( ) x , $ S L

1 s3 s2 g4

2 r2 r2 r2 r2 r2

3 s3 s2 g7 g5

CH4.185
LR(0)
 Even though we are doing LR(0) parsing we are using some look ahead
(there is a column for each non-terminal)
 however, we only use the terminal to figure out which state to go to next,
not to decide whether to shift or reduce

states ( ) x , $ S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3 s3 s2 g7 g5

ignore next automaton state


states no look-ahead S L
1 shift g4
2 reduce 2
3 shift g7 g5
CH4.186
LR(0)

 Even though we are doing LR(0) parsing we are using some


look ahead (there is a column for each non-terminal)
 however, we only use the terminal to figure out which state
to go to next, not to decide whether to shift or reduce
 If the same row contains both shift and reduce, we will have
a conflict ==> the grammar is not LR(0)
 Likewise if the same row contains reduce by two different
rules

states no look-ahead S L
1 shift, reduce 5 g4
2 reduce 2, reduce 7
3 shift g7 g5

CH4.187
SLR

 SLR (simple LR) is a variant of LR(0) that reduces the


number of conflicts in LR(0) tables by using a tiny bit of
look ahead
 To determine when to reduce, 1 symbol of look ahead is
used.
 Only put reduce by rule (X ::= RHS) in column T if T is in
Follow(X)

states ( ) x , $ S L
1 s3 s2 g4
2 r2 s5 r2
3 r1 r1 r5 r5 g7 g5

cuts down the number of rk slots & therefore cuts down conflicts
CH4.188
LR(1) & LALR

 LR(1) automata are identical to LR(0) except for the “items”


that make up the states
 LR(0) items:
X ::= s1 . s2
look-ahead symbol added
 LR(1) items
X ::= s1 . s2, T
 Idea: sequence s1 is on stack; input stream is s2 T
 Find closure with respect to X ::= s1 . Y s2, T by adding all
items Y ::= s3, U when Y ::= s3 is a rule and U is in First(s2
T)
 Two states are different if they contain the same rules but the
rules have different look-ahead symbols
 Leads to many states
 LALR(1) = LR(1) where states that are identical aside
from look-ahead symbols have been merged
 ML-Yacc & most parser generators use LALR CH4.189
Conflicts
 Shift/Reduce Sometimes even unambiguous
 Reduce/Reduce grammars produce multiply defined
labels (s/r, r/r conflicts)in the SLR
Example: table.
S’  S
SL=R | R
I0 = {S’  .S , S  .L = R , S  .R , L  .* R ,
L  * R | id
L . id , R  .L}
RL
I1 = {S’  S. }
I 2 = {S  L . = R , R  L . } action[2, = ] ?
I3 = {S  R.} s6
I4 = {L  *.R , R  .L , L  .* R , (because of S  L . = R )
L . id } I5 = {L  id. } rR  L
I6 = {S  L = . R , R  .L , L  .* R , (because of R  L .
L . id } … also I7, I8, I9 … and = follows R)
CH4.190
But Why?
 Let’s consider a string that will exhibit the conflict.
id=id
$0 id=id$ s5
$0id5 =id$ r L id
$0L2 =id$ conflict…

 What is the correct move? (recall: grammar is non-


ambig.)
 R=id is not a sentential form!!!
 Even though = might follow R …it does not in this
case.
 …it does only when R is preceded by *
 SLR finds a conflict because using Follow + LR(0)
items as the guide to find when to reduce is not the
best method.
CH4.191
Picture So Far
 SLR construction:
based on canonical collection of LR(0) items –
gives rise to canonical LR(0) parsing table.
 No multiply defined labels => Grammar is called
“SLR(1)”

 More general class: LR(1) grammars.


Using the notion of LR(1) item and the canonical
LR(1) parsing table.

CH4.192
LR(1) Items
 DEF. A LR(1) item is a production with a marker
together with a terminal:
E.g. [S  aA.Be, c]
intuition: it indicates how much of a certain production
we have seen already (aA) + what we could expect next
(Be) + a lookahead that agrees with what should follow
in the input if we ever do Reduce by the production
S  aABe
By incorporating such lookahead information into the
item concept we will make more wise reduce decisions.
 Direct use of lookahead in an LR(1) item is only
performed in considering reduce actions. (I.e. when
marker is in the rightmost).
 Core of an LR(1) item [S  aA.Be, c] is the LR(0)
item S  aA.Be
 Different LR(1) items may share the same core.
CH4.193
Usefulness of LR(1) items
 E.g. if we have two LR(1) items of the form
 [ A  . , a ] [ B  . , b ] we will take
advantage of the lookahead to decide which
reduction to use (the same setting would
perhaps produce a reduce/reduce conflict in the
SLR approach).

 How the Notion of Validity changes:


 An item [ A  1.2 , a ] is valid for a viable
prefix 1 if we have a rightmost derivation that
yields Aaw which in one step yields 12aw

CH4.194
Constructing the Canonical Collection of
LR(1) items

 Initial item: [ S’  .S , $]
 Closure. (more refined)
if [A.B , a] belongs to the set of items, and
B   is a production of the grammar, then:
we add the item [B  . , b]
for all bFIRST(a)
 Goto. (the same)
A state containing [A.X , a] will move to a
state containing [AX. , a] with label X
 Every state is closed according to Closure.
 Every state has transitions according to Goto.

CH4.195
Constructing the LR(1) Parsing Table
 Shift actions: (same)
If [A.b , a] is in state Ik and Ik moves to state
Im with label b then we add the action
action[k, b] = “shift m”
 Reduce actions: (more refined)
If [A. , a] is in state Ik then we add the action:
“Reduce A”
into action[A, a]
Observe that we don’t use information from
FOLLOW(A) anymore.
 Goto part of the table is as before.

CH4.196
Example I
construction
S’  S
S  CC
CcC |d

FIRST
S cd
C cd

CH4.197
Example II

S’  S
SL=R | R
L  * R | id
RL

FIRST
S * id
L * id
R * id

CH4.198
LR(1) more general to SLR(1):

S’  S I0 = { [S’  .S , $ ] action[2, = ] ?
SL=R | R [S  .L = R , $ ] s6
L  * R | id [S  .R , $ ] (because of
[L  .* R , = / $ ] S  L. = R )
RL
[L . id , = / $ ] THERE IS NO
[R  .L , $ ] } CONFLICT
ANYMORE
I1 = {[S’  S . , $ ]}
I2 = { [S  L . = R , $ ] I5 = {[L  id. , = / $ ]}
[R  L . , $ ] } I6 = { [S  L = . R , $ ]
[R  .L , $ ]
I3 = { [S  R. , $ ]} [L  .* R , $ ]
[L . id , $ ] }
I4 = { [L  *.R , = / $ ] I7 = {[L  *R. , = / $ ]}
[R  .L , = / $ ] I9 = {[L  *.R , $ ]
[L  .* R , = / $ ] I8 = {[R  L. , = / $ ]} [R  .L , $ ]
[L . id , = / $ ] } [L  .* R , $ ]
I10 = {[L  *R. , $ ]} [L . id , $ ] }
I11 = {[L  id. , $ ]}
I12 = {[R  L. , $ ]}
CH4.199
LALR Parsing
 Canonical sets of LR(1) items
 Number of states much larger than in the SLR construction
 LR(1) = Order of thousands for a standard prog. Lang.
 SLR(1) = order of hundreds for a standard prog. Lang.
 LALR(1) (lookahead-LR)
 A tradeoff:
 Collapse states of the LR(1) table that have the same
core (the “LR(0)” part of each state)
 LALR never introduces a Shift/Reduce Conflict if
LR(1) doesn’t.
 It might introduce a Reduce/Reduce Conflict (that did
not exist in the LR(1))…
 Still much better than SLR(1) (larger set of languages)
 … but smaller than LR(1), actually ~ SLR(1)
 What Yacc and most compilers employ.
CH4.200
Collapsing states with the same core.
 E.g., If I3 I6 collapse then whenever the LALR(1)
parser puts I36 into the stack, the LR(1) parser
would have either I3 or I6
 A shift/reduce action would not be introduced by
the LALR “collapse”
 Indeed if the LALR(1) has a Shift/Reduce
conflict this conflict should also exist in the
LR(1) version: this is because two states with
the same core would have the same outgoing
arrows.
 On the other hand a reduce/reduce conflict may be
introduced.
 Still LALR(1) preferred: table proportional to
SLR(1)
 Direct construction is also possible.
CH4.201
Error Recovery in LR Parsing

 For a given stack $...Ii and input symbols s…s’…$


it holds that action[i,s] = empty

 Panic-mode error recovery.

CH4.202
Panic Recovery Strategy I
 Scan down the stack till a state Ij is found
 Ij moves with the non-terminal A to some state
Ik
 Ik moves with s’ to some state Ik’
 Proceed as follows:
 Pop all states till Ij
 Push A and state Ik
 Discard all symbols from the input till s’
 There may be many choices as above.
 [essentially the parser in this way determines that a
string that is produced by A has an error; it assumes
it is correct and advances]
 Error message: construct of type “A” has error at
location X
CH4.203
Panic Recovery Strategy II

 Scan down the stack till a state Ij is found


 Ij moves with the terminal t to some state Ik
 Ik with s’ has a valid action.
 Proceed as follows:
 Pop all states till Ij
 Push t and state Ik
 Discard all symbols from the input till s’
 There may be many choices as above.
 Error message: “missing t”

CH4.204
Example
E’  E
EE+E| action goto
|E*E
|(E) id + * ( ) $ E
| id 0 s3 e1 e1 s2 e2 e1 1
1 e3 s4 s5 e3 e2 acc
2 s3 e1 e1 s2 e2 e1 6
3 r4 r4 r4 r4 r4 r4
4 s3 e1 e1 s2 e2 e1 7
5 s3 e1 e1 s2 e2 e1 8
6 e3 s4 s5 e3 s9 e4
7 r1 r1 s5 r1 r1 r1
8 r2 r2 r2 r2 r2 r2
9 r3 r3 r3 r3 r3 r3 CH4.205
E’  E
EE+E|
|E*E
Collection of LR(0) items
|(E)
| id

I0 I2 I5 I8
E’  .E E  (. E ) EE*.E EE*E.
E  .E + E E  .E + E E  .E + E EE.+E
E  .E * E E  .E * E E  .E * E EE.*E
E  .( E ) E  .( E ) E  .( E )
E  .id E  .id E  .id

I1 I3 I6 I9
E’  E. E  id. E(E.) E(E).
EE.+E EE.+E
EE.*E I4 EE.*E
EE+.E
E  .E + E I7 Follow(E’)=$
E  .E * E EE +E. Follow(E)=+*)$
E  .( E ) EE.+E
E  .id EE.*E
CH4.206
The parsing table
id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 s4/r1 s5/r1 r1 r1
8 s4/r2 s5/r2 r2 r2
9 r3 r3 r3 r3

CH4.207
Error-handling
id + * ( ) $ E
0 s3 e1 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 s4/r1 s5/r1 r1 r1
8 s4/r2 s5/r2 r2 r2
9 r3 r3 r3 r3

CH4.208
Error-handling
I0 I2 I5 I8
E’  .E E  (. E ) EE*.E EE*E.
E  .E + E E  .E + E E  .E + E EE.+E
E  .E * E E  .E * E E  .E * E EE.*E
E  .( E ) E  .( E ) E  .( E )
E  .id E  .id E  .id

e1 Push E into the stack and move to state 1


“missing operand”
:
e1 Push id into the stack and change to state 3
“missing operand”

CH4.209
Error-handling
id + * ( ) $ E
0 s3 e1 e1 s2 e1 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 s4/r1 s5/r1 r1 r1
8 s4/r2 s5/r2 r2 r2
9 r3 r3 r3 r3

CH4.210
Error-handling
id + * ( ) $ E
0 s3 e1 e1 s2 e2 e1 1
1 s4 s5 e2 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 e1 s2 7
5 s3 s2 8
6 s4 s5 s9
7 s4/r1 s5/r1 r1 r1
8 s4/r2 s5/r2 r2 r2
9 r3 r3 r3 r3

CH4.211
Error-handling

e2 remove “)” from input.

“unbalanced right parenthesis”

Try the input id+)

CH4.212
Error-handling state 1
id + * ( ) $ E
0 s3 e1 e1 s2 e2 e1 1
1 e3 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 s4/r1 s5/r1 r1 r1
8 s4/r2 s5/r2 r2 r2
9 r3 r3 r3 r3

CH4.213
Error-Handling

I1 I3 I6 I9
E’  E. E  id. E(E.) E(E).
EE.+E EE.+E
EE.*E I4 EE.*E
EE+.E
E  .E + E I7
E  .E * E EE +E.
E  .( E ) EE.+E
E  .id EE.*E

e3 Push + into the stack and change to state 4


“missing operator”

CH4.214
Intro to Translation
 Side-effects and Translation Schemes.
side-effects
E’  E attached to the symbols
E  E + E {print(+)} to the right of them.
| E * E {print(*)}
| {parenthesis++} ( E ) {parenthesis--}
| id { print(id); print(parenthesis); }

 Do the construction as before but:


 Side-effect in front of a symbol will be
executed in a state when we make the move
following that symbol to another state.
 Side-effects on the rightmost end are executed
during reduce actions.
Do for example id*(id+id)$
CH4.215
Ambiguous Grammars
 Ambiguous Grammars produce “conflicts”
 Shift/reduce, or reduce/reduce.
 Three “typical examples”
 The dangling else
 The “expression” grammar
 Eqn pre-processor of Troff

CH4.216
Dangling Else Ambiguity
 Recall Grammar
stmt  if expr then stmt
| if expr then stmt else stmt
| other (any other statement)

Can be rewritten as:


S  iSeS | iS | a

We compute LR(0) items + Goto and the SLR


Parsing table.

CH4.217
Grammar Relationships

Unambiguous Grammars Ambiguous Grammars

LL(1)

LR(1)LALR SLR LR(0) LL(0)

CH4.218
Canonical sets of LR(0) items
I0 I3 Follow(S)= $,e
S’  .S S  a.
S  .iSeS
S  .iS I4
S  .a S  iS.eS
S  iS.
I1
S’ S. I5
S  iSe.S
I2 S  .iSeS
S  i.SeS S  .iS
S  i.S S  .a
S  .iSeS
S  .iS I6
S  .a S  iSeS.

CH4.219
Parsing Table
1. S iSeS
2. S iS
3. S a

i e a $ S
0 s2 s3 1
1 acc
2 s2 s3 4
3 r3 r3
4 s5/r2 r2
5 s2 s3 6
6 r1 r1

Resolve: choose s5
e.g. parse iiaea to understand the conflict resolution.

CH4.220
Tracing
STACK INPUT Remark
$0 iiaea$

CH4.221
Expressions
 Recall grammar:
E’  E
EE+T | T
TT*F | F
F  ( E ) | id

 Originally written as:


E  E + E | E *E | (E) | id

In fact we can try to construct the SLR parsing table


and see what will happen

CH4.222
Canonical Sets
I0 I2 I5 I8
E’  .E E  .( E ) EE*.E EE*E.
E  .E + E E  .E + E E  .E + E EE.+E
E  .E * E E  .E * E E  .E * E EE.*E
E  .( E ) E  .( E ) E  .( E )
E  .id E  .id E  .id

I1 I3 I6 I9
E’  E. E  id. E(E.) E(E).
EE.+E EE.+E
EE.*E I4 EE.*E
EE+.E
E  .E + E I7
E  .E * E EE +E. Follow(E’)=$
E  .( E ) EE.+E Follow(E)=+*)$
E  .id EE.*E
CH4.223
The Parsing Table
1. E E+E
2. E E*E
3. E (E)
4. E id

id + * ( ) $ E
0 s3 s2 1
1 s4 s5 acc
2 s3 s2 6
3 r4 r4 r4 r4
4 s3 s2 7
5 s3 s2 8
6 s4 s5 s9
7 s4/r1 s5/r1 r1 r1
8 s4/r2 s5/r2 r2 r2
9 r3 r3 r3 r3

CH4.224
Resolving Ambiguity
 If the state contains E  E op E .
and we find op on the input we
 Reduce (left-association)
 Shift (right-association)
 If the state contains E  E op E .
and we find op’ on the input we
 Reduce (op’ has lower precedence than op)
 Shift (op’ has higher precedence than op)

CH4.225
Subscript/Superscript
1. E EsubEsupE
2. E EsubE
3. E EsupE
4. E {E}
5. E c

Even if we resolve conflicts using precedence and associations


Rule 1 will still produce problems,
i.e. when seeing a } or $
Being at the state that contains the item E EsubEsupE.
We will have two possibilities for reduce:
Either r3 or r1. (reduce/reduce conflict)

Resolving for r1 allows a special meaning for it.

CH4.226