00 upvote00 downvote

20 visualizações226 páginasPresentation and Class Notes on Syntax Analysis, part of Compiler Design subject.

May 10, 2019

© © All Rights Reserved

PPT, PDF, TXT ou leia online no Scribd

Presentation and Class Notes on Syntax Analysis, part of Compiler Design subject.

© All Rights Reserved

20 visualizações

00 upvote00 downvote

Presentation and Class Notes on Syntax Analysis, part of Compiler Design subject.

© All Rights Reserved

Você está na página 1de 226

Sarfaraz Masood

(Asstt Prof.)

Jamia Millia Islamia

(A Central University)

New Delhi – 110025

sarfarazmasood2002@yahoo.com

CH4.1

Syntax Analysis - Parsing

An overview of parsing :

Functions & Responsibilities

Concepts & Terminology

Writing and Designing Grammars

Resolving Grammar Problems / Difficulties

Top-Down Parsing

Recursive Descent & Predictive LL

Bottom-Up Parsing

LR & LALR

Concluding Remarks/Looking Ahead

CH4.2

An Overview of Parsing

Important ?

1. Precise, easy-to-understand representations

2. Compiler-writing tools can take grammar and

generate a compiler

3. allow language to be evolved (new statements,

changes to statements, etc.) Languages are not

static, but are constantly upgraded to add new

features or fix “old” ones

ADA ADA9x, C++ Adds: Templates, exceptions,

How do grammars relate to parsing process ?

CH4.3

Parsing During Compilation

regular

expressions errors

source parser parse

program analyzer get next tree front end repres

token

symbol

• uses a grammar to check table • also technically part

structure of tokens or parsing

• produces a parse tree

• syntactic errors and • includes augmenting

recovery info on tokens in

• recognize correct syntax source, type checking,

• report errors semantic analysis

CH4.4

Parsing Responsibilities

Recall typical error types:

Lexical : Misspellings

Syntactic : Omission, wrong order of tokens

Semantic : Incompatible types

Logical : Infinite loop / recursive call

Majority of error processing occurs during syntax analysis

NOTE: Not all errors are identifiable !! Which ones?

CH4.5

Key Issues – Error Processing

• Detecting errors

• Finding position at which they occur

• Clear / accurate presentation

• Recover (pass over) to continue and find later

errors

• Don’t impact compilation of “correct”

programs

CH4.6

What are some Typical Errors ?

#include<stdio.h>

int f1(int v)

As reported by MS VC++

{ int i,j=0;

for (i=1;i<5;i++) 'f2' undefined; assuming extern returning int

{ j=v+f2(i) } syntax error : missing ';' before ‘}‘

syntax error : missing ';' before ‘return‘

return j; }

fatal error : unexpected end of file found

int f2(int u)

{ int j;

j=u+f1(u*u)

return j; } Which are “easy” to

int main() recover from? Which

{ int i,j=0; are “hard” ?

for (i=1;i<10;i++)

{ j=j+i*I; printf(“%d\n”,i);

printf("%d\n",f1(j));

return 0;

}

CH4.7

Error Recovery Strategies

found ( end, “;”, “}”, etc. )

-- Decision of designer

-- Problems:

skip input miss declaration – causing more errors

miss errors in skipped material

-- Advantages:

simple suited to 1 error per statement

Phrase Level – Local correction on input

-- “,” ”;” – Delete “,” – insert “;”

-- Also decision of designer

-- Not suited to all situations

-- Used in conjunction with panic mode to

allow less input to be skipped

CH4.8

Error Recovery Strategies – (2)

Error Productions:

-- Augment grammar with rules

-- Augment grammar used for parser

construction / generation

-- example: add a rule for

:= in C assignment statements

Report error but continue compile

-- Self correction + diagnostic messages

Global Correction:

-- Adding / deleting / replacing symbols is

chancy – may do many changes !

-- Algorithms available to minimize changes

costly - key issues

CH4.9

Motivating Grammars

• Regular Expressions

Basis of lexical analysis

Represent regular languages

• Context Free Grammars

Basis of parsing

Represent language constructs

Characterize context free languages

CH4.10

Context Free Grammars :

Concepts & Terminology

Definition: A Context Free Grammar, CFG, is described

by T, NT, S, PR, where:

T: Terminals / tokens of the language

NT: Non-terminals to denote sets of strings generated by

the grammar & in the language

S: Start symbol, SNT, which defines all strings of the

language

PR: Production rules to indicate how T and NT are

combined to generate valid strings of the language.

PR: NT (T | NT)*

Like a Regular Expression / DFA / NFA, a Context Free

Grammar is a mathematical model

CH4.11

Context Free Grammars : A First Look

assign_stmt id := expr ;

expr expr operator term

expr term

term id What do “blue”

term real symbols represent?

term integer

operator +

operator -

and substitutions that transform a starting non-term

into a sequence of terminals / tokens.

Simply stated: Grammars / production rules allow us to

“rewrite” and “identify” correct syntax.

CH4.12

Derivation

using production:

assign_stmt assign_stmt id := expr ;

id := expr ; expr expr operator term

id := expr operator term; expr expr operator term

id := expr operator term operator term; expr term

id := term operator term operator term; term id

id := id operator term operator term; operator +

id := id + term operator term; term real

id := id + real operator term; operator -

id := id + real - term; term integer

id := id + real - integer;

CH4.13

Example Grammar

expr ( expr )

expr - expr

Black : NT

expr id

op + Blue : T

op - expr : S

op * 9 Production rules

op /

op

synopsis of terminology.

CH4.14

Example Grammar - Terminology

Non Terminals: A,B,C,S, black strings

T or NT: X,Y,Z

Strings of Terminals: u,v,…,z in T*

Strings of T / NT: , in ( T NT)*

Alternatives of production rules:

A 1; A 2; …; A k; A 1 | 2 | … | 1

First NT on LHS of 1st production rule is designated as

start symbol !

E E A E | ( E ) | -E | id

A+|-|*| / |

CH4.15

Grammar Concepts

replaces a NT with the RHS of a production rule.

EXAMPLE: E -E (the means “derives” in one

step) using the production rule: E -E

EXAMPLE: E E A E E * E E * ( E )

DEFINITION: derives in one step

+

derives in one step

*

derives in zero steps

1 2 … n 1

*

n ;

*

for all

If

* and then

*

CH4.16

How does this relate to Languages?

+

Let G be a CFG with start symbol S. Then S W

(where W has no non-terminals) represents the language

generated by G, denoted L(G). So WL(G) S W. +

W : is a sentence of G

sentential form of G.

EXAMPLE: id * id is a sentence

Here’s the derivation:

E E A E E * E id * E id * id

Sentential forms

E

*

id * id CH4.17

Other Derivation Concepts

E E A E id A E id * E id * id

lm lm lm lm

EAE

E rm rm

E A id

rm

E * id

rm

id * id

If A , what’s true about ?

lm

If A , what’s true about ?

rm

pictorially in a parse tree. CH4.18

Examples of LM / RM Derivations

E E A E | ( E ) | -E | id

A+|-|*| / |

A leftmost derivation of : id + id * id

A rightmost derivation of : id + id * id

CH4.19

Derivations & Parse Tree

E

E EAE E A E

E

E*E E A E

*

E

id * E E A E

id *

E

id * id E A E

id * id

CH4.20

Parse Trees and Derivations

E E+E | E*E | (E) | -E | id

Leftmost derivations of id + id * id

E

E

EE+E E + E id + E E + E

E + E

id

id + E id + E * E E + E

id E * E

CH4.21

Parse Tree & Derivations - continued

id + E * E id + id * E E + E

id E * E

id

id + id * E id + id * id E + E

id E * E

id id

CH4.22

Alternative Parse Tree & Derivation

EE*E

E

E+E*E

E * E

id + E * E E + E id

id + id * E id id

id + id * id

Two distinct leftmost derivations!

CH4.23

Resolving Grammar Problems/Difficulties

Reg. Expr. generate/represent regular languages

Reg. Languages smallest, most well defined class

of languages

Context Free Grammars: Basis of Parsing

CFGs represent context free languages

CFLs contain more powerful languages

Reg. Lang. CFLs

not regular.

CH4.24

Resolving Problems/Difficulties – (2)

to go from reg. expr. to CFGs via NFA.

Recall: (a | b)*abb

a

start a b b

0 1 2 3

CH4.25

Resolving Problems/Difficulties – (3)

Construct CFG as follows:

1. Each State I has non-terminal Ai : A0, A1, A2, A3

: A0 bA0, A1 bA2

3. If i b j then Ai bAj

: A2 bA3

4. If I is an accepting state, Ai : A3

PR ={ A0 aA0 | aA1 | bA0 ;

A1 bA2 ; a

A2 bA3 ; start

0 a 1 b 2 b 3

A3 }

b

CH4.26

How Does This CFG Derive Strings ?

a

start a b b

0 1 2 3

b

vs.

A0 aA0, A0 aA1

A0 bA0, A1 bA2

A2 bA3, A3

CH4.27

Regular Expressions vs. CFGs

1. CFGs are overkill, lexical rules are quite

simple and straightforward

2. REs – concise / easy to understand

3. More efficient lexical analyzer can be

constructed

4. RE for lexical analysis and CFGs for parsing

promotes modularity, low coupling & high

cohesion.

CFGs : Match tokens “(“ “)”, begin / end, if-then-else,

whiles, proc/func calls, …

Intended for structural associations between tokens !

Are tokens in correct order ?

CH4.28

Resolving Grammar Difficulties :

Motivation

The structure of a grammar affects the compiler design

recall “syntax-directed” translation

• Different parsing approaches have different needs

Top-Down vs. Bottom-Up

better parsing methods.

- ambiguity

- -moves

Grammar

- cycles

Problems

- left recursion

- left factoring

CH4.29

Resolving Problems: Ambiguous

Grammars

Consider the following grammar segment:

stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

What’s problem here ?

Let’s consider a simple parse tree:

stmt

if

Else must match to previous

E2 S2 S3

then. Structure indicates parse

subtree for expression.

CH4.30

Example : What Happens with this string?

if E1 then if E1 then

if E2 then if E2 then

S1 vs. S1

else else

S2 S2

CH4.31

Parse Trees for Example

Form 1: stmt

if expr

E2 S1 S2

Form 2:

stmt

if

CH4.32

Removing Ambiguity

stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

SiEtS

| iEtSeS

| s

Ea

The problem string: i a t i a t s e s

CH4.33

Revise to remove ambiguity:

SiEtS SM|U

| iEtSeS M iEtMeM| s

| s UiEtS|iEtMeU

Ea Ea

matched_stmt if expr then matched_stmt else matched_stmt | other

unmatched_stmt if expr then stmt

| if expr then matched_stmt else unmatched_stmt

CH4.34

Resolving Difficulties : Left Recursion

derivation : A +

A, for some .

since it could consistently make choice which wouldn’t

allow termination.

A A A A … etc. A A |

A A |

To the following:

A A’

A’ A’ |

CH4.35

Why is Left Recursion a Problem ?

Consider: Derive : id + id + id

EE+T | T EE+T

TT*F | F

F ( E ) | id

EE+T | T What does this generate?

EE+TT+T

EE+TE+T+TT+T+T

How does this build strings ?

What does each string have to start with ?

CH4.36

Resolving Difficulties : Left Recursion (2)

Informal Discussion:

A A1 | A2 | … | Am | 1 | 2 | … | n

Where no i begins with A.

Now apply concepts of previous slide:

A 1A’ | 2A’ | … | nA’

A’ 1A’ | 2A’| … | m A’ |

For our example:

E TE’

EE+T | T E’ + TE’ |

TT*F | F T FT’

T’ * FT’ |

F ( E ) | id F ( E ) | id

CH4.37

Resolving Difficulties : Left Recursion (3)

Problem: If left recursion is two-or-more levels deep,

this isn’t enough

S Aa | b

S Aa Sda

A Ac | Sd |

Algorithm:

Input: Grammar G with ordered Non-Terminals A1, ..., An

Output: An equivalent grammar with no left recursion

1. Arrange the non-terminals in some order A1=start NT,A2,…An

2. for i := 1 to n do begin

for j := 1 to i – 1 do begin

replace each production of the form Ai Aj

by the productions Ai 1 | 2 | … | k

where Aj 1|2|…|k are all current Aj productions;

end

eliminate the immediate left recursion among Ai productions

end CH4.38

Using the Algorithm

A2 A2c | A1d

i=1

For A1 there is no left recursion

i=2

for j=1 to 1 do

Take productions: A2 A1 and replace with

A2 1 | 2 | … | k |

where A1 1 | 2 | … | k are A1 productions

in our case A2 A1d becomes A2 A2ad | bd | d

What’s left: A1 A2a | b |

Are we done ?

A2 A2 c | A2 ad | bd | d

CH4.39

Using the Algorithm (2)

A1 A2a | b |

A2 A2 c | A2 ad | bd | d

Recall:

A A1 | A2 | … | Am | 1 | 2 | … | n

A’ 1A’ | 2A’| … | m A’ |

CH4.40

Removing Difficulties : -Moves

rules of the form B uAv and add the rule B uv to

the grammar G.

Why does this work ?

Examples:

E TE’

E’ + TE’ |

T FT’

T’ * FT’ |

F ( E ) | id

A1 A2 a | b

A2 bd A2’ | A2’

A2’ c A2’ | bd A2’ |

CH4.41

Removing Difficulties : Cycles

Make sure every production is adding

some terminal(s) (except a single -

production in the start NT)…

e.g.

Transformation:

S SS | ( S ) | substitute A uBv by

Has a cycle: S SS S the productions A

u1v| ... | unv

S assuming that all the

productions for B are

Transform to: B 1| ... | n

SS(S)|(S)|

CH4.42

Removing Difficulties : Left Factoring

stmt if expr then stmt else stmt

| if expr then stmt

When do you know which one is valid ?

What’s the general form of stmt ?

A 1 | 2 : if expr then stmt

1: else stmt 2 :

A A’ stmt if expr then stmt rest

A’ 1 | 2 rest else stmt |

CH4.43

Resolving Grammar Problems

be represented by context free grammars / languages.

Examples:

1. Declaring ID before its use

2. Valid typing within expressions

3. Parameters in definition vs. in call

These features are call context-sensitive and define yet

another language class, CSL.

CH4.44

Context-Sensitive Languages - Examples

Examples:

L1 = { wcw | w is in (a | b)* } : Declare before use

L2 = { an bm cn dm | n 1, m 1 }

an bm : formal parameter

cn dm : actual parameter

CH4.45

How do you show a Language is a CFL?

L3 = { w c wR | w is in (a | b)* }

L4 = { an bm cm dn | n 1, m 1 }

L5 = { an bn cm dm | n 1, m 1 }

L6 = { an bn | n 1 }

CH4.46

Solutions

L3 = { w c wR | w is in (a | b)* }

SaSa | bSb | c

L4 = { an bm cm dn | n 1, m 1 }

S aSd | aAd

A b A c | bc

L5 = { an bn cm dm | n 1, m 1 }

S XY

X a X b | ab

Y c Y d | cd

L6 = { an bn | n 1 }

S a S b | ab

CH4.47

Example of CS Grammar

L2 = { an bm cn dm | n 1, m 1 }

SaAcH

A aAc | B

BbBD|bD

DccD

DDHDHd

cDHcd

CH4.48

Top-Down Parsing

• Why ?

• By always replacing the leftmost non-terminal symbol via a

production rule, we are guaranteed of developing a parse tree in a

left-to-right fashion that is consistent with scanning the input.

• A aBc adDc adec (scan a, scan d, scan e, scan c - accept!)

• Recursive-descent parsing concepts

•Predictive parsing

• Recursive / Brute force technique

• non-recursive / table driven

• Error recovery

• Implementation

CH4.49

Top-Down Parsing

CH4.50

Recursive Descent Parsing

• Choose production rule based on input symbol

• May require backtracking to correct a wrong choice.

• Example: S cAd

input: cad

A ab | a

S

cad S cad

c A d

c A d

a b Problem: backtrack

S S

S cad

cad cad

c d c A d

A c A d

a b a

a

CH4.51

Top-Down Parsing

CH4.52

Predictive Parsing

•Backtracking is bad!

•To eliminate backtracking, what must we do/be sure of for grammar?

• no left recursion

• apply left factoring

• (frequently) when grammar satisfies above conditions:

current input symbol in conjunction with current non-terminal

uniquely determines the production that needs to be applied.

• Utilize transition diagrams:

For each non-terminal of the grammar do following:

1. Create an initial and final state

2. If A X1X2…Xn is a production, add path with edges X1, X2,

… , Xn

• Once transition diagrams have been developed, apply a

straightforward technique to algorithmicize transition diagrams with

procedure and possible recursion.

CH4.53

Transition Diagrams

• Unlike lexical equivalents, each edge represents a token

•Transition implies: if token, match input else call proc

• Recall earlier grammar and its associated transition diagrams

E TE’ T FT’ F ( E ) | id

E’ + TE’ | T’ * FT’ |

T E’

E: 0 1 2 How are transition

diagrams used ?

+ T E’

E’: 3 4 5 6

Are -moves a

problem ?

F T’

T: 7 8 9 Can we simplify

transition diagrams ?

* F T’

T’: 10 11 12 13 Why is simplification

critical ?

( E )

F: 14 15 16 17

id CH4.54

How are Transition Diagrams Used ?

TD_E’()

main()

{

{ token = get_token();

TD_E(); if token = ‘+’ then

} { TD_T(); TD_E’(); }

} What

happened to

TD_F()

TD_E() { -moves?

{ token = get_token();

if token = ‘(’ then … “else

TD_T();

{ TD_E(); match(‘)’); } unget()

TD_E’(); else and

} if token.value <> id then

{error + EXIT}

terminate”

else

TD_T() ...

}

{ NOTE: not

TD_F(); TD_E’() all error

{

TD_T’(); conditions

token = get_token();

} if token = ‘*’ then have been

{ TD_F(); TD_T’(); } represented.

}

CH4.55

How can Transition Diagrams be

Simplified ?

+ T E’

E’: 3 4 5 6

CH4.56

How can Transition Diagrams be

Simplified ? (2)

+ T E’

E’: 3 4 5 6

+ T

E’: 3 4 5

6

CH4.57

How can Transition Diagrams be

Simplified ? (3)

+ T E’

E’: 3 4 5 6

T

+ T +

E’: 3 4 5 E’: 3 4

6 6

CH4.58

How can Transition Diagrams be

Simplified ? (4)

+ T E’

E’: 3 4 5 6

T

+ T +

E’: 3 4 5 E’: 3 4

6 6

T E’

E: 0 1 2

CH4.59

How can Transition Diagrams be

Simplified ? (5)

+ T E’

E’: 3 4 5 6

T

+ T +

E’: 3 4 5 E’: 3 4

6 6

T E’

E: 0 1 2

T +

E: 0 3 4

6

CH4.60

Additional Transition Diagram

Simplifications

• Similar steps for T and T’

• Simplified Transition diagrams:

*

F Why is simplification

T: 7 10 13

important ?

F

T’: 10 11

13

( E )

F: 14 15 16 17

id

CH4.61

Top-Down Parsing

CH4.62

Motivating Table-Driven Parsing

2. Find leftmost derivation

Terminator

Grammar: E TE’

E’ +TE’ | Input : id + id $

T id

Derivation: E

Processing Stack:

CH4.63

Non-Recursive / Table Driven

a + b $ Input (String + terminator)

NT + T Program

Y

symbols of

CFG Z What actions parser

Parsing Table should take based on

Empty stack $ M[A,a] stack / input

symbol

1. When X=a = $ halt, accept, success

2. When X=a $ , POP X off stack, advance input, go to 1.

3. When X is a non-terminal, examine M[X,a]

if it is an error call recovery routine

if M[X,a] = {X UVW}, POP X, PUSH W,V,U

DO NOT expend any input CH4.64

Algorithm for Non-Recursive Parsing

Set ip to point to the first symbol of w$;

repeat

let X be the top stack symbol and a the symbol pointed to by ip;

if X is terminal or $ then

Input pointer

if X=a then

pop X from the stack and advance ip

else error()

else /* X is a non-terminal */

if M[X,a] = XY1Y2…Yk then begin

pop X from stack;

push Yk, Yk-1, … , Y1 onto stack, with Y1 on top

output the production XY1Y2…Yk

end May also execute other code

based on the production used

else error()

until X=$ /* stack is empty */

CH4.65

Example

E TE’

E’ + TE’ |

T FT’ Our well-worn example !

T’ * FT’ |

F ( E ) | id

Table M

terminal

id + * ( ) $

E ETE’ ETE’

E’ E’+TE’ E’ E’

T TFT’ TFT’

T’ T’ T’*FT’ T’ T’

F Fid F(E)

CH4.66

Trace of Example

STACK INPUT OUTPUT

CH4.67

Trace of Example

STACK INPUT OUTPUT

$E id + id * id$

$E’T id + id * id$ E TE’

$E’T’F id + id * id$ T FT’

$E’T’id id + id * id$ F id

$E’T’ + id * id$

$E’ + id * id$ T’ Expend Input

$E’T+ + id * id$ E’ +TE’

$E’T id * id$

$E’T’F id * id$ T FT’

$E’T’id id * id$ F id

$E’T’ * id$

$E’T’F* * id$ T’ *FT’

$E’T’F id$

$E’T’id id$ F id

$E’T’ $

$E’ $ T’

$ $ E’

CH4.68

Leftmost Derivation for the Example

id + id T’E’ id + id * FT’E’ id + id * id T’E’

id + id * id E’ id + id * id

CH4.69

What’s the Missing Puzzle Piece ?

Constructing the Parsing Table M !

1st : Calculate First & Follow for Grammar

2nd: Apply Construction Algorithm for Parsing Table

( We’ll see this shortly )

Basic Tools:

First: Let be a string of grammar symbols. First() is the set

that includes every terminal that appears leftmost in or

in any string originating from .

NOTE: If * , then is First( ).

a that can appear directly to the right of A in some

sentential form. (S

* Aa, for some and ).

NOTE: If S * A, then $ is Follow(A). CH4.70

Motivation Behind First & Follow

Is used to help find the appropriate reduction to follow

First: given the top-of-the-stack non-terminal and the current

input symbol.

Example: If A , and a is in First(), then when

a=input, replace A with (in the stack).

( a is one of first symbols of , so when A is on the stack

and a is input, POP A and PUSH .

when First gives no suggestion. When * or ,

then what follows A dictates the next choice to be made.

Example: If A , and b is in Follow(A ), then when

* and b is an input character, then we expand A with

, which will eventually expand to , of which b follows!

( : i.e., First( ) contains .)

*

CH4.71

An example.

STACK INPUT OUTPUT

$S abbd$

SaBCd

B CB | | S a

Cb

CH4.72

Computing First(X) :

All Grammar Symbols

1. If X is a terminal, First(X) = {X}

2. If X is a production rule, add to First(X)

3. If X is a non-terminal, and X Y1Y2…Yk is a production rule

Place First(Y1) in First(X)

*

if Y1 , Place First(Y2) in First(X)

if Y2

* , Place First(Y3) in First(X)

…

* ,

if Yk-1 Place First(Yk) in First(X)

NOTE: As soon as Yi

* , Stop.

Repeat above steps until no more elements are added to any First(

) set.

Checking “Yj * ?” essentially amounts to checking whether

belongs to First(Yj)

CH4.73

Computing First(X) :

All Grammar Symbols - continued

Informally, suppose we want to compute

First(X1 X2 … Xn ) = First (X1) “+”

First(X2) if is in First(X1) “+”

First(X3) if is in First(X2) “+”

…

First(Xn) if is in First(Xn-1)

is in First(Xi) for all i

Note 2: For First(X1), if X1 Z1 Z2 … Zm ,

then we need to compute First(Z1 Z2 … Zm) !

CH4.74

Example 1

S i E t SS’ | a

S’ eS |

E b

CH4.75

Example 1

S i E t SS’ | a

S’ eS |

E b

Verify that

First(S) = { i, a }

First(S’) = { e, }

First(E) = { b }

CH4.76

Example 2

Computing First for: E TE’

E’ + TE’ |

T FT’

T’ * FT’ |

F ( E ) | id

CH4.77

Example 2

Computing First for: E TE’

E’ + TE’ |

T FT’

T’ * FT’ |

F ( E ) | id

First(TE’)

First(T) “+” First(E’)

First(E) *

Not First(E’) since T

First(T)

*

First(E’) = { + , } First(T’) = { * , }

First(T) First(F) = { ( , id }

CH4.78

Computing Follow(A) :

All Non-Terminals

1. Place $ in Follow(S), where S is the start symbol and $

signals end of input

2. If there is a production A B, then everything in

First() is in Follow(B) except for .

3. If A B is a production, or A B and *

(First() contains ), then everything in Follow(A) is in

Follow(B)

(Whatever followed A must follow B, since nothing

follows B from the production rule)

CH4.79

The Algorithm for Follow – pseudocode

1. Initialize Follow(X) for all non-terminals X

to empty set. Place $ in Follow(S), where S is the start

NT.

2. Repeat the following step until no modifications are

made to any Follow-set

For any production X X1 X2 … Xm

For j=1 to m,

if Xj is a non-terminal then:

Follow(Xj)=Follow(Xj)(First(Xj+1,…,Xm)-{});

If First(Xj+1,…,Xm) contains or Xj+1,…,Xm=

then Follow(Xj)=Follow(Xj) Follow(X);

CH4.80

Computing Follow : 1st Example

Recall: S i E t SS’ | a First(S) = { i, a }

S’ eS | First(S’) = { e, }

E b First(E) = { b }

CH4.81

Computing Follow : 1st Example

Recall: S i E t SS’ | a First(S) = { i, a }

S’ eS | First(S’) = { e, }

E b First(E) = { b }

Since S i E t SS’ , put in First(S’) – not

Since S’

* , Put in Follow(S)

Follow(E) = { t }

CH4.82

Example 2

Compute Follow for: E TE’

E’ + TE’ |

T FT’

T’ * FT’ |

F ( E ) | id

CH4.83

Example 2

Compute Follow for: E TE’

E’ + TE’ |

T FT’

T’ * FT’ |

F ( E ) | id

First Follow

E ( id E $)

E’ + E’ $)

T ( id T +$)

T’ * T’ +$)

F ( id F +*$)

CH4.84

Constructing Parsing Table

Algorithm:

Table has one row per non-terminal / one column per

terminal (incl. $ )

1. Repeat Steps 2 & 3 for each rule A

2. Terminal a in First()? Add A to M[A, a ]

3.1 in First()? Add A to M[A, b ] for all

terminals b in Follow(A).

3.2 in First() and $ in Follow(A)? Add A to

M[A, $ ]

4. All undefined entries are errors.

CH4.85

Constructing Parsing Table – Example 1

S i E t SS’ | a First(S) = { i, a } Follow(S) = { e, $ }

S’ eS | First(S’) = { e, } Follow(S’) = { e, $ }

E b First(E) = { b } Follow(E) = { t }

CH4.86

Constructing Parsing Table – Example 1

S i E t SS’ | a First(S) = { i, a } Follow(S) = { e, $ }

S’ eS | First(S’) = { e, } Follow(S’) = { e, $ }

E b First(E) = { b } Follow(E) = { t }

S i E t SS’ Sa Eb

First(i E t SS’)={i} First(a) = {a} First(b) = {b}

S’ eS S

First(eS) = {e} First() = {} Follow(S’) = { e, $ }

terminal a b e i t $

S S a S iEtSS’

S’ S’

S’ eS S

E E b

CH4.87

Constructing Parsing Table – Example 2

E TE’ First(E,F,T) = { (, id } Follow(E,E’) = { ), $}

E’ + TE’ |

T FT’ First(E’) = { +, } Follow(F) = { *, +, ), $ }

T’ * FT’ | First(T’) = { *, } Follow(T,T’) = { +, ) , $}

F ( E ) | id

CH4.88

Constructing Parsing Table – Example 2

E TE’ First(E,F,T) = { (, id } Follow(E,E’) = { ), $}

E’ + TE’ |

T FT’ First(E’) = { +, } Follow(F) = { *, +, ), $ }

T’ * FT’ | First(T’) = { *, } Follow(T,T’) = { +, ) , $}

F ( E ) | id

M[E, ( ] : E TE’

by rule 2

M[E, id ] : E TE’

(by rule 2) E’ +TE’ : First(+TE’) = + : M[E’, +] : E’ +TE’

(by rule 3) E’ : in First( ) T’ : in First( )

M[E’, )] : E’ (3.1) M[T’, +] : T’ (3.1)

M[E’, $] : E’ (3.2) M[T’, )] : T’ (3.1)

(Due to Follow(E’) M[T’, $] : T’ (3.2)

CH4.89

LL(1) Grammars

L : Scan input from Left to Right

L : Construct a Leftmost Derivation

1 : Use “1” input symbol as lookahead in conjunction

with stack to decide on the parsing action

LL(1) grammars == they have no multiply-defined

entries in the parsing table.

Properties of LL(1) grammars:

• Grammar can’t be ambiguous or left recursive

• Grammar is LL(1) when A

1. & do not derive strings starting with the

same terminal a

2. Either or can derive , but not both.

Note: It may not be possible for a grammar to be

manipulated into an LL(1) grammar

CH4.90

Error Recovery

When Do Errors Occur? Recall Predictive Parser Function:

a + b $ Input

Program

Y

Z

Parsing Table

$ M[A,a]

2. If M[ X, Input ] is empty – No allowable actions

Consider two recovery techniques:

A. Panic Mode

B. Phrase-level Recovery CH4.91

Panic-Mode Recovery

Assume a non-terminal on the top of the stack.

Idea: skip symbols on the input until a token in a selected

set of synchronizing tokens is found.

The choice for a synchronizing set is important.

some ideas:

define the synchronizing set of A to be FOLLOW(A).

then skip input until a token in FOLLOW(A) appears

and then pop A from the stack. Resume parsing...

add symbols of FIRST(A) into synchronizing set. In

this case we skip input and once we find a token in

FIRST(A) we resume parsing from A.

Productions that lead to if available might be used.

If a terminal appears on top of the stack and does not match

to the input == pop it and and continue parsing (issuing an

error message saying that the terminal was inserted).

CH4.92

Panic Mode Recovery, II

1. if M[A,a] = {empty} and a belongs to Follow(A) then we set

M[A,a] = “synch”

Error-recovery Strategy :

If A=top-of-the-stack and a=current-input,

1. If A is NT and M[A,a] = {empty} then skip a from the input.

2. If A is NT and M[A,a] = {synch} then pop A.

3. If A is a terminal and A!=a then pop token (essentially inserting

it).

CH4.93

Revised Parsing Table / Example

terminal

id + * ( ) $

E ETE’ ETE’

E’ E’+TE’ E’ E’

T TFT’ TFT’

T’ T’ T’*FT’ T’ T’

F Fid F(E)

top of stack NT

“synch” action

CH4.94

Revised Parsing Table / Example(2)

STACK INPUT Remark

$E + id * + id$ error, skip +

$E id * + id$

$E’T id * + id$

$E’T’F id * + id$

$E’T’id id * + id$ Possible

$E’T’ * + id$ Error Msg:

$E’T’F* * + id$ “Misplaced +

I am skipping it”

$E’T’F + id$ error, M[F,+] = synch

$E’T’ + id$ F has been popped

$E’ + id$

$E’T+ + id$

$E’T id$

$E’T’F id$ Possible

$E’T’id id$ Error Msg:

$E’T’ $ “Missing Term”

$E’ $

$ $

CH4.95

Writing Error Messages

Keep input counter(s)

Recall: every non-terminal symbolizes an abstract language

construct.

Examples of Error-messages for our usual grammar

E = means expression.

top-of-stack is E, input is +

“Error at location i, expressions cannot start with a ‘+’” or

“error at location i, invalid expression”

Similarly for E, *

E’= expression ending.

Top-of-stack is E’, input is * or id

“Error: expression starting at j is badly formed at location i”

Requires: every time you pop an ‘E’ remember the location

CH4.96

Writing Error-Messages, II

Messages for Synch Errors.

Top-of-stack is F input is +

“error at location i, expected

summation/multiplication term missing”

Top-of-stack is E input is )

“error at location i, expected expression missing”

CH4.97

Writing Error Messages, III

When the top-of-the stack is a terminal that does

not match…

E.g. top-of-stack is id and the input is +

“error at location i: identifier expected”

Top-of-stack is ) and the input is terminal other

than )

Every time you match an ‘(‘

push the location of ‘(‘ to a “left parenthesis” stack.

– this can also be done with the symbol stack.

When the mismatch is discovered look at the left

parenthesis stack to recover the location of the

parenthesis.

“error at location i: left parenthesis at location m has

no closing right parenthesis”

– E.g. consider ( id * + (id id) $

CH4.98

Incorporating Error-Messages to the Table

Empty parsing table entries can now fill with the

appropriate error-reporting techniques.

CH4.99

Phrase-Level Recovery

handling routines that do not only report errors but may

also:

• change/ insert / delete / symbols into the stack and /

or input stream

• + issue error message

• Problems:

• Modifying stack has to be done with care, so as to

not create possibility of derivations that aren’t in

language

• infinite loops must be avoided

• Essentially extends panic mode to have more complete

error handling

CH4.100

How Would You Implement TD Parser

• Stack – Easy to handle. Write ADT to manipulate its contents

• Input Stream – Responsibility of lexical analyzer

• Key Issue – How is parsing table implemented ?

One approach: Assign unique IDS

terminal

id + * ( ) $

E ETE’ ETE’ synch synch

E’ E’+TE’ E’ E’

T TFT’ synch TFT’ synch synch

T’ T’ T’*FT’ T’ T’

F Fid synch synch F(E) synch synch

Ditto for synch

unique IDs which handle

actions

errors

CH4.101

Revised Parsing Table:

terminal

id + * ( ) $

E 1 18 19 1 9 10

E’ 20 2 21 22 3 3

T 4 11 23 4 12 13

T’ 24 6 5 25 6 6

F 8 14 15 7 16 17

1 ETE’

2 E’+TE’ 9 – 17 : 18 – 25 :

3 E’ Sync Error

4 TFT’ Actions Handlers

5 T’*FT’

6 T’

7 F(E)

8 Fid

CH4.102

Revised Parsing Table: (2)

• Uses Stack ADT

• Gets Tokens

• Prints Error Messages

• Prints Diagnostic Messages

• Handles Errors

CH4.103

How is Parser Constructed ?

One large CASE statement:

state = M[ top(s), current_token ]

switch (state)

{

case 1: proc_E_TE’( ) ;

break ; Combine put

… in another switch

case 8: proc_F_id( ) ;

break ;

case 9: proc_sync_9( ) ;

Some sync

break ;

… actions may be

case 17: proc_sync_17( ) ; same

break ;

case 18: Some error

… Procs to handle errors handlers may be

case 25: similar

}

CH4.104

Final Comments – Top-Down Parsing

So far,

• We’ve examined grammars and language theory and

its relationship to parsing

• Key concepts: Rewriting grammar into an acceptable

form

• Examined Top-Down parsing:

Brute Force : Transition diagrams & recursion

Elegant : Table driven

• We’ve identified its shortcomings:

Not all grammars can be made LL(1) !

• Bottom-Up Parsing - Future

CH4.105

Bottom Up Parsing

CH4.106

Bottom Up Parsing

“Shift-Reduce” Parsing

Reduce a string to the start symbol of the grammar.

At every step a particular substring is matched (in

left-to-right fashion) to the right side of some

production and then it is substituted by the non-

terminal in the left hand side of the production.

abbcde

Consider:

aAbcde

S aABe

aAde

A Abc | b aABe

Bd S

Rightmost Derivation:

S aABe aAde aAbcde abbcde

CH4.107

Handles

Handle of a string = substring that matches the

RHS of some production AND whose reduction to

the non-terminal on the LHS is a step along the

reverse of some rightmost derivation.

Formally: handle of a right sentential form

is <A , location of in >

that satisfies the above property.

i.e. A is a handle of at the location immediately

after the end of , if:

=> A =>

rm

*

S rm

Right sentential forms of a non-ambiguous grammar

have one unique handle [but many substrings that look like

handles potentially !].

CH4.108

Example

Consider:

S aABe

A Abc | b

Bd

It follows that:

(S ) aABe is a handle of aABe in location 1.

(B ) d is a handle of aAde in location 3.

(A ) Abc is a handle of aAbcde in location 2.

(A ) b is a handle of abbcde in location 2.

CH4.109

Example, II

Grammar:

S aABe

A Abc | b

Bd

Consider aAbcde (it is a right sentential form)

Is [A b, aAbcde] a handle?

if it is then there must be:

S rm … rm aAAbcde rm aAbcde

A’s in this grammar. => Impossible

CH4.110

Example, III

Grammar:

S aABe

A Abc | b

Bd

Consider aAbcde (it is a right sentential form)

Is [B d, aAbcde] a handle?

if it is then there must be:

S rm … rm aAbcBe rm aAbcde

we try to obtain aAbcBe not a right

sentential form

S rm aABe ?? aAbcBe

CH4.111

Handle Pruning

A rightmost derivation in reverse can be obtained

by “handle-pruning.”

Apply this to the previous example.

S aABe

A Abc | b

Bd

abbcde Ab

aAbcde A Abc

aAde Bd

aABe S aABe

S

CH4.112

Handle Pruning, II

Consider the cut of a parse-tree of a certain right

sentential form.

S

Viable prefix

CH4.113

Shift Reduce Parsing with a Stack

The “big” problem : given the sentential form

locate the handle

General Idea for S-R parsing using a stack:

using a stack:

1. “shift” input symbols into the stack until a

handle is found on top of it.

2. “reduce” the handle to the corresponding non-

terminal.

(other operations: “accept” when the input is

consumed and only the start symbol is on the stack,

also: “error”).

Viable prefix: prefix of a right sentential form that

appears on the stack of a Shift-Reduce parser.

CH4.114

What happens with ambiguous grammars

Consider:

EE+E | E*E|

| ( E ) | id

Derive id+id*id

By two different Rightmost

derivations

CH4.115

Example

STACK INPUT Remark EE+E

$ id + id * id$ Shift | E*E

$ id + id * id$ Reduce by E id

$E + id * id$

| ( E ) | id

CH4.116

Conflicts

Conflicts [appear in ambiguous grammars]

either “shift/reduce” or “reduce/reduce”

Another Example:

| if expr then stmt else stmt

| other (any other statement)

Stack Input

if … then else … Shift/ Reduce

conflict

CH4.117

More Conflicts

stmt id ( parameter-list )

stmt expr := expr

parameter-list parameter-list , parameter | parameter

parameter id

expr-list expr-list , expr | expr

expr id | id ( expr-list )

Corresponding token stream is id(id, id)

After three shifts:

Stack = id(id Input = , id)

Reduce/Reduce Conflict … what to do?

(it really depends on what is A,

an array? or a procedure? CH4.118

Removing Conflicts

One way is to manipulate grammar.

cf. what we did in the top-down approach to

transform a grammar so that it is LL(1).

Nevertheless:

We will see that shift/reduce and reduce/reduce

conflicts can be best dealt with after they are

discovered.

This simplifies the design.

CH4.119

Introduction to LR Parsing

CH4.120

Example

Consider:

S aABe

A Abc | b

Bd

Rightmost Derivation of the string abbcde:

S aABe aAde aAbcde abbcde

The (unique) handle is underlined for each step.

A viable prefix is

(1) a string that equals a prefix of a right-sentential form up

to (and including) its unique handle.

(2) any prefix of a string that satisfies (1)

Examples: a, aA, aAd, aAbc, ab, aAb,…

Not viable prefixes: aAde, Abc, aAA,…

CH4.121

Shift/Reduce Parser

STACK INPUT Remark

$ abbcde$ SHIFT

$a bbcde$ SHIFT

$ab bcde$ REDUCE

$aA bcde$ SHIFT

$aAb cde$ SHIFT (?)

$aAbc de$ REDUCE

$aA de$ SHIFT

$aAd e$ REDUCE

$aAB e$ SHIFT

$aABe $ REDUCE

$S $ ACCEPT Observe: all

Strings in the

stack are viable

prefixes

CH4.122

When to shift? When to Reduce?

Sometimes on top of the stack something appears

to be a handle (i.e., matches the RHS of a

production).

But: maybe we have not shifted enough elements to

identify the (real) handle.

Observe the correct sequence of Shift and Reduce

steps preserves the property that the stack IS a

viable prefix.

Example

$aAb cde$ Shift or Reduce?

If we shift we obtain aAbc in the stack.

Recall that Abc is a handle.

Instead if we reduce we obtain aAA in the stack.

(this is NOT a viable prefix!!!)

CH4.123

When to Shift? When to Reduce? II

In order to make shift/reduce decisions:

We need to look to perhaps a few elements inside the

stack.

We need to make sure that the way we modify the stack

preserves the “viable prefix condition.”

For our previous example:

Any b appearing to the right of “A” should not be

reduced.

In fact we can come up with heuristic decisions based

on the grammar structure:

A “b” is reduced only if it is to the right of “a”

PROBLEM: what kind of information do we need to store

inside the stack so that we can make decisions as above just

by looking at the top element?

CH4.124

LR Parsing

LR (left-to-right, rightmost derivation).

LR(1) = 1 lookahead symbol.

Use stack

Stack should contain “more information” (in a

compressed form) compared to a Top-Down Table-

driven parser.

LR(1):

Decisions are taken looking at the top of the

stack + 1 input element.

CH4.125

Anatomy of an LR parser

a + b $ Input (String + terminator)

“States”

Stack

NT + T

symbols s LR Parsing Output

of CFG Program

X

s What actions parser

Parsing Table should take based on

X

action[.,.] goto[.,.] stack / input

1. If action[s,a]=“accept” halt, accept, success

2. If action[s,a]=“reduce by production A ” do the following:

2a. Pop 2*|| elements from the stack.

2b. Push A

2c. Push goto[s*,A]

3. If action[s,a]=“shift and goto state s*”

Shift; push s*

CH4.126

Example

1. S aABe

action goto

2. A Abc

3. A b a b c d e $ S A B

4. B d

0 s1 9

1 s3 2

2 s4 s8 5

3 r3 r3

4 s6

5 s7

6 r2 r2

7 r1

8 r4

9 acc

CH4.127

Example, II

STACK INPUT Remark

$0 abbcde$

CH4.128

Interesting Fact + LR Parsing Table

Construction Methods

HOW TO CONSTRUCT SUCH TABLES?

The set of all viable prefixes is Regular.

It is possible to write a DFA that recognizes it!

Use the DFA as an aid to construction of the table.

Design Methodologies:

SLR (simple LR)

“short table but limited methodology.”

Canonical LR

“general methodology but big table.”

LALR (lookahead LR)

“in between”

CH4.129

SLR Parsing

CH4.130

Items

SLR (Simple LR parsing)

DEF A LR(0) item is a production with a “marker.”

E.g. S aA.Be

intuition: it indicates how much of a certain production

we have seen already (up to the point of the marker)

CENTRAL IDEA OF SLR PARSING: construct a DFA

that recognizes viable prefixes of the grammar.

Intuition: Shift/Reduce actions can be decided based on

this DFA (what we have seen so far & what are our next

options).

Use “LR(0) Items” for the creation of this DFA.

CH4.131

Basic Operations

Augmented Grammar:

E’ E

EE+T | T EE+T | T

TT*F | F TT*F | F

F ( E ) | id F ( E ) | id

Function closure(I)

{ J=I;

repeat for each A .B in J and each produtcion

B of G such that B. is not in J: ADD B. to J

until … no more items can be added to J

return J

}

EXAMPLE consider I={ E’.E }

CH4.132

GOTO function

Definition.

Goto(I,X) = closure of the set of all items

A X. where A .X belongs to I

Intuitively: Goto(I,X) set of all items that

“reachable” from the items of I once X has been

“seen.”

E.g. consider I={E’ E. , E E.+T} and compute

Goto(I,+)

Goto(I,+) = { E E+.T, T .T * F , T .F ,

F .( E ) , F .id }

CH4.133

The Canonical Collections of Items for G

{ C:={ closure [S’ .S] }

repeat

for each set of items I in C and each

grammar symbol X

such that goto(I,X) is not empty and not in C

do add goto(I,X) to C

until no more sets of items can be added to C

} I0 I1

E’ .E E’ E.

E .E + T E E. + T

E’ E …

EE+T | T E .T

TT*F | F T .T * F I2 I11

T .F E T.

F ( E ) | id

F .( E ) T T. * F

F .id

CH4.134

The DFA For Viable Prefixes

States = Canonical Collection of Sets of Items

Transitions defined by the Goto Function.

All states final except I0

E + T *

I0 I1 I2 I3 I7

F

I3

…

… Look p. 226

Intuition: Imagine an NFA with states all the items

in the grammar and transitions to be of the form:

“A .X” goes to “A X.” with an arrow

labeled “X”

Then the closure used in the Goto functions

Essentially transforms this NFA into the DFA above

CH4.135

Example

S’ S

S aABe Start with I0 = closure(S’ .S)

A Abc

A b

Bd

CH4.136

Example, II

E’ E

EE+T | T

TT*F | F

F ( E ) | id

CH4.137

Relation to Parsing

An item A 1.2 is valid for a viable prefix

1 if we have a rightmost derivation that yields

Aw which in one step yields 12w

An item will be valid for many viable prefixes.

Whether a certain item is valid for a certain viable

prefix it helps on our decision whether to shift or

reduce when 1 is on the stack.

If 2 looks like we still need to shift.

If 2= it looks like we should reduce A 1

Not a total solution since two valid items may

tell us different things.

CH4.138

Sanity Check

E+T* is a viable prefix (and the DFA will be at

state I7 after reading it)

Indeed: E’=>E=>E+T=>E+T*F is a rightmost

derivation, T*F is the handle of E+T*F, thus

E+T*F is a viable prefix, thus E+T* is also.

Examine state I7 … it contains

T T*.F

F .(E)

F .id

i.e., precisely the items valid for E+T*:

E’=>E=>E+T=>E+T*F

E’=>E=>E+T=>E+T*F=>E+T*(E)

E’=>E=>E+T=>E+T*F=>E+T*id

There are no other valid items for for the viable

prefix E+T*

CH4.139

SLR Parsing Table Construction

Output: The SLR Parsing table functions ACTION & GOTO

2. “State i” is constructed from Ii

If [A .a] is in Ii and goto(Ii,a)=Ik then we set

ACTION[i,a] to be “shift k” (a is a terminal)

If [A .] is in Ii then we set ACTION[i,a] to reduce “A”

for all a in Follow(A) --- (note: A is not S’)

If [S’ S.] is in Ii then we set ACTION[i,$] = accept

3. The goto transitions for state i are constructed as follows for

all A, if goto(Ii,A)=Ik then goto[i,A]=k

4. All entries not defined by rules (2) and (3) are made “error”

5. The initial state of the parser is the one constructed from the

set of items I0

CH4.140

Example.

I0 Since F .( E ) is in I0

E’ .E

Goto(I0, E)=I1 And Goto(I0,( )=I4

E .E + T

E .T Goto(I0,T)=I2 we set ACTION(0, ( )=s4

T .T * F Goto(I0,( )=I4

T .F Since E’ E. is in I1

F .( E ) I4 We set ACTION(1,$)=acc

F .id F (.E)

E .E + T

E .T Since E T. is in I2 and

I1 Follow(E)={$,+,) }

E’ E. T .T * F

E E. + T T .F We set ACTION(2,$)=rE T

F .( E ) ACTION(2,+)=rE T

I2 F .id ACTION(2,))=rE T

E T.

T T. * F Follow(T)=Follow(F)={ ) , + , * , $ }

CH4.141

Construct the whole table..

(SLR table has no multiply defined labels).

CH4.142

LR(0) parsing : Another Example

LR(0) items:

an item is a rule from the grammar combined with “.”

to indicate where the parser currently is in the input

eg: S’ ::= . S $ indicates that the parser is just beginning to

parse this rule and it expects to be able to parse S then $ next

A whole automaton state looks like this:

1

collection of

S’ ::= . S $

LR(0) items

S ::= . ( L )

state number S ::= . x

• LR(1) states look very similar, it is just that the items contain some look-ahead info

CH4.143

LR(0) parsing

and construct its closure

the closure adds more items to a set when the “.”

appears to the left of a non-terminal

if the state includes X ::= s . Y s’ and Y ::= t is a rule

then the state also includes Y ::= . t

Grammar:

0. S’ ::= S $ 1

• S ::= ( L )

• S ::= x S’ ::= . S $

• L ::= S

• L ::= L , S

CH4.144

LR(0) parsing

and construct its closure

the closure adds more items to a set when the “.”

appears to the left of a non-terminal

if the state includes X ::= s . Y s’ and Y ::= t is a rule

then the state also includes Y ::= . t

Grammar:

0. S’ ::= S $ 1

• S ::= ( L ) S’ ::= . S $

• S ::= x S ::= . ( L )

• L ::= S

• L ::= L , S

CH4.145

LR(0) parsing

and construct its closure

the closure adds more items to a set when the “.”

appears to the left of a non-terminal

if the state includes X ::= s . Y s’ and Y ::= t is a rule

then the state also includes Y ::= . t

Grammar:

0. S’ ::= S $ 1

• S ::= ( L ) S’ ::= . S $

• S ::= x S ::= . ( L ) Full

• L ::= S S ::= . x Closure

• L ::= L , S

CH4.146

LR(0) parsing

To construct an LR(0) automaton:

start with start rule & compute initial state with

closure

pick one of the items from the state and move “.”

to the right one symbol (as if you have just

parsed the symbol)

this creates a new item ...

... and a new state when you compute the closure of

the new item

mark the edge between the two states with:

– a terminal T, if you moved “.” over T

– a non-terminal X, if you moved “.” over X

continue until there are no further ways to move “.” across items

and generate new states or new edges in the automaton

CH4.147

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x

• L ::= S

• L ::= L , S

S’ ::= . S $

S ::= . ( L )

S ::= . x

CH4.148

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x

• L ::= S

• L ::= L , S

S’ ::= . S $

S ::= . ( L )

S ::= . x

S’ ::= S . $

CH4.149

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x

• L ::= S

• L ::= L , S

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S

S ::= . x L ::= . L , S

S ::= . ( L )

S S ::= . x

S’ ::= S . $

CH4.150

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x

• L ::= S

• L ::= L , S

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S

S ::= . x L ::= . L , S

S ::= . ( L )

S S ::= . x

S’ ::= S . $

CH4.151

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x

• L ::= S

• L ::= L , S

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S

S ::= . x L ::= . L , S

S ::= . ( L )

S S ::= . x

S’ ::= S . $

CH4.152

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x

• L ::= S

• L ::= L , S

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

S’ ::= S . $

CH4.153

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x

• L ::= S

• L ::= L , S

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

S

S’ ::= S . $

L ::= S .

CH4.154

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x S ::= x .

• L ::= S

• L ::= L , S x

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

S

S’ ::= S . $

L ::= S .

CH4.155

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x S ::= x .

• L ::= S

• L ::= L , S x

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

)

S

S’ ::= S . $

S ::= ( L ) .

L ::= S .

CH4.156

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x S ::= x .

L ::= L , . S

• L ::= S

S ::= . ( L )

• L ::= L , S x

S ::= . x

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

)

S

S’ ::= S . $

S ::= ( L ) .

L ::= S .

CH4.157

Grammar:

0. S’ ::= S $

• S ::= ( L )

• S ::= x S ::= x .

L ::= L , . S

• L ::= S S ::= . ( L )

• L ::= L , S x

S ::= . x

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

)

S

S’ ::= S . $

S ::= ( L ) .

L ::= S .

CH4.158

L ::= L , S .

Grammar:

0. S’ ::= S $ S

• S ::= ( L )

• S ::= x S ::= x .

L ::= L , . S

• L ::= S

S ::= . ( L )

• L ::= L , S x

S ::= . x

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

)

S

S’ ::= S . $

S ::= ( L ) .

L ::= S .

CH4.159

L ::= L , S .

Grammar:

0. S’ ::= S $ S

• S ::= ( L )

• S ::= x S ::= x .

L ::= L , . S

• L ::= S

S ::= . ( L )

• L ::= L , S x

( S ::= . x

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

)

S

S’ ::= S . $

S ::= ( L ) .

L ::= S .

CH4.160

Grammar: L ::= L , S .

0. S’ ::= S $

• S ::= ( L ) S

• S ::= x

• L ::= S S ::= x .

x

• L ::= L , S L ::= L , . S

x S ::= . ( L )

x

( S ::= . x

(

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x L ::= . L , S

S ::= . ( L ) L S ::= ( L . )

S ::= . x L ::= L . , S

S

)

S

S’ ::= S . $

S ::= ( L ) .

L ::= S .

CH4.161

Assigning numbers to states:

Grammar: 9 L ::= L , S .

0. S’ ::= S $ S

• S ::= ( L ) 8

• S ::= x x

2 S ::= x .

• L ::= S L ::= L , . S

x

• L ::= L , S x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $

6 S ::= ( L ) .

7 L ::= S .

CH4.162

computing parse table

State i contains rule k: X ::= s . ==> table[i,T] = rk for all

terminals T

Transition from i to j marked with T ==> table[i,T] = sj

Transition from i to j marked with X ==> table[i,X] = gj

...

1

2 sn = shift & goto state n gn = goto state n

3 rk = reduce by rule k

... a = accept

n = error

CH4.163

9 L ::= L , S .

8 S

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1

2

3

4

...

CH4.164

9 L ::= L , S .

8 S

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1 s3

2

3

4

...

CH4.165

9 L ::= L , S .

8 S

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1 s3 s2

2

3

4

...

CH4.166

9 L ::= L , S .

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1 s3 s2 g4

2

3

4

...

CH4.167

9 L ::= L , S .

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1 s3 s2 g4

2 r2 r2 r2 r2 r2

3

4

...

CH4.168

9 L ::= L , S .

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1 s3 s2 g4

2 r2 r2 r2 r2 r2

3 s3 s2

4

...

CH4.169

9 L ::= L , S .

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1 s3 s2 g4

2 r2 r2 r2 r2 r2

3 s3 s2 g7 g5

4

...

CH4.170

9 L ::= L , S .

x

2 S ::= x .

x L ::= L , . S

x S ::= . ( L )

( S ::= . x

1 (

S’ ::= . S $ S ::= ( . L )

S ::= . ( L ) L ::= . S ,

S ::= . x ( L ::= . L , S 5

S ::= . ( L ) L S ::= ( L . )

3 S ::= . x L ::= L . , S

S

)

S

4 S’ ::= S . $ 7 L ::= S . 6 S ::= ( L ) .

states ( ) x , $ S L

1 s3 s2 g4

2 r2 r2 r2 r2 r2

3 s3 s2 g7 g5

4 a

...

CH4.171

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S

3 s3 s2 g7 g5

• L ::= L , S

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

8 s3 s2 g9

yet to read

9 r4 r4 r4 r4 r4

input: ( x , x ) $

stack: 1

CH4.172

states ( ) x , $ S L

0. S’ ::= S $ 1 s3 s2 g4

• S ::= ( L )

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S 3 s3 s2 g7 g5

• L ::= L , S 4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

9 r4 r4 r4 r4 r4

input: ( x , x ) $

stack: 1(3

CH4.173

1. S’ ::= S $ states ( ) x , $ S L

2. S ::= ( L ) 1 s3 s2 g4

3. S ::= x 2 r2 r2 r2 r2 r2

4. L ::= S

3 s3 s2 g7 g5

5. L ::= L , S

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

9 r4 r4 r4 r4 r4

input: ( x , x ) $

stack: 1(3x2

CH4.174

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S

• L ::= L , S 3 s3 s2 g7 g5

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read

8 s3 s2 g9

input: ( x , x ) $ 9 r4 r4 r4 r4 r4

stack: 1(3S

CH4.175

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S 3 s3 s2 g7 g5

• L ::= L , S 4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

8 s3 s2 g9

yet to read

9 r4 r4 r4 r4 r4

input: ( x , x ) $

stack: 1(3S7

CH4.176

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S 3 s3 s2 g7 g5

• L ::= L , S

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

( x , x ) $ 9 r4 r4 r4 r4 r4

input:

stack: 1(3L

CH4.177

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S

3 s3 s2 g7 g5

• L ::= L , S

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

( x , x ) $ 9 r4 r4 r4 r4 r4

input:

stack: 1(3L5

CH4.178

states ( ) x , $ S L

0. S’ ::= S $

1 s3 s2 g4

• S ::= ( L )

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S 3 s3 s2 g7 g5

• L ::= L , S 4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

9 r4 r4 r4 r4 r4

input: ( x , x ) $

stack: 1(3L5,8

CH4.179

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S

3 s3 s2 g7 g5

• L ::= L , S

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

input: ( x , x ) $ 9 r4 r4 r4 r4 r4

stack: 1(3L5,8x2

CH4.180

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L )

1 s3 s2 g4

• S ::= x

• L ::= S 2 r2 r2 r2 r2 r2

• L ::= L , S 3 s3 s2 g7 g5

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read

8 s3 s2 g9

input: ( x , x ) $ 9 r4 r4 r4 r4 r4

stack: 1(3L5,8S

CH4.181

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x

• L ::= S 2 r2 r2 r2 r2 r2

• L ::= L , S 3 s3 s2 g7 g5

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

input: ( x , x ) $ 9 r4 r4 r4 r4 r4

stack: 1(3L5,8S9

CH4.182

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S

3 s3 s2 g7 g5

• L ::= L , S

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

( x , x ) $ 9 r4 r4 r4 r4 r4

input:

stack: 1(3L

CH4.183

0. S’ ::= S $ states ( ) x , $ S L

• S ::= ( L ) 1 s3 s2 g4

• S ::= x 2 r2 r2 r2 r2 r2

• L ::= S 3 s3 s2 g7 g5

• L ::= L , S

4 a

5 s6 s8

6 r1 r1 r1 r1 r1

7 r3 r3 r3 r3 r3

yet to read 8 s3 s2 g9

9 r4 r4 r4 r4 r4

input: ( x , x ) $

CH4.184

LR(0)

Even though we are doing LR(0) parsing we are using

some look ahead (there is a column for each non-terminal)

however, we only use the terminal to figure out which state

to go to next, not to decide whether to shift or reduce

states ( ) x , $ S L

1 s3 s2 g4

2 r2 r2 r2 r2 r2

3 s3 s2 g7 g5

CH4.185

LR(0)

Even though we are doing LR(0) parsing we are using some look ahead

(there is a column for each non-terminal)

however, we only use the terminal to figure out which state to go to next,

not to decide whether to shift or reduce

states ( ) x , $ S L

1 s3 s2 g4

2 r2 r2 r2 r2 r2

3 s3 s2 g7 g5

states no look-ahead S L

1 shift g4

2 reduce 2

3 shift g7 g5

CH4.186

LR(0)

look ahead (there is a column for each non-terminal)

however, we only use the terminal to figure out which state

to go to next, not to decide whether to shift or reduce

If the same row contains both shift and reduce, we will have

a conflict ==> the grammar is not LR(0)

Likewise if the same row contains reduce by two different

rules

states no look-ahead S L

1 shift, reduce 5 g4

2 reduce 2, reduce 7

3 shift g7 g5

CH4.187

SLR

number of conflicts in LR(0) tables by using a tiny bit of

look ahead

To determine when to reduce, 1 symbol of look ahead is

used.

Only put reduce by rule (X ::= RHS) in column T if T is in

Follow(X)

states ( ) x , $ S L

1 s3 s2 g4

2 r2 s5 r2

3 r1 r1 r5 r5 g7 g5

cuts down the number of rk slots & therefore cuts down conflicts

CH4.188

LR(1) & LALR

that make up the states

LR(0) items:

X ::= s1 . s2

look-ahead symbol added

LR(1) items

X ::= s1 . s2, T

Idea: sequence s1 is on stack; input stream is s2 T

Find closure with respect to X ::= s1 . Y s2, T by adding all

items Y ::= s3, U when Y ::= s3 is a rule and U is in First(s2

T)

Two states are different if they contain the same rules but the

rules have different look-ahead symbols

Leads to many states

LALR(1) = LR(1) where states that are identical aside

from look-ahead symbols have been merged

ML-Yacc & most parser generators use LALR CH4.189

Conflicts

Shift/Reduce Sometimes even unambiguous

Reduce/Reduce grammars produce multiply defined

labels (s/r, r/r conflicts)in the SLR

Example: table.

S’ S

SL=R | R

I0 = {S’ .S , S .L = R , S .R , L .* R ,

L * R | id

L . id , R .L}

RL

I1 = {S’ S. }

I 2 = {S L . = R , R L . } action[2, = ] ?

I3 = {S R.} s6

I4 = {L *.R , R .L , L .* R , (because of S L . = R )

L . id } I5 = {L id. } rR L

I6 = {S L = . R , R .L , L .* R , (because of R L .

L . id } … also I7, I8, I9 … and = follows R)

CH4.190

But Why?

Let’s consider a string that will exhibit the conflict.

id=id

$0 id=id$ s5

$0id5 =id$ r L id

$0L2 =id$ conflict…

ambig.)

R=id is not a sentential form!!!

Even though = might follow R …it does not in this

case.

…it does only when R is preceded by *

SLR finds a conflict because using Follow + LR(0)

items as the guide to find when to reduce is not the

best method.

CH4.191

Picture So Far

SLR construction:

based on canonical collection of LR(0) items –

gives rise to canonical LR(0) parsing table.

No multiply defined labels => Grammar is called

“SLR(1)”

Using the notion of LR(1) item and the canonical

LR(1) parsing table.

CH4.192

LR(1) Items

DEF. A LR(1) item is a production with a marker

together with a terminal:

E.g. [S aA.Be, c]

intuition: it indicates how much of a certain production

we have seen already (aA) + what we could expect next

(Be) + a lookahead that agrees with what should follow

in the input if we ever do Reduce by the production

S aABe

By incorporating such lookahead information into the

item concept we will make more wise reduce decisions.

Direct use of lookahead in an LR(1) item is only

performed in considering reduce actions. (I.e. when

marker is in the rightmost).

Core of an LR(1) item [S aA.Be, c] is the LR(0)

item S aA.Be

Different LR(1) items may share the same core.

CH4.193

Usefulness of LR(1) items

E.g. if we have two LR(1) items of the form

[ A . , a ] [ B . , b ] we will take

advantage of the lookahead to decide which

reduction to use (the same setting would

perhaps produce a reduce/reduce conflict in the

SLR approach).

An item [ A 1.2 , a ] is valid for a viable

prefix 1 if we have a rightmost derivation that

yields Aaw which in one step yields 12aw

CH4.194

Constructing the Canonical Collection of

LR(1) items

Initial item: [ S’ .S , $]

Closure. (more refined)

if [A.B , a] belongs to the set of items, and

B is a production of the grammar, then:

we add the item [B . , b]

for all bFIRST(a)

Goto. (the same)

A state containing [A.X , a] will move to a

state containing [AX. , a] with label X

Every state is closed according to Closure.

Every state has transitions according to Goto.

CH4.195

Constructing the LR(1) Parsing Table

Shift actions: (same)

If [A.b , a] is in state Ik and Ik moves to state

Im with label b then we add the action

action[k, b] = “shift m”

Reduce actions: (more refined)

If [A. , a] is in state Ik then we add the action:

“Reduce A”

into action[A, a]

Observe that we don’t use information from

FOLLOW(A) anymore.

Goto part of the table is as before.

CH4.196

Example I

construction

S’ S

S CC

CcC |d

FIRST

S cd

C cd

CH4.197

Example II

S’ S

SL=R | R

L * R | id

RL

FIRST

S * id

L * id

R * id

CH4.198

LR(1) more general to SLR(1):

S’ S I0 = { [S’ .S , $ ] action[2, = ] ?

SL=R | R [S .L = R , $ ] s6

L * R | id [S .R , $ ] (because of

[L .* R , = / $ ] S L. = R )

RL

[L . id , = / $ ] THERE IS NO

[R .L , $ ] } CONFLICT

ANYMORE

I1 = {[S’ S . , $ ]}

I2 = { [S L . = R , $ ] I5 = {[L id. , = / $ ]}

[R L . , $ ] } I6 = { [S L = . R , $ ]

[R .L , $ ]

I3 = { [S R. , $ ]} [L .* R , $ ]

[L . id , $ ] }

I4 = { [L *.R , = / $ ] I7 = {[L *R. , = / $ ]}

[R .L , = / $ ] I9 = {[L *.R , $ ]

[L .* R , = / $ ] I8 = {[R L. , = / $ ]} [R .L , $ ]

[L . id , = / $ ] } [L .* R , $ ]

I10 = {[L *R. , $ ]} [L . id , $ ] }

I11 = {[L id. , $ ]}

I12 = {[R L. , $ ]}

CH4.199

LALR Parsing

Canonical sets of LR(1) items

Number of states much larger than in the SLR construction

LR(1) = Order of thousands for a standard prog. Lang.

SLR(1) = order of hundreds for a standard prog. Lang.

LALR(1) (lookahead-LR)

A tradeoff:

Collapse states of the LR(1) table that have the same

core (the “LR(0)” part of each state)

LALR never introduces a Shift/Reduce Conflict if

LR(1) doesn’t.

It might introduce a Reduce/Reduce Conflict (that did

not exist in the LR(1))…

Still much better than SLR(1) (larger set of languages)

… but smaller than LR(1), actually ~ SLR(1)

What Yacc and most compilers employ.

CH4.200

Collapsing states with the same core.

E.g., If I3 I6 collapse then whenever the LALR(1)

parser puts I36 into the stack, the LR(1) parser

would have either I3 or I6

A shift/reduce action would not be introduced by

the LALR “collapse”

Indeed if the LALR(1) has a Shift/Reduce

conflict this conflict should also exist in the

LR(1) version: this is because two states with

the same core would have the same outgoing

arrows.

On the other hand a reduce/reduce conflict may be

introduced.

Still LALR(1) preferred: table proportional to

SLR(1)

Direct construction is also possible.

CH4.201

Error Recovery in LR Parsing

it holds that action[i,s] = empty

CH4.202

Panic Recovery Strategy I

Scan down the stack till a state Ij is found

Ij moves with the non-terminal A to some state

Ik

Ik moves with s’ to some state Ik’

Proceed as follows:

Pop all states till Ij

Push A and state Ik

Discard all symbols from the input till s’

There may be many choices as above.

[essentially the parser in this way determines that a

string that is produced by A has an error; it assumes

it is correct and advances]

Error message: construct of type “A” has error at

location X

CH4.203

Panic Recovery Strategy II

Ij moves with the terminal t to some state Ik

Ik with s’ has a valid action.

Proceed as follows:

Pop all states till Ij

Push t and state Ik

Discard all symbols from the input till s’

There may be many choices as above.

Error message: “missing t”

CH4.204

Example

E’ E

EE+E| action goto

|E*E

|(E) id + * ( ) $ E

| id 0 s3 e1 e1 s2 e2 e1 1

1 e3 s4 s5 e3 e2 acc

2 s3 e1 e1 s2 e2 e1 6

3 r4 r4 r4 r4 r4 r4

4 s3 e1 e1 s2 e2 e1 7

5 s3 e1 e1 s2 e2 e1 8

6 e3 s4 s5 e3 s9 e4

7 r1 r1 s5 r1 r1 r1

8 r2 r2 r2 r2 r2 r2

9 r3 r3 r3 r3 r3 r3 CH4.205

E’ E

EE+E|

|E*E

Collection of LR(0) items

|(E)

| id

I0 I2 I5 I8

E’ .E E (. E ) EE*.E EE*E.

E .E + E E .E + E E .E + E EE.+E

E .E * E E .E * E E .E * E EE.*E

E .( E ) E .( E ) E .( E )

E .id E .id E .id

I1 I3 I6 I9

E’ E. E id. E(E.) E(E).

EE.+E EE.+E

EE.*E I4 EE.*E

EE+.E

E .E + E I7 Follow(E’)=$

E .E * E EE +E. Follow(E)=+*)$

E .( E ) EE.+E

E .id EE.*E

CH4.206

The parsing table

id + * ( ) $ E

0 s3 s2 1

1 s4 s5 acc

2 s3 s2 6

3 r4 r4 r4 r4

4 s3 s2 7

5 s3 s2 8

6 s4 s5 s9

7 s4/r1 s5/r1 r1 r1

8 s4/r2 s5/r2 r2 r2

9 r3 r3 r3 r3

CH4.207

Error-handling

id + * ( ) $ E

0 s3 e1 s2 1

1 s4 s5 acc

2 s3 s2 6

3 r4 r4 r4 r4

4 s3 s2 7

5 s3 s2 8

6 s4 s5 s9

7 s4/r1 s5/r1 r1 r1

8 s4/r2 s5/r2 r2 r2

9 r3 r3 r3 r3

CH4.208

Error-handling

I0 I2 I5 I8

E’ .E E (. E ) EE*.E EE*E.

E .E + E E .E + E E .E + E EE.+E

E .E * E E .E * E E .E * E EE.*E

E .( E ) E .( E ) E .( E )

E .id E .id E .id

“missing operand”

:

e1 Push id into the stack and change to state 3

“missing operand”

CH4.209

Error-handling

id + * ( ) $ E

0 s3 e1 e1 s2 e1 1

1 s4 s5 acc

2 s3 s2 6

3 r4 r4 r4 r4

4 s3 s2 7

5 s3 s2 8

6 s4 s5 s9

7 s4/r1 s5/r1 r1 r1

8 s4/r2 s5/r2 r2 r2

9 r3 r3 r3 r3

CH4.210

Error-handling

id + * ( ) $ E

0 s3 e1 e1 s2 e2 e1 1

1 s4 s5 e2 acc

2 s3 s2 6

3 r4 r4 r4 r4

4 s3 e1 s2 7

5 s3 s2 8

6 s4 s5 s9

7 s4/r1 s5/r1 r1 r1

8 s4/r2 s5/r2 r2 r2

9 r3 r3 r3 r3

CH4.211

Error-handling

CH4.212

Error-handling state 1

id + * ( ) $ E

0 s3 e1 e1 s2 e2 e1 1

1 e3 s4 s5 acc

2 s3 s2 6

3 r4 r4 r4 r4

4 s3 s2 7

5 s3 s2 8

6 s4 s5 s9

7 s4/r1 s5/r1 r1 r1

8 s4/r2 s5/r2 r2 r2

9 r3 r3 r3 r3

CH4.213

Error-Handling

I1 I3 I6 I9

E’ E. E id. E(E.) E(E).

EE.+E EE.+E

EE.*E I4 EE.*E

EE+.E

E .E + E I7

E .E * E EE +E.

E .( E ) EE.+E

E .id EE.*E

“missing operator”

CH4.214

Intro to Translation

Side-effects and Translation Schemes.

side-effects

E’ E attached to the symbols

E E + E {print(+)} to the right of them.

| E * E {print(*)}

| {parenthesis++} ( E ) {parenthesis--}

| id { print(id); print(parenthesis); }

Side-effect in front of a symbol will be

executed in a state when we make the move

following that symbol to another state.

Side-effects on the rightmost end are executed

during reduce actions.

Do for example id*(id+id)$

CH4.215

Ambiguous Grammars

Ambiguous Grammars produce “conflicts”

Shift/reduce, or reduce/reduce.

Three “typical examples”

The dangling else

The “expression” grammar

Eqn pre-processor of Troff

CH4.216

Dangling Else Ambiguity

Recall Grammar

stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

S iSeS | iS | a

Parsing table.

CH4.217

Grammar Relationships

LL(1)

CH4.218

Canonical sets of LR(0) items

I0 I3 Follow(S)= $,e

S’ .S S a.

S .iSeS

S .iS I4

S .a S iS.eS

S iS.

I1

S’ S. I5

S iSe.S

I2 S .iSeS

S i.SeS S .iS

S i.S S .a

S .iSeS

S .iS I6

S .a S iSeS.

CH4.219

Parsing Table

1. S iSeS

2. S iS

3. S a

i e a $ S

0 s2 s3 1

1 acc

2 s2 s3 4

3 r3 r3

4 s5/r2 r2

5 s2 s3 6

6 r1 r1

Resolve: choose s5

e.g. parse iiaea to understand the conflict resolution.

CH4.220

Tracing

STACK INPUT Remark

$0 iiaea$

CH4.221

Expressions

Recall grammar:

E’ E

EE+T | T

TT*F | F

F ( E ) | id

E E + E | E *E | (E) | id

and see what will happen

CH4.222

Canonical Sets

I0 I2 I5 I8

E’ .E E .( E ) EE*.E EE*E.

E .E + E E .E + E E .E + E EE.+E

E .E * E E .E * E E .E * E EE.*E

E .( E ) E .( E ) E .( E )

E .id E .id E .id

I1 I3 I6 I9

E’ E. E id. E(E.) E(E).

EE.+E EE.+E

EE.*E I4 EE.*E

EE+.E

E .E + E I7

E .E * E EE +E. Follow(E’)=$

E .( E ) EE.+E Follow(E)=+*)$

E .id EE.*E

CH4.223

The Parsing Table

1. E E+E

2. E E*E

3. E (E)

4. E id

id + * ( ) $ E

0 s3 s2 1

1 s4 s5 acc

2 s3 s2 6

3 r4 r4 r4 r4

4 s3 s2 7

5 s3 s2 8

6 s4 s5 s9

7 s4/r1 s5/r1 r1 r1

8 s4/r2 s5/r2 r2 r2

9 r3 r3 r3 r3

CH4.224

Resolving Ambiguity

If the state contains E E op E .

and we find op on the input we

Reduce (left-association)

Shift (right-association)

If the state contains E E op E .

and we find op’ on the input we

Reduce (op’ has lower precedence than op)

Shift (op’ has higher precedence than op)

CH4.225

Subscript/Superscript

1. E EsubEsupE

2. E EsubE

3. E EsupE

4. E {E}

5. E c

Rule 1 will still produce problems,

i.e. when seeing a } or $

Being at the state that contains the item E EsubEsupE.

We will have two possibilities for reduce:

Either r3 or r1. (reduce/reduce conflict)

CH4.226

## Muito mais do que documentos

Descubra tudo o que o Scribd tem a oferecer, incluindo livros e audiolivros de grandes editoras.

Cancele quando quiser.