Você está na página 1de 35

Parsing & parsergenerators

• LL(0) parser
- language contains only 1 string

• LL(1) parser
- recursive descent parsing procedures
- top down
- maximal restriction of LL(0) parser

• LR(0) parser
- bottom up
- items / item sets
- shift/reduce parsing
- conflicting actions

• LALR(1) parser
- maximal restriction LR(0) parser
- lookahead sets
- (TP)YACC

• SLR(1) parser
- restriction LR(0) parser based on Follow sets
- LALR(1) table ⊆ SLR(1) table ⊆ LR(0) table
1
parser
contextfree generator parser
grammar

parser
symbol derivation
sequence (parse tree)
(output scanner, (information on base
tokens) of which it can be
constructed)

2
1.
(
L → L, L
ambiguous
L→a

syntactic dominoes

• dominoes
L L

L , L a

• initial configuration
L

a , a , a

3
• complete configurations
L L
L L

L , L L , L
L , L L , L

L , L a a L , L
L , L a a L , L

a a a a
a a a a

• exhaustive searching

• backtracking

4
2.
(
L → L, a
leftrecursive,¬LL(1)
L→a

• dominoes
L L

L , a a

• complete configuration
L
L

L , a
L , a

L , a
L , a

a
a

5
3.
(
L → a, L
¬LL(1)
L→a

• dominoes
L L

a , L a

• complete configuration
L
L

a , L
a , L

a , L
a , L

a
a

6
4. factorisation applied to 3.

 L → a RL

RL → ε

 RL → , L (or: RL → , a RL)

• extra rule: L0 → L2

• dominoes
RL
L RL L0

a RL ε , L L 2

7
• complete configuration
L0
L0

L 2
L 2

a RL
a RL

, L
, L

a RL
a RL

, L
, L

a RL
a RL

8
• lookahead sets
( ∗
F ollow(A) if α ⇒ ε
LA(A → α) = F irst(α) ∪ ∗
∅ if ¬(α ⇒ ε)


F irst(α) = {a ∈ T | (∃w :: α ⇒ aw)}

F ollow(A) = {a ∈ T | (∃u, v :: S ⇒ uAav)}

L0 → L2
L → a RL
RL → ε LA(RL → ε) = {2}
RL → ,L LA(RL →, L) = {, }

• recursive descent parsing procedures (LL(1) parser)

proc PL = (T ERM (a); PRL)

proc PRL =
( if sym = commasym → T ERM (commasym);
PL
[] sym = eof sym → skip
fi
)

parser

PL ;
if sym = eof sym → skip f i
9
LL-parsing

x y


S2 ⇒ xγ (on basis of constructed tree)

γ ⇒ y ? (to be established)
(γ, y) configuration

LR-parsing

x y


γ⇒x (on basis of constructed trees)

S2 ⇒ γy ? (to be established)

10
initial plan

Z0 = {L0 → ·L2}

derived plans (in as far new)

Z0 \ Z0 = {L → ·a, L → ·L, a}

partially completed plans

s1 = {L → a·} s 1 \ s1 =∅
s2 = {L0 → L · 2, L → L·, a} s 2 \ s2 =∅
s3 = {L → L, ·a} s 3 \ s3 =∅
s4 = {L → L, a·} s 4 \ s4 =∅

partial DFA

Z0 a s1

s2 , s3 a s4

11
NFA

L
L0 → ·L2 L0 → L · 2
ε
a
L → ·a L → a·
ε ε
L
L → ·L, a L → L·, a
,
ε
L → L, ·a
a
L → L, a·

subset construction combined with reachability al-


gorithm transforms above NFA into (partial) DFA
on previous slide

12
L → L, a
L → a
L0 → L2

# s s\s a , L
0 L0 → ·L2 L → ·L, a 1 − 2
L → ·a
1 L → a· − − − −
2 L0 → L · 2 − − 3 −
L → L·, a
3 L → L, ·a − 4 − −
4 L → L, a· − − − −

DFA

0 L 2 , 3 a 4

13
L → L, a
L → a
L0 → L2

LR(0)-table

# a , 2 L
0 11 22,5,6
1 (1, L) (1, L)2 (1, L)8
2 33 @7
3 44
4 (3, L) (3, L)5 (3, L)6

successful parsing sequences

(0 , a, a, a2) (0 , a2)
1 s (01 , , a, a2) 1 s (01 , 2)
2 r (02 , , a, a2) 8 r (02 , 2)
3 s (023 , a, a2) @
7 acceptance
4 s (0234 , , a2)
5 r (02 , , a2)
3 s (023 , a2)
4 s (0234 , 2)
6 r (02 , 2)
7 @ acceptance

14
Properties LR(0)-parser (§ 4.3)

• regular stack: path from Z0 in DFA

• label of a state (6= Z0): unique label of all


incoming arrows of the state (symbol before ·)

• label function: regular stack


Z0s1s2 . . . sn −→ label(s1)label(s2) . . . label(sn)
(sequence of root labels of a sequence of trees
corresponding to the stack)
• if (Z0, xy) ∗ (γ, y), then γ is regular

and label(γ) ⇒ x

label(γ)

x y

• if (γ, y) ∗ (Z0Z1, 2) @ acceptance,



then S2 ⇒ label(γ)y
(Z1 unique state containing item S 0 → S · 2)

label(γ)

x y

∗ ∗
• if (Z0, w2) acceptance, then S2 ⇒ w2

• L(R) ⊆ L(G)
(L(R) language accepted by LR(0)-parser R)

• if X ⇒ x (X ∈ VN , x ∈ VT∗) and s X→ t
(in DFA), then (γs, xy) ∗ (γst, y) for all γ ∈ Γ∗
and y ∈ VT∗2 (γs always a prefix of the stack
during steps)

label(γs)
label(t)

x y


• if w ∈ L(G), then S ⇒ w and, since Z0 S→ Z1,
(Z0, w2) ∗ (Z0Z1, 2) @ acceptance

• L(G) ⊆ L(R)

• L(R) = L(G)
(also holds for nondeterm. LR(0)-parser)
• if γ is a regular stack, then there exists an x ∈
VT∗ such that (Z0, xy) ∗ (γ, y) for any y ∈ VT∗2
(a regular stack can be constructed)

• if γ is a regular stack, then there exists a y ∈


VT∗2 such that (γ, y) ∗ acceptance
(a regular stack can be completed)

• an deterministic LR(0)-parser blocks at the po-


sition of the first error in the input string
(input: xay, x ∈ pref (L(G)), xa ∈ / pref (L(G)))
L → a, L
L → a
L0 → L2

# s s\s a , L
0 L0 → ·L2 L → ·a, L 1 − 2
L → ·a
1 L → a·, L − − 3 −
L → a·
2 L0 → L · 2 − − − −
3 L → a, ·L L → ·a, L 1 − 4
L → ·a
4 L → a, L· − − − −

DFA
a
0 a 1 , 3 L 4

LR(0)-table

s a , 2 L
0 1 2
1 3
(1, L) (1, L) (1, L)
2 @
3 1 4
4 (3, L) (3, L) (3, L)
15
LR(0)-table

s a , 2 L
0 11 26,8
1 32
(1, L) (1, L) (1, L)4,8
2 @7
3 13 44,5
4 (3, L) (3, L) (3, L)5,6

successful parsing sequences

(0 , a, a, a2) (0 , a2)
1 s (01 , , a, a2) 1 s (01 , 2)
2 s (013 , a, a2) 8 r (02 , 2)
3 s (0131 , , a2) @
7 acceptance
2 s (01313 , a2)
3 s (013131 , 2)
4 r (013134 , 2)
5 r (0134 , 2)
6 r (02 , 2)
7 @ acceptance

LALR(1)-table (= SLR(1)-table)

s a , 2 L
0 1 2
1 3 (1, L)
2 @
3 1 4
4 (3, L)
16
S0 → S2
S → aA
L(G) = {abncn | n ≥ 0}
A → ε
A → bAc

# s s\s a b c S A
1 S 0 → ·S2 S → ·aA 2 3
2 S →a·A A→· 4 5
A → ·bAc
3 S0 → S · 2
4 A → b · Ac A → · 4 6
A → ·bAc
5 S → aA·
6 A → bA · c 7
7 A → bAc·

s a b c 2 S A
1 2 3
2 4 5
(0, A) (0, A) (0, A) (0, A)
3 @
4 4 6
(0, A) (0, A) (0, A) (0, A)
5 (2, S) (2, S) (2, S) (2, S)
6 7
7 (3, A) (3, A) (3, A) (3, A)

17
parsing sequences

(1 , a2) (1 , abc2)
s (12 , 2) s (12 , bc2)
r (125 , 2) r (125 , bc2)
r (13 , 2) r (13 , bc2)
@ acceptance blocked

(1 , abc2) (1 , abbcc2)
s (12 , bc2) s (12 , bbcc2) ∗
s (124 , c2) s (124 , bcc2) ∗
r (1246 , c2) s (1244 , cc2)
s (12467 , 2) r (12446 , cc2)
r (125 , 2) s (124467 , c2)
r (13 , 2) r (1246 , c2)
@ s (12467 , 2)
acceptance
r (125 , 2)
r (13 , 2)
@ acceptance
∗ other choice leads to
blocked configuration

18
R LR(0)-parser, R0 LALR(1)-parser

• @ ∈ act0(Z1, 2) ≡ @
|
∈ act(Z1, 2)
{z
∧ L(G) 6= ∅}
true

• t ∈ act0(s, a) ≡ t ∈ act(s, a)
(regular stacks can be constructed and com-
pleted)

• t ∈ nst0(s, A) ≡ t ∈ nst(s, A)

• acceptation-action and shift-actions remain in


act-table under restriction and nst-table remains
unchanged
(n, A) ∈ act0(s, a)

(n, A) ∈ act(s, a)
∧ (∃γ, δ, t, x, y : γ, δ ∈ Γ∗ ∧ |γs| = |δ| + n ∧ t ∈ Γ
∧ label(t) = A ∧ xay ∈ VT∗2
: (Z0, xay) ∗ (γs, ay)
(δt, ay) ∗ acceptance)

(∃α : α ∈ V n : A → α· ∈ s ∧ a ∈ LAbu(s, A, α))

a ∈ LAbu(s, A, α) ≡
A → α· ∈ s ∧
(∃γ, δ, t, x, y : γ, δ ∈ Γ∗ ∧ |γs| = |δ| + |α| ∧ t ∈ Γ
∧ label(t) = A ∧ xay ∈ VT∗2
: (Z0, xay) ∗ (γs, ay)
(δt, ay) ∗ acceptance)

(computation see § 4.7 (least fixedpoint))


a ∈ LAbu(s, A, α)

(Z0, xay) ∗ (γs, ay) (δt, ay) ∗ acceptance


label(γs) ⇒ x

label(δt) = label(δ)A ⇒ label(δ)α = label(γs)


S2 ⇒ label(δt)ay

∗ ∗
S2 ⇒ label(δ) |{z}
Aa y ⇒ label(γs)ay ⇒ xay

a ∈ f ollow(A)

f ollow(A) =

{a ∈ VT ∪ {2} | (∃α, β : αaβ ∈ V ∗2 : S2 ⇒ αAaβ)}
SLR(1)-parser:
(|α|, A) ∈ actSLR(1)(s, a) ≡ A → α· ∈ s∧a ∈ f ollow(A)

LR(0)-table ⊇ SLR(1)-table ⊇ LALR(1)-table


restriction
LR(0)-parser −→ SLR(1)-parser
restriction
−→ LALR(1)-parser

L(RSLR(1)) = L(G)
S0 ∗
S 0 =⇒ αXY β
X0 is accessible
X0 → X 1 . . . X n ∈ P 0
Xi+1 , . . . , Xj−1 ∈ EMPTY

X0 X ∈ LAST (Xi )
Y ∈ FIRST (Xj )

X1 · · · Xi Xi+1 · · · Xj−1 Xj · · · Xn

X Y
α ε ε ... ε ε β

19

EMPTY = {Y ∈ VN0 | Y ⇒ ε}
σ : V ∗ → P (V )
σ.ε = ∅
σ.(Xα) = {X} ∪ σ.α X ∈ V, α ∈ V ∗
computation

empty := ∅;
repeat
h := empty;
for all A, α : A → α ∈ P 0
do if σ.α ⊆ empty → empty := empty ∪ {A}
[] σ.α 6⊆ empty → skip
fi
od
until h = empty

more efficient replacement of for-statement

for all A : A ∈ VN0 \ h


do for all α : A → α ∈ P 0
do if
...
fi
od
od
20

FIRST (X) = {Y ∈ V | (∃β :: X ⇒ Y β}

FIRST 1(X) = {Y ∈ V | (∃α, β :: X → αY β ∈ P 0 ∧ α ⇒ ε}
computation

for all X : X ∈ V do f irst(X) := ∅ od;


for all A : A ∈ VN0
do for all α : A → α ∈ P 0
do for all γ, Y, δ : α = γY δ
do if σ.γ ⊆ EMPTY → f irst(A) := f irst(A) ∪ {Y }
[] σ.γ 6⊆ EMPTY → skip
fi
od
od
od; { f irst = FIRST 1 }
{ transitive closure (Warshall) }
for all Y : Y ∈ V
do for all X : X ∈ V
do if Y ∈ f irst(X) →
f irst(X) := f irst(X) ∪ f irst(Y )
[] Y 6∈ f irst(X) → skip
fi
od
od; { f irst = FIRST + 1 }
{ reflexive closure }
for all X : X ∈ V do f irst(X) := f irst(X) ∪ {X} od
{ f irst = FIRST ∗1 }
21

FOLLOW (X) = {Y ∈ V | (∃α, β : α, β ∈ V ∗ : S 0 ⇒ αXY β)}

for all X : X ∈ V do f ollow(X) := ∅ od;


for all A, α : A → α ∈ P 0
do for all β, B, γ, C, δ : α = βBγCδ
do if σ.γ ⊆ EMPTY →
for all Z : Z ∈ LAST (B)
do f ollow(Z) := f ollow(Z) ∪ FIRST (C)
od
[] σ.γ 6⊆ EMPTY → skip
fi
od
od{ f ollow = FOLLOW }

l, h := α, ∅;
{ inv.: l ++ l0 = α ∧ h = FIRST (l0) }
do l 6= ε →
if l :: l0X →
for all Z : Z ∈ LAST (X)
do f ollow(Z) := f ollow(Z) ∪ h
od;
if X ∈ EMPTY → h := h ∪ FIRST (X)
[] X 6∈ EMPTY → h := FIRST (X)
f i;
l := l0
fi
od
22
• item set closure s of s is the smallest set t
satisfying

– s⊆t

– if A → α · Bβ ∈ t and B → γ ∈ P 0 then
B → ·β ∈ t

• EF F (X) =

{Y ∈ V | (∃β :: X ⇒ Y β using no ε-rules)}
EF F1(X) = {Y ∈ V | (∃β :: X → Y β ∈ P 0)}
EF F = EF F1∗

• computation s
h = ( B : B ∈ VN0 ∧ (∃A, α, β :: A → α · Bβ ∈ s)
S

: EF F (B))
s = s ∪ {A → ·α | A ∈ h ∧ A → α ∈ P 0}

23
dimensions SLR(1)-tables

Pascal grammar:

50 terminals
50 nonterminals
100 production rules

leading to ± 300 states


thus a 300×100 table

table has a regular structure:

• a row contains many instances


of the same reduce(p)
• a column contains many instances
of the same shif t(s)

compact encoding possible

24
table compression
(stripping algorithm)

a b b
table M = c
a c d

bitmatrix B: B[i, j] ≡ M [i, j] = error

M [i, j]

if B[i, j] then error


else M [i, j]

if B[i, j] then error


else if j = 1 then a
b b
else c [i, j]
c d

if B[i, j] then error


else if j = 1 then a
else if i = 1 then b

else c [i, j]
c d

25
if B[i, j] then error
else if j = 1 then a
else if i = 1 then b
else if j = 2 then c
else if j = 3 then d

M [i0, j0]
...
if i = i0 then p
...
if j = j0 then q
...

IP os[i0] = sequence number test i = i0


JP os[j0] = sequence number test j = j0
IV al[i0] = p
JV al[j0] = q

if B[i, j] then error


else if IP os[i] < JP os[j]
then IV al[i]
else JV al[j]

26
a b
a b
−→ = M0
c d
c d

if B[i, j] then error


else if IP os[i] < JP os[j]
then IV al[i]
else if IP os[i] > JP os[j]
then JV al[j]
else M 0[IV al[i], JV al[j]]

27

Você também pode gostar