Você está na página 1de 22

The Chinese University of Hong Fall 2009

Kong
CSC 3130: Automata theory and formal languages

Normal forms and parsing

Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/cs
c3130
Testing membership and parsing
• Given a grammar

S → 0S1 | 1S0S1 | T
T→S|e

• How can we know if a string x is in its


language?
• If so, can we obtain a parse tree for x?
• Can we tell if the parse tree is unique?
First attempt

S → 0S1 | 1S0S1 | T
x = 00111
T→S|ε

• Maybe we can try all possible derivations:

S 0S1 00S11
01S0S11
0T1
1S0S1 10S10S1 when do we stop?
...
T S
ε
Problems

S → 0S1 | 1S0S1 | T
x = 00111
T→S|ε

• How do we know when to stop?

S 0S1 00S11
01S0S11 when do we stop?
0T1
1S0S1 10S10S1
...
Problems

S → 0S1 | 1S0S1 | T
x = 01011
T→S|ε

• Idea: Stop derivation when length exceeds |x|


• Not right because of ε -productions

S ⇒ 0S1 ⇒ 01S0S11 ⇒ 01S011 ⇒ 01011


1 3 7 6 5

• We might want to eliminate ε -productions too


Problems

S → 0S1 | 1S0S1 | T
x = 00111
T→S|ε

• Loops among the variables (S → T → S)


might make us go forever
• We want to eliminate such loops
Removal of ε -productions
• A variable N is nullable if there is a derivation


N⇒
• How to remove ε -productions (except from S)

 Find all nullable variables N1, ..., Nk


 For every production of the form A →
α Niβ ,
add another production A → α β
 If Ni → ε is a production, remove it
If S is nullable, add the special
production S → ε
Example
• Find the nullable variables

grammar nullable variables

S → ACD B C D
A→ a
B →ε
C → ED |
ε
D → BC | b
E →b

Find
 all nullable variables N1, ..., Nk
Finding nullable variables
• To find nullable variables, we work backwards
– First, mark all variables A s.t. A → ε as nullable
– Then, as long as there are productions of the form

A marked
where all of A1,…, Ak are → A1…asAknullable, mark A as
nullable
Eliminating ε -productions

S → ACD D →C
A→ a S → AD
B →ε D →B
C → ED | D →ε
ε S → AC
D → BC | b S →A
E →b C →E
nullable variables: B, C, D

 For every production of the form A → α Niβ ,


add another production A → α β
If Ni → ε is a production, remove it
Dealing with loops
• A unit production is a production of the
form A1 → A2

where A1 and A2 are both variables


• Example
grammar: unit productions:
S → 0S1 | 1S0S1 | T S T
T→S|R|ε
R → 0SR R
Removal of unit productions
• If there is a cycle of unit productions

A1 → A2 → ... → Ak → A1
delete it and replace everything with A1

• Example
S T
S → 0S1 | 1S0S1 
|T S → 0S1 | 1S0S1
|R|ε
T → S S→R|ε
R → 0SR R → 0SR
R

T is replaced by S in the {S, T} cycle


Removal of unit productions
• For other unit productions, replace every
chain
A1 → A2 → ... → Ak → α

by productions A1 → α ,... , Ak → α

•S
Example
→ 0S1 | 1S0S1 S → 0S1 | 1S0S1
|R|ε | 0SR | ε
R → 0SR R → 0SR

S → R → 0SR is replaced by S → 0SR, R → 0SR


Recap
• After eliminating ε -productions and unit productions,
we know that every derivation

* a …a where a , …, a are terminals


S⇒ 1 k 1 k
doesn’t shrink in length and doesn’t go into cycles

• Exception: S → ε
– We will not use this rule at all, except to check if ε ∈L

• Note
ε -productions must be eliminated before unit productions
Example: testing membership

S → 0S1 | 1S0S1 | Teliminate S → ε | 01 | 101 | 0S1


T→S|ε unit, ε -prod |10S1 | 1S01 | 1S0S1

x = 00111

S 01, 101
0S1 0011, 01011
00S11 only strings of length ≥ 6
strings of length ≥ 6
10S1 10011, strings of length ≥ 6
1S01 10101, strings of length ≥ 6
1S0S1 only strings of length ≥ 6
Algorithm 1 for testing
membership
• How to check if a string x ≠ ε is in L(G)

 Eliminate all ε -productions and unit


 productions
 Let X := S
While some new rule R can be applied to
X
Apply R to X
If X = x, you have found a derivation

for x
If |X| > |x|, backtrack
If no more rules can be applied to X, x is
not in L
Practical limitations of Algorithm I
• This method can be very slow if x is long

G = CFG of the java programming language


x = code for a 200-line java program

algorithm might take about 10200 steps!

• There is a faster algorithm, but it requires


that we do some more transformations on
the grammar
Chomsky Normal Form
• A grammar is in Chomsky Normal Form if every
production (except possibly S → ε ) is of the type

A → BC or A→a

• Conversion to Chomsky Normal Form is easy:

A → BcDE A → BCDE A → BX1


replace break up
terminals
C→c sequences X1 → CX2
with new with new X2 → DE
variables variables
C→c
Exercise
• Convert this CFG into Chomsky Normal Form:

S → ε |ADDA
A →a
C →c
D → bCb
Algorithm 2 for testing
membership

S → AB | BC SAC
A → BA | a – SAC
B → CC | b – B B
C → AB | a
SA B SC SA
B AC AC B AC
x = baaba
b a a b a

Idea: We generate each substring of x bottom up


Parse tree reconstruction

S → AB | BC SAC
A → BA | a – SAC
B → CC | b – B B
C → AB | a
SA B SC SA
B AC AC B AC
x = baaba
b a a b a

Tracing back the derivations, we obtain the parse tree


Cocke-Younger-Kasami algorithm

ta
ce ble
Input: Grammar G in CNF, string x = x1…xk lls
1k


For cells in last row
If there is a production A → xi
Put A in table cell ii 12 23
For cells st in other rows 11 22 kk
If there is a production A → BC x1 x2 … xk
where B is in cell sj and C is in
cell jt 1 s j t k
Put A in cell st

Cell ij remembers all possible derivations of substring xi…xj

Você também pode gostar