Você está na página 1de 21

Models of Language

Generation:
Grammars

1
11. Ambiguity of Context-free Grammars
Ambiguity in a language occurs either when a symbol or an expression has more than one meaning (e.g., story), or
when an expression can be (grammatically) parsed in two different ways. The former is called lexical (or semantic)
ambiguity, and the later syntactic (or structural) ambiguity. For example, in natural language, the sentence “A man
entered the room with a picture” can be interpreted (i.e., parsed) into two different grammatical structures as follows.

man entered room


A with picture
the a

man entered room


A
with picture the
a
This sentence is syntactically ambiguous. With no further information, it is impossible to know which way the sentence
should be translated. In formal language, given a grammar G and a sentence x (i.e., a string, in the formal language jargon),
parsing shows how x can be derived by the grammar. If x can be derived in two different ways, grammar G is ambiguous.
Parsing is one of the main functions of the compiler of a programming language. In this chapter we will study syntactic
ambiguity.

2
Ambiguity

11.1 Parse tree 290


11.2 Parse Tree and Ambiguity 293
11.3 Eliminating Ambiguity of an ambiguous CFG 295
Using parenthesis, Fixing the order of rule applications
Eliminating redundant rules
Setting up precedence and associativity
Rumination 305
Exercises 306

Dear Dad & Dear Son


Break Time
Dear Dad,
$chool i$ really great. I am making lot$ of friend$ and $tudying very hard. With all my $tuff, I $imply can`t
think of anything I need, $o if you would like, you can ju$t $end me a card, a$ I would love to hear from you.
Love,
Your $on

The Reply:
Dear Son,
I kNOw that astroNOmy, ecoNOmics, and oceaNOgraphy are eNOugh to keep even an hoNOr student busy.
Do NOt forget that the pursuit of kNOwledge is a NOble task, and you can never study eNOugh.
Love,
Dad
- Adrea -

3
Ambiguity

11.1 Parse Tree


In the formal language, the syntax (i.e., structure) of a string generated by a
grammar depends on the rules applied as well as their order of application. Syntax
provides critical information for the compiler to translate the string (i.e., a program)
into an object code. Thus, if a string can be derived in two different ways, it is
impossible to give a unique translation.
A parse tree is an efficient data structure to represent the syntax of a string derived
by the grammar and to translate it by the compiler. For example, figure (b) below
shows a parse tree for string p∨q∧r generated by grammar G in figure (a).
S
S ∨ S
A S ∧ S
G: S → S∨S | S∧S | ∼ S | A p
A A
A →p | q | r
q r
(a) (b)

4
Parse Tree Ambiguity

Given a derivation (i.e., sequence of rules applied) for a string w, the parse tree for
w with respect to the derivation is constructed as follows. First put the root node with
the start symbol S. Then, for each leaf node on the current tree with a nonterminal
label, say A, recursively expand the tree as follows:
Suppose that A → β is a rule applied next to derive w. For each symbol X
appearing in β , in the order from left to right, a child node with label X is add to A.
This procedure repeats until the tree has no leaf nodes (labeled with a nonterminal
symbol) to expand. Reading all the leaf nodes left to right on the final tree should
give the string w. This string of terminal symbols is called the yield of the parse tree.
S
S ∨ S
A S ∧ S
G: S → S∨S | S∧S | ∼ S | A p
A A
A →p | q | r
q r
(a) (b)

5
parse tree Ambiguity

In general, given a source code in programming environments, the compiler


constructs a parse tree of the code and then traversing the tree bottom up, left to
right, generates an object code (machine language program). For example, suppose
that for the string p∨q∧r the compiler has generated the parse tree in figure (b)
below.
The compiler generates machine language instructions that will access the values
of variables q and r, compute q∧r and store the result (usually in a register), access
the value of p, and finally execute the OR operation with the stored result of q∧r.
Because of the way of traversing the tree, bottom up, left to right, to generate the
object code, the order of logical operations depends on the tree. For the example,
the compiler generates an object code that will execute the logical expression p∨q∧r
in the order of p∨( q∧r ). S
S ∨ S
A S ∧ S
G: S → S∨S | S∧S | ∼ S | A
p
A A
A →p | q | r
q r
(a) (b)
6
Ambiguity
11.2 Parse Tree and Ambiguity
The two parse trees in figures (b) and (c) below show two parse trees yielding the
same string p∨q∧r. In other words, it can be derived by grammar G in figure (a) in
two ways. Consequently, the two parse trees imply that the expression p∨q∧r can be
evaluated in two different ways, i.e., p∨(q∧r) and (p∨)q∧r. This implies that for
grammar G, the operator precedence between the two logical operations ∨ (OR) and
∧ (AND) is ambiguous.

S S
S S S S
∨ ∧
A S ∨ S A
S ∧S
G: S → S∨S | S∧S | ∼ S | A p
A →p | q | r A A r
A A
q r p q
(a) (b) p∨(q∧r) (c) (p∨q)∧r

7
Parse Tree and Ambiguity Ambiguity

As we saw in the previous example, the existence of two parse trees yielding the
same string is a problematic property, called ambiguity, of a CFG that should be
eliminated. In real application, we cannot expect the correct result from a program
written in the language of an ambiguous grammar. Before we discuss how to
eliminate ambiguity from a CFG, we need a formal definition of it.
Definition (Ambiguity): A CFG G is ambiguous if there is a string x ∈L(G) for
which there are two parse trees yielding x.
Unfortunately, it is an unsolvable problem to decide whether an arbitrary CFG is
ambiguous or not. Also, there is no algorithm available that given an ambiguous
CFG, converts it to an unambiguous grammar. However, for a certain restricted
construct, it is possible to solve the problems. In this section we will present several
techniques with some examples.

8
Ambiguity

11.3 Eliminating Ambiguity of a CFG


(1) Binding with parenthesis.
Example: We know that the CFG G1 below is ambiguous because there are two
parse trees yielding the same string p∨q∧r. The ambiguity occurs because it can
generate the same string by applying S → S∨S followed by S → S∧S, or vice versa,
as shown, respectively, in figure (a) and figure (b).

S S
S S S
∨ S
A ∧
G1: S → S∨S | S∧S | ∼ S | A S ∧S S ∨ S A
p
A →p | q | r r
A A A A
q r p q
(a) (b
)

9
Eliminating Ambiguity Ambiguity

This ambiguity can be eliminated by parenthesizing the right side of those two
rules as shown in G2 below. The parentheses make the yields of the two parse trees
different as shown in figures (a) and (b).

Ambiguous G1: S → S∨S | S∧S | ∼ S | A A →p | q | r


Unambiguous G2: S → (S∨S) | (S∧S) | ∼ S | A A →p | q | r

S S
( ) )
S S ( S
∨ S
A ( S ∧S ) ( S S ) ∧ A

p
r
A A A A
q r p q
(a): (p ∨(q ∧r)) (b): ((p ∨q ) ∧r)

10
Ambiguity
Eliminating Ambiguity

The parenthesizing technique is simple, but has a serious drawback, because we


are altering the language by adding new terminal symbols, i.e., the parentheses.
However, this is a popular technique used in programming languages. Instead of the
parentheses they use other notations, for example, in Pascal “begin” and “end,” and
in C and C++, the braces ‘{’ and ‘}’.
(2) Fixing the order of applying rules.
Example 1. The language generated by CFG G3 below is {bicbj | i, j ≥ 0}. This
grammar is ambiguous because, for example, the string bcb can be derived by
generating the left side b first then the right side b, or vice versa, as shown below.

S S
b b
S S
G3 : S → bS | Sb | c
S b b
S
c c

11
Ambiguity
Eliminating Ambiguity

We can simply modify the grammar G3 to G4 as shown below such that left side
b’s, if any, are always generated first. Figure (b) is the only parse tree for string
bcb. Grammar G4 is unambiguous.

Ambiguous G3 : S → bS | Sb | c
Unambiguous G4 : S → bS | A A → Ab | c

S
b
S S
S
b b
S S A
S b b b
S A
c c c
(a) (b)

12
Ambiguity
Eliminating Ambiguity

Example 2. Using the technique of fixing the order of derivation, the ambiguous
CFG G1 that we have examined can be converted to an unambiguous grammar G5
shown in figure (a). Notice that this grammar generates the operators left to right in
the order they appear in the string.
S
Ambiguous:
∼ S
G1 : S → S∨S | S∧S | ∼ S | A
A ∨ S
A →p | q | r
A ∧ S
Unambiguous: p
q A ∨ S
G5 : S → A∨S | A∧S | ∼ S | A
q A
A →p | q | r
p
(a) (b): ∼ p∨q∧q∨p

13
Eliminating Ambiguity Ambiguity

(3) Eliminating redundant rules


The CFG G6 below is ambiguous because it can generate ab either by B or D. We
can simply delete one of the two and make the grammar unambiguous (see G7).

Ambiguous G6 : S → BD B → abb D → abd


Unambiguous G7 : S → BD B → abb D →d

The CFG G8 below is ambiguous because it can generate ε in two ways.


Applying the technique for minimizing the number of ε -production rules, we can
convert it to an unambiguous grammar G9.

Ambiguous G8 : S → BD B → bBcε D → dDeε


Unambiguous G9 : S → BDε B → bBcbc
D → dDede

14
Ambiguity
Eliminating Ambiguity

(4) Implementing operator precedence and associativity


Operator precedence and associativity are important rules for evaluating
mathematical expressions. In programming languages, the rules are defined by the
grammar. As we know, multiplication (*) and division (/) are given higher precedence
than addition (+) and subtraction (-). The assignment operator (=) is given the lowest.
Operator = is right associative, and all the others are left associative. According to this
order of precedence and associativity, the mathematical expression in figure (a) will
be evaluated as shown in figure (b).

a = ((b + (c * d)) – (e / f))


a=b+c*d–e/f (5) (3) (1) (4) (2)

(a) (b)

15
Ambiguity
Eliminating Ambiguity

Example. Assume that the logical operators ∼ , ∧, and ∨are right associative with
precedence given in that order (i.e., ∼ is at the top followed by ∧, and ∨at the bottom).
The ambiguous CFG G1 (repeated below) can be modified into an unambiguous G10
by implementing the precedence and associativity into the production rules.
Notice that every OR (∨) operator in the string should be generated by S → D∨S
before generating others. Then AND (∧) operators, if any, are generated by D → C∧D,
and finally NOT (∼ ) operators. Also notice that with D, there is no way to derive ∨,
and with C neither ∨nor ∧can be derived.
S
Ambiguous D ∨ S
G1: S → S∨SS∧S∼ SA ∨ S
A → pqr C D D
~ C C C ∧ D
Unambiguous
G10 : S → D∨SD D → C∧DC A A C
A
C → ∼ CA A → pqr r A
p q
(a) (b) p
16
Eliminating Ambiguity Ambiguity

This fixed order of rule applications facilitates the compiler generating an object
code to evaluate the string (i.e., a logical expression) according to the operator
precedence and associativity.
For example, because NOT (∼ ) operators are generated last, they appear on a
subtree of the parent node of a leaf node labeled with ∧or ∨. Hence, the compiler,
traversing the tree bottom up, left to right, generates the instructions which will
execute the NOT (∼ ) operators before the instructions executing other operators.
Similarly, we can see how the compiler generates the instructions to execute AND
(∧) operators before OR (∨) operators. S
D ∨ S
G10 : S → D∨SD D → C∧DC ∨ S
C D D
C → ∼ CA A → pqr
~ C C C ∧ D

A A C
∼ p ∨q ∨r ∧p ⇒ ((∼ p) ∨(q ∨(r ∧p))) A
(1) (4) (3) (2) r A
p q
(a) (b) p
17
Eliminating Ambiguity Ambiguity

Now, for the associativity, notice that identical operators are generated left to
right in the order they appear in the string. For example, all OR (∨) operators are
derived by applying S → D∨S recursively. Thus, the compiler, traversing the parse
tree bottom up, left to right, will generate an object code to evaluate the logic
expression following the order of right associativity.
If we want to implement left associativity for an operator used in the language,
we can simply reverse the right side of the rule which derives it. For example, we
use S → S∨D instead to make operator ∨left associative.
S
D ∨ S
∨ S
G10 : S → D∨SD D → C∧DC C D D
C → ∼ CA A → pqr ~ C C C ∧ D

A A C
∼ p ∨q ∨r ∧p ⇒ ((∼ p) ∨(q ∨(r ∧p))) A
(1) (4) (3) (2) p q r A
(a) (b) p

18
Ambiguity

Rumination (1): ambiguity


We learned that for a simple CFG, it is possible to analyze it to decide whether it is ambiguous or not, and if it is ambiguous,
convert it to an unambiguous one. However, as we mentioned before, most of the problems concerning ambiguity of CFG’s in
general are very difficult or even unsolvable. Here are some interesting facts.

• There are some languages that can only be generated by ambiguous context-free grammars. Such languages are call
inherently ambiguous. The following language L is an example.
L = {anbncm | m, n ≥ 1 } ∪ {anbmcm | m, n ≥ 1}
• It is an unsolvable problem to tell whether an arbitrary CFG is ambiguous or not.
• It is an unsolvable problem to convert an arbitrary ambiguous CFG, whose language is not inherently ambiguous, to an
unambiguous CFG.

Today’s Quote Break Time

Be slow in choosing a friend, slower in changing.


- Benjamin Franklin -

19
Ambiguity
Exercises

11.1 Show that each of the following CFG’s is ambiguous, and convert it to an unambiguous CFG.
(a) S → Sa | A | a A → a | bA
(b) S → AB | ε A→ a|ε B→ b|ε

11.2 (a) Show that the following CFG is ambiguous.

G: S → S + S | S * S | T T→ a|b

(b) In the CFG G above, let + and * be the addition and multiplication operators, respectively, and a and b be integer
variables. Convert G to a CFG G’ that satisfies the following conditions.
(i) L(G) = L(G’)
(ii) Operator * has higher precedence than operator +.
(iii) Both operators are left associative.
You should also present a parse tree showing that your grammar meets the required order of operator precedence and
associativity.

20
Exercises Ambiguity

The following problem is concerned with the ambiguity of the if-statement, which appears often in the text. Question (a)
is easy, but question (b) is challenging.

11.3 The syntax flow graph below defines a simplified <If-statement> of the Pascal programming language. Following
the convention, symbols in a circle or oval are terminals, and the words in rectangles correspond to nonterminal symbols.
(a) Transform the flow graph to a CFG G and show that G is ambiguous.
(b) Convert grammar G to an unambiguous CFG and explain in detail how you got your answer.

<IF-statement>

IF ( <bool> ) then <statement> else <statement>

a-b
<statement> <bool>
c
<IF-statement>

d
a+b

21

Você também pode gostar