Você está na página 1de 7

1

Grammars
 What is a Grammar?
 Chomsky Classification
 Ambiguous Grammars
What is a Grammar?
The grammar (G) of a language can be formally defined as a 4-tuple G = (N, T, S, P),
where
N is the finite set of non-terminal symbols,
T is the finite set of terminal symbols,
S is the starting symbol, which must be a member of the set N, and
P is the finite set of productions that recursively define the language.
The starting symbol is the unique non-terminal symbol that is used as a starting point in
generating all strings of the language. A production is simply a rule that defines a string
transformation. It has the general form:
α→β
Any occurrence of the string α in the string to be transformed can be replaced by the
string β.
Example:
E → E A E | ( E ) | id
A→ + | * P
id → a | b | c
Here
N = {E, A, id}
T = {+, *, (, ), a, b, c}
S=E
Let us derive the string a + (b*c):
E
EAE (using E → E A E)
id A E (using E → id)
aAE (using id → a)
2

a+E (using A → +)
a+(E) (using E → ( E ))
a+(EAE) (using E → E A E)
a + (id A E) (using E → id)
a + (b A E) (using id → b)
a + (b * E ) (using A → * )
a + (b * id) (using E → id)
a + (b * c) (using id → c)
A sentential form is any string that can be derived from the starting symbol e.g. E A E,
a A E, and a + (b*E).
A sentence is a sentential form that does not contain any non-terminal symbol; it just
contains terminal symbols and cannot be expanded any further e.g. a + (b*c).
Chomsky Classification of Grammars
Clearly, the major part of a grammar definition is P – the set of productions. It is
therefore very important to examine the possible structure of these productions.
Suppose U is the set of all terminal and non-terminal symbols of a language; that is, U =
N u T. The notation U+ is used to denote the positive closure of the set U, the set of all
non-empty strings that can be formed by the concatenation of members of U. U*, on the
other hand, denotes the closure of U; that is, the set U+ u {ε }, where ε is the empty string.
Chomsky Avram Noam classified grammars into following four types:
o Type 0 grammar (free grammar)
o Type 1 grammar (context-sensitive grammar)
o Type 2 grammar (context-free grammar)
o Type 3 grammar (finite, finite-state or regular grammar)
Type 0 Grammar:

A grammar in which each production has the form:


α→β
where
α is a member of U+ (having at least one non-terminal symbol)
β is a member of U*
3

These grammars are also called free grammars. These grammars do not have
much relevance to today’s programming languages because they are too general,
requiring restrictions on the form of the productions so that the writing of a
compiler is a feasible task.
Type 1 Grammar:
A grammar in which each production has the form:
αAβ→αγβ
where
α, β and γ are members of U*
A is a single non-terminal symbol
These are the context-sensitive grammars. In the production above, A is
transformed to γ only when it occurs in the context of being preceded by α and
followed by β.
Most programming languages have aspects that can only be described by type 1
grammars. In practice, however, the use of a type 1 grammar is conventionally
avoided by augmenting the syntax description of the language, which is expressed
in a form that is not context sensitive, with additional rules, usually expressed in
English. For example, the scope rules of Pascal are not included in the formal
syntax definition – they are expressed in informal English.
Type 2 Grammar:

A grammar in which each production has the form:


A→γ
where
A is a single non-terminal symbol
γ is a member of U*
Such grammars are also referred as context-free grammars, since A can always be
transformed into γ without any concern of its context. This type of grammar is of
immense important in programming language design. It corresponds directly to
the BNF notation, where each production has a single non-terminal symbol on its
left-hand side, and so any grammar that is expressible in BNF must be a context-
4

free grammar. Hence, languages such as Pascal and ALGOL 60 are defined as
context-free (or type 2) languages.
Type 3 Grammar:

A grammar in which each production has the form:


A → a or A → a B
where
A and B are non-terminal symbols
a is a terminal symbol
These grammars are also called finite, finite-state or regular grammars. This type
of grammar is too restrictive for general-purpose programming languages, but it
does find application in the design of some of the structures used as components
of most programming languages. Structures such as identifiers, numbers and so
on can often be described using a regular grammar.

The hierarchy of grammars is represented diagrammatically in figure below. This figure


shows the inclusive nature of the hierarchy so that, for example, all type 3 languages are
also type 1 languages. In progressing from type 0 to type 3 grammars, language
complexity and hence complexity of recognizers, parsers or compilers decreases.

Type 2 Type 1 Type 0 Hierarchy of Chomsky


Type 3 grammars

Decreases Complexity
Note:
i. type3 grammars can be represented by Finite State Automata
ii. type 2 grammars can be represented by Push-Down Automata
iii. type1 grammars can be represented by Linear-Bounded Automata
iv. type0 grammars can be represented by Turing Machines
5

Ambiguous Grammars
A grammar that produces more than one leftmost derivation or more than one
rightmost derivation for the same sentence is said to be ambiguous.

For example, let us consider the grammar:

E → E + E | E * E | id P
id → a | b | c

Here
N = {E, id}
T = {+, *, a, b, c}
S=E

The sentence a + b*c has the two distinct leftmost derivations:


E
E+E (using E → E + E)
id + E (using E → id)
a+E (using id → a)
a+E*E (using E → E * E)
a + id * E (using E → id)
a+b*E (using id → b)
a + b * id (using E → id)
a+b*c (using id → c)
and
E
E*E (using E → E * E)
E+E*E (using E → E + E)
id + E * E (using E → id)
a+E*E (using id → a)
6

a + id * E (using E → id)
a+b*E (using id → b)
a + b * id (using E → id)
a+b*c (using id → c)

To eliminate ambiguity we change the grammar to:

E→E+T|T
T→ T * id | id P
id → a | b | c

Here
N = {E, T, id}
T = {+, *, a, b, c}
S=E

Now every sentence has unique derivation, whether leftmost or rightmost.


For example, we derive the sentence a + b*c using leftmost derivation:
E
E+T (using E → E + T)
T+T (using E → T)
id + T (using T → id)
a+T (using id → a)
a + T * id (using T → T * id)
a + id * id (using T → id)
a + b * id (using id → b)
a+b*c (using id → c)

There is no other way to derive it using leftmost derivation.


7

As second example, let us consider the grammar:

S → aS | Sa | ε P
Here
N = {S}
T = {a}
S=S

The sentence aa has the two distinct leftmost derivations:


S
Sa (using S → Sa)
Saa (using S → Sa)
εaa (using S → ε)
aa

and
S
Sa (using S → Sa)
aSa (using S → aS)
a εa (using S → ε)
aa

To eliminate ambiguity we change the grammar to:

S → aS | ε P
Here
N = {S}
T = {a}
S=S
Now every sentence has unique derivation, whether leftmost or rightmost.

Você também pode gostar