Escolar Documentos
Profissional Documentos
Cultura Documentos
Grammars
What is a Grammar?
Chomsky Classification
Ambiguous Grammars
What is a Grammar?
The grammar (G) of a language can be formally defined as a 4-tuple G = (N, T, S, P),
where
N is the finite set of non-terminal symbols,
T is the finite set of terminal symbols,
S is the starting symbol, which must be a member of the set N, and
P is the finite set of productions that recursively define the language.
The starting symbol is the unique non-terminal symbol that is used as a starting point in
generating all strings of the language. A production is simply a rule that defines a string
transformation. It has the general form:
α→β
Any occurrence of the string α in the string to be transformed can be replaced by the
string β.
Example:
E → E A E | ( E ) | id
A→ + | * P
id → a | b | c
Here
N = {E, A, id}
T = {+, *, (, ), a, b, c}
S=E
Let us derive the string a + (b*c):
E
EAE (using E → E A E)
id A E (using E → id)
aAE (using id → a)
2
a+E (using A → +)
a+(E) (using E → ( E ))
a+(EAE) (using E → E A E)
a + (id A E) (using E → id)
a + (b A E) (using id → b)
a + (b * E ) (using A → * )
a + (b * id) (using E → id)
a + (b * c) (using id → c)
A sentential form is any string that can be derived from the starting symbol e.g. E A E,
a A E, and a + (b*E).
A sentence is a sentential form that does not contain any non-terminal symbol; it just
contains terminal symbols and cannot be expanded any further e.g. a + (b*c).
Chomsky Classification of Grammars
Clearly, the major part of a grammar definition is P – the set of productions. It is
therefore very important to examine the possible structure of these productions.
Suppose U is the set of all terminal and non-terminal symbols of a language; that is, U =
N u T. The notation U+ is used to denote the positive closure of the set U, the set of all
non-empty strings that can be formed by the concatenation of members of U. U*, on the
other hand, denotes the closure of U; that is, the set U+ u {ε }, where ε is the empty string.
Chomsky Avram Noam classified grammars into following four types:
o Type 0 grammar (free grammar)
o Type 1 grammar (context-sensitive grammar)
o Type 2 grammar (context-free grammar)
o Type 3 grammar (finite, finite-state or regular grammar)
Type 0 Grammar:
These grammars are also called free grammars. These grammars do not have
much relevance to today’s programming languages because they are too general,
requiring restrictions on the form of the productions so that the writing of a
compiler is a feasible task.
Type 1 Grammar:
A grammar in which each production has the form:
αAβ→αγβ
where
α, β and γ are members of U*
A is a single non-terminal symbol
These are the context-sensitive grammars. In the production above, A is
transformed to γ only when it occurs in the context of being preceded by α and
followed by β.
Most programming languages have aspects that can only be described by type 1
grammars. In practice, however, the use of a type 1 grammar is conventionally
avoided by augmenting the syntax description of the language, which is expressed
in a form that is not context sensitive, with additional rules, usually expressed in
English. For example, the scope rules of Pascal are not included in the formal
syntax definition – they are expressed in informal English.
Type 2 Grammar:
free grammar. Hence, languages such as Pascal and ALGOL 60 are defined as
context-free (or type 2) languages.
Type 3 Grammar:
Decreases Complexity
Note:
i. type3 grammars can be represented by Finite State Automata
ii. type 2 grammars can be represented by Push-Down Automata
iii. type1 grammars can be represented by Linear-Bounded Automata
iv. type0 grammars can be represented by Turing Machines
5
Ambiguous Grammars
A grammar that produces more than one leftmost derivation or more than one
rightmost derivation for the same sentence is said to be ambiguous.
E → E + E | E * E | id P
id → a | b | c
Here
N = {E, id}
T = {+, *, a, b, c}
S=E
a + id * E (using E → id)
a+b*E (using id → b)
a + b * id (using E → id)
a+b*c (using id → c)
E→E+T|T
T→ T * id | id P
id → a | b | c
Here
N = {E, T, id}
T = {+, *, a, b, c}
S=E
S → aS | Sa | ε P
Here
N = {S}
T = {a}
S=S
and
S
Sa (using S → Sa)
aSa (using S → aS)
a εa (using S → ε)
aa
S → aS | ε P
Here
N = {S}
T = {a}
S=S
Now every sentence has unique derivation, whether leftmost or rightmost.