Você está na página 1de 19

Grammars

Definitions
Grammars
Backus-Naur Form
Derivation
terminology
trees

Grammars and ambiguity


Simple example
Grammar hierarchies
Syntax graphs
Recursive descent parsing

(2.1)

Definitions

(2.2)

Syntax

the form or structure of the expressions,


statements, and program units

Semantics

the meaning of the expressions,


statements, and program units

Sentence

a string of characters over some alphabet

Language

a set of sentences

Lexeme

the lowest level syntactic unit of a language


:=, {, while

Token

a category of lexemes (e.g., identifier)

Grammars

(2.3)

Can serve as generators or recognizers


recognizers used in compilers
well study grammars as generators

Contain 4 components
terminal symbols

atomic components of statements in the


language
appear in source programs

identifiers, operators, punctuation, keywords

nonterminal symbols

intermediate elements in producing terminal


symbols
never appear in source program

start (or goal) symbol

a special nonterminal which is the starting


symbol for producing statements

Grammars (continued)

(2.4)

4 components (continued)
productions
rules for transforming nonterminal
symbols into terminals or other
nonterminals
nonterminal ::= terminals and/or
nonterminals
each has lefthand side (LHS) and
righthand side (RHS)
every nonterminal must appear on LHS
of at least one production

(2.5)

Grammars (continued)
4 categories of grammars
regular
good for identifiers, parameter lists,
subscripts

context free
LHS of production is single non-terminal

context sensitive
recursively enumerable

enough
for PLs

Backus-Naur Form (BNF)

(2.6)

Used to describe syntax of PL; first used for


Algol-60
Nonterminals are enclosed in <...>
<expression>, <identifier>

Alternatives indicated by |

<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Options (0 or 1 occurrences) indicated by [...]


<stmt> ::= if <cond> then <stmt> [ else
<stmt>]
note recursion

Repetition (0 or more occurrences) indicated


by {...}
<unsigned> ::= <digit> {<digit>}

Derivation

repeated application of rules, starting with


start symbol and ending with sentence

BNF (continued)

(2.7)

Example grammar and derivation


<program>
<stmts>
<stmt>
<var>
<expr>
<term>

->
->
->
->
->
->

<stmts>
<stmt> | <stmt> ; <stmts>
<var> = <expr>
a | b | c | d
<term> + <term> | <term> - <term>
<var> | const

<program>

=>
=>
=>
=>
=>
=>
=>
=>

<stmts>
<stmt>
<var> = <expr>
a = <expr>
a = <term> + <term>
a = <var> + <term>
a = b + <term>
a = b + const

Derivation Terminology

(2.8)

Every string of symbols in the


derivation is a sentential form
A sentence is a sentential form that
has only terminal symbols
A leftmost derivation is one in which
the leftmost nonterminal in each
sentential form is the one that is
expanded
similarly for rightmost derivation

A derivation may be neither leftmost


nor rightmost

(2.9)

Derivation Trees
A derivation tree is the tree resulting from applying
productions to rewrite start symbol

a parse tree is the same tree starting with terminals and


building back to the start symbol
<program>

<stmts>

<stmt>

<expr>

<var>=

a
<term>+<term>

(2.10)

Grammars and Ambiguity

A grammar is ambiguous iff it generates a sentential form that has


two or more distinct parse trees
An ambiguous expression grammar:
<expr> -> <expr> <op> <expr> | const
<op> -> / | -

<expr>
<expr>
<expr>

const

<expr>

<op>

<expr>

<expr> <op>

<op> <expr>

const

<expr>
<expr>

const

const

const

<op>

<expr>

const

Grammars and Ambiguity (continued)

(2.11)

We must have unambiguous grammars so compiler can


produce correct code
because parse tree provides precedence and associativity of
operators

Left recursive grammars produce left associativity


Right recursive grammars produce right associativity
An unambiguous expression grammar:
<expr> -> <expr> - <term> | <term>
<term> -> <term> / const | const

<expr>

<expr>
<term>

<term>
<term>/const

Grammars and Ambiguity (continued)

(2.12)

One famous ambiguity is dangling else


<stmt> ::= if <cond> then <stmt> [else <stmt>]

This can derive


if X > 9
then if B = 4
then X := 5
else X := 0

Grammars and Ambiguity (continued)

(2.13)

Can solve syntactically by adding


nonterminals & prod
<stmt> ::= <matched> | <unmatched>
<matched> ::= if <cond> then
<matched> else <matched>
<unmatched> ::= if <cond> then
<stmt> | if <cond> then <matched>
else <unmatched>

Can also solve semantically


elses are associated with immediately
preceding unmatched then

Grammar Hierarchies

(2.14)

BNF (and equivalent notations such as


syntax graphs) can describe context free
grammars
nonterminals appear alone on the LHS of
productions

But there is a whole hierarchy of


grammar types
recursively enumerable
context sensitive
context free
regular

Context free grammars can describe the


essential features of all current PLs
Regular grammars are good for
identifiers, parameter lists, etc.

Simple Grammar Example

(2.15)

Consider following unambiguous grammar


for expressions
<expr> ::= [<expr> <addop>] <term>
<term> ::= [<term> <mulop>] <factor>
<factor> ::= (<expr>) | <digit>
<addop> ::= + | <mulop> ::= * | /
<digit> ::= 0 | ... | 9

This grammar is left recursive and


generates expressions that are left
associative
Changing <factor> production produces
right associative exponentiation
<factor> ::= <expon> [ ** <factor> ]

(2.16)

Syntax Graphs
Are equivalent to CFGs

put the terminals in circles or ellipses and put the


nonterminals in rectangles;
connect with lines with arrowheads

Terminals in circles
Non-terminals in rectangles
Lines and arrows indicate how constructs are built
type_identifier
(

identifier

,
constant

..

constant

Recursive Descent Parsing


Parsing is the process of tracing or
constructing a parse tree for a given
input string
Parsers usually do not analyze
lexemes
done by a lexical analyzer, which is
called by the parser

(2.17)

Recursive Descent Parsing (continued)

(2.18)

A recursive descent parser traces out a


parse tree in top-down order
top-down parser

Each nonterminal in the grammar has a


subprogram associated with it
the subprogram parses all sentential
forms that the nonterminal can generate

The recursive descent parsing


subprograms are built directly from the
grammar rules
Recursive descent parsers, like other topdown parsers, cannot be built from leftrecursive grammars

Recursive Descent Parsing (continued)

(2.19)

Example
For the grammar:
<term> -> <factor> {(* | /) <factor>}
void term ()
{
factor ();
/* parse the first factor*/
while (next_token == ast_code ||
next_token == slash_code)
{
lexical (); /* get next token */
factor (); /* parse the next factor */
}
}