Você está na página 1de 34

Specification of Tokens

 Regular expressions are an important


notation for specifying patterns.
 Operation on languages
 Regular expressions
 Regular definitions

Winter 2007 SEG2101 Chapter 8 1


Operations on Languages

Winter 2007 SEG2101 Chapter 8 2


Regular Expressions
 Regular expression is a compact notation
for describing string.
 In Pascal, an identifier is a letter followed
by zero or more letter or digits
letter(letter|digit)*
 |: or
 *: zero or more instance of

 a(a|d)*
Winter 2007 SEG2101 Chapter 8 3
Rules
  is a regular expression that denotes {}, the set
containing empty string.
 If a is a symbol in , then a is a regular expression that
denotes {a}, the set containing the string a.
 Suppose r and s are regular expressions denoting the
language L(r) and L(s), then
 (r) |(s) is a regular expression denoting L(r)L(s).
 (r)(s) is regular expression denoting L (r) L(s).
 (r) * is a regular expression denoting (L (r) )*.
 (r) is a regular expression denoting L (r).

Winter 2007 SEG2101 Chapter 8 4


Precedence Conventions
 The unary operator * has the highest
precedence and is left associative.
 Concatenation has the second highest
precedence and is left associative.
 | has the lowest precedence and is left
associative.
 (a)|(b)*(c)a|b*c

Winter 2007 SEG2101 Chapter 8 5


Example of Regular
Expressions

Winter 2007 SEG2101 Chapter 8 6


Properties of Regular
Expression

Winter 2007 SEG2101 Chapter 8 7


Regular Definitions
 If  is an alphabet of basic symbols, then a
regular definition is a sequence of definitions
of the form:
d1r1
d2r2
...
dnrn
 where each di is a distinct name, and each ri
is a regular expression over the symbols in
{d1,d2,…,di-1}, i.e., the basic symbols and
the previously defined names.
Winter 2007 SEG2101 Chapter 8 8
Examples of Regular Definitions

Example 3.5. Unsigned numbers

Winter 2007 SEG2101 Chapter 8 9


Recognition of Tokens
 A grammar for branching statements
 stmt  if expr then stmt
| if expr then stmt else stmt
|
 expr  term relop term
| term
 term  id
| number
Example
 Patterns for tokens in the grammar
 digit  [0-9]
digits  digit+
number  digits (. digits)? (E [+|-]? digits )?
id  letter (letter |digit)*
if  if
then  then
else  else
relop  < | > | <= | >= | = | < >
 ws  (blank | tab | newline)+
Tokens, their patterns, and attribute values

Lexemes Token name Attribute value


Any ws - -
if if -
then then -
else else -
Any id id Pointer to table entry
Any number number Pointer to table entry
< relop LT
<= relop LE
= relop EQ
<> relop NE
> relop GT
>= relop GE
Example
C=a+b*5
<id, pointer to symbol table entry>
<relop, EQ>
<id, pointer to symbol table entry>
<assign_op, ->
<id, pointer to symbol table entry>
<multi_op, ->
<num, pointer to symbol table entry>
Transition Diagrams
 Nodes: states, conditions that could occur during the process of scanning
the input looking for a lexeme that matches one of several patterns
 Edges: directed from state to state
 Labeled by a symbol or set of symbols
 Deterministic: there’s never more than one edge out of a given state with a
given symbol among its labels
 Certain states are accepting or final: a lexeme has been found
 Double circle
 If it’s necessary to retract the forward pointer, we shall additionally place a
* near that accepting state
 Start state, or initial state, is indicated by an edge, labeled “start”, entering
from nowhere
start < =
0 1 2 return(relop, LE)
>
3 return(relop, NE)
other
4 * return(relop, LT)
Example Transition Diagram for
relop
start < =
0 1 2 return(relop, LE)
>

= 3 return(relop, NE)
other
*
4 return(relop, LT)
> 5 return(relop, EQ)

=
6 7 return(relop, GE)

other *
8 return(relop, GT)
Recognition of Reserved Words
and Identifiers
 Problem: keywords look like identifiers
 Solution:
 Install the reserved words in the symbol table
initially
 Create separate transition diagrams for each
keyword
Examples for Identifiers and
Keywords
start *
letter other
9 10 11 return(getToken(),
installID())
letter or digit

*
Completion of the Running
Example – Unsigned Numbers

3.14E-5

3.14

314
Transition Diagram for
Whitespace

delim
*
start delim other
22 23 24

delim -> blank | tab | newline


start 0 < 1 = 2
>
= 3
other *
4
> 5
return(relop, EQ)
6 = 7
start lett othe
other 8 * 9 er 10 r 11

start digit other *


12 13 20

. deli
digi m
* start deli other *
14
digit t15 other
21 22 m
23 24
E
E digit

+ or - digit other *
16 17 18 19
digit

Transition Diagram
C code to find next start state
C Code for Lexical analyzers
Finite Automata
 Finite automata are recognizers
 They simply say “yes” or “no” about each input
string
 Two kinds:
 Nondeterministic finite automata (NFA)
 No restrictions on the labels of the edges
 Deterministic finite automata (DFA)
 For
each state, and for each symbol, there’s exactly
one edge with that symbol leaving that state
Nondeterministic Finite Automata
 NFA consists of
 A finite set of states S
 A set of input symbol , the input alphabet
 A transition function that gives, for each state,
and for each symbol in ∪{} a set of states
 A state s0 from S (the start state or initial state)
 A set of states F, a subset of S (the accepting
states, or final states)
 NFA can be represented by a transition
graph
 There’s an edge labeled a from state s to
state t iff t is one of the next states for state s
and input a
 It’s similar to a transition diagram except:
 The same symbol can label edges from one state
to several different states
 An edge may be labeled by , in addition to
symbols from the input alphabet
An Example NFA: (a|b)*abb

Transition Graph Transition Tables


a
Stat a b 
start a b b e
0 1 2 3
0 {0, 1} {0} 
b 1  {2} 
2  {3} 
3   
Example NFA: aa*|bb*
a
a
 1 3
start
0

b
 2 4

b
Deterministic Finite Automata
 DFA is a special case of an NFA where:
 There are no moves on input 
 For each state s and input symbol a, there’s
exactly one edge out of s labeled s
 Every regular expression and every NFA
can be converted to a DFA accepting the
same language
Example DFA accepting (a|b)*abb
b
b

start a b b
0 1 2 3

a
a
a
Construction of an NFA from a
Regular Expression
(Thomson’s algorithm)
 Basis:
 For expression , construct the NFA
start 
i f

 For subexpression a in , construct the NFA


start a
i f
NFA for the concatenation of two
regular expressions N(s).N(t)

start
i N(s) N(t) f

abb
start a b b
0 1 2 3
NFA for the union of two regular
expressions r=N(s)|N(t)
N(s)
 
start
i f
 
N(t)

a
a|b  1 2

start
0 5

 b 
3 4
NFA for the closure of a regular expression N(s)*

start  
i N(s) f

(a|b)* a
 2 3

start  
0 1 6 7

 b 
4 5


NFA for (a|b)*abb#

a
2 3
start  
  a b b #
0 1 6 7 8 9 10 11

 b 
4 5

Você também pode gostar