Escolar Documentos
Profissional Documentos
Cultura Documentos
expressions
1
SO FAR
2
TODAY
Formal definition of languages
in terms of strings
3
LANGUAGES alphabets, strings, and
languages
4
LANGUAGE
How can we define what a (formal) language is?
5
LANGUAGE
We define a language to be a set of strings over an alphabet Σ
An alphabet is a set of symbols, e.g., { a, b, c, ..., z }
A string over an alphabet Σ is a sequence of symbols from the alphabet
6
EXERCISE, LANGUAGE
How can we define the alphabet and language for
2) written English?
7
STRINGS
8
STRINGS
For a1, a2, ... , an ∈ Σ the sequence a1a2 ... an is a string over Σ
The empty string is written λ
9
CONCATENATION
For u = a1a2 ... an and v = b1b2 ... bn what is the concatenation of u and v, written uv?
10
PREFIX AND SUFFIX
For a string w = u v
u is a prefix
v is a suffix
11
SUBSTRING
For a string w = u1 v u2
v is a substring
prefix and suffix special cases of substring
Substrings of abbab?
12
LENGTH
For string w = a1a2 ... an, the length is n, |w| = n
|λ| = 0
|abbab| = 5
13
PROOF BY INDUCTION
Induction over natural numbers
To show that a property P holds for all natural numbers, ∀n ∈ ℕ . P(n), show
14
EXERCISE, LENGTH OF CONCATENATION
What is |u v| ?
15
LENGTH OF CONCATENATION
Theorem: |u v| = |u| + |v|
Proof: By induction on the length of v.
16
REVERSE
For a string w = a1a2 ... an what is the reverse wR of w?
What is a palindrome?
17
REPETITION
Let wn be w repeated n times, w w ... w
18
N
Σ, STRINGS OF LENGTH N
Let Σn be the set of strings of length n over Σ
For Σ = {a, b}
Σ0 = { λ }
Σ1 = { a, b }
Σ2 = { aa, ab, ba, bb }
19
Σ*, KLEENE CLOSURE
Σ* is the set of all strings over Σ
{a,b}* = { λ, a, b, aa, bb, ab, ba, aaa, bbb, ... }
20
Σ*, KLEENE CLOSURE
We have that Σ* = Σ0 ∪ Σ1 ∪ ..., where
Σ0 = { λ }
Σn+1 = { x y : x ∈ Σ, y ∈ Σn }
21
+
Σ , POSITIVE CLOSURE
Let Σ+ = Σ1 ∪ Σ2 ∪ ...
22
EXERCISE
For Σ = {a, b} what is the cardinality of Σ3?
In general, what is the cardinality of Σn?
23
EXERCISE
Prove that |Σn| = |Σ|n
24
LANGUAGES
25
LANGUAGE
A language L is a set of strings over an alphabet Σ
A language L is a subset of Σ*
For
Σ = { a, b }
Σ* = { λ, a, b, aa, ab, ba, bb, aaa, aab, ... }
26
EXERCISE
What is P(Σ*)?
27
SET OPERATIONS ON LANGUAGES
Since language are sets, the standard set operations apply.
For L1 = {a, b, aaa} and L2 = {bb, ab},
what is
L1 ∪ L2
L1 ∩ L2
L1 ∖ L2
28
REVERSAL AND CONCATENATION
Reversal and concatenation carry over from strings in the natural way
Reversal, LR = { wR : w ∈ L }
{ ab, aab, baba }R
{anbn : n ≥ 0 }
29
REPETITION
With concatenation of languages defined, we can define repetition
L0 = { λ }
Ln+1 = { u v : u ∈ L, v ∈ Ln }
For L = { anbn : n ≥ 0}
what is L2?
what is L0?
30
CLOSURES
With repetition we can define Kleene closure and positive closure for languages
L* = L0 ∪ L1 ∪ ...
L+ = L1 ∪ L2 ∪ ...
What is L* in words?
If L* = L we say that L is Kleene closed
Is C Kleene closed?
31
SUMMARY
An alphabet, Σ, is a set of symbols
A string is a sequence of symbols
concatenation, reverse, length, substring, prefix, suffix, repetition
Kleene closure Σ*, and positive closure Σ+
32
WHY IS THIS USEFUL?
Broad definition: any set of strings on an alphabet is a language
33
REGULAR EXPRESSIONS
34
REGULAR EXPRESSIONS
∅, λ, and any α ∈ Σ are primitive regular expressions
If r1 and r2 are regular expressions, then so are
r1 + r2,
r1r2,
r* , and
(r)
35
EXERCISE
Is (a + bc)*(c+λ) a regular expression?
Is (a + b +) a regular expression?
36
INTUITIVE MEANING
Each regular expression over Σ defines a language over Σ
think in terms of matching
37
EXAMPLE
What is the language defined by a + b?
38
LANGUAGE DEFINED BY REGULAR EXPRESSIONS
How can we define the language of a
regular expression more formally?
Can we build a recursive function, L(r)
that defines the language of a regular
expression r?
Remember
a language is a set of strings
we have defined operations on languages:
union, concatenation, Kleene star
39
EXAMPLE
What is L((a + b)a*)?
40
EXERCISE
What is the language defined by (a+b)*(a+bb)
41
ON PRECEDENCE
What is the language defined by (a + b)a
42
EXERCISE
What is the language defined by (aa)*(bb)*b?
43
EXAMPLE
Create a regular expression over Σ = { 0, 1 } that defines the language where all
strings have at least two consecutive 0s
001 ∈ L
010 ∉ L
44
EXERCISE
Construct the regular expression over { 0, 1 } where no string has two consecutive 0s.
010 ∈ L
001 ∉ L
45
EQUIVALENCE OF REGULAR EXPRESSIONS
Two regular expressions are equivalent if they define the same language
Example uses
Lexical analysis - tokenization preceding parsing
Text search – grep/egrep (unix)
Search for
gr(a|e)y
^[-+]?[0-9]*\.?[0-9]+$
47
REGULAR EXPRESSIONS IN COMPILERS
• The programmer creates a program
• The lexer splits the program text into a stream of tokens and removes white space
• Literals: 1, 1.32, “Hello World!” …
• Keywords: if, while, …
• Variables: c, y, counter, …
• The token stream is passed to the parser that creates a parse tree, which is used
by the next step of the compiler – this simplifies the parse as it can work on tokens
rather than on characters.
48
PARTS OF EXAMPLE PASCAL LEXER
white_space [ \t]*
digit [0-9]
alpha [A-Za-z_]
alpha_num ({alpha}|{digit})
hex_digit [0-9A-F]
identifier {alpha}{alpha_num}*
unsigned_integer {digit}+
hex_integer ${hex_digit}{hex_digit}*
exponent e[+-]?{digit}+
i {unsigned_integer}
real ({i}\.{i}?|{i}?\.{i}){exponent}?
string \'([^'\n]|\'\')+\’
and return(AND);
array return(ARRAY);
49
begin return(_BEGIN);
EXAMPLE TOKENIZATION
Consider the following PASCAL program
Program Lesson1_Program1;
Begin
Write('Hello World');
Readln;
End.
Which would produce the following token stream
Note that the tokens are represented by integers and tokens like IDENTIFIER and STRING carry
the actual string representing the token.
50
REGULAR LANGUAGES
Topic for the next few lectures
Ways of defining regular language
Regular Expressions (RE)
Regular grammars
51
REGULAR LANGUAGES
Regular Expression
DFA
Regular Language
NFA
Regular Grammar
52
DO THE EXERCISES!
Exercise material on the homepage
exercises similar to what will be on exam