Você está na página 1de 53

DVA337 HT17 - LECTURE 4 Languages and regular

expressions

1
SO FAR

2
TODAY
Formal definition of languages
 in terms of strings

Operations on strings and languages


Definition of regular expressions
Meaning of regular expressions in terms of languages
Outlook: practical use of regular expressions

3
LANGUAGES alphabets, strings, and
languages

4
LANGUAGE
How can we define what a (formal) language is?

5
LANGUAGE
We define a language to be a set of strings over an alphabet Σ
An alphabet is a set of symbols, e.g., { a, b, c, ..., z }
A string over an alphabet Σ is a sequence of symbols from the alphabet

What is the alphabet for the language


 L = { apple, pear, 1911 }
 L = { x : x is a binary string }

6
EXERCISE, LANGUAGE
How can we define the alphabet and language for

1) the programming language C?

2) written English?

7
STRINGS
8
STRINGS
For a1, a2, ... , an ∈ Σ the sequence a1a2 ... an is a string over Σ
The empty string is written λ

What are the strings over Σ = { a, b }?

Let u, v, w denote strings

9
CONCATENATION
For u = a1a2 ... an and v = b1b2 ... bn what is the concatenation of u and v, written uv?

10
PREFIX AND SUFFIX
For a string w = u v
 u is a prefix
 v is a suffix

All prefixes and suffixes for abbab?

11
SUBSTRING
For a string w = u1 v u2
 v is a substring
 prefix and suffix special cases of substring

Substrings of abbab?

12
LENGTH
For string w = a1a2 ... an, the length is n, |w| = n
 |λ| = 0
 |abbab| = 5

How can we define length recursively?

13
PROOF BY INDUCTION
Induction over natural numbers

To show that a property P holds for all natural numbers, ∀n ∈ ℕ . P(n), show

A base case, e.g., P(0)


An inductive step, ∀n ∈ ℕ . P(n) → P(n+1)

Why can we conclude ∀n ∈ ℕ . P(n) from this?

14
EXERCISE, LENGTH OF CONCATENATION
What is |u v| ?

Can we prove it?

15
LENGTH OF CONCATENATION
Theorem: |u v| = |u| + |v|
Proof: By induction on the length of v.

16
REVERSE
For a string w = a1a2 ... an what is the reverse wR of w?

What is a palindrome?

17
REPETITION
Let wn be w repeated n times, w w ... w

Can you write a recursive definition of wn?

18
N
Σ, STRINGS OF LENGTH N
Let Σn be the set of strings of length n over Σ
For Σ = {a, b}
 Σ0 = { λ }
 Σ1 = { a, b }
 Σ2 = { aa, ab, ba, bb }

How can we define Σn?

19
Σ*, KLEENE CLOSURE
Σ* is the set of all strings over Σ
{a,b}* = { λ, a, b, aa, bb, ab, ba, aaa, bbb, ... }

How can we define Σ*?

20
Σ*, KLEENE CLOSURE
We have that Σ* = Σ0 ∪ Σ1 ∪ ..., where
 Σ0 = { λ }
 Σn+1 = { x y : x ∈ Σ, y ∈ Σn }

Can we use this to define Σ*?


 as a fixpoint to F(S) ⊆ S for some F?

21
+
Σ , POSITIVE CLOSURE
Let Σ+ = Σ1 ∪ Σ2 ∪ ...

How can we define the positive closure?

22
EXERCISE
For Σ = {a, b} what is the cardinality of Σ3?
In general, what is the cardinality of Σn?

For Σ as below, give Σ* and Σ+


 Σ = { 0, 1 }
Σ={a}
Σ={}

23
EXERCISE
Prove that |Σn| = |Σ|n

24
LANGUAGES
25
LANGUAGE
A language L is a set of strings over an alphabet Σ
A language L is a subset of Σ*
For
 Σ = { a, b }
 Σ* = { λ, a, b, aa, ab, ba, bb, aaa, aab, ... }

Examples of languages over Σ?

26
EXERCISE
What is P(Σ*)?

27
SET OPERATIONS ON LANGUAGES
Since language are sets, the standard set operations apply.
For L1 = {a, b, aaa} and L2 = {bb, ab},
what is
 L1 ∪ L2
 L1 ∩ L2
 L1 ∖ L2

What is the complement of a language, L∁

28
REVERSAL AND CONCATENATION
Reversal and concatenation carry over from strings in the natural way

Reversal, LR = { wR : w ∈ L }
 { ab, aab, baba }R
 {anbn : n ≥ 0 }

Concatenation, L1L2 = { u v : u ∈ L1, v ∈ L2 }


 { ab, aab, baba }{b,aa}

29
REPETITION
With concatenation of languages defined, we can define repetition

L0 = { λ }
Ln+1 = { u v : u ∈ L, v ∈ Ln }

For L = { anbn : n ≥ 0}
 what is L2?
 what is L0?

30
CLOSURES
With repetition we can define Kleene closure and positive closure for languages

L* = L0 ∪ L1 ∪ ...
L+ = L1 ∪ L2 ∪ ...

What is L* in words?
If L* = L we say that L is Kleene closed
 Is C Kleene closed?

31
SUMMARY
An alphabet, Σ, is a set of symbols
A string is a sequence of symbols
 concatenation, reverse, length, substring, prefix, suffix, repetition
 Kleene closure Σ*, and positive closure Σ+

A language over Σ is a set of strings; a subset of Σ*


 union, intersection, difference, complement
 reverse, concatenation, repetition
 Kleene closure L*, and positive closure L+ (c.f., Σ* and Σ+ )

32
WHY IS THIS USEFUL?
Broad definition: any set of strings on an alphabet is a language

Methods of defining language


 grammars

Methods of deciding membership in languages


 How to answer the questions if a given string is in a given language
 Can membership always be decided?

33
REGULAR EXPRESSIONS
34
REGULAR EXPRESSIONS
∅, λ, and any α ∈ Σ are primitive regular expressions
If r1 and r2 are regular expressions, then so are
r1 + r2,
r1r2,
r* , and
(r)

35
EXERCISE
Is (a + bc)*(c+λ) a regular expression?

Is (a + b +) a regular expression?

36
INTUITIVE MEANING
Each regular expression over Σ defines a language over Σ
 think in terms of matching

∅, λ, and any α ∈ Σ are primitive regular expressions


If r1 and r2 are regular expressions, then so are
r1 + r2,
r1r2,
r* , and
(r)

37
EXAMPLE
What is the language defined by a + b?

What is the language defined by (ab)*?

Exercise, what is the language defined by (a + bc)*(c+λ)?

38
LANGUAGE DEFINED BY REGULAR EXPRESSIONS
How can we define the language of a
regular expression more formally?
Can we build a recursive function, L(r)
that defines the language of a regular
expression r?

Remember
 a language is a set of strings
 we have defined operations on languages:
union, concatenation, Kleene star

39
EXAMPLE
What is L((a + b)a*)?

40
EXERCISE
What is the language defined by (a+b)*(a+bb)

41
ON PRECEDENCE
What is the language defined by (a + b)a

What is the language defined by a + (ba)

Which one is a + ba?

42
EXERCISE
What is the language defined by (aa)*(bb)*b?

43
EXAMPLE
Create a regular expression over Σ = { 0, 1 } that defines the language where all
strings have at least two consecutive 0s
 001 ∈ L
 010 ∉ L

44
EXERCISE
Construct the regular expression over { 0, 1 } where no string has two consecutive 0s.
 010 ∈ L
 001 ∉ L

45
EQUIVALENCE OF REGULAR EXPRESSIONS
Two regular expressions are equivalent if they define the same language

L = { all strings over {0, 1} without consecutive 0 }


r1 = (1+01)*(0+λ)
r2 = (1+011*)*(0+λ)+1*(0+λ)

Since L = L(r1) = L(r2) we have that r1 and r2 are equivalent.

Can we prove that L(r1) = L(r2) in some way?


46
REGULAR EXPRESSIONS IN REALITY
Slightly richer alphabet and language than what we saw here, e.g.,
 quantifiers: *, +, ?, {m}, {m,}, {m,n}, …
 atoms: char, [chars], ., ^, $, \char

Example uses
 Lexical analysis - tokenization preceding parsing
 Text search – grep/egrep (unix)

Search for
 gr(a|e)y
 ^[-+]?[0-9]*\.?[0-9]+$

47
REGULAR EXPRESSIONS IN COMPILERS
• The programmer creates a program
• The lexer splits the program text into a stream of tokens and removes white space
• Literals: 1, 1.32, “Hello World!” …
• Keywords: if, while, …
• Variables: c, y, counter, …

• The token stream is passed to the parser that creates a parse tree, which is used
by the next step of the compiler – this simplifies the parse as it can work on tokens
rather than on characters.

Text Tokens Binary


Lexer Parser …

48
PARTS OF EXAMPLE PASCAL LEXER
white_space [ \t]*

digit [0-9]

alpha [A-Za-z_]

alpha_num ({alpha}|{digit})

hex_digit [0-9A-F]

identifier {alpha}{alpha_num}*

unsigned_integer {digit}+

hex_integer ${hex_digit}{hex_digit}*

exponent e[+-]?{digit}+

i {unsigned_integer}

real ({i}\.{i}?|{i}?\.{i}){exponent}?

string \'([^'\n]|\'\')+\’

and return(AND);

array return(ARRAY);
49
begin return(_BEGIN);
EXAMPLE TOKENIZATION
Consider the following PASCAL program
Program Lesson1_Program1;
Begin
Write('Hello World');
Readln;
End.
Which would produce the following token stream

PROGRAM IDENTIFIER BEGIN IDENTIFIER ( STRING ) ; IDENTIFIER ; END .

Note that the tokens are represented by integers and tokens like IDENTIFIER and STRING carry
the actual string representing the token.

50
REGULAR LANGUAGES
Topic for the next few lectures
Ways of defining regular language
 Regular Expressions (RE)
 Regular grammars

Ways of deciding membership in regular languages


 DFA and NFA

Equivalence of the approaches


 DFA  NFA  RE

51
REGULAR LANGUAGES
Regular Expression

DFA
Regular Language
NFA

Regular Grammar

52
DO THE EXERCISES!
Exercise material on the homepage
 exercises similar to what will be on exam

If you get stuck


 ask a friend
 ask me

If several of you have issues with one we’ll add it to a lecture.


53

Você também pode gostar