Você está na página 1de 35

(Regular Expression)

PERL

AGENDA

Introduction How pattern matching works Pattern Matching Operators Special Characters in Pattern Matching Pattern matching options Pattern Substitution Translation

Page 1

Introduction

A pattern/expression is a sequence of characters to be searched for in a character string. In Perl patterns are normally enclosed in slash characters.: ex - /def/

Page 2

Regular Expressions Basic Operations


Matching (m/PATTERN/) This operator returns true

if PATTERN is found

Substitution (s/PATTERN/REPLACEMENT/ ) - This

operator replaces the sub- string matched by PATTERN with REPLACEMENT.

Translation (tr/CHARACTERS/REPLACEMENTS) - This

operator replaces characters specified by CHARACTERS with the characters in REPLACEMENTS

Page 3

The Matching Operator (m//)

The matching operator (m//) is used to find patterns in strings. One of its more common uses is to look for a specific string inside a data file. The matching operator only searches the $_ variable. In m/ / we have two standard delimiters( / /) In m!! we have two other delimiters (!!) which is used when we want to match a path.
Refer reg.pl

Page 4

The Match operators (=~ , !~)

Perl defines special operators that test whether a particular pattern appears in a character string.

The =~ operator tests whether a pattern is matched Ex - $result = $var =~ /abc/; In this example, the value stored in the scalar variable $var is searched for the pattern abc. If abc is found, $result is assigned a nonzero value; otherwise, $result is set to zero.
Page 5

Continue..
The !~ operator is similar to =~, except that it checks whether a pattern is not matched. Because =~ and !~ produce either true or false as their result, these operators are ideally suited for use in conditional

expressions.

Refer reg1.pl,reg5.pl

Page 6

How to create Patterns


These are the rules used to create the patterns.

Variable Interpolation Character Sequences Alternation Character Classes Symbolic Character Classes Anchors Quantifiers Pattern Memory Word Boundaries Quoting Meta-Characters

Page 7

Variable Interpolation
Any variable is interpolated, and the essentially new pattern then is evaluated as a regular expression.

Ex - $needToFind = "bbb"; $_ = "AAA bbb AAA"; print "Found bbb\n" if m/$needToFind/;

Page 8

Character Sequences
A sequence of characters will match the identical sequence in the searched string. Ex - /def/ will match with def but not efd or

dfe.

$_=dfenation; print Found\n if (/def/);

Page 9

Alternation (|)
The alternation meta-character (|) will let us match more than one possible string. Ex - m/a|b/; will match if either the "a" character or the "b" character is in the searched string. You can use sequences of more than one character with alternation. Ex - m/dog|cat/; will match if either of the strings "dog" or "cat" is in the searched string.
Page 10

Character Classes []
The square brackets are used to create character classes. A character class is used to match a specific type of character. The character class [0123456789] defines the

class of decimal digits. [0-9a-f] defines the class of hexadecimal digits. Refer reg2

Page 11

Symbolic Character Classes


There are several character classes that are used so frequently that they have a symbolic representation. The period (.)meta-character stands for a special character class that matches all characters except for the newline. Refer reg3.pl

Page 12

Symbolic Character Classes


\w - This symbol matches any alphanumeric character or the underscore character. It is equivalent to the character class [a-zA-Z09_]. \W - This symbol matches every character that the \w symbol does not. In other words, it is the complement of \w. It is equivalent to [^a-zA-Z0-9_].

Page 13

Continue
\s - This symbol matches any space, tab, or newline character. It is equivalent to [\t \n]. \S - This symbol matches any non-whitespace character. It is equivalent to [^\t \n]. \d - This symbol matches any digit. It is equivalent to [0-9]. \D - This symbol matches any non-digit character. It is equivalent to [^0-9].

Page 14

Anchors(^ , $)
The caret (^) and the dollar sign ($) are used to

anchor a pattern to the beginning and the end of the searched string. The caret is always the first character in the pattern when used as an anchor. For example, m/^one/; will only match if the searched string starts with sequence of characters, one. The dollar sign is always the last character in the pattern when used as an anchor. For example, m/end$/; will match only if the searched string ends with end.either the character sequence last or the character sequence end.

Page 15

Quantifiers
Perl provides several different quantifiers that let us specify how many times a given component must be present before the match is true. They are used when you don't know in advance how many characters need to be matched.

Page 16

Continue
* - The component must be present zero or more times. Ex ab*c matches with ac,abc,abbc but doesnt match with abb,bbc + - The component must be present one or

more times. Ex ab+c matches with abc,abbc,abbbc but doesnt match with ac,abb Refer reg4.pl

Page 17

Continue
? - The component must be present zero or one times. Ex ab?c matches with ac,abc but does not match with abbc,abb. {n} - The component must be present n times. Ex ab{2}c matches with only abbc.

Page 18

Continue
{n,} - The component must be present at least n times. Ex ab{2,}c matches with abbc,abbbc but doesnt match with abc. {n,m} - The component must be present at least n times and no more than m times. Ex ab{2,3}c matches with abbc,abbbc.

Page 19

Pattern Memory/ Back-references


Parentheses are used to store matched values into buffers for later recall. This is called pattern memory or back-references. Ex - m/(fish|fowl)/; to match a string and a match is found, the variable $1 will hold either fish or fowl depending on which sequence was matched.

Page 20

Continues..
Use parenthesis for sub patterns Sub pattern matches will be saved in $1, $2, $3... $1, $2, $3 are called Back-references Back-references will have value even if other portions did not match Back-references will have last matched value if multiple matches Refer reg6.pl, reg7.pl

Page 21

Special Variables in Pattern Memory


Perl also has a few special variables to help you know what matched and what did not.

Page 22

Word Boundaries
The word-boundary pattern anchors, \b and \B, specify whether a matched pattern must be on a word boundary or inside a word boundary. The \b pattern anchor specifies that the pattern must be on a word boundary. Ex - /\bdef/ matches only if def is the beginning of a word.

Page 23

Continue
def\b/ matches def and abcdef, but not defghi . /\bdef\b/ matches only the word def, not abcdef or defghi. The \B pattern anchor is the opposite of \b. \B matches only if the pattern is contained in a word.

Page 24

Continue
Ex - /\Bdef/ matches abcdef, but not def.

/def\B/ matches defghi


/\Bdef\B/ matches cdefg or abcdefghi, but not def, defghi, or abcdef.

Page 25

Quoting Characters
There are another ways to tell Perl that a special

character is to be treated as a normal character is to precede it with the \Q escape sequence. When the Perl interpreter sees \Q, every character following the \Q is treated as a normal character until \E is seen. Ex /\Q^ab*/ matches any occurrence of the string ^ab*:- /\Q^ab\E*/ matches ^a followed by zero or more occurrences of b

Page 26

Escape Sequences for Special Characters


If we want our pattern to include a character that is normally treated as a special character, precede the character with a backslash \.

Ex The pattern /\*+/ matches with one or more occurrences of * in a string.To include a backslash in a pattern,we have to specify two backslashes: /\\+/

Page 27

Matching Any Letter or Number


As I have already told you the pattern

/a[0123456789]c/ pattern matches a, followed by any digit, followed by c. There is a another way of writing this: /a[0-9]c/ pattern matches a0c, a1c, a2c, and so on up to a9c. Similarly, the range [a-z] matches any lowercase letter, and the range [A-Z] matches any uppercase letter. The pattern /[A-Z][A-Z]/ matches any two uppercase letters.To match any uppercase letter,lowercase letter,or digit,we can use the following range: /[0-9azA-Z]/

Page 28

Substitution
The substitution operator is s///

The Perl interpreter searches for the pattern

specified by the placeholder pattern. If it finds pattern, it replaces it with the string represented by the placeholder replacement. substitution too

The optional i, g, and o switches apply to


The pattern to be replaced goes between the first

and second delimiters and third

The replacement pattern goes between the second


There should be variable on left of =~
Page 29

Cont..
$house = "henhouse"; $house =~ s/hen/dog/; Now, $house = doghouse

Refer reg11.pl, reg12.pl, reg15, reg13.pl, reg14.pl

Page 30

Options for the Substitution Operator


Options

Page 31

Translation

The translation operator (tr///) is used to change individual characters in the $_

variable . It requires two operands, like this: tr/a/z/; =>This statement translates all occurrences of a into z . Refer reg9.pl, reg10.pl

Page 32

Translation Options

Options c

Description This option complements the match character list. In other words, the translation is done for every character that does not match the character list. This option deletes any character in the match list that does not have a corresponding character in the replacement list. This option reduces repeated instances of matched characters to a single instance of that character

Page 33

Imagination
Page 34

Action

Joy

Você também pode gostar