Escolar Documentos
Profissional Documentos
Cultura Documentos
What is Perl?
Practical Extraction and Report Language
Perl is a Portable Scripting Language No compiling is needed. Runs on Windows, UNIX, LINUX and cygwin Fast and easy text processing capability
1989
October 18 Perl 3.0 is released under the GNU Protection License
1991
March 21 Perl 4.0 is released under the GPL and the new Perl Artistic License
Now
Perl 6
Learning Perl
By Larry Wall Published by O'Reilly By Larry Wall,Tom Christiansen and Jon Orwant Published by O'Reilly
Programming Perl
Web Site
http://safari.oreilly.com
Run it as follows:
Numerical Literals
Numerical Literals
Integer Floating Point Scientific Notation Scientific Notation Underscores instead of commas for long numbers
11
String Literals
String Literals
There
is more than one way to do it! 'Just don't create a file called -rf.' Beauty?\nWhat's that?\n Real programmers can write assembly in any language.
12
Types of Variables
Types of variables: Scalar variables : $a, $b, $c Array variables : @array Hash variables : %hash File handles : STDIN, STDOUT, STDERR
$a = 5; $a = perl;
13
14
$language = Perl;
if ($language == Perl
if ($language eq Perl)
Numeric : >
String : gt
Numeric : >=
Numeric : < Numeric : <=
String : ge
String : lt String : le
Less than
16
String Functions
Convert to upper case
$name = uc($name);
$name = ucfirst($name);
18
Variable Interpolation
Perl looks for variables inside strings and replaces
Character Interpolation
List of character escapes that are recognized
\n \t \r
Common Example :
# a number prints 22 looks like a string, but ... # will print 40!
21
If ... else ... Statements Unless ... else Statements While Loop Until Loop For Loops
23
if...else statement
if...else statement - use this statement to
execute some code if a condition is true and another code if the condition is false
if (condition)
24
statements.
unless ($weather eq Rain) { print Dress as you wish!\n; } else { print Umbrella!\n; }
While Loop
Example :
Until Loop
The until function evaluates an expression
For Loops
31
Exercise
Use a loop structure and code a program that
Exercise
#! /usr/bin/perl for ($i=0, $j=0; $i<100; $i++) { if ( $j==3){$chain.=B;$j=0;} else {$chain.=A; $j++;} print $chain\n; }
33
Exercise
for ($i=0; $i<100; $i++) { $v=rand 100; #print Patient $i $v\n; printf Patient %d %.2f\n\n, $i, $v; #%s : chaines, strings #%d : integer #%f : floating points }
34
Arrays
Array variable is denoted by the @ symbol @array = ( Larry, Curly, Moe );
To access the whole array, use the whole
array
Array Indexes start at 0 !!!!! To access one element of the array : use $
Why? Because every element in the array is scalar
print $array[0]\n; # prints : Larry
36
array
print $#array; # prints 2 in the previous # example
Note another way to find the number of
37
# DEFINE AN ARRAY @coins = ("Quarter","Dime","Nickel"); # PRINT THE ARRAY print "@coins"; print "<br />"; print @coins;
38
There is no specific slice() function for slicing up elements of an array. Instead PERL allows us to create a new array with elements of another array using array indexing.
myrangefriend.pl: #!/usr/bin/perl print "content-type: text/html \n\n"; #HTTP HEADER # SEQUENTIAL ARRAY @nums = (1..200); @slicenums = @nums[10..20,50..60,190..200]; print "@slicenums"; myrangefriend.pl: 11 12 13 14 15 16 17 18 19 20 21 51 52 53 54 55 56 57 58 59 60 61 191 192 193 194 195 196 197 198 199 200
44
Sorting Arrays
Perl has a built in sort function Two ways to sort: Default : sorts in a standard string comparisons order sort LIST Usersub: create your own subroutine that returns an integer less than, equal to or greater than 0 Sort USERSUB LIST
46
48
Array Operations
push(@ARRAY, LIST)
add the LIST to the end of the @ARRAY pop(@ARRAY) remove and return the last element of @ARRAY unshift(@ARRAY, LIST)
Check 04_arrayOps.pl
Arrays
If you want to assign the first value of an array into a scalar, the script would be : ($result)=@array; To assign the first two elements of an array into scalar values : ($result1,$result2)=@array scalar variable. $result=@array;
52
Arrays
last index number in an array. Add in a $ and it will provide : $result = $@array; the amount will have to be adjusted by one. $result = $@array+1; To copy one array to a new (second) array : @array1 = @array2 To add an new value to the beginning of an array, the UNSHIFT command is used : unshift(@array,newelement);
53
Arrays
To add a new value to the end of an array also has two options : (@array,newelement); @array=(@array,newelement);
Next is combining two arrays into a new array :
@newarray=(@firstarray,@secondarray);
To remove the first value of an array the SHIFT command is used : shift(@array); You can store that removed value into a scalar at the same time too : $result=shift(@array);
54
Arrays
To remove the last element of an array : pop(@array); To remove the last element of an array and store it in a scalar : $result=pop(@array); To replace a specific element in an array : $array[number]=$newelement;
55
# sort lexically @articles = sort @files; # same thing, but with explicit sort routine @articles = sort {$a cmp $b} @files; # same thing in reversed order @articles = sort {$b cmp $a} @files; # sort numerically ascending @articles = sort {$a <=> $b} @files; # sort numerically descending @articles = sort {$b <=> $a} @files; # sort using explicit subroutine name sub byage { $age{$a} <=> $age{$b}; # presuming integers } @sortedclass = sort byage @class;
57
Manipulating Arrays
Manipulating Arrays
Split a string into words and put into an array
previously
Split into characters @stooge = split( //, curly ); # array @stooge has 5 elements: c, u, r, l, y
60
Split cont..
Split on any character @array = split( /:/, 10:20:30:40); # array has 4 elements : 10, 20, 30, 40 Split on Multiple White Space @array = split(/\s+/, this is a test; # array has 4 elements : this, is, a, test
61
Arrays to Strings
Array to space separated string @array = (Larry, Curly, Moe); $string = join( ;, @array); # string = Larry;Curly;Moe Array of characters to string @stooge = (c, u, r, l, y); $string = join( , @stooge ); # string = curly
62
63
To remove the last element of the array (LIFO) $elment = pop @array; print $element; # prints Shemp @array now has the original elements (Larry, Curly, Moe)
64
65
Associative Arrays
An associative array, each ID key is
associated with a value. When storing data about specific named values, a numerical array is not always the best way to do it. With associative arrays we can use the values as keys and assign values to them.
@ ages[Raja'] = "32";
<?php
Multidimensional Arrays
A Perl array is a data type that allows you to
Two-dimensional Arrays
<?php @ shop = array( array("rose", 1.25 , 15), array("daisy", 0.75 , 25), array("orchid", 1.15 , 7) ); ?>
Three-dimensional Arrays
<?php @ shop = array(array(array("rose", 1.25, 15), array("daisy", 0.75, 25), array("orchid", 1.15, 7) ), array(array("rose", 1.25, 15), array("daisy", 0.75, 25), array("orchid", 1.15, 7) ), array(array("rose", 1.25, 15), array("daisy", 0.75, 25), array("orchid", 1.15, 7) ) ); ?>
@tab=([Monday,Tuesday], [Morning,Afternoon,Evening]);
Chop
The chop function is used to "chop off" the
Chomp
the chomp command on that last line instead.
$who_are_you = you
chomp ($who_are_you); print "You are $who_are_you!";
Length
The length function simply gives you back the
76
Substring
The substring function is a way to get a
77
78
# reverse word order $string = 'Yoda said, "can you see this?"'; @allwords = split(" ", $string); $revwords = join(" ", reverse @allwords); print $revwords, "\n"; this?" see you "can said, Yoda
79
$gnirts = reverse($string);
@sdrow = reverse(@words);
80
Paragraphs
82
Hashes
Hashes are complex list data, like arrays
except they link a key to a value. To define a hash, we use the percent (%) symbol before the name.
%coins = ("Quarter", 25, "Dime", 10, "Nickel", 5);
print %coins; Nickel5Dime10Quarter25
83
84
85
Special Variables
Global Scalar Special Variables.
$@ $EVAL_ERROR The Perl syntax error message from the last eval command.
$$ $PROCESS_ID or $PID $< The pid of the Perl process running this script.
$REAL_USER_ID or $UID
argument
A piece of data supplied to a program, subroutine,
function, or method to tell it what it's supposed to do. Also called a "parameter". Subroutines
A subroutine is a named block of code Separate from the main part of the program Usually put subroutines at end of file Subroutines can take arguments and return values print(), chomp(), chop() are built-in subroutines
called a function.
Defining a subroutine
sub header { print "-" x 79, "\n"; print "December Sales Report\n"; print "-" x 79, "\n"; }
99
command-line arguments
./argv.pl 1 2 3 4
perl argv.pl 1 2 3 4
thanks, you gave me 4 command-line
arguments: 1 2 3 4
100
<12 >
%f: a floating point number in decimal notation %e: a floating point number in scientific notation
101
named @ARGV
$ARGV[0] contains the first argument, $ARGV[1] contains
@ARGV array
number of arguments on the command line is $#ARGV +
1.
102
argument
Reading command line arguments from perl
Passing arguments
Accessing Function Parameters Setting Default Values for Function Parameters
103
file
A named collection of data, usually stored on
disk in a directory in a filesystem. Roughly like a document, if you're into office metaphors. In modern filesystems, you can actually give a file more than one name. Some files have special properties, like directories and devices.
open(INFILE,"myfile"): reading open(OUTFILE,">myfile"): writing open(OUTFILE,">>myfile"): appending open(INFILE,"someprogram |"): reading from program open(OUTFILE,"| someprogram"): writing to program opendir(DIR,"mydirectory"): open directo
Operations on an open file handle $a = <INFILE>: read a line from INFILE into $a @a = <INFILE>: read all lines from INFILE into @a $a = readdir(DIR): read a filename from DIR into $a @a = readdir(DIR): read all filenames from DIR into @a read(INFILE,$a,$length): read $length characters from INFILE into $a print OUTFILE "text": write some text in OUTFILE
Close files / directories close(FILE): close a file closedir(DIR): close a directory
binmode(HANDLE): change file mode from text to binary unlink("myfile"): delete file myfile rename("file1","file2"): change name of file file1 to file2 mkdir("mydir"): create directory mydir rmdir("mydir"): delete directory mydir chdir("mydir"): change the current directory to mydir system("command"): execute command command die("message"): exit program with message message warn("message"): warn user about problem message Example open(INFILE,"myfile") or die("cannot open myfile!"); Other About $_ Holds the content of the current variable Examples: while(<INFILE>) # $_ contains the current line read foreach (@array) # $_ contains the current element in @array
File Handlers
Opening a File: open (SRC, my_file.txt); Reading from a File $line = <SRC>; # reads upto a newline character Closing a File close (SRC);
file(s) or STDIN;
$line = <>;
automatically open file.txt and read the first line. If program is run ./prog.pl file1.txt file2.txt, this will first read in file1.txt and then file2.txt ... you will not know when one ends and the other begins.
you to enter text at the prompt, and will continue until you enter the EOF character
CTRL-D in UNIX
three stooges
#!/usr/local/bin/perl %stooges = (larry => 1, moe => 1, curly => 1 ); print Enter your name: ? ; $name = <STDIN>; chomp $name; if($stooges{ lc($name) }) { print You are one of the Three Stooges!!\n; } else { print Sorry, you are not a Stooge!!\n; }
And
File2.txt a b c
Write a program that takes the two files as arguments and outputs a third file that looks like:
File3.txt 1 a 2 b 3
end of a string. $line = this is the first line of text\n; chomp $line; # removes the new line character print $line; # prints this is the first line of # text without returning Chop : function that chops off the last character of a string. $line = this is the first line of text; chop $line; print $line; #prints this is the first line of tex
pattern
A template used in pattern matching.
interpretation as a regular expression. pattern matching Taking a pattern, usually a regular expression, and trying the pattern various ways on a string to see whether there's any way to make it fit.
Pattern Matching
Introduction Expressions Copying and Substituting Approximate Matching Simultaneously Matching from Where the Last Pattern Matching Letters Left Off Matching Words Greedy and Non-Greedy Matches Commenting Regular Expressions Detecting Duplicate Words Finding the N th Occurrence of a Match Expressing AND, OR, and NOT in a Matching Multiple Lines Single Pattern Reading Records with a Pattern Matching Multiple-Byte Characters Separator Matching a Valid Mail Address Extracting a Range of Lines Matching Abbreviations Matching Shell Globs as Regular Program: urlify Expressions Program: tcgrep Speeding Up Interpolated Matches Regular Expression Grabbag Testing for a Valid Pattern Honoring Locale Settings in Regular
126
Introduction
match( $string, $pattern );
$meadow !~ m/sheep/;
$meadow =~ s/old/new/;
immediately followed by a "v", then by an "i", then by an "n", and then finally by an "e".
128
three aspects
greed
eagerness
Backtracking
129
Greed
It is the principle that if a quantifier (like *) can
match a varying number of times Eagerness is the notion that the leftmost match wins. The engine is very eager to return you a match as quickly as possible, sometimes even before you are expecting it. Consider the match "Fred" =~ /x*/. Formally, it means zero or more of them, and in this case, zero sufficed for the eager matcher.
130
"Fred" =~ /x*
131
eagerness
$string = "good food";
$string =~ s/o*/e/;
good food geod food geed food
geed feed
ged food ged fed egood food
132
Pattern-Matching Modifiers
133
non-overlapping matches
134
Special Variables
$string = "And little lambs eat ivy"; $string =~ /l[^s]*s/; print "($`) ($&) ($')\n"; (And ) (little lambs) ( eat ivy)
135
$dst = $src;
$dst =~ s/this/that/; use: ($dst = $src) =~ s/this/that/;
136
/[A-Za-z'-]+/
# as many letters, apostrophes, and hyphens
/\b([A-Za-z]+)\b/
139
if (++$count == $WANT) { print "The third fish is a $1 one.\n"; Warning: don't `last' out of this loop } }
The third fish is a red one.
140
the special characters . (any character but newline), ^ (start of string), and $ (end of string) don't seem to work for you. This might happen if you're reading in multiline records or the whole file at once.
141
Lists all the files in the current directory that are postfixed '.c' Lists all the files in the current directory that are postfixed '.txt'
ls *.txt
characters:
Literals
Metacharacters
Metacharacters
Match more than just characters Match line position ^ start of a line $ end of a line
'ing' rather than just contain 'ing'. How would we change are regular expressions to accomplish this:
Previous Regular Expression: $word =~m/ ing / New Regular Expression: $word=~m/ ing$ /
words in our text that do not end in 'ing' How would we change are regular expressions to accomplish this:
Previous Regular Expression: $word =~m/ ing$ / New Regular Expression: !($word=~m/ (ing)$ /)
Matching Interogations
Literal Metacharacters
Suppose that you actually want to look for all strings
space but by tabs, returns, ect... Let's modify our split function to incorporate multiple white space
#!/usr/local/bin/perl while(<>) { chomp; @words = split/\s+/, $_; foreach $word(@words) { if($word=~m/ing$/) { print $word\n; } }
'word' : \b Examples:
/ Jeff\b / Match Jeff but not Jefferson / Carol\b / Match Carol but not Caroline / Rollin\b / Match Rollin but not Rolling /\bform / Match form or formation but not Information /\bform\b/ Match form but neither information nor formation
DOT Metacharacter
The DOT Metacharacter, '.' symbolizes any character
/ . oat/ Would possibly return : boat, coat, goat Note: remember '.*' usually means a bunch of anything,
PIPE Metacharacter
The PIPE Metacharacter is used for alternation / Bridget (Thomson | McInnes) / Match Bridget Thomson or Bridget McInnes but Bridget Thomson McInnes / B | bridget / Match B or bridget / ^( B | b ) ridget / Match Bridget or bridget at the beginning of a line
NOT
only get all words that end in 'ing' but also 'ed'. How would we change are regular expressions to accomplish this:
Previous Regular Expression: $word =~m/ ing / New Regular Expression: $word=~m/ (ing|ed)/
The ? Metacharacter
The metacharacter, ?, indicates that the character
/ worl?ds /
/ m?ethane /
The * Metacharacter
The metacharacter, *, indicates that the character
/ ab*c/
Matches any string that starts with an a, if possibly followed by a sequence of b's and ends with a c.
Now suppose we want to create a list of all the words in our text that end in 'ing' or 'ings' How would we change are regular expressions to accomplish this:
Exercise
For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc.?
the quick brown fox jumped over the lazy dog The Sea! The Sea! (.+)\s*\1 9780471975632 C:\DOS\PATH\NAME /[a-z]/ /(\W+)/ /\W*/ /^\w+$/ /[^\w+$]/ /\d/ /(.+)\s*\1/ /((.+)\s*\1)/ /(.+)\s*((\1))/ /\DOS/ /\\DOS/ /\\\DOS/
Exercise
For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc.?
the quick brown fox jumped over the lazy dog The Sea! The Sea! (.+)\s*\1 9780471975632 C:\DOS\PATH\NAME /[a-z]/ /(\W+)/ /\W*/ /^\w+$/ /[^\w+$]/ /\d/ /(.+)\s*\1/ /((.+)\s*\1)/ /(.+)\s*((\1))/ /\DOS/ /\\DOS/ /\\\DOS/ 1,2,3 1,2,3,5 1,2,3,5 4 1,2,3,5 3,4 2,
2 5 5 5
Modifying Text
Match Up to this point, we have seen attempt to match a given regular expression Example : $variable =~m/ regex /
Substitution Takes match one step further : if there is a match, then replace it with the given string Example : $variable =~s/ regex / replacement/
Substitution Example
Suppose when we find all our words that end in 'ing' we
$target=~/(\d+)/
$' =>I have A copy of the target text until the first match
$` => apples A copy of the target text after the last match $1, $2, $3, ect $1=25 The text matched by 1st, 2nd, ect., set of parentheses. Note : $0 is not included here
$+
end in 'ing' without splitting our line of text into an array of words
#!/usr/local/bin/perl -w while(<>) { chomp $_; if($_=~/([A-Za-z]*ing\b)/g) { print "$&\n"; } }
Example
#!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/^([A-Za-z+\s]*)\bcrave\b([\sA-Za-z]+)/) { print $1\n; print $2\n; } Run Program with string : I crave to rule the world! Results: I to rule the world!
Example
#!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/\bcrave\b/) { print $`\n; print $&\n; print $\n; } Run Program with string : I crave to rule the world! Results: I crave to rule the world!
171