Você está na página 1de 106

INTRODUCTION TO PERL

Prepared with the help internet, Google and


countless contributors to this PPT
Thanks to all of them

Presented By
Viswanatha Yarasi
viswanatha_yarasi@satyam.com
(9849743894)
PERL Programming
• Objectives
– Introduction to PERL Programming.
– Where to get perl.
– Writing a perl script
– How to execute the perl intrepreter.
– Variables
• Scalars
• Arrays
• Hashes
Objectives Contd.
– Using strict.
– Built in help – perldoc.
– Conditional and looping statements
– Built in functions.
– Regular expressions.
– File/Directory handling.
– Input/Output
– Functions
What Is Perl
– Practical Extraction and Report
language
– Pathologically Eclectic Rubbish Lister?
– Perl is a High-level Scripting language
– Released in 1987 by Larry Wall
– Faster than sh or csh, slower than C
– More powerful than C, and easier to use
– No need for sed, awk, tr, wc, cut, …
Larry Wall
Larry Wall Quotes

• “Even if you aren't in doubt, consider the mental


welfare of the person who has to maintain the
code after you, and who will probably put parens
in the wrong place. “
• "It is easier to port a shell than a shell script."
• “The three chief virtues of a programmer are:
Laziness, Impatience and Hubris”
• “And don't tell me there isn't one bit of difference
between null and space, because that's exactly
how much difference there is. :-) “
What Is Perl (Contd.)
– Compiles at run-time
– Available for Unix, PC, Mac
– Best Regular Expressions on Earth
– Originally designed to address the short
comings of Unix Scripts.
– The Swiss Army chainsaw of scripting
languages
– Open source and free.
What Is Perl (Contd.)
– Thousands of modules available from
http://www.cpan.org
– Originally designed to address the short
comings of Unix Scripts.
– “More than one way to skin a cat”
– Can be used in conjunction with other
languages like C,Python, Java etc.
– Can be difficult to read.
What Is Perl Good For?
– Quick scripts, complex scripts
– Parsing & restructuring data files
– CGI-BIN scripts
– Biotechnology
– Prototypes
– System Administration
– QA
– High-level programming
• Networking libraries
• Graphics libraries
• Database interface libraries
Strengths of Perl
• Text processing (pattern matching)

• List processing

• Database access

• System language
What Is Perl Bad For?
– Compute-intensive applications (use C)
– Hardware interfacing (device drivers…)
Where To Get Perl
– Latest release is 5.8.5
– Most used is 5.6.1
– Download from http://www.cpan.org/ports/
for the OS being used
– Easy installation of Perl on Windows from
• http://www.cygwin.com
– For Linux/Windows/Solaris
• http://www.activestate.com/Products/ActivePerl/
• http://www.perl.com/download.csp
Perl is reasonably well
documented!
• Programming Perl
– Wall&Schwartz; O’Reilly/Nutshell
– the “camel book”
• Programming Perl
– Wall, Christiansen,&Schwartz; O’Reilly
– the other camel book
• www-cgi.cs.cmu/cgi-bin/perl-man
– html-based manual
Perl is an interpreted language
• Program is text file
• Perl loads it, compiles into internal form
• Executes the intermediate code
Perl scripts
– Writing a perl script
#!/usr/bin/perl -w
Statements(;)
Comments(#)

– Editing a perl script


vi script1.pl
#!/usr/bin/perl -w
#print some text
print “Hello World\n”;
Command Line Options
• From the shell: /usr/bin/perl [optional
parameters]
• As a shell script: #!/usr/bin/perl
[optional parameters]
• -c - check syntax without doing execution
• -d - run the Perl debugger
• -e - specify one line of script (like sed)
• -v - print minimal version information
• -V - print verbose version information
• -w - prints VERY useful syntax and
runtime warnings; everybody should make
a habit of testing their scripts with this!
Perl scripts (Contd.)
• Running a perl script
./script1.pl
• Make your file executable (chmod u+x)!!
• chmod u+x script1.pl
or
• Chmod 755 script1.pl
Hello World
$a = 123;
$b = 55;
$c = $a + $b;
$d = "kakukk\n";
$d = 'kakukk\n' if $c == 178;
if( $d eq "kakukk\n" ){
print "Hello World!\n";
}else{
print "This is not a good day!\n";
}
OUTPUT:
This is not a good day!
Variables
• Variables are containers to hold values
• Values can change within a script
• Types
– Scalars – single pieces of information
– Arrays – lists of information
– Hashes – 'look-up' table of information
– Special variables, like $_ or $/
Scalars
• Contain single pieces of info
• Naming conventions-
• Preceded by '$'
• No spaces (use '_')
• Usually lowercase
• e.g. $test_scalar
• Store various types including strings and
numbers
Scalars
– Scalar values
$days # the simple scalar value "days"
$days[28] # the 29th element of array
@days
$days{'Feb'} # the 'Feb' value from hash
%days
$#days # the last index of array @days
Scalars
– Scalar values
$abc = 12345;
$abc = 12345.67;
$abc = 0xffff; # hexadecimal
$abc = 0377; # octal
$abc = 'a simple string\n';
$abc = "a string with a newline\n";
Scalars
$a = 123;
$b = 55;
$c = $a + $b;
$d = "kakukk\n";
$d = 'kakukk\n' if $c == 178;
if( $d eq "kakukk\n" ){
print "Hello World!\n";
}else{
print "This is not a good day!\n";
}

• Variables start with $


• There is nothing like reserved words
• $else $if $while are good
variables
Scalars
$a = 123;
$b = 55;
$c = $a + $b;
$d = "kakukk\n";
$d = 'kakukk\n' if $c == 178;
if( $d eq "kakukk\n" ){
print "Hello World!\n";
}else{
print "This is not a good day!\n";
}

• ”kakukk\n” is interpolated string


• ’kakukk\n’ is NOT interpolated, 8
characters
Scalars
$n = 1234; # decimal integer
$n = 0b1110011; # binary integer
$n = 01234; # octal integer
$n = 0x1234; # hexadecimal
integer
$n = 12.34e-56; # exponential
notation
$n = "-12.34e56"; # number
specified as a string
$n = "1234"; # number
specified as a string
Scalars Example
./script2.pl
• $bit_of_text assigned value with '='
#!/usr/bin/perl -w
#create variable with value of 'Hello World'
$bit_of_text = “Hello World\n”;
print $bit_of text;
Simple Arithmetic
• #!usr/bin/perl -w
$number1 = 6;
$number2 = 3;
$add = $number1 + $number2;
print "Addition answer is $add\n";
$subtract = $number1 - $number2;
print "Subtraction answer is $subtract\n";
Simple Arithmetic
$multiply = $number1 * $number2;
print "Multiplication answer is $multiply\n";
$divide = $number1/$number2;
print “Division answer is $divide";
$modulus = $number1 % $number2;
print “Modulus answer is $modulus\n”;
$exponent = $number1 ** $number2;
Print “Exponent is $exponent\n”;
Simple Arithmetic
• Incrementing
• Numbers can be incremented with '++'
$number1 = 3;
$number1++;
print “$number1\n”; # prints 4
Simple Arithmetic
• Decrementing
• Numbers can be decremented with '--'
$number1 = 3;
$number1--;
print “$number1\n”; # prints 2
Expression (1)
• Expression is just as in other programming
language
• + - / * arithmetic operators
• . string concatenation
• == equality numeric comparison
• != non equal, <=, <, >, >= numeric
comparison
• eq, ne, lt, le, gt, ge string
comparison
Expression (2)
• Precedence as usual
• Use ( and ) to group sub-expressions
• condition ? true-exp:false-exp
• , comma operator
• = assignment operator
• op= operator assignment operators +=,
-=, *=, /=, =~
Operating on Assignments
• Sometimes you see code like:
– chop ($name = $_);
• Perl is following these rules:
– process parenthesized stuff first
– the "value" of an assignment statement is the thing on
the left hand side.
• Another example you might see:
– …read into the $email variable…
($user = $email) =~ s/@sedona\.intel\.com//;
– This only modifies the variable $user
• first an assignment
• then a substitution
A note on equality
• When we use control structures, we generally compare
one thing to another.
• What we are looking for in gneralised terms is "do X if
A=B" or "do Y if A=C".
• When comparing scalars you can compare them in a
numerical context or a string context
• Equals:
$integer == 1
$string eq "perl"
• Not equals:
$integer != 1
$string ne "perl"
Reading Input
• Read input from the Diamond Operator
print “Enter a string\n”;

$a = <STDIN>; # read the next line


print “You entered:$a\n”;

while(defined($_ = <STDIN>)) {
chomp($_);
# other operations with $_ here
}
IF/ELSE
• A control expression that IF the condition is true,
one statement block isexecuted, ELSE a
different statement block is exected (ifelse.pl).
if (control_expression is TRUE) {
do this;
and this;
}
else {
do that;
and that;
}
ELSIF
• if/else is great for yes/no decisions. If you want to test mutltiple
statements you can combine else and if to make 'elsif' (elsif.pl).

if (condition 1 is TRUE) {
do this;
}
elsif (condition 2 is TRUE) {
do that;
}
elsif (condition 3 is TRUE) {
do the other;
}
else { #all tests are failed
do whatever;
}
WHILE
• Lets say you want to do a series of actions
whilst a certain condition is true (while.pl):

while (expression is true) {


do this;
do that;
do the other; #until no longer true
}
FOR
• The statement people remember from BASIC (or
C!)

• An initial expression is set up ($init_val), a test


expression is then set up which is tested at each
iteration ($init_val < 10). Finally the initial
expression is changed with each loop iteration
($init_val++).

for ($init_val; $init_val < 10; $init_val++) {


print "$init_val\n";
}
Other Control Structures
• unless/else - like if/else - but unless (false) rather than
if(true).

• do/while and do/until - "does" a statement block "while" a


condition is evaluated or "does" a statement block "until"
expression is evaluated.

• last - allows you to get out of a loop early - e.g. instead


of loop finishing when loop conditions are met - a loop
can end when conditions internal to theloop are met. See
also "next" "redo" and read up on "labelled blocks" for
more info.
Arrays
• Using array as an whole array
– @a = ( 1, 2, 3, )
• Using array element
– $a[0] is scalar 1
– (unless $[ is assigned different from 0)
• Arrays are one dimensional
– You will learn later how to emulate multi-dim arrays
– Arrays are integer-indexed
Arrays
@foo = (1,2,3);
print
$foo[1,2],"\n",@foo[1,2],"\n";

OUTPUT: $foo[1,2] is the same as $foo[2]


(comma operator)
3
@foo[1,2] is an array slice, is the
23 same as ($foo[1],$foo[2])

You can also use @foo[0..2]


Arrays
• List of information
• Each member of list is an element
• Assign values using '=' and list ()
@test_array = (8,”hello”,5.6,$something)
• Refer to elements as
$test_array[element number]
Arrays
$test_array[1] is “hello”
• Arrays start at 0!
$test_array[0] is 8
• Take care with variable naming!
$test_array[0]
is unrelated to.....
$test_array
Assign to Array Slice
@a = ( 1,2,3,4,5,6);
print @a,"\n";
@a[0..2] = (3,2,1,0);
print @a,"\n"; You can assign value to a
@a[0..2] = (0,0); slice of an array.
print @a,"\n";
OUTPUT: Extra elements are ignored.
123456 If there are less elements the
array gets shorter.- “ITZ
321456 WRONG”. Instead of
00456 getting reduced that array
element will hold NULL
Value.
FOREACH
• Takes a list of values and assigns them to
a scalar variable, which executes a block
of code (foreach.pl).
foreach $element (@list) {
do this;
do that;
do the_other; #until no more $element's
}
FOREACH(2)
• There's a few things missing from this code
snippet compared to the previous one
(foreach2.pl).
• No specification of $element this time. And yet it
still works! Why?
foreach (@list) {
do this;
do that;
do the_other; #until no more implied $_'s
}
FOREACH(3)
• There is an implied scalar on the previous slide - $_

• The $_ is a special scalar variable - almost like a scratchpad - its a


container for information (foreach3.pl). Notice it works both for the
foreach AND the print statement.

• Perl knows that if you use foreach (@list) that it is going to assign
each element to a scalar - so it will use $_ by default.

foreach $_ (@list) {
do this;
do that;
do the_other; #until no more $_'s
}
FOREACH(4)
%hash = (Gabor => 123, Peter => 78, Adam => 10);

# returns an unordered list of the keys


@list_of_keys = keys %hash;

# returns an unordered list of the values


# useful only if you don't care about the keys

# use it like this


foreach $key (sort keys %hash) {
print "$key $hash{$key}\n";
}

@list_of_values = values %hash;


FOREACH(5)
# iterates through the hash
# do not change the keys here !!

while (($key, $value) = each(%hash)) {


print "$key $value\n";
}

#EXAMPLE
%h = (Gabor => 123, Peter => 78, Adam => 10);

foreach $key (sort keys %h) {


print "$key $h{$key}\n";
}

Output:
Adam 10
Gabor 123
Peter 78
Array Functions
• pop – remove from right hand side
• push – add to right hand side
• shift – remove from left hand side
• unshift – add to left hand side
Array Functions
• Script4 -pop and push 8 54 78 2 5 6 4

• #create an array
0 1 2 3 4 5 6
@an_array = (8,54,78,2,5,6,4)

8 54
• POP into variable (variable=4) 78 2 5 6

#pop the last value


0 1 2 3 4 5
$pop_test = pop (@an_array);

• PUSH variable back onto array


8 54 78 2 5 6 4
• #push $pop_test back on
• push (@an_array,$pop_test);
0 1 2 3 4 5 6
Array Functions
• Script4 –shift and unshift 8 54 78 2 5 6 4

#create an array
@an_array = (8,54,78,2,5,6,4) 0 1 2 3 4 5 6

• SHIFT into variable 54 78 2 5 6 4


(variable=8)
#shift the first value 0 1 2 3 4 5
$shift_test = shift (@an_array);

• UNSHIFT variable back onto


array 8 54 78 2 5 6 4
#unshift $shift_test back on
unshift (@an_array,$shift_test); 0 1 2 3 4 5 6
Hashes
• Look-up table of 'Keys' with associated
• 'Values'
• e.g. Hash called 'car'
• KEY VALUE
• COLOUR BLUE
• SIZE BIG
• SPEED REALLY FAST
Hashes
• Keys are arbitrary scalars
• Preceded by %, e.g. %car
• Use keys to retrieve values:
$test_hash{key}=value
$car{colour}=”blue”
What is hash
• Hash is associative array
• Index can be anything not only number
• %hash = (1, 2, ”b”, 4); has 2
elements
– $hash{1}=2 and $hash{”b”}=4 but also
can be written as
• %hash = ( 1 => 2, ”b” => 4 );
Array Slices from hash
%foo = (1=>2,3=>4,'apple'=>'peach');
print @foo{1,3},"\n";

@foo{1,3} is a hash slice, is the same


• OUTPUT: as ($foo{1},$foo{2})
• 24
It starts with @ and not %
Hash Tables & Key Lookups
• Arrays are integer-indexed.
• Hashes are string-indexed.
• When you don't know what your data is, it's
probably a string.
• Array lookups are quick (multiply, add)
– INDEX VALUE
0 Banana----
1 Chips-----
2 Salsa-----
3 Wheelchair
– 4 records, 10 chars each = 40 bytes
– lookup: byte = (index * 10) + char;
Hash Tables & Key Lookups
• Same lookup time for index 1 or 100000
• But what if the index you want to use is a
string like someone's name?
– storing names in an array and "searching" is
not a good solution
• A really good solution is a Hash Table.
Hash Tables & Key Lookups
• Terminology
– The thing you look up with is a key.
– The thing you get back is a value.
– Together they're a key/value pair.
– Sometimes we use hash tables on numeric-
looking things…
• Social Security numbers 555-61-6542
• Credit Card numbers
• Phone Numbers
Hash Tables & Key Lookups
• Example Hash Table
– KEY VALUE
602-917-1111 Fred Flintstone
602-822-2222 Barney Rubble
520-779-5555 George Jetson
• Look up by phone number only now.
• To look up by name, need a separate
hash table with names as keys.
Hashes Example
#Hash example - codon translation
use strict;
my %translate;
$translate{'atg'} = 'M';
$translate{'taa'} = '*';
$translate{'ctt'} = 'L';
print "$translate{'atg'}\n";
Hashes Examples
• Building Hash Tables
– my(%phonebook) = (
"602-917-1111" => "Fred Flintstone",
"602-822-2222" => "Barney Rubble",
"520-779-5555" => "George Jetson",
);
– Give me a list of all the keys
– @phonenums = keys(%phonebook);
• Look up a phone number
– $person = $phonebook{"602-822-2222"};
Hashes Examples
• Adding to a Hash
– $phonebook{"602-102-2001"} = "Daffy
Duck";
• Deleting from a Hash
– delete $phonebook{"602-102-2001"};

• Checking for existence of an element


– if (exists $phonebook{"602-102-2001"})

Hash Functions
• keys
e.g. @codons = keys(%translate)
• values
e.g. @aas = values(%translate)
Printing Hash Contents
my(%phonebook) = (
"602-917-1111" => "Fred Flintstone",
"602-822-2222" => "Barney Rubble",
"520-779-5555" => "George Jetson",
);

while (($phonenum, $name) = each(%phonebook)) {


print "phonenum:$phonenum, name:$name\n";
}
print "\n";
Strings
• Interpolated and non-interpolated
strings
$a = 'apple';
print "$a\n";
print '$a\n';
OUTPUT:
apple
$a\n
Multi-line strings
$a = 'zero';
$b = <<END;
OUTPUT:
this is ba: $a this is ba: zero
END
this is ca: $a
print $b;
$c = <<'END';
this is ca: $a
END
print $c;
Play with the interpolated strings putting
expressions into it and experience what is
interpolated and what is not!
Simple string handling operators
• Concatenate strings:
$a = ”apple” . ”peach”;
• Automatic conversion
$a .= 555;

OUTPUT:
applepeach555
String Concatenation
• Strings can be concatenated with '.'
$string1 = “This”;
$string2 = “ is”;
$string3 = “ easy”;
$string4 = “ so far”;
print $string1.$string2.$string3.$string4;
# prints This is easy so far
Changing Case on Strings
• Applications
– when comparing two strings, compare case-
insensitively
• force the case, then compare the strings.
– keyword recognition in configuration files
– usernames, email addrs, …
• wrong: if ($email eq "pab\@sedona.intel.com")
• better: $email =~ tr/A-Z/a-z/;
if ($email eq "pab\@sedona.intel.com")
Changing Case on Strings
• Well written programs observe this rule:
– If humans might try it,
your program ought to understand it.
• ignore case where it should be ignored
• respect case where it should be respected
– output to the user
– rewriting config files
Don’t program dangerous!

• $variable
• @variable
• %variable

• Are three different variables!


• And still you can have subroutine with
the name.
Use strict
'use strict;'
• Forces you to declare variables before
using them
• Good for when scripts get bigger
• Declarations start with 'my'
e.g. my %translate;
local or my?
$my = 'global';
$local = 'global';
&subi;
&bubi;
sub subi {
my $my ='my';
local $local = 'local';
&bubi;
} my is really local.
sub bubi {
print "$my\n$local\n"; local is actually global, but
} saves old value.
OUTPUT:
global
local
global
global
Calling Subroutines
sub name { command(s) }

• Arguments are put into the global array @_


• You can $_[$i] or $v = shift
• Return value is return expression or just the last expression
• Local variables are created using keyword my or local

print printMsg("\nHello world passed");


sub printMsg {
my ($my_var);
local ($local_var);
$my_var = 1;
print "\nIn sub printMsg, \$my_var:$my_var\n";
print "\nInput arguments to the sub, printMsg:@_\n";
print "Input arguments through implicit array ". $_[0],$_[1],$_[2] . "\n";
return "Hello world received from sub, printMsg";
}

OUTPUT:
• In sub printMsg, $my_var:1

• Input arguments to the sub, printMsg:


• Hello world passed
• Input arguments through implicit array
• Hello world passed
• Hello world received from sub, printMsg
Optimizing Perl - Some Tips
• Perl is fast, but not that fast...
• Still need to take care with apparently simple things in 'hot' code
• Function/method calls have significant overheads per call.
• Copying data also isn't cheap, especially long strings (allocate and copy)
• Perl compiles to 'op codes' then executes them in a loop...
• The more ops, the slower the code (all else being roughly equal).
• Try to do more with fewer ops. Especially if you can move loops into ops.
• Key techniques include:
• Caching at many levels, from common sub-expression elimination to web
caching
• Functional programming: @result = map { … } grep { … } @data;
• But don't get carried away... only optimize hot code, and only if
needed
• Don't optimize for performance at the cost of maintenance. Learn perl
idioms.
• Beware "Compulsive Tuning Disorder" - Gaja Krishna Vaidyanatha
• And remember that "Premature optimization is the root of all evil" - Donald
Knuth
Built In Functions
• SPLIT
• JOIN
• LENGTH
• SUBSTR
• UC/LC
• S///
• REVERSE and TR
• TR/REVERSE
SPLIT
• split can take a scalar and chop it into bits, each
individual bit then endsup in an array. The "recognition
sequence" is user-defined but not retained (split.pl).

$dna_strand =
“AGCTATCGATGCTTTAAACGGCTATCGAGTTTTTTTT";
print "My DNA strand is: $dna_strand\n";
print "If we split this using TTTAAA we get the
following fragments:\n";
@dna_fragments = split(/TTTAAA/,$dna_strand);
foreach $fragment (@dna_fragments) {
print "$fragment\n";
}
JOIN
• join is the conceptual opposite of split. Lets think of it
interms of a DNA ligation with a linker sequence (join.pl):

my ($ligated_fragments);
my (@dna_fragments);
@dna_fragments=("AGGCTT", "AGCCCAAATT",
"AGCCCCATTA");
$ligated_fragments = join ("aaattt", @dna_fragments);
print "The fragments have been ligated with an aaattt
linker:\n";
print "$ligated_fragments\n";
LENGTH
• length - finds the length of a scalar (or a bit of DNA!)
(length.pl).

#!/usr/bin/perl -w
use strict;
my ($genome, $genome_length);
$genome =
"AGATCATCGATCGATCGATCAGCATTCAGCTACTAGC
TAGCTGGGGGGATCATCTATC";
$genome_length = length($genome);
print "My genome sequence is:\n$genome\nand is
$genome_length bases long\n"
SUBSTR
• substr extracts a specified part of a scalar (substr.pl).
• substr($scalar, $start_position, $length)

#!/usr/bin/perl -w
use strict;
my ($dna_sequence, $substring);
$dna_sequence =
"AGCTATACGACTAGTCTGATCGATCATCGATGCTGA";
$substring = substr ($dna_sequence, 0, 5);
print "The first 5 bases of $dna_sequence
are:\n$substring\n";
UC/LC
• uc (uppercase) and lc (lowercase) simply change the
case of a scalar (uclc.pl).

#!/usr/bin/perl -w
use strict;
my ($mixed_case, $uppercase, $lowercase);
$mixed_case = "AgCtAAGggGTCaCAcAAAAaCCCcATTTgcCC";
$uppercase = uc ($mixed_case);
$lowercase = lc ($mixed_case);
print "From $mixed_case we get:\n";
print "UPPERCASE: $uppercase\n";
print "lowercase: $lowercase\n";
S/// - SUBSTITUTE
• This is proper Perl :-)
• The obvious difference between DNA and RNA
is the replacement of T with U.
• Lets mimic the transcription of DNA to RNA with
our new found Perl skills.
• We can use the substitution operator 's'.
• This can convert one element in a scalar to
another element.
• This takes the form s/[one thing]/[for another
thing]/
• Let's see it in action (transcription.pl).
S/// - SUBSTITUTE (2)
#!/usr/bin/perl -w
use strict;
my ($dna_molecule, $rna_molecule);
$dna_molecule =
"AGCTATCGATGCTTTCGATCACCGGCTATCGAGTTTTT
TTT";
print "My DNA molecule is $dna_molecule\n";
$rna_molecule = $dna_molecule;
$rna_molecule =~ s/T/U/g;
print "My RNA molecule is $rna_molecule\n";
exit();
=~
• What is that crazy =~ sign?
• This is called the "=~ operator".
• Allows you to specify the target of a pattern matching
operation (FYI the /[whatever]/ bit is a "matching
operator").
• By default matching operators act on $_ ie. if you just
saw s/T/U/g; in a program on its own it is acting on $_
• We have $rna_molecule =~ s/T/U/g; - which means
perform the s/T/U/g on $rna_molecule. We have re-
assigned the effect of the matching operator from $_ to
$rna_molecule.
• If you want $rna_molecule to remain unchanged - but
alter it in someway - assign it to another scalar first.
REVERSE and TR
• So substitution allows you to change one thing ito
another. This is great – we could use the same
technique to get the complement of a DNA strand!
• All we have to do is change all the A's to T's, all the G's
to C's, all the T's to A's and all the C's to G's.
• Then if we reverse it we get the reverse complement! Or
do we? See wrong_revcom.pl.
• I guess the game is given away in the filename that
there's something up with this.
• Look closely.
• Think about what each line is going to do to the scalar
$DNA.
• Tell me why the code is wrong.
REVERSE and TR (2)
#!/usr/bin/perl –w

$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";
$DNA_UNTOUCHED = $DNA;
print "After no substitutions: DNA is : $DNA\n";
#substitute all the A's to T's
$DNA =~ s/A/T/g;
print "After A-T substitution: DNA is : $DNA\n";
#substiutute all the G's to C's
$DNA =~ s/G/C/g;
print "After G-C substitution: DNA is : $DNA\n";
#substitute all the C's to G's
$DNA =~ s/C/G/g;
print "After C-G substitution: DNA is : $DNA\n";
#subsitute all the T's to A's
$DNA =~ s/T/A/g;
print "After A-T substitution: DNA is : $DNA\n";
$DNA = reverse ($DNA);
print "$DNA_UNTOUCHED reverse complemented is:\n$DNA\n";
REVERSE and TR (3)
The answer

• You can't use sequential substitutions!


• WATCH YOUR PERL SYNTAX vs YOUR INTERNAL LOGIC! If
yourthinking is wrong, even if your Perl is correct – your output will
be the result of your flawed logic! ie - WRONG!
• Ideally we want make all our substitutions in one statement that
understands our needs.
• Come forth the tr operator.
• tr is like s, but better for tasks like this
• tr/ABCD/dcba would make AABBCCDD into ddccbbaa.
• Don't believe me?
• Look at revcomp.pl:
TR/REVERSE
#!/usr/bin/perl -w
use strict;
my ($DNA, $DNA_UNTOUCHED);
$DNA = "AAAAGGGGCCCCTTTAGCTAGCT";
$DNA_UNTOUCHED = $DNA;
$DNA =~ tr/AGCT/TCGA/;
$DNA = reverse ($DNA);
print "$DNA_UNTOUCHED has a reverse complement
of:\n$DNA\n";
exit ();
File Handling
• open(FILEHANDLE,”filename”)
• close FILEHANDLE
• < FILEHANDLE > read a record from file
• print FILEHANDLE expressionlist
• read, write, seek, truncate,
flock, binmode
File Opening and Closing
open(FH,”file.txt”); #to read
open(FH,”>file.txt”); #to write a new file
open(FH,”>>f.txt”); #to append
open(FH,”+<f.txt”); #read/write
open(FH,”+>f.txt”); #read/write but first
truncate
Return value is non-zero or undef in case of error.

close FH; #closes the file


Reading Records From File
• $/ specifies the record separator, \n by
default
• <FH> reads a line (bytes until the next
record separator)
• @a = <FH>; # gives all the lines
• $/=undef; $file=<FH>; reads the
whole file
Reading Into A Loop
open(FH,"test.pl");

# Returns undef when end of file


while(<FH>){
# By default the buffer is read into $_
print $_;
}
close FH;
Printing To A File
• print FILEHANDLE expression list
– print STDERR ”error output”
– print STDOUT ”just output”
– print ”to STDOUT default”
– print FH ”a line into a file\n”
• STDERR, STDOUT are reserved words
Removing New Line
open(FH,"test.pl");
# <FH> reads the whole line including the new line at the end of the line.
while(<FH>){
# chomp chops off the new line safely if new line exists.
chomp;
print ”$_\n”;
}
close FH;
Truncate, Seek, Flock
$LOCK_SH = 1; # shared lock
$LOCK_EX = 2; # exclusive lock
$LOCK_NB = 4; # non blocking lock
$LOCK_UN = 8; # unlock
open(FH,"+<counter.txt") or open(FH,">counter.txt");
flock FH,$LOCK_EX;
seek F,0,0;# seek to the start of the file
$counter = <FH>;
$counter++;# change here to --
seek FH,0,0;# seek to the start of the file again
truncate FH,0;# comment this out when decrementing
print FH $counter;
flock FH,$LOCK_UN;
close FH;
print $counter;
Handling Directories
opendir(DIR,”dirname”);
readdir(DIR);
closedir DIR;

• You can not open a directory for writing 


Example: Getting the list of the files
opendir(DIR,'.');
@filesanddirs = readdir(DIR);
closedir DIR;
@files = ();
for ( @filesanddirs ){
push @files, $_ if -f $_;
}
for(@files){
print "$_\n";
}
Example: Getting all files
recursively
$startDirectory = '.';
opendir(DIR,$startDirectory);
@files = readdir(DIR);
closedir(DIR);
@dirs = grep( (-d "$startDirectory/$_") && /^[^.]/ ,@files);

while( $#dirs > -1 ){


$cdir = pop @dirs;

opendir(DIR,“sStartDirectory/$cdir");
@f = readdir(DIR);
closedir(DIR);

@d = grep( (-d "$startDirectory/$cdir/$_") && /^[^.]/,@f);


for( @d ){
push @dirs, "$cdir/$_";
}
for( @f ){
push @files, "$startDirectory/$cdir/$_";
}
}
Implied Variables
While ($a = <FOO>) { # Read FH
$a =~ s/blah/quux/; print $a;
}

$_ is implied for reads from FH's, regexp, and print

while (<FOO>) { # $_ instead of $a


s/blah/quux/; print;
}

Subs return value of last operation if no return (see


sorting)
Implied variables decrease clutter, but increase
complexity – use with care!
Global Special Variables
$| Turn off line buffering
$_ The default input and pattern-searching space
$. Input line number from files which are being read
@_ Arguments passes to the function
$$ $PID $PROCESS_ID
Process ID (read only)
$< $UID $REAL_USER_ID
Real user ID of the process
$> $EUID $EFFECTIVE_USER_ID
Effective user id
$( $GID $REAL_GROUP_ID
The real group id of the process
$) $EGID $EFFECTIVE_GROUP_ID
The effective group id of the process
Global Special Variables (2)
@array = qw(hello world have a nice day);

# $_ is default input and pattern-searching space


foreach $_ (@array) {
print "Prinint default input, \$_:" . $_ . "\n";

if ($_ =~ "hello") {
print "\nhello found\n\n";
}
}

foreach ('hickory','dickory','doc') {
print;
}

print "\n\$\$:Process ID:" . $$ . "\n"; # process ID


Global Special Variables (3)
while (($var_name, $var_value) = each(%ENV)) {
print "var_name:$var_name, var_value:$var_value\n";
}

foreach (@INC) { # Include path


print "Include path:$_\n";
}

print "\nPrinting file name \__FILE\__ :" . __FILE__ . "\n";


print "\nPrinting line number wich is useful in debugging
\__LINE\__ :" . __LINE__ . "\n";
Global Special Variables (4)
open(MY_FILE,"myperlbits.txt");
while( <MY_FILE> ) # print "Line number from $.
{
print $., $_, "\n";
}
close (MY_FILE);

&printMsg("Hello world passed");


sub printMsg { # input arguments from @_
print "\nInput arguments to the function, printMsg:@_\n";
}
Global Special Variables (5)
$a = "print \"1\\n\";\nwhat is this?";
eval $a;
print $a,"\n",$@;
print "but we run fine\n";
$a = "print \"1\\n\";";
eval $a;
print $a,"\n",$@;

OUTPUT:
print "1\n";
what is this?
syntax error at (eval 1) line 3, at EOF
but we run fine
1
print "1\n";

Você também pode gostar