Introduction To Programming in Perl

Introduction to programming in Perl WS 2006/07: Bioinformatics I
Introduction to programming in Perl
Nicodème Paul
Nicodeme.paul@unibas.ch
http://www2.biozentrum.unibas.ch/personal/schwede/Teaching/BixI-WS0607/frame.htm
01-11-06 1
What is programming ?
Programming is breaking a task into small steps (divide and conquer).
Sum : 15 + 25 + 11 ?
15 + 25 + 11
40 + 11
51
Programs are written in a programming language such as :

Fortran, Pascal, C, C++, java, Perl, Python, ….
01-11-06 2
Program translator
Computer
Processor
Compiler
Program 0101011
or
Interpreter
Memory
01-11-06 3
What is Perl ?
Perl : Practical Extraction and Report Language

by Larry Wall (1987)
• Text-processing language
• Glue language
• Very high level language
• perl is the language compiler/interpreter program
01-11-06 4
Why do we use Perl?
• Simplicity
• Rapid prototyping
• Portability
• Widely used in Bioinformatics
01-11-06 5
A first example
#!/usr/bin/perl # shebang line
# Pragmas
use strict; # Restrict unsafe constructs
use warnings; # Provide helpful diagnostics
# Assign 15 to $number1
my $number1 = 15;
my $number2 = 25;
my $number3 = 11;
$number1 = $number1 + $number2; # $number1 contains 40

$number1 = $number1 + $number3; # $number1 contains 51
print “My result is : $number1\n”; # Print the result on the terminal
01-11-06 6
Scalar Data Type
• $answer = 36; # an integer

• $pi = 3141659265 # a real number
• $avocados = 6.02e23; # scientific notation
• $language = “Perl”; # a string
• $sign1 = “$language is nice”; # string with interpolation
• $sign2 = ‘$language is nice’; # string without interpolation
Scalar = singular variable
$ S
01-11-06 7
Scalar Binary Operators

$u = 17 $v = 3 $s = “Perl”
Name Example Result

Addition $u + $v 17 + 3 = 20
Subtraction $u - $v 17 – 3 = 14
Multiplication $u * $v 17 * 3 = 51
Division $u / $v 17 / 3 = 5.66666666667
Modulus $u % $v 17 % 3 = 2
Exponentiation $u ** $v 17 ** 3 = 4913
Concatenation $s . $s “Perl” . “Perl” = “PerlPerl”

Repetition $s x n “Perl” x 3 = “PerlPerlPerl”
01-11-06 8
Scalar Unary Operators
Numbers Strings
abs(expr) uc(expr)
sqrt(expr) lc(expr)
exit(expr) chop(variable)
exp(expr) chomp(variable)
int(expr) reverse(expr)
log(expr) length(expr)
¾ perldoc –f function_name
01-11-06 9
Context
$u = “12” + 5;
¾17
$u = “12john” +5;
¾17
$u = “john12” + 5;
¾5
use strict;
$u = “john12” + 5;
¾ Argument “john12” isn’t numeric in addition (+) at line 3
¾5
$u = “12” + 5;
¾17
01-11-06 10
Array data type
Values
0 35
1 12.4
Indices
2 “bye\n”
3 1.7e23
4 ‘Hi’
$data[0] = 35; $data[1] = 12.4; $data[2] = “bye\n”; $data[3] = 1.7e23; $data[4] = ‘Hi’;
@data = (35, 12.4, “bye\n”, 1.7e23, ‘Hi’)
Array = plural variable
01-11-06
@ a 11
Array operators
@let = (“J”, “P”, “S”, “D”, “C”);
pop $r=pop(@let) $r=“C”

@let=(“J”,“P”,“S”,“D”)
push push(@let,“G”) @let=(“J”,“P”,“S”,“D”,”C”,“G”)
shift $r=shift(@let) $r=“J”

@let=(“P”,“S”,“D”,“C”)
unshift unshift(@let,”G”) @let=(“G”,“j”,“P”,“S”,“D”,“C”)
splice @a=splice(@let,1,2) @a=(“P”,”S”)

@let=(“J”,”D”,”C”)
join $r=join(‘:’,@let) $r=“J:P:S:D:C”
scalar $r=scalar(@let) $r=5
reverse @a=reverse(@let) @a=(“C”,”D”,”S”,”P”,”J”)
01-11-06 12
Search for a name
#!/usr/bin/perl
use strict;
use warnings;
my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);

my $offset = int(rand(scalar(@names))); # random index in [0, …, 4], int(2.55) = 2
if ($names[$offset] eq “Simon”) { # block start for the if statement

print “Simon is found\n”;
print “Success!\n”;
} # block end for the if statement
else { # block start for the else statement
print “Simon is not found\n”;
print “Failed!\n”;
} # block end for the else statement
01-11-06 13
Comparison operators
Comparison Numeric String Return Value

Equal == eq 1 if $a is equal to $b , otherwise “”
Not equal != ne 1 if $a is not equal to $b , otherwise “”
Less than < lt 1 if $a is less than $b , otherwise “”
Greater than > gt 1 if $a is greater than $b , otherwise “”
Less than or equal <= le 1 if $a is not greater than $b , otherwise “”
Greater than or equal >= ge 1 if $a is not less than $b , otherwise “”
Comparison <=> cmp 0 if $a and $b are equal, 1 if $a is greater,
-1 if $b is greater
“” is the empty string
01-11-06 14
What is true or false?
• Any number is true except for 0.
• Any string is true except for “” and “0”.
• Anything else converted to a true value string or a true value number is true.
• Anything that is not true is false.
01-11-06 15
Logical operators
Example Name Result

$a && $b AND $a if $a is false, $b otherwise
$a || $b OR $a if $a is true, $b otherwise
! $a NOT True if $a is not true, false otherwise
$a and $b AND $a if $a is false, $b otherwise
$a or $b OR $a if $a is true, $b otherwise
not $a NOT True if $a is not true, false otherwise
$a xor $b XOR True if $a or $b is true, false if both are true
Pay attention to precedence rule :
$xyz = $x || $y || $z is not the same as $xyz = $x or $y or $y
! Use parentheses !
01-11-06 16
Conditional statements
• Simple
Statement if (Expression);
• Compound
if (Expression) Block
if (Expression) Block else Block
if (Expression) Block elsif (Expression) Block else Block
01-11-06 17
Search for a name

#!/usr/bin/perl
use strict;
use warnings;

my $offset = int(rand(scalar(@names)));
my $count = 1;
while($names[$offset] ne “Simon”) { # block start for the while statement

$offset = int(rand(scalar(@names)));
$count = $count + 1;
} # block end for the while statement
print “Simon is found after $count trials\n”;
01-11-06 18
Check for a name
#!/usr/bin/perl
use strict;
use warnings;
for (my $i = 0; $i < scalar(@names); $i = $i + 1) { # block start for the for loop
if ($names[$i] eq “Simon”) {
}
} # end block for the for loop
01-11-06 19
Check for a name

#!/usr/bin/perl
use strict;
use warnings;
for (my $i = 0; $i < scalar(@names); $i = $i + 1) { # block start for the for loop
if ($names[$i] eq “Simon”) {
last; # jump outside of the loop
}
} # end block for the for loop
01-11-06 20
Loop statements
• Simple
Statement while (Expression);
• Compound
while (Expression) Block
for (Initialization; Expression; Incrementing) Block
01-11-06 21
Hashes
Values
John 5
Peter 3
Keys
Simon 11
Dave 1
Chris 4
%names
$names{“John”} = 5; $names{“Peter”} = 3; $name{“Simon”} = 11
$names{“Dave”} = 1 $names{“Chris”} = 4
01-11-06
% Key/value 22
Check for a name
#!/usr/bin/perl
use strict;
use warnings;
my %names = (“John”, 5, “Peter”, 3, “Simon”, 11, “Dave”, 1, “Chris”, 4);

my $key = “Simon”;
if (exists $names{$key}) { exists return true if the key is in %names otherwise false
print “$key is found, his value is : $names{$key}\n”;
}
else {
print “$key is not found\n”;
}
01-11-06 23
Check for a name

#!/usr/bin/perl
use strict;
use warnings;
my %names = (
“John” => 5,
“Peter” => 3,
“Simon” => 11,
“Dave” => 0,
“Chris” => 4
);
my $key = “Simon”;
if (exists $names{$key}) {
print “$key is found, his value is : $names{$key}\n”;
}
else {
print “$key is not found\n”;
}
01-11-06 24
Hash operators
exists exists $hash{$key} Returns true if $key is in %hash,

otherwise it returns false
delete delete $hash{$key} Deletes $key => $hash{$key} from

%hash.
each each %hash Steps through a hash one key/value
pair at a time
keys keys %hash Returns a list consisting of all

the keys of %hash
values Values %hash Returns a list consisting of all

the keys of %hash
01-11-06 25
Getting user input

#!/usr/bin/perl
use strict;
use warnings;
my $line;
print “Type something : “;
while ($line = <STDIN>) { # STDIN : Standard Input

if ($line eq “\n”) {
print “That was just a blank line\n”;
}
else {
print “Input : $line”;
}
print “Type something : “;
}
¾ Ctr-C to exit
01-11-06 26
Reading from a file
#!/usr/bin/perl
use strict;
use warnings;
print “Enter the filename: “;

Input file
my $filename = <STDIN>; # Read Standard Input for a
# filename
John
chomp($filename); # Remove the end of line character
Peter
Simon
if (! (-e $filename)) { # Test whether the file exists
Dave
print “File not found\n”;
Chris
exit 1;
}
open(IN, $filename) || die “Could not open $filename\n”;

my @names = <IN>; # Store the content of the file in an array
close(IN);
print @names;
01-11-06 27
Reading from a file

#!/usr/bin/perl
use strict;
use warnings;
print “Enter the input file name : “;

my $filename = <STDIN>; # Read Standard Input for a filename Input file : data.txt
chomp($filename); # Remove the end of line character
John 5
if (! (-e $filename)) {
print “File not found\n”;
Peter 3
exit 1; Simon 11
} Dave 1
my %names = (); Chris 4
my ($key, $value);
while ($line = <IN>) {
chomp($line);
($key, $values) = split(‘\t’, $line);
$name{$key} = $value;
}
close(IN);
$, = “ “; # It contains the separator for the print statement
print %names, “\n”;
01-11-06 28
Input and output functions
open open FILEHANDLE, EXPR open a file to referred using

FILEHANDLE
close Close FILEHANDLE Close the file associated with

FILEHANDLE
print print [FILEHANDLE] LIST Print each element of LIST to

FILEHANDLE
01-11-06 29
Testing files
Example Name Result

-e $filename Exists True if file named in $a exists, otherwise false
-r $filename Readable True if file named in $a is readable, otherwise false
-w $filename Writable True if file named in $a is writable, otherwise false
-d $filename Directory True if file named in $a is a directory, otherwise false
-f $filename File True if file named in $a is a regular file, otherwise false
-T $filename Text File True if file named in $a is a text file, otherwise false
01-11-06 30
Regular expressions
#!/usr/bin/perl
use strict;
use warnings;
My $filename = “data.txt”;
my $line; Input file : data.txt
my %data = ();
my $key;
>id1
ATTGTC
while ($line = <IN>) { >id2
chomp($line); GGTCCT
if ($line =~ /^>/) { # check for ids using pattern matching >id3
$key = $line; TATGAAA
} >id4
else {
GTGTATA
data{$key} = $line;
}
}
close(IN);
my @ids = keys %data;
my @sequences = values %data;
$, = “ “;
print @ids, “\n”, @sequences, “\n”;
01-11-06 31
Regular expressions
EXPR =~ m/PATTERN/
m// Operator (Matching): searches the string in the scalar EXPR (or $_) for
PATTERN; in scalar context the operator returns true (1) if successful, false (””)
otherwise; in list context m// returns a list of substrings matched by any
capturing parentheses in PATTERN; PATTERN undergoes double-quote
interpolation.
$line = “>id1” => $line =~ /^>/
VAR =~ s/PATTERN/REPLACEMENT/
s/// Operator (Substitution): searches the string in scalar variable VAR (or $_) for
PATTERN and, if found, replaces the matched substring with the
REPLACEMENT text; in scalar and list context s// returns the number of times it
succeeded; both PATTERN and REPLACEMENT undergo double-quote
interpolation.
$line = “>id1” => $line =~ s/>//
VAR =~ tr/SEARCHLIST/REPLACEMENTLIST/
tr/// Operator (Transliteration): scans the string in scalar variable VAR (or $_) ,
character by character, and replaces each occurrence of a character found in
SEARCHLIST with the corresponding character in REPLACEMENT list; in scalar
and list context tr// returns the number of characters replaced or deleted;
SEARCHLIST is NOT a regular expression and both SEARCHLIST and
REPLACEMENT list do not undergo full double-quote interpolation (backslash
sequences but no variable interpolation).
$line = “id1” => $line =~ tr/a-z/A-Z/
01-11-06 32
Regular expressions
#!/usr/bin/perl
use strict;
use warnings;
my $filename = “data.txt”;
my $line;
my %data = (); Input file : data.txt
my $key;
>id1
while ($line = <IN>) { ATTGTC
chomp($line); >id2
if ($line =~ /^>/) { #check for ids using pattern matching GGTCCT
$line =~ s/>//; #substitute > by nothing in id
$line =~ tr/a-z/A-Z/; #translate lower case to upper case
>id3
$key = $line; TATGAAA
} >id4
else { GTGTATA
data{$key} = $line;
}
}
close(IN);
my @ids = keys %data;
my @sequences = values %data;
$, = “ “;
print @ids, “\n”, @sequences, “\n”;
01-11-06 33
Regular expressions
Symbol Meaning
\... Used to escape metacharacters (including itself) or to make the
next character a metacharacter (like \s, \w, \n)
...|... Alternation (match one or the other)
(...) Grouping (treat as a unit)
[...] Character class (match one character from a set)
^ True at the beginning of string (or sometimes after any newline)
$ True at the end of the string (or sometimes before any newline)
. Match any one character (except newline, normally)
$seq =~ /AAA$/
01-11-06 34
Regular expressions
Quantifier Meaning
* Match 0 or more times (maximal)
+ Match 1 or more times (maximal)
? Match 0 or 1 time (maximal)
{COUNT} Match exactly COUNT times
{MIN,} Match at least MIN times (maximal)
{MIN,MAX} Match at least MIN times but not more than MAX times (maximal)
*? Match 0 or more times (minimal)
+? Match 1 or more times (minimal)
?? Match 0 or 1 time (minimal)
{MIN,}? Match at least MIN times (minimal)
{MIN,MAX}? Match at least MIN times but not more than MAX times (minimal)
$seq=“TATGAAA” $seq =~ /.*AAA$/ $seq =~ /.*A{3}$/
01-11-06 35
Regular expressions
Symbol Meaning Character Class

\d Digit [0-9]
\D Nondigit [^0-9]
\s Whitespace [ \t\n\r\f]
\S Nonwhitespace [^ \t\n\r\f]
\w Word character [a-zA-Z0-9_]
\W Non-(word character) [^a-zA-Z0-9_]
$id = “id2” $id =~ /id\d+$/
01-11-06 36
Subroutines or functions
#!/usr/bin/perl
use strict;
use warnings; Input file : data1.txt
my $filename1 = “data1.txt”;
>id1
my $filename2 = “data2.txt”;
my %data1 = get_data($filename1); #subroutine call ATTGTC
my %data2 = get_data($filename2); #subroutine call >id2
$, = “ “; GGTCCT
print keys %data1, “\n”, values %data2, “\n”; >id3
print keys %data1, “\n”, values %data2, “\n”; TATGAAA
sub get_data { >id4
my $filename = shift(@_); GTGTATA
my $key;
my %tmp = ();
while (my $line = <IN>) { Input file : data2.txt
chomp($line);
if ($line =~ /^>/) {
$line =~ s/>//; >id5
$line =~ tr/a-z/A-Z/; ATAAAAA
$key = $line; >id6
}
else { GGAATTT
$tmp{$key} = $line; >id7
} TATGATT
} >id8
close(IN); GTGTAAT
return %tmp;
}
01-11-06 37
Packages
#!/usr/bin/perl package MyTools;
use strict; sub get_data {

use warnings; my $filename = shift(@_);
use MyTools; my $key;
my %tmp = ();
my $filename1 = “data1.txt”; open(IN, $filename) || die “Could not open
my $filename2 = “data2.txt”; $filename\n”;
while (my $line = <IN>) {
my %data1 = MyTools::get_data($filename1); chomp($line);
my %data2 = MyTools::get_data($filename2); if ($line =~ /^>/) {
$line =~ s/>//;
$, = “ “; # set the print separator $key = $line;
}
print keys %data1, “\n”, values %data1, “\n”; else {
print keys %data2, “\n”, values %data2, “\n”; $tmp{$key} = $line;
}
}
close(IN);
return %tmp;
}
1; # this should be your last line
Comprehensive Perl Archive Network (CPAN) http://www.cpan.org/

01-11-06 38
References
• Recommended Books
– Beginner
» “Learning Perl”, 4th Edition by Randal Schwartz, Tom
Phoenix & Brian D Foy
» “Beginning Perl for Bioinformatics”, 1st Edition by James

Tisdall
» Edition by Cynthia Gibas & PerJambeck

» “Developing Bioinformatics Computer Skills”, 1st
01-11-06 39

Introduction To Programming in Perl

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Introduction To Programming in Perl

Enviado por

Direitos autorais:

Formatos disponíveis

Introduction to programming in Perl WS 2006/07: Bioinformatics I

Introduction to programming in Perl

Programming is breaking a task into small steps (divide and conquer).

Programs are written in a programming language such as :

Perl : Practical Extraction and Report Language

• Very high level language

• perl is the language compiler/interpreter program

• Widely used in Bioinformatics

#!/usr/bin/perl # shebang line

$number1 = $number1 + $number2; # $number1 contains 40

• $answer = 36; # an integer

Scalar = singular variable

Scalar Binary Operators

Name Example Result

Concatenation $s . $s “Perl” . “Perl” = “PerlPerl”

@data = (35, 12.4, “bye\n”, 1.7e23, ‘Hi’)

Array = plural variable

@let = (“J”, “P”, “S”, “D”, “C”);

pop $r=pop(@let) $r=“C”

shift $r=shift(@let) $r=“J”

splice @a=splice(@let,1,2) @a=(“P”,”S”)

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);

if ($names[$offset] eq “Simon”) { # block start for the if statement

Comparison Numeric String Return Value

“” is the empty string

• Any number is true except for 0.

• Any string is true except for “” and “0”.

• Anything that is not true is false.

Example Name Result

$a xor $b XOR True if $a or $b is true, false if both are true

Pay attention to precedence rule :

$xyz = $x || $y || $z is not the same as $xyz = $x or $y or $y

if (Expression) Block else Block

if (Expression) Block elsif (Expression) Block else Block

Search for a name

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);

while($names[$offset] ne “Simon”) { # block start for the while statement

print “Simon is found after $count trials\n”;

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);

Check for a name

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);

Statement while (Expression);

while (Expression) Block

for (Initialization; Expression; Incrementing) Block

$names{“John”} = 5; $names{“Peter”} = 3; $name{“Simon”} = 11

my %names = (“John”, 5, “Peter”, 3, “Simon”, 11, “Dave”, 1, “Chris”, 4);

Check for a name

exists exists $hash{$key} Returns true if $key is in %hash,

delete delete $hash{$key} Deletes $key => $hash{$key} from

keys keys %hash Returns a list consisting of all

values Values %hash Returns a list consisting of all

Getting user input

while ($line = <STDIN>) { # STDIN : Standard Input

print “Enter the filename: “;

open(IN, $filename) || die “Could not open $filename\n”;

Reading from a file

print “Enter the input file name : “;

open open FILEHANDLE, EXPR open a file to referred using

close Close FILEHANDLE Close the file associated with

print print [FILEHANDLE] LIST Print each element of LIST to

Example Name Result

$seq=“TATGAAA” $seq =~ /.*AAA$/ $seq =~ /.*A{3}$/

$seq=“TATGAAA” $seq =~ /.AAA$/ $seq =~ /.A{3}$/