Você está na página 1de 20

Introduction to programming in Perl WS 2006/07: Bioinformatics I

Introduction to programming in Perl

Nicodème Paul

Nicodeme.paul@unibas.ch

http://www2.biozentrum.unibas.ch/personal/schwede/Teaching/BixI-WS0607/frame.htm

01-11-06 1

What is programming ?
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Programming is breaking a task into small steps (divide and conquer).

Sum : 15 + 25 + 11 ?

15 + 25 + 11

40 + 11

51

Programs are written in a programming language such as :


Fortran, Pascal, C, C++, java, Perl, Python, ….

01-11-06 2
Program translator
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Computer

Processor

Compiler
Program 0101011
or
Interpreter

Memory

01-11-06 3

What is Perl ?
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Perl : Practical Extraction and Report Language


by Larry Wall (1987)

• Text-processing language

• Glue language

• Very high level language

• perl is the language compiler/interpreter program

01-11-06 4
Why do we use Perl?
Introduction to programming in Perl WS 2006/07: Bioinformatics I

• Simplicity

• Rapid prototyping

• Portability

• Widely used in Bioinformatics

01-11-06 5

A first example
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl # shebang line

# Pragmas
use strict; # Restrict unsafe constructs
use warnings; # Provide helpful diagnostics

# Assign 15 to $number1
my $number1 = 15;

# Assign 25 to $number2
my $number2 = 25;

# Assign 11 to $number3
my $number3 = 11;

$number1 = $number1 + $number2; # $number1 contains 40


$number1 = $number1 + $number3; # $number1 contains 51
print “My result is : $number1\n”; # Print the result on the terminal

01-11-06 6
Scalar Data Type
Introduction to programming in Perl WS 2006/07: Bioinformatics I

• $answer = 36; # an integer


• $pi = 3141659265 # a real number
• $avocados = 6.02e23; # scientific notation
• $language = “Perl”; # a string
• $sign1 = “$language is nice”; # string with interpolation
• $sign2 = ‘$language is nice’; # string without interpolation

Scalar = singular variable

$ S

01-11-06 7

Scalar Binary Operators


Introduction to programming in Perl WS 2006/07: Bioinformatics I

$u = 17 $v = 3 $s = “Perl”

Name Example Result


Addition $u + $v 17 + 3 = 20
Subtraction $u - $v 17 – 3 = 14
Multiplication $u * $v 17 * 3 = 51

Division $u / $v 17 / 3 = 5.66666666667

Modulus $u % $v 17 % 3 = 2
Exponentiation $u ** $v 17 ** 3 = 4913

Concatenation $s . $s “Perl” . “Perl” = “PerlPerl”


Repetition $s x n “Perl” x 3 = “PerlPerlPerl”

01-11-06 8
Scalar Unary Operators
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Numbers Strings

abs(expr) uc(expr)

sqrt(expr) lc(expr)

exit(expr) chop(variable)

exp(expr) chomp(variable)

int(expr) reverse(expr)

log(expr) length(expr)

¾ perldoc –f function_name
01-11-06 9

Context
Introduction to programming in Perl WS 2006/07: Bioinformatics I

$u = “12” + 5;
¾17

$u = “12john” +5;
¾17

$u = “john12” + 5;
¾5

use strict;

$u = “john12” + 5;
¾ Argument “john12” isn’t numeric in addition (+) at line 3
¾5

$u = “12” + 5;
¾17
01-11-06 10
Array data type
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Values

0 35

1 12.4

Indices
2 “bye\n”

3 1.7e23

4 ‘Hi’

$data[0] = 35; $data[1] = 12.4; $data[2] = “bye\n”; $data[3] = 1.7e23; $data[4] = ‘Hi’;

@data = (35, 12.4, “bye\n”, 1.7e23, ‘Hi’)

Array = plural variable

01-11-06
@ a 11

Array operators
Introduction to programming in Perl WS 2006/07: Bioinformatics I

@let = (“J”, “P”, “S”, “D”, “C”);

pop $r=pop(@let) $r=“C”


@let=(“J”,“P”,“S”,“D”)
push push(@let,“G”) @let=(“J”,“P”,“S”,“D”,”C”,“G”)

shift $r=shift(@let) $r=“J”


@let=(“P”,“S”,“D”,“C”)
unshift unshift(@let,”G”) @let=(“G”,“j”,“P”,“S”,“D”,“C”)

splice @a=splice(@let,1,2) @a=(“P”,”S”)


@let=(“J”,”D”,”C”)
join $r=join(‘:’,@let) $r=“J:P:S:D:C”
scalar $r=scalar(@let) $r=5
reverse @a=reverse(@let) @a=(“C”,”D”,”S”,”P”,”J”)

¾ perldoc –f function_name
01-11-06 12
Search for a name
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);


my $offset = int(rand(scalar(@names))); # random index in [0, …, 4], int(2.55) = 2

if ($names[$offset] eq “Simon”) { # block start for the if statement


print “Simon is found\n”;
print “Success!\n”;
} # block end for the if statement
else { # block start for the else statement
print “Simon is not found\n”;
print “Failed!\n”;
} # block end for the else statement

01-11-06 13

Comparison operators
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Comparison Numeric String Return Value


Equal == eq 1 if $a is equal to $b , otherwise “”
Not equal != ne 1 if $a is not equal to $b , otherwise “”
Less than < lt 1 if $a is less than $b , otherwise “”
Greater than > gt 1 if $a is greater than $b , otherwise “”
Less than or equal <= le 1 if $a is not greater than $b , otherwise “”
Greater than or equal >= ge 1 if $a is not less than $b , otherwise “”
Comparison <=> cmp 0 if $a and $b are equal, 1 if $a is greater,
-1 if $b is greater

“” is the empty string

01-11-06 14
What is true or false?
Introduction to programming in Perl WS 2006/07: Bioinformatics I

• Any number is true except for 0.

• Any string is true except for “” and “0”.

• Anything else converted to a true value string or a true value number is true.

• Anything that is not true is false.

01-11-06 15

Logical operators
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Example Name Result


$a && $b AND $a if $a is false, $b otherwise
$a || $b OR $a if $a is true, $b otherwise
! $a NOT True if $a is not true, false otherwise
$a and $b AND $a if $a is false, $b otherwise
$a or $b OR $a if $a is true, $b otherwise
not $a NOT True if $a is not true, false otherwise

$a xor $b XOR True if $a or $b is true, false if both are true

Pay attention to precedence rule :

$xyz = $x || $y || $z is not the same as $xyz = $x or $y or $y

! Use parentheses !

01-11-06 16
Conditional statements
Introduction to programming in Perl WS 2006/07: Bioinformatics I

• Simple

Statement if (Expression);

• Compound

if (Expression) Block

if (Expression) Block else Block

if (Expression) Block elsif (Expression) Block else Block

01-11-06 17

Search for a name


Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);


my $offset = int(rand(scalar(@names)));
my $count = 1;

while($names[$offset] ne “Simon”) { # block start for the while statement


$offset = int(rand(scalar(@names)));
$count = $count + 1;
} # block end for the while statement

print “Simon is found after $count trials\n”;

01-11-06 18
Check for a name
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);

for (my $i = 0; $i < scalar(@names); $i = $i + 1) { # block start for the for loop
if ($names[$i] eq “Simon”) {
print “Simon is found\n”;
}
} # end block for the for loop

01-11-06 19

Check for a name


Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my @names = (“John”, “Peter”, “Simon”, “Dave”, “Chris”);

for (my $i = 0; $i < scalar(@names); $i = $i + 1) { # block start for the for loop
if ($names[$i] eq “Simon”) {
print “Simon is found\n”;
last; # jump outside of the loop
}
} # end block for the for loop

01-11-06 20
Loop statements
Introduction to programming in Perl WS 2006/07: Bioinformatics I

• Simple

Statement while (Expression);

• Compound

while (Expression) Block

for (Initialization; Expression; Incrementing) Block

01-11-06 21

Hashes
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Values

John 5

Peter 3
Keys

Simon 11

Dave 1

Chris 4

%names

$names{“John”} = 5; $names{“Peter”} = 3; $name{“Simon”} = 11

$names{“Dave”} = 1 $names{“Chris”} = 4

01-11-06
% Key/value 22
Check for a name
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my %names = (“John”, 5, “Peter”, 3, “Simon”, 11, “Dave”, 1, “Chris”, 4);


my $key = “Simon”;

if (exists $names{$key}) { exists return true if the key is in %names otherwise false
print “$key is found, his value is : $names{$key}\n”;
}
else {
print “$key is not found\n”;
}

01-11-06 23

Check for a name


Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my %names = (
“John” => 5,
“Peter” => 3,
“Simon” => 11,
“Dave” => 0,
“Chris” => 4
);
my $key = “Simon”;

if (exists $names{$key}) {
print “$key is found, his value is : $names{$key}\n”;
}
else {
print “$key is not found\n”;
}

01-11-06 24
Hash operators
Introduction to programming in Perl WS 2006/07: Bioinformatics I

exists exists $hash{$key} Returns true if $key is in %hash,


otherwise it returns false

delete delete $hash{$key} Deletes $key => $hash{$key} from


%hash.
each each %hash Steps through a hash one key/value
pair at a time

keys keys %hash Returns a list consisting of all


the keys of %hash

values Values %hash Returns a list consisting of all


the keys of %hash

¾ perldoc –f function_name

01-11-06 25

Getting user input


Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my $line;
print “Type something : “;

while ($line = <STDIN>) { # STDIN : Standard Input


if ($line eq “\n”) {
print “That was just a blank line\n”;
}
else {
print “Input : $line”;
}
print “Type something : “;
}

¾ Ctr-C to exit
01-11-06 26
Reading from a file
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

print “Enter the filename: “;


Input file
my $filename = <STDIN>; # Read Standard Input for a
# filename
John
chomp($filename); # Remove the end of line character
Peter
Simon
if (! (-e $filename)) { # Test whether the file exists
Dave
print “File not found\n”;
Chris
exit 1;
}

open(IN, $filename) || die “Could not open $filename\n”;


my @names = <IN>; # Store the content of the file in an array
close(IN);

print @names;

01-11-06 27

Reading from a file


Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

print “Enter the input file name : “;


my $filename = <STDIN>; # Read Standard Input for a filename Input file : data.txt
chomp($filename); # Remove the end of line character
John 5
if (! (-e $filename)) {
print “File not found\n”;
Peter 3
exit 1; Simon 11
} Dave 1
my %names = (); Chris 4
my ($key, $value);
open(IN, $filename) || die “Could not open $filename\n”;
while ($line = <IN>) {
chomp($line);
($key, $values) = split(‘\t’, $line);
$name{$key} = $value;
}
close(IN);
$, = “ “; # It contains the separator for the print statement
print %names, “\n”;

01-11-06 28
Input and output functions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

open open FILEHANDLE, EXPR open a file to referred using


FILEHANDLE

close Close FILEHANDLE Close the file associated with


FILEHANDLE

print print [FILEHANDLE] LIST Print each element of LIST to


FILEHANDLE

¾ perldoc –f function_name

01-11-06 29

Testing files
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Example Name Result


-e $filename Exists True if file named in $a exists, otherwise false
-r $filename Readable True if file named in $a is readable, otherwise false
-w $filename Writable True if file named in $a is writable, otherwise false
-d $filename Directory True if file named in $a is a directory, otherwise false
-f $filename File True if file named in $a is a regular file, otherwise false
-T $filename Text File True if file named in $a is a text file, otherwise false

01-11-06 30
Regular expressions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

My $filename = “data.txt”;
my $line; Input file : data.txt
my %data = ();
my $key;
>id1
open(IN, $filename) || die “Could not open $filename\n”;
ATTGTC
while ($line = <IN>) { >id2
chomp($line); GGTCCT
if ($line =~ /^>/) { # check for ids using pattern matching >id3
$key = $line; TATGAAA
} >id4
else {
GTGTATA
data{$key} = $line;
}
}
close(IN);
my @ids = keys %data;
my @sequences = values %data;
$, = “ “;
print @ids, “\n”, @sequences, “\n”;
01-11-06 31

Regular expressions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

EXPR =~ m/PATTERN/
m// Operator (Matching): searches the string in the scalar EXPR (or $_) for
PATTERN; in scalar context the operator returns true (1) if successful, false (””)
otherwise; in list context m// returns a list of substrings matched by any
capturing parentheses in PATTERN; PATTERN undergoes double-quote
interpolation.
$line = “>id1” => $line =~ /^>/

VAR =~ s/PATTERN/REPLACEMENT/
s/// Operator (Substitution): searches the string in scalar variable VAR (or $_) for
PATTERN and, if found, replaces the matched substring with the
REPLACEMENT text; in scalar and list context s// returns the number of times it
succeeded; both PATTERN and REPLACEMENT undergo double-quote
interpolation.
$line = “>id1” => $line =~ s/>//

VAR =~ tr/SEARCHLIST/REPLACEMENTLIST/
tr/// Operator (Transliteration): scans the string in scalar variable VAR (or $_) ,
character by character, and replaces each occurrence of a character found in
SEARCHLIST with the corresponding character in REPLACEMENT list; in scalar
and list context tr// returns the number of characters replaced or deleted;
SEARCHLIST is NOT a regular expression and both SEARCHLIST and
REPLACEMENT list do not undergo full double-quote interpolation (backslash
sequences but no variable interpolation).
$line = “id1” => $line =~ tr/a-z/A-Z/

01-11-06 32
Regular expressions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl

use strict;
use warnings;

my $filename = “data.txt”;
my $line;
my %data = (); Input file : data.txt
my $key;
>id1
open(IN, $filename) || die “Could not open $filename\n”;
while ($line = <IN>) { ATTGTC
chomp($line); >id2
if ($line =~ /^>/) { #check for ids using pattern matching GGTCCT
$line =~ s/>//; #substitute > by nothing in id
$line =~ tr/a-z/A-Z/; #translate lower case to upper case
>id3
$key = $line; TATGAAA
} >id4
else { GTGTATA
data{$key} = $line;
}
}
close(IN);
my @ids = keys %data;
my @sequences = values %data;
$, = “ “;
print @ids, “\n”, @sequences, “\n”;

01-11-06 33

Regular expressions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Symbol Meaning
\... Used to escape metacharacters (including itself) or to make the
next character a metacharacter (like \s, \w, \n)
...|... Alternation (match one or the other)
(...) Grouping (treat as a unit)
[...] Character class (match one character from a set)
^ True at the beginning of string (or sometimes after any newline)
$ True at the end of the string (or sometimes before any newline)
. Match any one character (except newline, normally)

$seq =~ /AAA$/

01-11-06 34
Regular expressions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Quantifier Meaning
* Match 0 or more times (maximal)
+ Match 1 or more times (maximal)
? Match 0 or 1 time (maximal)
{COUNT} Match exactly COUNT times
{MIN,} Match at least MIN times (maximal)
{MIN,MAX} Match at least MIN times but not more than MAX times (maximal)
*? Match 0 or more times (minimal)
+? Match 1 or more times (minimal)
?? Match 0 or 1 time (minimal)
{MIN,}? Match at least MIN times (minimal)
{MIN,MAX}? Match at least MIN times but not more than MAX times (minimal)

$seq=“TATGAAA” $seq =~ /.*AAA$/ $seq =~ /.*A{3}$/

01-11-06 35

Regular expressions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

Symbol Meaning Character Class


\d Digit [0-9]

\D Nondigit [^0-9]
\s Whitespace [ \t\n\r\f]
\S Nonwhitespace [^ \t\n\r\f]

\w Word character [a-zA-Z0-9_]

\W Non-(word character) [^a-zA-Z0-9_]

$id = “id2” $id =~ /id\d+$/

01-11-06 36
Subroutines or functions
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl
use strict;
use warnings; Input file : data1.txt

my $filename1 = “data1.txt”;
>id1
my $filename2 = “data2.txt”;
my %data1 = get_data($filename1); #subroutine call ATTGTC
my %data2 = get_data($filename2); #subroutine call >id2
$, = “ “; GGTCCT
print keys %data1, “\n”, values %data2, “\n”; >id3
print keys %data1, “\n”, values %data2, “\n”; TATGAAA
sub get_data { >id4
my $filename = shift(@_); GTGTATA
my $key;
my %tmp = ();
open(IN, $filename) || die “Could not open $filename\n”;
while (my $line = <IN>) { Input file : data2.txt
chomp($line);
if ($line =~ /^>/) {
$line =~ s/>//; >id5
$line =~ tr/a-z/A-Z/; ATAAAAA
$key = $line; >id6
}
else { GGAATTT
$tmp{$key} = $line; >id7
} TATGATT
} >id8
close(IN); GTGTAAT
return %tmp;
}
01-11-06 37

Packages
Introduction to programming in Perl WS 2006/07: Bioinformatics I

#!/usr/bin/perl package MyTools;

use strict; sub get_data {


use warnings; my $filename = shift(@_);
use MyTools; my $key;
my %tmp = ();
my $filename1 = “data1.txt”; open(IN, $filename) || die “Could not open
my $filename2 = “data2.txt”; $filename\n”;
while (my $line = <IN>) {
my %data1 = MyTools::get_data($filename1); chomp($line);
my %data2 = MyTools::get_data($filename2); if ($line =~ /^>/) {
$line =~ s/>//;
$, = “ “; # set the print separator $key = $line;
}
print keys %data1, “\n”, values %data1, “\n”; else {
print keys %data2, “\n”, values %data2, “\n”; $tmp{$key} = $line;
}
}
close(IN);
return %tmp;
}
1; # this should be your last line

Comprehensive Perl Archive Network (CPAN) http://www.cpan.org/


01-11-06 38
References
Introduction to programming in Perl WS 2006/07: Bioinformatics I

• Recommended Books
– Beginner
» “Learning Perl”, 4th Edition by Randal Schwartz, Tom
Phoenix & Brian D Foy

» “Beginning Perl for Bioinformatics”, 1st Edition by James


Tisdall

» Edition by Cynthia Gibas & PerJambeck


» “Developing Bioinformatics Computer Skills”, 1st
01-11-06 39

Você também pode gostar