SlideShare ist ein Scribd-Unternehmen logo
1 von 77
Downloaden Sie, um offline zu lesen
Institute of Bioinformatics
Johannes Kepler University Linz




                                  Perl
               A Short Introduction for Bioinformaticians
What is Perl?


           Perl is a simple and easy-to-use programming language
           Perl is an interpreted (scripting) language
           Perl is (almost) platform-independent
           Perl is free of charge
           Perl is a common standard in bioinformatics, language pro-
           cessing, and Web programming




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians     2
Perl’s Advantages


           Platform-independent
           Free of charge
           Only minor hardware and software requirements
           Powerful elements for string processing (regular expressions)
           and hash tables allow concise algorithms
           Quick and easy solutions are possible




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians        3
Perl’s Disadvantages


           Richness and compactness of language facilitates difficult-to-
           read programs
           No stand-alone programs without additional software
           Slower than compiled code
           Perl is sometimes wasteful with memory
           Perl’s built-in strings, lists, and hash tables sometimes hide po-
           tential performance problems
           Therefore, Perl cannot handle as large problems as some other
           programming languages


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians             4
Use Perl for. . .


           Small rapid prototyping solutions
           Applications that require nifty string processing
           Small to medium-sized problems
           Applications with moderate performance and memory require-
           ments




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians     5
Do Not Use Perl for. . .


           Large software projects (performance, stability, quality, main-
           tainability)
           Applications in which performance and memory consumption
           are most important factors




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians          6
Why Perl in Bioinformatics


           String processing capabilities and hash tables
           Easy, also for biologists and other people outside computer sci-
           ence




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           7
What Else is Perl Used For?


           Very common in Web programming (offers very good database
           and networking integration)
           Perl can also serve as a more powerful replacement of UNIX
           shell scripts or DOS/Windows batch files
           In this introduction, we concentrate on standard elements that
           will be necessary for bioinformatics applications




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians         8
Setting Up Your Perl System


           What you need: (1) Perl software, (2) text editor
           On UNIX/Linux systems, probably everything is pre-installed
           Windows/MacOS
                  Get ActivePerl and install it:
                  http://www.activestate.com/Products/
                  ActivePerl/
                  Choose your favorite text editor (e.g. UltraEdit, TextPad,
                  XEmacs)
                  Make sure that perl is in your default search path



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians            9
Documentation


           Online:
           http://www.perl.org/docs.html
           http://perldoc.perl.org/perlintro.pdf
           http://perldoc.perl.org/index-tutorials.html
           http://perldoc.perl.org/index-functions.html
           Program perldoc that is part of every Perl system




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   10
First Steps


           Comments start with #
           For compatibility with UNIX/Linux systems, it is advisable to start the
           program with the so-called She’Bang line:
                  #!/usr/bin/perl
           For better checking, you are advised to add the following two lines to
           your program (right after the She’Bang line):
                  use strict;
                  use warnings;
           Statements need to be closed with a semicolon, whitespaces are
           irrelevant
           In the simplest case, terminal output is issued with the print com-
           mand
Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                  11
Data — Variables


           There are three basic types of variables in Perl. Unlike other pro-
           gramming languages, the type is identified with a special character
           that prefixes the variable name:
           Scalars: prefixed with “$” e.g. $num
           Arrays: prefixed with “@” e.g. @list
           Hashes: prefixed with ‘%” e.g. %hashtable
           If using use strict;, variables have to be declared before they are
           used. This is done with my.
           The most basic operator is the assignment operator =. It copies the
           content of one variable into the other (possibly with a conversion).



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians               12
Scalars


           Among scalars, there is no specific kind of type checking!
           Scalars can have three different kinds of meanings:
            1. Numbers
            2. Strings
            3. References




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians    13
Numbers


           There is no explicit distinction between integer and floating
           point numbers, this is handled implicitly
           Examples:
                  $num = 3;
                  $pi = 3.141592654;
                  $mio = 1.e6;


           Arithmetic operators: +, −, ∗, /, %, may be used in conjunction
           with assignments, i.e. + =, − =, ∗ =, / =, % =
           Increment/decrement operators: ++, −−

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians          14
String Constants


           Double quotes or single quotes may be used around strings:
                  ’Hello world’
                  "Hello world"
           The difference is that strings with single quotes are taken lit-
           erally. Strings with double quotes are interpreted, i.e. variable
           names and special characters are translated first
           Common operator: concatenation operator ., may also be
           used in conjunction with assignments .=




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians            15
Arrays


           Arrays need not have a predefined size
           Assignment operators work on individual elements (scalars) or with
           whole lists, e.g.
                  @list = ("Zicke", "Zacke", 123);
           The index starts with 0
           Memory allocation is done on demand; Be aware: when you first
           create the fifth element, the whole list from the first to the fifth element
           is allocated (i.e. elements 0. . . 4)
           Accessing single elements:
                  $elem = $list[2];
           Accessing multiple elements is also possible, e.g. @list[0,1],
           @list[1..3]
Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                    16
Arrays (cont’d)


           Assignments also work in the following way:
                  ($firstelem, @remaining) = @list;
           Special operators:
                  Assignment to scalar gives number of elements, e.g.
                    $no = @list;
                  Index of last element:
                    $index = $#list;
                    $elem = $list[$#list];
           Note that lists are always flattened, e.g.
                  @list = ("a", "b", ("c", "d"));
           is the same as
                  @list = ("a", "b", "c", "d");
Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians     17
Arrays: Stacks and Queues


           Special functions for adding/removing elements from arrays:
                  push appends (a) new element(s) at the end of the array, e.g.
                    push(@list, "tralala");
                    push(@list, ("hui", 1));
                  pop returns the last element and removes it from the list, e.g.
                    $elem = pop(@list);
                  shift returns the first element and removes it from the
                    $elem = shift(@list);
                  unshift inserts (a) new element(s) at the beginning of an array
                    unshift(@list, "tralala");
                    unshift(@list, ("hui", 1));
           “Killing” an array: @list = ();

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 18
Arrays: The splice Function


  The splice function allows to remove a part of an array or to replace it
  by another list.
      Example with four arguments:
                  my @list = ("u", "v", "w", "x", "y", "z", "0", "1", "2");
                  splice(@list, 3, 4, ("a", "b"));

           removes the four elements from no. 3 on (i.e. elements [3..6]) and
           replaces them by two elements, "a" and "b". Finally, @list is (“u”,
           “v”, “w”, “a”, “b”, “1”, “2”).
           Example with three arguments:
                  my @list = ("u", "v", "w", "x", "y", "z", "0", "1", "2");
                  splice(@list, 3, 4);

           removes the four elements from no. 3 onwards. Finally, @list is (“u”,
           “v”, “w”, “1”, “2”).


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                19
Arrays: The splice Function (cont’d)


           Example with two arguments:
                  my @list = ("u", "v", "w", "x", "y", "z", "0", "1", "2");
                  splice(@list, 3);

           removes all elements from no. 3 onwards. Finally, @list is (“u”, “v”,
           “w”).
           A negative offset −i as second argument means to start from the i-th
           to the last element.
           See also
           http://perldoc.perl.org/functions/splice.html




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                20
Hashes


           Like arrays, hashes are collections of scalars, however, with the dif-
           ference that they are not ordered; individual elements can be ac-
           cessed by arbitrary scalars, so-called keys
           Assignment operators work on individual elements (scalars), e.g.,
                  $color{"apple"} = "red";
           or with whole lists, e.g.
                  %color = ("apple", "red", "banana",
                  "yellow");
           which is equivalent to the more readable
                  %color = (apple => "red", banana =>
                  "yellow");


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 21
Hashes (cont’d)


           Memory allocation is done on demand
           Special functions:
                  List of keys:
                     @keylist = keys %color;
                  List of values:
                     @vallist = values %color;
                  Deleting a hash entry:
                    delete $color{’apple’};
                  Checking whether a hash entry exists:
                    exists $color{’apple’}




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   22
Control Structures: if and unless


           Single if:
                  if ( expression )
                  {
                    ...
                  }
           unless (expression negated):
                  unless ( expression )
                  {
                    ...
                  }




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   23
Control Structures: if and unless (cont’d)


           Note that the curly braces are obligatory; conditional statements,
           however, can also be written in single lines:
                  ...           if ( expression );
                  ...           unless ( expression );




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians             24
Control Structures: if/else


           if/else
                  if ( expression )
                  {
                    ...
                  }
                  else
                  {
                    ...
                  }




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   25
Control Structures: if/elsif/else


           if/elsif/else:
                  if ( expression1 )
                  {
                    ...
                  }
                  elsif ( expression2 )
                  {
                    ...
                  }
                  else
                  {
                    ...
                  }
Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   26
Control Structures: while and until


           while:
                  while ( expression )
                  {
                    ...
                  }
           until (expression negated):
                  until ( expression )
                  {
                    ...
                  }




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   27
Control Structures: while and until
  (cont’d)

           Note that the curly braces are obligatory; while and until loops, how-
           ever, can also be written in single lines:
                  ...           while ( expression );
                  ...           until ( expression );




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 28
Control Structures: for and foreach (1/3)


           for (the same as in C):
                  for ( init ; condition ; increment )
                  {
                    ...
                  }
           foreach:
                  foreach $variable (@list)
                  {
                    ...
                  }


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   29
Control Structures: for and foreach (2/3)


           foreach with fixed list:
                  foreach $key ("tralala", 3, 2, 1, 0)
                  {
                    ...
                  }
           Simple repetions:
                  foreach (1..15)
                  {
                    ...
                  }


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   30
Control Structures: for and foreach (3/3)


           Note that the curly braces are obligatory; foreach loops, how-
           ever, can also be written in single lines:
                  ...             foreach $variable (@list));
           Not allowed for for loops!




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians         31
Control Structures: Miscellaneous


           next stops execution of loop body and moves to the condition again
           (like continue in C)
           last stops execution of loop (like break in C)
           redo stops execution of loop body and executes the loop body again
           without checking the condition again
           The previous three commands apply to the innermost loop by default;
           this can be altered using labels
           A goto statement is available as well (deprecated!), also next and
           last can be abused in similar ways
           A continue block can be added to while, until and foreach loops



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians              32
Expressions


           Logical operators: &&, ||, !
           Numeric comparison: ==, !=, <, >, <=, >=
           String comparison: eq, ne, lt, gt, le, ge

  Truth and falsehood:
           0, ’0’ , ”, () and undef are interpreted as false if they are the
           result of the evaluation of an expression, all other values are
           interpreted as true




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians            33
Special Variables


  $_: default input; many (if not most) built-in Perl functions use this vari-
      able as default input if no argument is specified; also used as default
      variable in foreach loops
  @_: list of input arguments in sub-routines (see later)
  @ARGV: list of command line arguments; NOTE: unlike in C, $ARGV[0]
     is the first command line argument, not the program name
  $0: the name of the program being executed (like argv[0] in C)
  $1, $2, . . . : pattern matches (see later)

  Note that there are a whole lot more special variables, but they are not
  important for us at this point.


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians              34
Control Structures: sub-routines


           Subroutines are simply written as follows:
                  sub name
                  {
                    ...
                  }
           Recursions are allowed
           Note that there is no typechecking, not even the number of argu-
           ments needs to be fixed
           Arguments are passed through the special list @_
           Return values are optional and need to be passed with return



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           35
Control Structures: sub-routines (cont’d)


           Calls of sub-routines can be prefixed with &
           Arguments need to be separated by commas and can (but
           need not!) be embraced with parentheses; however, the use
           of parentheses is highly recommended
           Like in C, Perl uses call by value




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians    36
Control Structures: sub-routines example


           #!/usr/bin/perl -w
           use strict;

           sub divide
           {
             my ($enumerator,$denominator) = @_;
             return $enumerator/$denominator;
           }

           ...
           my $quotient = &divide(4, 2);
           ...


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   37
Variable Scoping


           Declaring/using a variable in the main program outside any block cre-
           ates a global variable
           Declarations inside a sub-routine create a local variable
           Declaring/using a variable in a block creates a temporary variable
           Use use strict to enforce declarations
           Try to use temporary/local variables where possible and avoid the
           use of global variables where possible
           Local variables are destroyed/de-allocated as soon as the execution
           of their scope (block/sub-routine) is finished (with the only exception
           if there are references to this local variable)



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 38
Contexts


  In Perl, contexts determine how certain expressions are interpreted
           Scalar contexts:
                  Numerical context
                  String context
                  Boolean context
           List context




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians     39
References


           A reference is a scalar that “points” to a certain variable (scalar, list
           or hash)
           A reference is created by prefixing a variable with a backslash “”,
           e.g.
                  $ref_to_scalar = $scalar;
                  $ref_to_array = @array;
                  $ref_to_hash = %hash;
           De-referencing, i.e. getting back the value, is done by prefixing the
           reference with the appropriate prefix, e.g.
                  $scalar = ${$ref_to_scalar};
                  @array = @{$ref_to_array};
                  %hash = %{$ref_to_hash};

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                    40
References (cont’d)


           The curly braces may also be omitted when de-referencing
           References are useful to efficiently pass large arguments to sub-
           routines
           Thereby, call by reference is realized
           Accessing items in de-referenced arrays and hashes
           may be clumsy,    therefore ${$ref_to_array}[2] and
           ${$ref_to_hash}{"key"}     may  also  be   written as
           $ref_to_array->[2] and $ref_to_hash->{"key"}




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           41
References (cont’d)


           The curly braces may also be omitted when de-referencing
           References are useful to efficiently pass large arguments to sub-
           routines
           Thereby, call by reference is realized
           Accessing items in de-referenced arrays and hashes
           may be clumsy,    therefore ${$ref_to_array}[2] and
           ${$ref_to_hash}{"key"}     may  also  be   written as
           $ref_to_array->[2] and $ref_to_hash->{"key"}




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           42
Matrices


           Brackets create a reference to an anonymous array; by this trick,
           matrices can be realized easily, e.g.
                  $array[0] = [1, 2, 3];
                  $array[1] = [4, 5, 6];
                  $array[2] = [7, 8, 9];
                  print "$array[1][2]"; # prints 6
           More easily:
                  @array = ([1, 2, 3], [4, 5, 6], [7, 8, 9]);
           This also works without a one-for-all assignment, only by assigning
           values to individual elements




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians              43
Hashes of Hashes


           Curly braces create a reference to an anonymous hash; by this trick,
           hashes of hashes can be realized relatively easily, e.g.
                  %hash = (j => {a=>1, b=>2, c=>3},
                           k => {d=>6, e=>5, f=>10});
           This also works without a one-for-all assignment, only by assigning
           values to individual elements, e.g.
                  $hash{j}{a} = 1;
                  $hash{k}{e} = 5;
           Note that, in such a case, $hash{j} is not a hash, but a reference;
           therefore, something like keys $hash{j} will not work. To get the
           keys of the entries in the second level, one has to use something like
                  keys %{$hash{j}}

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 44
A Note on the Lifetime of Local Variables


           Usually, local variables are destroyed/de-allocated as soon as
           the execution of their scope (block/sub-routine) is finished
           This means that references to local variables would point to
           nirvana after their scope’s execution is finished
           That is not how it works; instead, Perl uses a reference counter
           for each variable. A local variable remains existing until there
           is no reference pointing to it anymore.




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           45
A Note on the Lifetime of Local Variables
  (cont’d)

  Consider the following example:
           my @matrix;
           for(my $i = 0; $i < 10; $i++)
           {
              my @line = (0..9);
              $matrix[$i] = @line;
           }

  The full 10 × 10 matrix can be used safely, even if the array’s lines
  are lists created as local variables.




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians       46
File/Console IO (1/5)


           Similar to other programming languages, Perl uses file handles
           The following file handles are open and usable by default: STDIN,
           STDOUT, STDERR
           The open function is used to open a file handle:
                  open(INPUT, "< $filename1"); # open for reading
                  open(OUTPUT, "> $filename2"); # write new content
                  open(LOGFILE, ">> $filename2"); # append
           open returns 0 on failure and a non-zero value on success




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           47
File/Console IO (2/5)


           File handles are closed with close, e.g.
                  close(INPUT);
                  close(OUTPUT);
                  close(LOGFILE);
           close returns 0 on failure and a non-zero value on success
           open and close can also be used to handle pipes, e.g.
                  open(INPUT, "ls -al |"); # get directory listing
                  open(OUTPUT, "| more"); # list output by page




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians     48
File/Console IO (3/5)


           The print function can be used to write to a file, e.g.
                  print OUTPUT "Tralala";
           Note the missing comma after the file handle!
           In scalar context, the input operator <> reads one line from the spec-
           ified file handle, e.g.
                  $line = <INPUT>;
           reads one line from the file handle INPUT, whereas
                  $line = <STDIN>;
           reads one line from the console input.
           Note that, in the above examples, $line still contains the trailing
           newline character; it can be removed with the chomp function, e.g.
                  chomp $line;
Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 49
File/Console IO (4/5)


           In list context, the input operator <> reads the whole file at once, e.g.
                  @whole = <INPUT>;
           reads the whole file INPUT line by line, where each line is one item
           in the list @whole is one line
           All lines in @whole still contain the the trailing newline characters;
           they can be removed with the chomp function, e.g.
                  chomp @whole;
           removes the trailing newline character from all lines in @whole
           This way of reading a file may be very comfortable. Note, however,
           that it requires reading the whole file into the memory at once, which
           may be infeasible for larger files


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                   50
File/Console IO (5/5)


           If the input operator <> is used without specifying a variable, the next
           line is placed in the default variable $_
           Of course, $_ still contains the trailing newline character; it can be
           removed with the chomp function, this time without any arguments
           (because chomp takes $_ by default anyway)
           Example of a program fragment that reads input from the console
           input line by line:
                  while (<STDIN>)
                  {
                    ... # $_ contains last input line with newline
                    chomp;
                    ... # $_ contains last input line without newline
                  }

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                   51
Regular Expressions: Basics (1/3)


           Regular expressions are an amazingly powerful tool for string pattern
           matching and replacement
           Regular expressions are usual in some other software products (par-
           ticularly in the UNIX world), but Perl offers one of the most advanced
           and powerful variants
           Simple regular expressions are embraced with slashes
           Regular expression are most often used in conjunction with the two
           operators =˜ and !˜; the former checks whether a pattern is found
           and the latter checks whether the pattern is not present, e.g.
                  (’Hello world’ =˜ /Hell/) # evaluates to true
                  (’Hello world’ !˜ /Hell/) # evaluates to false


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 52
Regular Expressions: Basics (2/3)


           The examples on the previous slide show how to search for
           certain constant string parts (“Hell” in these examples)
           The search pattern can also (partly) be a variable
           Search patterns, however, need not be string constants; there
           is a host of meta-characters for constructing more advanced
           searches: {}[]()ˆ$.|*+?
           Using meta-characters as ordinary search expressions re-
           quires prefixing them with a backslash




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians        53
Regular Expressions: Basics (3/3)


           A regular expression can also be written synonymously as the
           search operator m//, e.g. /world/ and m/world/ mean the
           same thing. The m// operator has the advantage that other
           characters for embracing the regular expression can also be
           used (instead of only slashes), e.g. m!!, m{} (note the braces!)




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           54
Regular Expressions: Matching
  Beginnings and Ends

           The ˆ meta-character is used to require that the string begins with
           the search pattern, e.g.
                  (’Hello world’ =˜ /ˆHell/) # matches
                  (’Hello world’ =˜ /ˆworld/) # does not match
           The $ meta-character is used to require that the string ends with the
           search pattern, e.g.
                  (’Hello world’ =˜ /Hell$/) # does not match
                  (’Hello world’ =˜ /rld$/) # matches
           The two meta-characters ˆ and $ can also be used together




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                55
Regular Expressions: Character Classes


           Character classes can be defined between brackets; as a simple ex-
           ample, /[bcr]at/ matches “bat”, “cat” and “rat”
           Character classes may also be ranges in the present character cod-
           ing (e.g. ASCII), e.g. /index[0-2]/ matches “index0”, “index1” and
           “index2”; /[0-9a-fA-F]/ matches a hexadecimal digit
           Using the meta-character ˆ in the first place of a character class
           means negation, e.g. /[ˆ0-9]/ matches all non-numerical charac-
           ters




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians             56
Regular Expressions: Predefined
  Character Classes

           d: numerical character, short for [0-9]
           D: non-numerical character, short for [ˆ0-9], equivalently [ˆd]
           s: whitespace character, short for [ trnf]
           S: non-whitespace character, short for [ˆ trnf], equivalently
           [ˆs]
           w: word character (alphanumeric or _), short for [0-9a-zA-Z_]
           W: non-word character, short for [ˆ0-9a-zA-Z_], equivalently [ˆw]
           The period . matches every character except the newline character n
           b: matches a boundary between a word and a non-word character or be-
           tween a non-word and word character; this is useful for checking whether a
           match occurs at the beginning or end of a word



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                     57
Regular Expressions: Variants and
  Grouping

           The stroke character | is used as the so-called alternation meta-
           character, e.g. /dog|cat|rat/ matches “dog”, “cat” and “rat”, and
           so does /dog|[cr]at/
           Parentheses that serve as so-called grouping meta-characters can
           be used for grouping alternatives in parts of the search pattern, e.g.
           /house(dog|cat)/ matches “housedog” and “housecat”
           Alternatives may also be empty, e.g. /house(cat|)/ matches
           “housecat” and “house”
           Groupings may also be nested, e.g. /house(cat(s|)|)/ matches
           “housecats”, “housecat” and “house”




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 58
Regular Expressions: Quantifiers


           So far, we were only able to search only for patterns with a relatively
           fixed structure, but we were not able to deal with repetitions in a flex-
           ible way; that is what quantifiers are good for
           The following quantifiers are available:
           ? match 1 or 0 times
           * match any number of times
           + match at least once
           {n,m} match at least n and at most m times
           {n,} match at least n times
           {n} match exactly n times



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                  59
Regular Expressions: Quantifiers (cont’d)


  Examples:
           /0x[0-9a-fA-F]+/ matches hexadecimal numbers
           /[-+]?d+/ matches integer numbers
           /[-+]?d+.d+/ matches numbers
           /d{2}.d{2}.d{4}/ matches dates in DD.MM.YYYY format




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   60
Regular Expressions: Extracting Matches
  (1/3)

           Parentheses also serve for a different purpose: they allow extracting
           the relevant part of the string that matched; for that purpose, the
           special variables $1, $2, etc. are employed
           More specifically, Perl seeks the first match in the string and puts
           those parts into the special variables $1, $2, etc. that match
           Example: after evaluating
                  (’Hello you!’                             =˜ /Hello (world|you.)/)
           $1 has the value “you!”
           Another example: after evaluating
                  (’AGCTTATATGCATATATAT’ =˜ /T(.T|.A)T(.T|A)/)
           $1 has the value “TA” and $2 has the value “AT”

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                    61
Regular Expressions: Extracting Matches
  (2/3)

           Assigning an evaluation of a regular expression to a list stores the
           matching parts in the list (i.e. the regular expression is evaluated in
           list context)
           Example: after
                  @list = (’Hello you!’                               =˜ /Hello (world|you.)/)
           @list has the value (“you!”) and after
                  @list = (’AGCTTATATGCATATATAT’ =˜ /T(.T|.A)T(.T|A)/)
           @list has the value (“TA”, “AT”)




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                              62
Regular Expressions: Extracting Matches
  (3/3)

           Note that the extraction of matches discussed until now only con-
           cerns the first match of the regular expression in the string — ex-
           tracting all matches is a different story (see later)!
           Example: after
                  $string = ’red = 0xFF0000, blue = 0x0000FF’;
                  @list = ($string =˜ /0x([0-9a-fA-F]{6})/)
           @list will have the value (“FF0000”), but there is presently no way
           to access the second match
           Further note that quantifiers are greedy in the sense that they try to
           match as many items as possible; example: after
              (’<B>My homepage</B>’ =˜ /<(.*)>/)
           $1 has the value “B>My homepage</B”
Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                63
Regular Expressions: Quantifiers
  Revisited

           Suffixing quantifiers by “?” makes them non-greedy
           ?? match 1 or 0 times, try 0 first
           *? match any number of times, but as few times as possible
           +? match at least once, but as few times as possible
           {n,m}? match at least n and at most m times, but as few times as
              possible
           {n,}? match at least n times, but as few times as possible
           {n}? is allowed, but obviously it has the same meaning as {n}
           Example: after
              (’<B>My homepage</B>’ =˜ /<(.*?)>/)
           $1 has the value “B”

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians           64
Regular Expressions: Modifiers


           A modifier can be appended to a regular expression to alter its inter-
           pretation; the following are the most important modifiers:
           i case-insensitive matching
           m treat string as collection of individual lines; interpret n as newline
              character, ˆ and $ match individual lines
           s treat string as one line; . meta-character also matches n
           x whitespaces and comments inside regular expression are not in-
              terpreted
           g allow extraction of all occurrences (see later)
           Example: /a/i matches all strings containing ‘a’ or ‘A’



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                    65
Regular Expressions: Extracting All
  Matches (1/3)

           If the g modifier is specified, Perl internally keeps track of the string
           position
           If a regular expression is applied to the same string again, the search
           starts at that position where the last search stopped
           This can be repeated until the last search has been found
           Before the regular expression is evaluated first, the string position is
           undef
           If the string is changed between the regular expression is applied,
           the position is also set to undef
           In scalar context, a regular expression with g modifier returns false if
           the pattern has not been found and true if it has been found

Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                  66
Regular Expressions: Extracting All
  Matches (2/3)

           Example:
                  while ($string =˜ /0x([0-9a-fA-F]+)/g)
                  {
                    print "$1n";
                  }
           This loop extracts all hecadecimal numbers from $string and prints
           them line by line without the “0x” prefix
           For special purposes, the actual position can be determined with the
           function pos; however, note that, after each occurrence, the position
           points to the next character after the previous match (if counting of
           characters starts at 0)!



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                67
Regular Expressions: Extracting All
  Matches (3/3)

           In list context, the g modifier can be used to extract all matches and
           store them in a list
           If there are no groupings, all matches of the whole regular expression
           are placed in the list
           If there are nested groupings, all matches of all groupings are placed
           in the list in the order in which they are matched, where matches of
           outer groups precede matches of inner groups
           Example: after
                  @list = ("AGC GAT TGA GAG" =˜ /(G(A(T|G)))/g);
           @list has the value (“GAT”, “AT”, “T”, ‘GAG”, “AG”, “G”)



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                 68
Regular Expressions: Replacements


           Regular expressions also facilitate powerful replacement
           mechanisms; this is accomplished with the s/// operator
           Example: after
                  $sequence = "AGCGTAGTATAGAG";
                  $sequence =˜ s/T/U/;
           $sequence has the value “AGCGUAGTATAGAG”
           The s/// operator returns the number of replacements




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   69
Regular Expressions: Replacements
  (cont’d)

           Modifiers as introduced above work analogously; not surprisingly, the
           g modifiers allows to replace all occurrences of the search string, e.g.
           after
                  $sequence = "AGCGTAGTATAGAG";
                  $sequence =˜ s/T/U/g;
           $sequence has the value “AGCGUAGUAUAGAG”
           The special variables $1, $2, $3, etc. allow very tricky replacements,
           e.g. with
                  s/(d{2}).(d{2}).(d{4})/$3-$2-$1/g
           one can convert all dates from DD.MM.YYYY format to YYYY-MM-
           DD format


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                  70
Regular Expressions: Transliteral
  Replacements

           The operators tr/// and y/// (both are equivalent) are available
           to perform so-called transliteral replacements, i.e. the translation of
           single characters according to a replacement list
           Example: tr/AB/BC/, at once, replaces all “A”s with “B”s and all
           “B”s with “C”s
           Note that this functionality cannot be realized easily with the replace-
           ment operator s///
           It is also possible to specify ranges, e.g. tr/a-f/0-5/




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                   71
Regular Expressions: The split
  Function

           The highly useful split function can be used to split a string into
           parts that are separated by certain characters or patterns; it returns
           a list of split strings
           Example: after
                  @fragments = split(/TGA/, "GCATGACGATGATATA");
           @fragments has the value (“GCA”, “CGA”, “TATA”)
           As obvious from the above example, the first argument is a regular
           expression at which the string is split; note that the split pattern is
           omitted in the split list
           Nor surprisingly, there is no restriction to fixed search patterns, e.g.
                  split(/s+/, $string);
           splits $string into single words
Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                  72
Regular Expressions: The split
  Function (cont’d)

           The split function has an optional third argument with which the max-
           imal number of splits can be controlled
           Example: after
                  @fragments = split(/TGA/, "GCATGACGATGATATA", 2);
           @fragments has the value (“GCA”, “CGATGATATA”)
           The join function is the converse function, i.e. it assembles a list
           of strings into one large string, where a separating character can be
           inserted; e.g. after
                  @list = ("tri", "tra", "tralala");
                  $string = join(’:’, @list);
           $string has the value “tri:tra:tralala”.


Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians                73
Useful Functions Not Previously
  Mentioned

  abs, atan2, cos, exp, log, sin, sqrt: usual mathematical functions
  int: converts a floating point number to an integer number (truncates!)
  printf: allows more flexibility for formatted output than print
  length: get the length of a string
  reverse: reverse a string or array
  sort: sort an array
  system: run external program
  time: get system time (mostly the number of seconds since Jan 1, 1970)
  localtime: convert system time into the actual local time



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians        74
Final Remarks


           The present slides are only an overview tutorial that concen-
           trates on elements of Perl that are useful for small bioinformat-
           ics applications
           Perl actually offers a lot more possibilities
           For more details, discover the world of Perl at
           http://www.perl.org/
           Theory is good, but not sufficient — a programming language
           can only be learned by experience




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians            75
References and Further Reading


     1. Robert Kirrily: perlintro — a brief introduction and overview of Perl
        http://perldoc.perl.org/perlintro.html
     2. perlsyn — Perl syntax (author unknown)
        http://perldoc.perl.org/perlsyn.html
     3. Mark Kvale: perlretut — Perl regular expressions tutorial
        http://perldoc.perl.org/perlretut.html
     4. Ian Truskett: perlreref — Perl regular expressions reference
        http://perldoc.perl.org/perlreref.html
     5. Perl Functions A–Z (multiple unknown authors)
        http://perldoc.perl.org/index-functions.html



Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians             76
References and Further Reading (cont’d)


     1. Rex A. Dwyer: Genomic Perl: From Bioinformatics Basics to
        Working Code. Cambridge University Press, 2003.
     2. Martin Kästner: Perl fürs Web. Galileo Press, Bonn, 2003.




Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians   77

Weitere ähnliche Inhalte

Was ist angesagt?

Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchAndrew Lowe
 
Python scripting kick off
Python scripting kick offPython scripting kick off
Python scripting kick offAndrea Gangemi
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way PresentationAmira ElSharkawy
 
Python Advanced – Building on the foundation
Python Advanced – Building on the foundationPython Advanced – Building on the foundation
Python Advanced – Building on the foundationKevlin Henney
 
Python for Linux System Administration
Python for Linux System AdministrationPython for Linux System Administration
Python for Linux System Administrationvceder
 
1. python programming
1. python programming1. python programming
1. python programmingsreeLekha51
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in pythonKarin Lagesen
 
Python Interview Questions For Freshers
Python Interview Questions For FreshersPython Interview Questions For Freshers
Python Interview Questions For Fresherszynofustechnology
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational BiologyAtreyiB
 
Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)Pedro Rodrigues
 
Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...
Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...
Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...Ranel Padon
 

Was ist angesagt? (19)

Python Session - 3
Python Session - 3Python Session - 3
Python Session - 3
 
Python basics
Python basicsPython basics
Python basics
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Python scripting kick off
Python scripting kick offPython scripting kick off
Python scripting kick off
 
Learn Python The Hard Way Presentation
Learn Python The Hard Way PresentationLearn Python The Hard Way Presentation
Learn Python The Hard Way Presentation
 
Python revision tour i
Python revision tour iPython revision tour i
Python revision tour i
 
Python Advanced – Building on the foundation
Python Advanced – Building on the foundationPython Advanced – Building on the foundation
Python Advanced – Building on the foundation
 
python.ppt
python.pptpython.ppt
python.ppt
 
C language updated
C language updatedC language updated
C language updated
 
Python for Linux System Administration
Python for Linux System AdministrationPython for Linux System Administration
Python for Linux System Administration
 
Python and You Series
Python and You SeriesPython and You Series
Python and You Series
 
Functions in python
Functions in pythonFunctions in python
Functions in python
 
Python Presentation
Python PresentationPython Presentation
Python Presentation
 
1. python programming
1. python programming1. python programming
1. python programming
 
Functions and modules in python
Functions and modules in pythonFunctions and modules in python
Functions and modules in python
 
Python Interview Questions For Freshers
Python Interview Questions For FreshersPython Interview Questions For Freshers
Python Interview Questions For Freshers
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)Introduction to the basics of Python programming (part 3)
Introduction to the basics of Python programming (part 3)
 
Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...
Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...
Python Programming - IV. Program Components (Functions, Classes, Modules, Pac...
 

Ähnlich wie Perl-Tutorial

Ähnlich wie Perl-Tutorial (20)

Perl Basics with Examples
Perl Basics with ExamplesPerl Basics with Examples
Perl Basics with Examples
 
Perl Reference.ppt
Perl Reference.pptPerl Reference.ppt
Perl Reference.ppt
 
presentation_intro_to_python
presentation_intro_to_pythonpresentation_intro_to_python
presentation_intro_to_python
 
presentation_intro_to_python_1462930390_181219.ppt
presentation_intro_to_python_1462930390_181219.pptpresentation_intro_to_python_1462930390_181219.ppt
presentation_intro_to_python_1462930390_181219.ppt
 
Theperlreview
TheperlreviewTheperlreview
Theperlreview
 
Python
PythonPython
Python
 
Perl
PerlPerl
Perl
 
Perl programming language
Perl programming languagePerl programming language
Perl programming language
 
CPPDS Slide.pdf
CPPDS Slide.pdfCPPDS Slide.pdf
CPPDS Slide.pdf
 
python and perl
python and perlpython and perl
python and perl
 
Basic of Python- Hands on Session
Basic of Python- Hands on SessionBasic of Python- Hands on Session
Basic of Python- Hands on Session
 
The Bund language
The Bund languageThe Bund language
The Bund language
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 
Mastering Python lesson 5a_lists_list_operations
Mastering Python lesson 5a_lists_list_operationsMastering Python lesson 5a_lists_list_operations
Mastering Python lesson 5a_lists_list_operations
 
Python 3.x quick syntax guide
Python 3.x quick syntax guidePython 3.x quick syntax guide
Python 3.x quick syntax guide
 
PYTHON 101.pptx
PYTHON 101.pptxPYTHON 101.pptx
PYTHON 101.pptx
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
Erlang Message Passing Concurrency, For The Win
Erlang  Message  Passing  Concurrency,  For  The  WinErlang  Message  Passing  Concurrency,  For  The  Win
Erlang Message Passing Concurrency, For The Win
 
Ch-8.pdf
Ch-8.pdfCh-8.pdf
Ch-8.pdf
 
Erlang, an overview
Erlang, an overviewErlang, an overview
Erlang, an overview
 

Mehr von tutorialsruby

&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />tutorialsruby
 
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>tutorialsruby
 
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>tutorialsruby
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />tutorialsruby
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />tutorialsruby
 
Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0tutorialsruby
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269tutorialsruby
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269tutorialsruby
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008tutorialsruby
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008tutorialsruby
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheetstutorialsruby
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheetstutorialsruby
 

Mehr von tutorialsruby (20)

&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />
 
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
TopStyle Help &amp; &lt;b>Tutorial&lt;/b>
 
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
The Art Institute of Atlanta IMD 210 Fundamentals of Scripting &lt;b>...&lt;/b>
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />
 
&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />&lt;img src="../i/r_14.png" />
&lt;img src="../i/r_14.png" />
 
Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0Standardization and Knowledge Transfer – INS0
Standardization and Knowledge Transfer – INS0
 
xhtml_basics
xhtml_basicsxhtml_basics
xhtml_basics
 
xhtml_basics
xhtml_basicsxhtml_basics
xhtml_basics
 
xhtml-documentation
xhtml-documentationxhtml-documentation
xhtml-documentation
 
xhtml-documentation
xhtml-documentationxhtml-documentation
xhtml-documentation
 
CSS
CSSCSS
CSS
 
CSS
CSSCSS
CSS
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
 
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa0602690047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
0047ecaa6ea3e9ac0a13a2fe96f4de3bfd515c88f5d90c1fae79b956363d7f02c7fa060269
 
HowTo_CSS
HowTo_CSSHowTo_CSS
HowTo_CSS
 
HowTo_CSS
HowTo_CSSHowTo_CSS
HowTo_CSS
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008
 
BloggingWithStyle_2008
BloggingWithStyle_2008BloggingWithStyle_2008
BloggingWithStyle_2008
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheets
 
cascadingstylesheets
cascadingstylesheetscascadingstylesheets
cascadingstylesheets
 

Kürzlich hochgeladen

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Kürzlich hochgeladen (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Perl-Tutorial

  • 1. Institute of Bioinformatics Johannes Kepler University Linz Perl A Short Introduction for Bioinformaticians
  • 2. What is Perl? Perl is a simple and easy-to-use programming language Perl is an interpreted (scripting) language Perl is (almost) platform-independent Perl is free of charge Perl is a common standard in bioinformatics, language pro- cessing, and Web programming Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 2
  • 3. Perl’s Advantages Platform-independent Free of charge Only minor hardware and software requirements Powerful elements for string processing (regular expressions) and hash tables allow concise algorithms Quick and easy solutions are possible Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 3
  • 4. Perl’s Disadvantages Richness and compactness of language facilitates difficult-to- read programs No stand-alone programs without additional software Slower than compiled code Perl is sometimes wasteful with memory Perl’s built-in strings, lists, and hash tables sometimes hide po- tential performance problems Therefore, Perl cannot handle as large problems as some other programming languages Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 4
  • 5. Use Perl for. . . Small rapid prototyping solutions Applications that require nifty string processing Small to medium-sized problems Applications with moderate performance and memory require- ments Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 5
  • 6. Do Not Use Perl for. . . Large software projects (performance, stability, quality, main- tainability) Applications in which performance and memory consumption are most important factors Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 6
  • 7. Why Perl in Bioinformatics String processing capabilities and hash tables Easy, also for biologists and other people outside computer sci- ence Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 7
  • 8. What Else is Perl Used For? Very common in Web programming (offers very good database and networking integration) Perl can also serve as a more powerful replacement of UNIX shell scripts or DOS/Windows batch files In this introduction, we concentrate on standard elements that will be necessary for bioinformatics applications Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 8
  • 9. Setting Up Your Perl System What you need: (1) Perl software, (2) text editor On UNIX/Linux systems, probably everything is pre-installed Windows/MacOS Get ActivePerl and install it: http://www.activestate.com/Products/ ActivePerl/ Choose your favorite text editor (e.g. UltraEdit, TextPad, XEmacs) Make sure that perl is in your default search path Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 9
  • 10. Documentation Online: http://www.perl.org/docs.html http://perldoc.perl.org/perlintro.pdf http://perldoc.perl.org/index-tutorials.html http://perldoc.perl.org/index-functions.html Program perldoc that is part of every Perl system Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 10
  • 11. First Steps Comments start with # For compatibility with UNIX/Linux systems, it is advisable to start the program with the so-called She’Bang line: #!/usr/bin/perl For better checking, you are advised to add the following two lines to your program (right after the She’Bang line): use strict; use warnings; Statements need to be closed with a semicolon, whitespaces are irrelevant In the simplest case, terminal output is issued with the print com- mand Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 11
  • 12. Data — Variables There are three basic types of variables in Perl. Unlike other pro- gramming languages, the type is identified with a special character that prefixes the variable name: Scalars: prefixed with “$” e.g. $num Arrays: prefixed with “@” e.g. @list Hashes: prefixed with ‘%” e.g. %hashtable If using use strict;, variables have to be declared before they are used. This is done with my. The most basic operator is the assignment operator =. It copies the content of one variable into the other (possibly with a conversion). Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 12
  • 13. Scalars Among scalars, there is no specific kind of type checking! Scalars can have three different kinds of meanings: 1. Numbers 2. Strings 3. References Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 13
  • 14. Numbers There is no explicit distinction between integer and floating point numbers, this is handled implicitly Examples: $num = 3; $pi = 3.141592654; $mio = 1.e6; Arithmetic operators: +, −, ∗, /, %, may be used in conjunction with assignments, i.e. + =, − =, ∗ =, / =, % = Increment/decrement operators: ++, −− Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 14
  • 15. String Constants Double quotes or single quotes may be used around strings: ’Hello world’ "Hello world" The difference is that strings with single quotes are taken lit- erally. Strings with double quotes are interpreted, i.e. variable names and special characters are translated first Common operator: concatenation operator ., may also be used in conjunction with assignments .= Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 15
  • 16. Arrays Arrays need not have a predefined size Assignment operators work on individual elements (scalars) or with whole lists, e.g. @list = ("Zicke", "Zacke", 123); The index starts with 0 Memory allocation is done on demand; Be aware: when you first create the fifth element, the whole list from the first to the fifth element is allocated (i.e. elements 0. . . 4) Accessing single elements: $elem = $list[2]; Accessing multiple elements is also possible, e.g. @list[0,1], @list[1..3] Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 16
  • 17. Arrays (cont’d) Assignments also work in the following way: ($firstelem, @remaining) = @list; Special operators: Assignment to scalar gives number of elements, e.g. $no = @list; Index of last element: $index = $#list; $elem = $list[$#list]; Note that lists are always flattened, e.g. @list = ("a", "b", ("c", "d")); is the same as @list = ("a", "b", "c", "d"); Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 17
  • 18. Arrays: Stacks and Queues Special functions for adding/removing elements from arrays: push appends (a) new element(s) at the end of the array, e.g. push(@list, "tralala"); push(@list, ("hui", 1)); pop returns the last element and removes it from the list, e.g. $elem = pop(@list); shift returns the first element and removes it from the $elem = shift(@list); unshift inserts (a) new element(s) at the beginning of an array unshift(@list, "tralala"); unshift(@list, ("hui", 1)); “Killing” an array: @list = (); Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 18
  • 19. Arrays: The splice Function The splice function allows to remove a part of an array or to replace it by another list. Example with four arguments: my @list = ("u", "v", "w", "x", "y", "z", "0", "1", "2"); splice(@list, 3, 4, ("a", "b")); removes the four elements from no. 3 on (i.e. elements [3..6]) and replaces them by two elements, "a" and "b". Finally, @list is (“u”, “v”, “w”, “a”, “b”, “1”, “2”). Example with three arguments: my @list = ("u", "v", "w", "x", "y", "z", "0", "1", "2"); splice(@list, 3, 4); removes the four elements from no. 3 onwards. Finally, @list is (“u”, “v”, “w”, “1”, “2”). Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 19
  • 20. Arrays: The splice Function (cont’d) Example with two arguments: my @list = ("u", "v", "w", "x", "y", "z", "0", "1", "2"); splice(@list, 3); removes all elements from no. 3 onwards. Finally, @list is (“u”, “v”, “w”). A negative offset −i as second argument means to start from the i-th to the last element. See also http://perldoc.perl.org/functions/splice.html Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 20
  • 21. Hashes Like arrays, hashes are collections of scalars, however, with the dif- ference that they are not ordered; individual elements can be ac- cessed by arbitrary scalars, so-called keys Assignment operators work on individual elements (scalars), e.g., $color{"apple"} = "red"; or with whole lists, e.g. %color = ("apple", "red", "banana", "yellow"); which is equivalent to the more readable %color = (apple => "red", banana => "yellow"); Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 21
  • 22. Hashes (cont’d) Memory allocation is done on demand Special functions: List of keys: @keylist = keys %color; List of values: @vallist = values %color; Deleting a hash entry: delete $color{’apple’}; Checking whether a hash entry exists: exists $color{’apple’} Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 22
  • 23. Control Structures: if and unless Single if: if ( expression ) { ... } unless (expression negated): unless ( expression ) { ... } Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 23
  • 24. Control Structures: if and unless (cont’d) Note that the curly braces are obligatory; conditional statements, however, can also be written in single lines: ... if ( expression ); ... unless ( expression ); Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 24
  • 25. Control Structures: if/else if/else if ( expression ) { ... } else { ... } Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 25
  • 26. Control Structures: if/elsif/else if/elsif/else: if ( expression1 ) { ... } elsif ( expression2 ) { ... } else { ... } Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 26
  • 27. Control Structures: while and until while: while ( expression ) { ... } until (expression negated): until ( expression ) { ... } Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 27
  • 28. Control Structures: while and until (cont’d) Note that the curly braces are obligatory; while and until loops, how- ever, can also be written in single lines: ... while ( expression ); ... until ( expression ); Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 28
  • 29. Control Structures: for and foreach (1/3) for (the same as in C): for ( init ; condition ; increment ) { ... } foreach: foreach $variable (@list) { ... } Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 29
  • 30. Control Structures: for and foreach (2/3) foreach with fixed list: foreach $key ("tralala", 3, 2, 1, 0) { ... } Simple repetions: foreach (1..15) { ... } Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 30
  • 31. Control Structures: for and foreach (3/3) Note that the curly braces are obligatory; foreach loops, how- ever, can also be written in single lines: ... foreach $variable (@list)); Not allowed for for loops! Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 31
  • 32. Control Structures: Miscellaneous next stops execution of loop body and moves to the condition again (like continue in C) last stops execution of loop (like break in C) redo stops execution of loop body and executes the loop body again without checking the condition again The previous three commands apply to the innermost loop by default; this can be altered using labels A goto statement is available as well (deprecated!), also next and last can be abused in similar ways A continue block can be added to while, until and foreach loops Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 32
  • 33. Expressions Logical operators: &&, ||, ! Numeric comparison: ==, !=, <, >, <=, >= String comparison: eq, ne, lt, gt, le, ge Truth and falsehood: 0, ’0’ , ”, () and undef are interpreted as false if they are the result of the evaluation of an expression, all other values are interpreted as true Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 33
  • 34. Special Variables $_: default input; many (if not most) built-in Perl functions use this vari- able as default input if no argument is specified; also used as default variable in foreach loops @_: list of input arguments in sub-routines (see later) @ARGV: list of command line arguments; NOTE: unlike in C, $ARGV[0] is the first command line argument, not the program name $0: the name of the program being executed (like argv[0] in C) $1, $2, . . . : pattern matches (see later) Note that there are a whole lot more special variables, but they are not important for us at this point. Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 34
  • 35. Control Structures: sub-routines Subroutines are simply written as follows: sub name { ... } Recursions are allowed Note that there is no typechecking, not even the number of argu- ments needs to be fixed Arguments are passed through the special list @_ Return values are optional and need to be passed with return Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 35
  • 36. Control Structures: sub-routines (cont’d) Calls of sub-routines can be prefixed with & Arguments need to be separated by commas and can (but need not!) be embraced with parentheses; however, the use of parentheses is highly recommended Like in C, Perl uses call by value Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 36
  • 37. Control Structures: sub-routines example #!/usr/bin/perl -w use strict; sub divide { my ($enumerator,$denominator) = @_; return $enumerator/$denominator; } ... my $quotient = &divide(4, 2); ... Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 37
  • 38. Variable Scoping Declaring/using a variable in the main program outside any block cre- ates a global variable Declarations inside a sub-routine create a local variable Declaring/using a variable in a block creates a temporary variable Use use strict to enforce declarations Try to use temporary/local variables where possible and avoid the use of global variables where possible Local variables are destroyed/de-allocated as soon as the execution of their scope (block/sub-routine) is finished (with the only exception if there are references to this local variable) Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 38
  • 39. Contexts In Perl, contexts determine how certain expressions are interpreted Scalar contexts: Numerical context String context Boolean context List context Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 39
  • 40. References A reference is a scalar that “points” to a certain variable (scalar, list or hash) A reference is created by prefixing a variable with a backslash “”, e.g. $ref_to_scalar = $scalar; $ref_to_array = @array; $ref_to_hash = %hash; De-referencing, i.e. getting back the value, is done by prefixing the reference with the appropriate prefix, e.g. $scalar = ${$ref_to_scalar}; @array = @{$ref_to_array}; %hash = %{$ref_to_hash}; Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 40
  • 41. References (cont’d) The curly braces may also be omitted when de-referencing References are useful to efficiently pass large arguments to sub- routines Thereby, call by reference is realized Accessing items in de-referenced arrays and hashes may be clumsy, therefore ${$ref_to_array}[2] and ${$ref_to_hash}{"key"} may also be written as $ref_to_array->[2] and $ref_to_hash->{"key"} Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 41
  • 42. References (cont’d) The curly braces may also be omitted when de-referencing References are useful to efficiently pass large arguments to sub- routines Thereby, call by reference is realized Accessing items in de-referenced arrays and hashes may be clumsy, therefore ${$ref_to_array}[2] and ${$ref_to_hash}{"key"} may also be written as $ref_to_array->[2] and $ref_to_hash->{"key"} Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 42
  • 43. Matrices Brackets create a reference to an anonymous array; by this trick, matrices can be realized easily, e.g. $array[0] = [1, 2, 3]; $array[1] = [4, 5, 6]; $array[2] = [7, 8, 9]; print "$array[1][2]"; # prints 6 More easily: @array = ([1, 2, 3], [4, 5, 6], [7, 8, 9]); This also works without a one-for-all assignment, only by assigning values to individual elements Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 43
  • 44. Hashes of Hashes Curly braces create a reference to an anonymous hash; by this trick, hashes of hashes can be realized relatively easily, e.g. %hash = (j => {a=>1, b=>2, c=>3}, k => {d=>6, e=>5, f=>10}); This also works without a one-for-all assignment, only by assigning values to individual elements, e.g. $hash{j}{a} = 1; $hash{k}{e} = 5; Note that, in such a case, $hash{j} is not a hash, but a reference; therefore, something like keys $hash{j} will not work. To get the keys of the entries in the second level, one has to use something like keys %{$hash{j}} Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 44
  • 45. A Note on the Lifetime of Local Variables Usually, local variables are destroyed/de-allocated as soon as the execution of their scope (block/sub-routine) is finished This means that references to local variables would point to nirvana after their scope’s execution is finished That is not how it works; instead, Perl uses a reference counter for each variable. A local variable remains existing until there is no reference pointing to it anymore. Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 45
  • 46. A Note on the Lifetime of Local Variables (cont’d) Consider the following example: my @matrix; for(my $i = 0; $i < 10; $i++) { my @line = (0..9); $matrix[$i] = @line; } The full 10 × 10 matrix can be used safely, even if the array’s lines are lists created as local variables. Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 46
  • 47. File/Console IO (1/5) Similar to other programming languages, Perl uses file handles The following file handles are open and usable by default: STDIN, STDOUT, STDERR The open function is used to open a file handle: open(INPUT, "< $filename1"); # open for reading open(OUTPUT, "> $filename2"); # write new content open(LOGFILE, ">> $filename2"); # append open returns 0 on failure and a non-zero value on success Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 47
  • 48. File/Console IO (2/5) File handles are closed with close, e.g. close(INPUT); close(OUTPUT); close(LOGFILE); close returns 0 on failure and a non-zero value on success open and close can also be used to handle pipes, e.g. open(INPUT, "ls -al |"); # get directory listing open(OUTPUT, "| more"); # list output by page Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 48
  • 49. File/Console IO (3/5) The print function can be used to write to a file, e.g. print OUTPUT "Tralala"; Note the missing comma after the file handle! In scalar context, the input operator <> reads one line from the spec- ified file handle, e.g. $line = <INPUT>; reads one line from the file handle INPUT, whereas $line = <STDIN>; reads one line from the console input. Note that, in the above examples, $line still contains the trailing newline character; it can be removed with the chomp function, e.g. chomp $line; Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 49
  • 50. File/Console IO (4/5) In list context, the input operator <> reads the whole file at once, e.g. @whole = <INPUT>; reads the whole file INPUT line by line, where each line is one item in the list @whole is one line All lines in @whole still contain the the trailing newline characters; they can be removed with the chomp function, e.g. chomp @whole; removes the trailing newline character from all lines in @whole This way of reading a file may be very comfortable. Note, however, that it requires reading the whole file into the memory at once, which may be infeasible for larger files Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 50
  • 51. File/Console IO (5/5) If the input operator <> is used without specifying a variable, the next line is placed in the default variable $_ Of course, $_ still contains the trailing newline character; it can be removed with the chomp function, this time without any arguments (because chomp takes $_ by default anyway) Example of a program fragment that reads input from the console input line by line: while (<STDIN>) { ... # $_ contains last input line with newline chomp; ... # $_ contains last input line without newline } Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 51
  • 52. Regular Expressions: Basics (1/3) Regular expressions are an amazingly powerful tool for string pattern matching and replacement Regular expressions are usual in some other software products (par- ticularly in the UNIX world), but Perl offers one of the most advanced and powerful variants Simple regular expressions are embraced with slashes Regular expression are most often used in conjunction with the two operators =˜ and !˜; the former checks whether a pattern is found and the latter checks whether the pattern is not present, e.g. (’Hello world’ =˜ /Hell/) # evaluates to true (’Hello world’ !˜ /Hell/) # evaluates to false Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 52
  • 53. Regular Expressions: Basics (2/3) The examples on the previous slide show how to search for certain constant string parts (“Hell” in these examples) The search pattern can also (partly) be a variable Search patterns, however, need not be string constants; there is a host of meta-characters for constructing more advanced searches: {}[]()ˆ$.|*+? Using meta-characters as ordinary search expressions re- quires prefixing them with a backslash Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 53
  • 54. Regular Expressions: Basics (3/3) A regular expression can also be written synonymously as the search operator m//, e.g. /world/ and m/world/ mean the same thing. The m// operator has the advantage that other characters for embracing the regular expression can also be used (instead of only slashes), e.g. m!!, m{} (note the braces!) Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 54
  • 55. Regular Expressions: Matching Beginnings and Ends The ˆ meta-character is used to require that the string begins with the search pattern, e.g. (’Hello world’ =˜ /ˆHell/) # matches (’Hello world’ =˜ /ˆworld/) # does not match The $ meta-character is used to require that the string ends with the search pattern, e.g. (’Hello world’ =˜ /Hell$/) # does not match (’Hello world’ =˜ /rld$/) # matches The two meta-characters ˆ and $ can also be used together Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 55
  • 56. Regular Expressions: Character Classes Character classes can be defined between brackets; as a simple ex- ample, /[bcr]at/ matches “bat”, “cat” and “rat” Character classes may also be ranges in the present character cod- ing (e.g. ASCII), e.g. /index[0-2]/ matches “index0”, “index1” and “index2”; /[0-9a-fA-F]/ matches a hexadecimal digit Using the meta-character ˆ in the first place of a character class means negation, e.g. /[ˆ0-9]/ matches all non-numerical charac- ters Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 56
  • 57. Regular Expressions: Predefined Character Classes d: numerical character, short for [0-9] D: non-numerical character, short for [ˆ0-9], equivalently [ˆd] s: whitespace character, short for [ trnf] S: non-whitespace character, short for [ˆ trnf], equivalently [ˆs] w: word character (alphanumeric or _), short for [0-9a-zA-Z_] W: non-word character, short for [ˆ0-9a-zA-Z_], equivalently [ˆw] The period . matches every character except the newline character n b: matches a boundary between a word and a non-word character or be- tween a non-word and word character; this is useful for checking whether a match occurs at the beginning or end of a word Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 57
  • 58. Regular Expressions: Variants and Grouping The stroke character | is used as the so-called alternation meta- character, e.g. /dog|cat|rat/ matches “dog”, “cat” and “rat”, and so does /dog|[cr]at/ Parentheses that serve as so-called grouping meta-characters can be used for grouping alternatives in parts of the search pattern, e.g. /house(dog|cat)/ matches “housedog” and “housecat” Alternatives may also be empty, e.g. /house(cat|)/ matches “housecat” and “house” Groupings may also be nested, e.g. /house(cat(s|)|)/ matches “housecats”, “housecat” and “house” Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 58
  • 59. Regular Expressions: Quantifiers So far, we were only able to search only for patterns with a relatively fixed structure, but we were not able to deal with repetitions in a flex- ible way; that is what quantifiers are good for The following quantifiers are available: ? match 1 or 0 times * match any number of times + match at least once {n,m} match at least n and at most m times {n,} match at least n times {n} match exactly n times Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 59
  • 60. Regular Expressions: Quantifiers (cont’d) Examples: /0x[0-9a-fA-F]+/ matches hexadecimal numbers /[-+]?d+/ matches integer numbers /[-+]?d+.d+/ matches numbers /d{2}.d{2}.d{4}/ matches dates in DD.MM.YYYY format Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 60
  • 61. Regular Expressions: Extracting Matches (1/3) Parentheses also serve for a different purpose: they allow extracting the relevant part of the string that matched; for that purpose, the special variables $1, $2, etc. are employed More specifically, Perl seeks the first match in the string and puts those parts into the special variables $1, $2, etc. that match Example: after evaluating (’Hello you!’ =˜ /Hello (world|you.)/) $1 has the value “you!” Another example: after evaluating (’AGCTTATATGCATATATAT’ =˜ /T(.T|.A)T(.T|A)/) $1 has the value “TA” and $2 has the value “AT” Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 61
  • 62. Regular Expressions: Extracting Matches (2/3) Assigning an evaluation of a regular expression to a list stores the matching parts in the list (i.e. the regular expression is evaluated in list context) Example: after @list = (’Hello you!’ =˜ /Hello (world|you.)/) @list has the value (“you!”) and after @list = (’AGCTTATATGCATATATAT’ =˜ /T(.T|.A)T(.T|A)/) @list has the value (“TA”, “AT”) Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 62
  • 63. Regular Expressions: Extracting Matches (3/3) Note that the extraction of matches discussed until now only con- cerns the first match of the regular expression in the string — ex- tracting all matches is a different story (see later)! Example: after $string = ’red = 0xFF0000, blue = 0x0000FF’; @list = ($string =˜ /0x([0-9a-fA-F]{6})/) @list will have the value (“FF0000”), but there is presently no way to access the second match Further note that quantifiers are greedy in the sense that they try to match as many items as possible; example: after (’<B>My homepage</B>’ =˜ /<(.*)>/) $1 has the value “B>My homepage</B” Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 63
  • 64. Regular Expressions: Quantifiers Revisited Suffixing quantifiers by “?” makes them non-greedy ?? match 1 or 0 times, try 0 first *? match any number of times, but as few times as possible +? match at least once, but as few times as possible {n,m}? match at least n and at most m times, but as few times as possible {n,}? match at least n times, but as few times as possible {n}? is allowed, but obviously it has the same meaning as {n} Example: after (’<B>My homepage</B>’ =˜ /<(.*?)>/) $1 has the value “B” Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 64
  • 65. Regular Expressions: Modifiers A modifier can be appended to a regular expression to alter its inter- pretation; the following are the most important modifiers: i case-insensitive matching m treat string as collection of individual lines; interpret n as newline character, ˆ and $ match individual lines s treat string as one line; . meta-character also matches n x whitespaces and comments inside regular expression are not in- terpreted g allow extraction of all occurrences (see later) Example: /a/i matches all strings containing ‘a’ or ‘A’ Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 65
  • 66. Regular Expressions: Extracting All Matches (1/3) If the g modifier is specified, Perl internally keeps track of the string position If a regular expression is applied to the same string again, the search starts at that position where the last search stopped This can be repeated until the last search has been found Before the regular expression is evaluated first, the string position is undef If the string is changed between the regular expression is applied, the position is also set to undef In scalar context, a regular expression with g modifier returns false if the pattern has not been found and true if it has been found Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 66
  • 67. Regular Expressions: Extracting All Matches (2/3) Example: while ($string =˜ /0x([0-9a-fA-F]+)/g) { print "$1n"; } This loop extracts all hecadecimal numbers from $string and prints them line by line without the “0x” prefix For special purposes, the actual position can be determined with the function pos; however, note that, after each occurrence, the position points to the next character after the previous match (if counting of characters starts at 0)! Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 67
  • 68. Regular Expressions: Extracting All Matches (3/3) In list context, the g modifier can be used to extract all matches and store them in a list If there are no groupings, all matches of the whole regular expression are placed in the list If there are nested groupings, all matches of all groupings are placed in the list in the order in which they are matched, where matches of outer groups precede matches of inner groups Example: after @list = ("AGC GAT TGA GAG" =˜ /(G(A(T|G)))/g); @list has the value (“GAT”, “AT”, “T”, ‘GAG”, “AG”, “G”) Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 68
  • 69. Regular Expressions: Replacements Regular expressions also facilitate powerful replacement mechanisms; this is accomplished with the s/// operator Example: after $sequence = "AGCGTAGTATAGAG"; $sequence =˜ s/T/U/; $sequence has the value “AGCGUAGTATAGAG” The s/// operator returns the number of replacements Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 69
  • 70. Regular Expressions: Replacements (cont’d) Modifiers as introduced above work analogously; not surprisingly, the g modifiers allows to replace all occurrences of the search string, e.g. after $sequence = "AGCGTAGTATAGAG"; $sequence =˜ s/T/U/g; $sequence has the value “AGCGUAGUAUAGAG” The special variables $1, $2, $3, etc. allow very tricky replacements, e.g. with s/(d{2}).(d{2}).(d{4})/$3-$2-$1/g one can convert all dates from DD.MM.YYYY format to YYYY-MM- DD format Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 70
  • 71. Regular Expressions: Transliteral Replacements The operators tr/// and y/// (both are equivalent) are available to perform so-called transliteral replacements, i.e. the translation of single characters according to a replacement list Example: tr/AB/BC/, at once, replaces all “A”s with “B”s and all “B”s with “C”s Note that this functionality cannot be realized easily with the replace- ment operator s/// It is also possible to specify ranges, e.g. tr/a-f/0-5/ Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 71
  • 72. Regular Expressions: The split Function The highly useful split function can be used to split a string into parts that are separated by certain characters or patterns; it returns a list of split strings Example: after @fragments = split(/TGA/, "GCATGACGATGATATA"); @fragments has the value (“GCA”, “CGA”, “TATA”) As obvious from the above example, the first argument is a regular expression at which the string is split; note that the split pattern is omitted in the split list Nor surprisingly, there is no restriction to fixed search patterns, e.g. split(/s+/, $string); splits $string into single words Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 72
  • 73. Regular Expressions: The split Function (cont’d) The split function has an optional third argument with which the max- imal number of splits can be controlled Example: after @fragments = split(/TGA/, "GCATGACGATGATATA", 2); @fragments has the value (“GCA”, “CGATGATATA”) The join function is the converse function, i.e. it assembles a list of strings into one large string, where a separating character can be inserted; e.g. after @list = ("tri", "tra", "tralala"); $string = join(’:’, @list); $string has the value “tri:tra:tralala”. Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 73
  • 74. Useful Functions Not Previously Mentioned abs, atan2, cos, exp, log, sin, sqrt: usual mathematical functions int: converts a floating point number to an integer number (truncates!) printf: allows more flexibility for formatted output than print length: get the length of a string reverse: reverse a string or array sort: sort an array system: run external program time: get system time (mostly the number of seconds since Jan 1, 1970) localtime: convert system time into the actual local time Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 74
  • 75. Final Remarks The present slides are only an overview tutorial that concen- trates on elements of Perl that are useful for small bioinformat- ics applications Perl actually offers a lot more possibilities For more details, discover the world of Perl at http://www.perl.org/ Theory is good, but not sufficient — a programming language can only be learned by experience Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 75
  • 76. References and Further Reading 1. Robert Kirrily: perlintro — a brief introduction and overview of Perl http://perldoc.perl.org/perlintro.html 2. perlsyn — Perl syntax (author unknown) http://perldoc.perl.org/perlsyn.html 3. Mark Kvale: perlretut — Perl regular expressions tutorial http://perldoc.perl.org/perlretut.html 4. Ian Truskett: perlreref — Perl regular expressions reference http://perldoc.perl.org/perlreref.html 5. Perl Functions A–Z (multiple unknown authors) http://perldoc.perl.org/index-functions.html Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 76
  • 77. References and Further Reading (cont’d) 1. Rex A. Dwyer: Genomic Perl: From Bioinformatics Basics to Working Code. Cambridge University Press, 2003. 2. Martin Kästner: Perl fürs Web. Galileo Press, Bonn, 2003. Ulrich Bodenhofer. Perl: A Short Introduction for Bioinformaticians 77