SlideShare ist ein Scribd-Unternehmen logo
1 von 62
Downloaden Sie, um offline zu lesen
Introduction to Perl/BioPerl
Presented by: Jennifer Dommer, Vivek Gopalan
Bioinformatics Developer
Bioinformatics and Computational Biosciences Branch
National Institute of Allergy and Infectious Diseases
Office of Cyber Infrastructure and Computational Biology
Who Are We?
•  Bioinformatics and Computational
Biology Branch (BCBB)
•  NIH/NIAID/OD/OSMO/OCICB/BCBB
•  group of 28 people
•  http://bioinformatics.niaid.nih.gov
•  scienceApps@niaid.nih.gov
Outline
•  Introduction
•  Perl programming principles
o Variables
o Flow controls/Loops
o File manipulation
o Regular expressions
•  BioPerl
o Introduction
o SeqIO
o SearchIO
Introduction
•  PERL – Practical Extraction and Report Language
•  An interpreted programming language created in
1987 by Larry Wall
•  Good at processing and transforming plain text,
like GenBank or PDB files
•  Not strongly typed – Variables don’t require a type
and are not required to be declared in advance.
You can’t do this in C- or Java-like languages.
•  Extensible – currently has an large and active user
base who are constantly adding new functional
libraries
•  Portable– can use in Windows, Mac, & Linux/Unix
Introduction
"Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing,
summarizing and otherwise mangling text. Although the biological sciences
do involve a good deal of numeric analysis now, most of the primary data is
still text: clone names, annotations, comments, bibliographic references.
Even DNA sequences are textlike. Interconverting incompatible data
formats is a matter of text mangling combined with some creative
guesswork. Perl's powerful regular expression matching and string
manipulation operators simplify this job in a way that isn't equalled by any
other modern language."
Getting Perl
•  Latest version – 5.12.3
•  http://www.perl.org/
Getting Help
• perl –v
•  Perl manual pages
•  Books and Documentation:
•  http://www.perl.org/docs.html
•  The O’Reilly Books: Programming Perl, etc.
•  http://www.cpan.org
•  http://perldoc.perl.org/perlintro.html
•  BCBB – for help writing your custom scripts
perldoc perl
perldoc perlintro
"Hello world" script
•  hello_world.pl file
#!/usr/bin/perl
# This is a comment
print "Hello worldn";
The shebang line must be the first line.
It tells the computer where to find Perl.
•  print is a Perl function name
•  Double quotes are used for Strings
•  The semi-colon must be present at the end of
every command in Perl
"Hello world" script
•  hello_world.pl file
•  Run hello_world.pl
#!/usr/bin/perl
# This is a comment
print "Hello worldn";
>perl hello_world.pl
Hello world
The shebang line must be the first line.
It tells the computer where to find Perl.
•  print is a Perl function name
•  Double quotes are used for Strings
•  The semi-colon must be present at the end of
every command in Perl
"Hello world" script
•  hello_world.pl file
•  Run hello_world.pl
#!/usr/bin/perl
# This is a comment
print "Hello worldn";
>perl hello_world.pl
Hello world
>perl -e 'print "Hello worldn”;'
Hello world
The shebang line must be the first line.
It tells the computer where to find Perl.
•  print is a Perl function name
•  Double quotes are used for Strings
•  The semi-colon must be present at the end of
every command in Perl
Basic Programming Concepts
•  Variables
•  Scalars
•  Arrays
•  Hashes
•  Flow Control
•  if/else
•  unless
•  Loops
•  for
•  foreach
•  while
•  until
•  Files
•  Regexes
Variables
•  In computer programming, a variable is a symbolic
name used to refer to a value – WikiPedia
o  Examples
•  Variable names can contain letters, digits, and _,
but cannot begin with a digit
  $x = 4;
  $y = 1.0;
  $name = 'Bob';
  $seq = "ACTGTTGTAAGC";
Perl will treat integers and floating
point numbers as numbers, so x and y
can be used together in an equation.
Strings are indicated by either
single or double quotes.
Perl Variables
•  Scalar
•  Array
•  Hash
Variables - Scalar
•  Can store a single string, or number
•  Begins with a $
•  Single or double quotes for strings
  my $x = 4;
  my $name = 'Bob';
  my $seq= "ACTGTTGTAAGC";
  print "My name is $name.";
#prints My name is Bob.
http://perldoc.perl.org/perlintro.html
&& and
|| or
! not
= assignment
. string concatenation
.. range operator
Arithmetic
Numeric Comparison
Boolean Logic
Miscellaneous
eq equality
ne inequality
lt less than
gt greater
le less than or equal
ge greater than or equal
String Comparison
Scalar Operators
== equality
!= inequality
< less than
> greater
<= less than or equal
>= greater than or equal
+ addition
- subtraction
* multiplication
/ division
++ increment (by one)
-- decrement (by one)
+= increment (by value)
-= decrement (by value)
Common Scalar Functions
Function Name Description
length Length of the scalar value
lc Lower case value
uc Upper case value
reverse Returns the value in the opposite order
substr Returns a substring
chomp Removes the last newline (n) character
chop Removes the last character
defined Checks if scalar value exists
split Splits scalar into array
Common Scalar Functions Examples
my $string = "This string has a newline.n";
chomp $string;
print $string;
#prints "This string has a newline.”
@array = split(" ", $string);
#array looks like ["This", "string", "has",
"a", "newline."]
Array
Vivek Jennifer Jason Darrell Qina
0 1 432
•  Stores a list of scalar values (strings or numbers)
•  Zero based index
Variables - Array
•  Begins with @
•  Use the () brackets for creating
•  Use the $ and [] brackets for retrieving a single
element in the array
my @grades = (75, 80, 35);
my @mixnmatch = (5, "A", 4.5);
my @names = ("Bob", "Vivek", "Jane");
# zero-based index
my $first_name = $names[0];
#special variable to retrieve the last item in an array
my $last_name = $names[$#names];
Common Array Functions
Function Name Description
scalar Size of the array
push Add value to end of an array
pop Removes the last element from an array
shift Removes the first element from an array
unshift Add value to the beginning of an array
join Convert array to scalar
splice Removes or replaces specified range of elements from array
grep Search array elements
sort Orders array elements
push/pop
Tim Molly Betty Chris
push(@names, "Charles");
@names =
@names = Tim Molly Betty Chris Charles
pop(@names);
@names = Tim Molly Betty Chris
shift/unshift
Tim Molly Betty Chris
unshift(@names, "Charles");
@names =
@names = Charles Tim Molly Betty Chris
shift(@names);
@names = Tim Molly Betty Chris
Variables - Hashes
KEYS VALUES
Title Programming Perl, 3rd Edition
Publisher O’Reilly Media
ISBN 978-0-596-00027-1
Variables - Hash
•  Stores data using key, value pairs
•  Indicated with %
•  Use the () brackets for creating
•  Use the $ and {} brackets for setting or retrieving
a single element from the hash
my %book_info = (
title =>"Perl for bioinformatics",
author => "James Tisdall",
pages => 270,
price => 40
);
$book_info{"author"};
#returns "James Tisdall"
Variables - Hash
•  Retrieving single value or all the keys/
values
•  NOTE: Keys and values are
unordered
my $book_title = $book_info{"title"};
#refers to "Perl for bioinformatics"
my @book_attributes = keys %book_info;
my @book_attribute_values = values %book_info;
Common Hash Functions
Function Name Description
keys Returns array of keys
values Returns array of values
reverse Converts keys to values in hash
Variables summary
# A. Scalar variable
my $first_name = "vivek";
my $last_name = "gopalan”;
# B. Array variable
# use 'circular' bracket and @ symbol for assignment
my @personal_info = ("vivek", $last_name);
# use 'square' bracket and the integer index to access an entry
my $fname = $personal_info[0];
# C. Hash variable
# use 'circular' brackets (similar to array) and % symbol for
assignment
my %personal_info = (
first_name => "vivek",
last_name => "gopalan"
);
# use 'curly' brackets to access a single entry
my $fname1 = $personal_info{first_name};
Tutorial 1
•  Create a variable with the following sequence:
ILE GLY GLY ASN ALA GLN ALA THR ALA ALA ASN SER ILE
ALA LEU GLY SER GLY ALA THR THR
•  print in lowercase
•  split into an array
•  print the array
•  print the first value in the array
•  shift the first value off the array and store it in a variable
•  print the variable and the array
•  push the variable onto the end of the array
•  print the array
Basic Programming Concepts
•  Variables
•  Scalars
•  Arrays
•  Hashes
•  Flow Control
•  if/else
•  unless
•  Loops
•  for
•  foreach
•  while
•  until
•  Files
•  Regexes
Flow Controls
•  If/elsif/else
•  unless
  $x = 4;
  if ($x > 4) {
  print "I am greater than 4";
  }elsif ($x == 4) {
  print "I am equal to 4";
  }else {
  print "I am less than 4";
  }
  unless($x > 4) {
  print "I am not greater than 4";
  }
Post-condition
# the traditional way
if ($x == 4) {
print "I am 4.";
}
# this line is equivalent to the if
statement above, but you can only use
it if you have a one line action
print "I am 4." if ( $x == 4 );
print "I am not 4." unless ( $x == 4);
Basic Programming Concepts
•  Variables
•  Scalars
•  Arrays
•  Hashes
•  Flow Control
•  if/else
•  unless
•  Loops
•  for
•  foreach
•  while
•  until
•  Files
•  Regexes
Loops
•  for
•  foreach
  for ( my $x = 0; $x < 4 ; $x++ ) {
  print "$xn";
  }
  my @names = ("Bob", "Vivek", "Jane");
 
  foreach my $name (@names) {
  print "My name is $name.n";
  }
  #prints:
  #My name is Bob.
  #My name is Vivek.
  #My name is Jane.
Hashes with foreach
my %book_info = (
title =>"Perl for Bioinformatics",
author => "James Tisdall");
  foreach my $key (keys %book_info) {
  print "$key : $book_info{$key}n";
  }
  #prints:
  #title : Perl for Bioinformatics
  #author : James Tisdall
Loops - continued
•  while
•  until
  my $x =0;
  until($x => 4) {
  print "$xn";
  $x++;
  }
  my $x =0;
  while($x < 4) {
  print "$xn";
  $x++;
  }
Tutorial 2
•  iterate through the array
•  print everything unless ILE
•  use a hash to count how many times each AA
occurs
•  iterate through the hash
•  print the counts
Basic Programming Concepts
•  Variables
•  Scalars
•  Arrays
•  Hashes
•  Flow Control
•  if/else
•  unless
•  Loops
•  for
•  foreach
•  while
•  until
•  Files
•  Regexes
Files
•  Existence
o  if(-e $file)
•  Open
o  Read - open(FILE, "< $file");
o  New - open(FILE, "> $file");
o  Append - open(FILE, ">> $file");
•  Read
o  while(<FILE>)
•  Write
o  print FILE $string;
•  Close
o  close(FILE)
Directory
•  Existence
o  if(-d $directory)
•  Open
o  opendir(DIR, "$directory")
•  Read
o  readdir(DIR)
•  Close
o  closedir(DIR)
•  Create
o  mkdir($directory) unless (-d
$directory)
# A. Reading file
# create a variable that can tell the program where to find your data
my $file = "/User/Vivek/Documents/perlTutorials/myFile.txt";
# Check if file exists and read through it
if(-e $file){
open(FILE, "<$file") or die "cannot open file";
while(<FILE>){
chomp;
my $line = $_;
#do something useful here
}
close(FILE);
}
# B. Reading directory
my $directory = "/User/Vivek";
if(-d $directory){
opendir(DIR, $directory);
my @files = readdir(DIR);
closedir(DIR);
print @files;
}
Notice the special character. When it
is used here, it holds the line that was
just read from the file.
The array @files will hold the name
of every file in the the directory.
Basic Programming Concepts
•  Variables
•  Scalars
•  Arrays
•  Hashes
•  Flow Control
•  if/else
•  unless
•  Loops
•  for
•  foreach
•  while
•  until
•  Files
•  Regexes
Regular Expressions (REGEX)
•  "A regular expression ... is a set of
pattern matching rules encoded in a
string according to certain syntax
rules." -wikipedia
•  Fast and efficient for "Fuzzy" matches
•  Example - Find all sequences from
human
o $seq_name =~ /(human|Homo sapiens)/i;
Beginning Perl for Bioinformatics - James Tidall
Simple Examples
my $protein = "MET SER ASN ASN THR SER";
$protein =~ s/SER/THR/g;
print $protein;
#prints "MET THR ASN ASN THR THR";
$protein =~ m/asn/i;
#will match ASN
Regular Expressions (REGEX)
Symbol Meaning
. Match any one character (except
newline).
^ Match at beginning of string
$ Match at end of string
n Match the newline
t Match a tab
s Match any whitespace character
w Match any word
character (alphanumeric plus "_")
W Match any non-word character
d Match any digit character
[A-Za-z] Match any letter
[0-9] same as d
my $string = "See also xyz";
$string =~ /See also ./;
#matches "See also x”
$string =~ /^./;
#matches "S”
$string =~ /.$/;
#matches "z”
$string =~ /wsw/;
#matches "e a"
Regular Expressions (REGEX)
Quantifier Meaning
* Match 0 or more times
+ Match at least once
? Match 0 or 1 times
*? Match 0 or more times (minimal).
+? Match 1 or more times (minimal).
?? Match 0 or 1 time (minimal).
{COUNT} Match exactly COUNT times.
{MIN,} Match at least MIN times
(maximal).
{MIN, MAX} Match at least MIN but not more
than MAX times (maximal).
my $string = "See also xyz";
$string =~ /See also .*/;
#matches "See also xyz”
$string =~ /^.*/;
#matches "See also xyz”
$string =~ /.?$/;
#matches "z”
$string =~ /w+s+w+/;
#matches "See also"
REGEX Examples
my $string = ">ref|XP_001882498.1| retrovirus-related pol polyprotein
[Laccaria bicolor S238N-H82]";
$string =~/s.*virus/;
#will match "retrovirus"
$string =~ /XP_d+/;
#will match "XP_001882498”
$string =~ /XP_d/;
#match “XP_0”
$string =~ /[.*]$/;
#will match "[Laccaria bicolor S238N-H82]"
$string =~ /^.*|/;
#will match ">ref|XP_001882498.1|"
$string =~ /^.*?|/;
#will match ">ref|"
$string =~ s/|/:/g;
#string becomes ">ref:XP_001882498.1: retrovirus-related pol polyprotein
[Laccaria bicolor S238N-H82]"
Tutorial 3
•  open the file example.fa
•  read through the file
•  print the id lines for the human sequences
(NOTE: the ids will start with HS)
Summary of Basics
•  Variables
•  Scalar
•  Array
•  Hash
•  Flow Control
•  if/else
•  unless
•  Loops
•  for
•  foreach
•  while
•  until
•  Files
•  Regexes
Basic BioPerl
•  GenBank file manipulation using Seq::IO
o  Fetch from NCBI
o  Select a subsequence
o  Print to a FASTA file
•  Analyzing BLAST results using Search::IO
o  Retrieve hits with greater than 75% identity and
length greater than 50
BioPerl
•  BioPerl is a collection of Perl libraries for analyzing
biological data.
•  Sequence Analysis, Phylogenetic Analysis,
Protein Structure Analysis, etc.
•  Installation instructions can be found at
www.bioperl.org
•  It is NOT a separate programming language.
Getting BioPerl
•  Installation instructions can be found
at www.bioperl.org
•  Current version 1.6.1
•  Documentation:
o  http://search.cpan.org/~cjfields/BioPerl/
o  http://doc.bioperl.org/
o  use perldoc
BioPerl Notes
•  All of the BioPerl libraries begin with "Bio::”
•  The libraries are grouped by function
•  Align, Phylogeny, DB, Seq, Search, Structure,
etc.
•  All of the parsing libraries end in "IO"
Hello GenBank
#!/usr/bin/perl
use strict;
use warnings;
# Import the Bioperl Library
use Bio::DB::GenBank;
#create GenBank download handle
my $gb = new Bio::DB::GenBank;
# this returns a Seq object via internet connection to GenBank:
my $seq = $gb->get_Seq_by_acc('AF303112');
print "ID: ". $seq->display_id(). "nSEQ: ". $seq->seq()."n";
File Handling with Perl
•  Existence
o  if(-e $file)
•  Open
o  Read - open(FILE, "< $file");
o  New - open(FILE, "> $file");
o  Append - open(FILE, ">> $file");
•  Read
o  while(<FILE>)
•  Write
o  print FILE $string;
•  Close
o  close(FILE)
Files With BioPerl
•  Open
•  Read - my $seq_in = Bio::SeqIO->new(
-file => '<$infile',
-format => 'Genbank');
•  New - my $seq_out = Bio::SeqIO->new(
-file => '>$outfile',
-format => 'Genbank');
•  Append - my $seq_out = Bio::SeqIO->new(
-file => '>>$outfile',
-format => 'Genbank');
•  Read
•  while (my $inseq = $seq_in->next_seq())
•  Write
•  $seq_out->write_seq($inseq);
#!/usr/bin/perl
use strict;
use warnings;
##--------- Divide GB File Based on Species ---------##
use lib “/Users/afniuser/Downloads/BioPerl-1.6.1”;
use Bio::SeqIO;
my $infile = "myGenbankFile.gb";
my $inseq = Bio::SeqIO->new(-file => “<$infile”,-format => 'Genbank');
my $humanFile = Bio::SeqIO->new(-file => '>human.gb',-format => 'Genbank');
my $otherFile = Bio::SeqIO->new(-file => '>other.gb',-format => 'Genbank');
while(my $seqin = $inseq->next_seq()){
#here we make use of the Bio::Seq object’s species attribute, which
#returns a Bio::Species object, which has a binomial attribute that
#holds the species name of the source of the sequence
if($seqin->species()->binomial() =~ m/Homo sapiens/){
$humanFile->write_seq($seqin);
}else{
$otherFile->write_seq($seqin);
}
}
Create the two output files.
Use a REGEX to decide which
file to write the sequence to.
Bio::SearchIO
•  These objects represent the three components of a
BLAST or FASTA pairwise database search result
–  Result - a container object for a given query sequence, there
will be a Result for every query sequence in a database search
•  Hit - a container object for each identified sequence found
to be similar to the query sequence, it contains HSPs
– HSP - represents the alignment of the query and hit
sequence. For BLAST there can be multiple HSPs
while only a single one for FASTA results. The HSP
object will contain reference to the query and subject
alignment start and end.
Result
Hit
HSP
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SearchIO;
my $in = new Bio::SearchIO(-format => 'blast',
-file => 'report.bls');
while(my $result = $in->next_result()){
## $result is a Bio::Search::Result::ResultI compliant object
while(my $hit = $result->next_hit()){
## $hit is a Bio::Search::Hit::HitI compliant object
while(my $hsp = $hit->next_hsp()){
##$hsp is a Bio::Search::HSP::HSPI compliant object
if($hsp->length('total') > 50 && $hsp->percent_identity() >= 75){
print "Query = ". $result->query_name().
"Hit = ". $hit->name().
"Length = ". $hsp->length('total').
"Percent_id = ". $hsp->percent_identity()."n";
}
}
}
}
We need to look at all of the
results, hits, and hsps, so we’ll
use nested while loops.
Tutorial 4
•  open the fasta file
•  create two output files in genbank format, one for
human, one for other
•  if the sequence ids start with HS, print to the
human file
•  if the id doesn't start with HS, print to the other
file
Summary
•  Perl
o Variables
o Flow Control
o Loops
o Files
o Regular Expressions
•  BioPerl
o SeqIO
o SearchIO
Contact Us
scienceApps@niaid.nih.gov
http://bioinformatics.niaid.nih.gov

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Unit 1-introduction to perl
Unit 1-introduction to perlUnit 1-introduction to perl
Unit 1-introduction to perl
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Data mining
Data miningData mining
Data mining
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Rasmol
RasmolRasmol
Rasmol
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Composite and Specialized databases
Composite and Specialized databasesComposite and Specialized databases
Composite and Specialized databases
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 

Ähnlich wie Intro to Perl and Bioperl

MIND sweeping introduction to PHP
MIND sweeping introduction to PHPMIND sweeping introduction to PHP
MIND sweeping introduction to PHP
BUDNET
 
Php basics
Php basicsPhp basics
Php basics
hamfu
 
Php introduction with history of php
Php introduction with history of phpPhp introduction with history of php
Php introduction with history of php
pooja bhandari
 
Scripting3
Scripting3Scripting3
Scripting3
Nao Dara
 

Ähnlich wie Intro to Perl and Bioperl (20)

MIND sweeping introduction to PHP
MIND sweeping introduction to PHPMIND sweeping introduction to PHP
MIND sweeping introduction to PHP
 
Zend Certification Preparation Tutorial
Zend Certification Preparation TutorialZend Certification Preparation Tutorial
Zend Certification Preparation Tutorial
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
05php
05php05php
05php
 
PHP - Introduction to PHP
PHP -  Introduction to PHPPHP -  Introduction to PHP
PHP - Introduction to PHP
 
Marc’s (bio)perl course
Marc’s (bio)perl courseMarc’s (bio)perl course
Marc’s (bio)perl course
 
05php
05php05php
05php
 
Php basics
Php basicsPhp basics
Php basics
 
php fundamental
php fundamentalphp fundamental
php fundamental
 
Php introduction with history of php
Php introduction with history of phpPhp introduction with history of php
Php introduction with history of php
 
php
phpphp
php
 
Scripting3
Scripting3Scripting3
Scripting3
 
rtwerewr
rtwerewrrtwerewr
rtwerewr
 
Php Crash Course - Macq Electronique 2010
Php Crash Course - Macq Electronique 2010Php Crash Course - Macq Electronique 2010
Php Crash Course - Macq Electronique 2010
 
LPW: Beginners Perl
LPW: Beginners PerlLPW: Beginners Perl
LPW: Beginners Perl
 
Practical approach to perl day1
Practical approach to perl day1Practical approach to perl day1
Practical approach to perl day1
 
Beginning Perl
Beginning PerlBeginning Perl
Beginning Perl
 
05php
05php05php
05php
 
Php classes in mumbai
Php classes in mumbaiPhp classes in mumbai
Php classes in mumbai
 
Bioinformatica p6-bioperl
Bioinformatica p6-bioperlBioinformatica p6-bioperl
Bioinformatica p6-bioperl
 

Mehr von Bioinformatics and Computational Biosciences Branch

Mehr von Bioinformatics and Computational Biosciences Branch (20)

Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019Virus Sequence Alignment and Phylogenetic Analysis 2019
Virus Sequence Alignment and Phylogenetic Analysis 2019
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Introduction to METAGENOTE
Introduction to METAGENOTE Introduction to METAGENOTE
Introduction to METAGENOTE
 
Intro to homology modeling
Intro to homology modelingIntro to homology modeling
Intro to homology modeling
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Biological networks
Biological networksBiological networks
Biological networks
 
UNIX Basics and Cluster Computing
UNIX Basics and Cluster ComputingUNIX Basics and Cluster Computing
UNIX Basics and Cluster Computing
 
Statistical applications in GraphPad Prism
Statistical applications in GraphPad PrismStatistical applications in GraphPad Prism
Statistical applications in GraphPad Prism
 
Intro to JMP for statistics
Intro to JMP for statisticsIntro to JMP for statistics
Intro to JMP for statistics
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
Better graphics in R
Better graphics in RBetter graphics in R
Better graphics in R
 
Automating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtoolsAutomating biostatistics workflows using R-based webtools
Automating biostatistics workflows using R-based webtools
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
GraphPad Prism: Curve fitting
GraphPad Prism: Curve fittingGraphPad Prism: Curve fitting
GraphPad Prism: Curve fitting
 

Kürzlich hochgeladen

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Kürzlich hochgeladen (20)

What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 

Intro to Perl and Bioperl

  • 1. Introduction to Perl/BioPerl Presented by: Jennifer Dommer, Vivek Gopalan Bioinformatics Developer Bioinformatics and Computational Biosciences Branch National Institute of Allergy and Infectious Diseases Office of Cyber Infrastructure and Computational Biology
  • 2. Who Are We? •  Bioinformatics and Computational Biology Branch (BCBB) •  NIH/NIAID/OD/OSMO/OCICB/BCBB •  group of 28 people •  http://bioinformatics.niaid.nih.gov •  scienceApps@niaid.nih.gov
  • 3. Outline •  Introduction •  Perl programming principles o Variables o Flow controls/Loops o File manipulation o Regular expressions •  BioPerl o Introduction o SeqIO o SearchIO
  • 4. Introduction •  PERL – Practical Extraction and Report Language •  An interpreted programming language created in 1987 by Larry Wall •  Good at processing and transforming plain text, like GenBank or PDB files •  Not strongly typed – Variables don’t require a type and are not required to be declared in advance. You can’t do this in C- or Java-like languages. •  Extensible – currently has an large and active user base who are constantly adding new functional libraries •  Portable– can use in Windows, Mac, & Linux/Unix
  • 5. Introduction "Perl is remarkably good for slicing, dicing, twisting, wringing, smoothing, summarizing and otherwise mangling text. Although the biological sciences do involve a good deal of numeric analysis now, most of the primary data is still text: clone names, annotations, comments, bibliographic references. Even DNA sequences are textlike. Interconverting incompatible data formats is a matter of text mangling combined with some creative guesswork. Perl's powerful regular expression matching and string manipulation operators simplify this job in a way that isn't equalled by any other modern language."
  • 6. Getting Perl •  Latest version – 5.12.3 •  http://www.perl.org/
  • 7. Getting Help • perl –v •  Perl manual pages •  Books and Documentation: •  http://www.perl.org/docs.html •  The O’Reilly Books: Programming Perl, etc. •  http://www.cpan.org •  http://perldoc.perl.org/perlintro.html •  BCBB – for help writing your custom scripts perldoc perl perldoc perlintro
  • 8. "Hello world" script •  hello_world.pl file #!/usr/bin/perl # This is a comment print "Hello worldn"; The shebang line must be the first line. It tells the computer where to find Perl. •  print is a Perl function name •  Double quotes are used for Strings •  The semi-colon must be present at the end of every command in Perl
  • 9. "Hello world" script •  hello_world.pl file •  Run hello_world.pl #!/usr/bin/perl # This is a comment print "Hello worldn"; >perl hello_world.pl Hello world The shebang line must be the first line. It tells the computer where to find Perl. •  print is a Perl function name •  Double quotes are used for Strings •  The semi-colon must be present at the end of every command in Perl
  • 10. "Hello world" script •  hello_world.pl file •  Run hello_world.pl #!/usr/bin/perl # This is a comment print "Hello worldn"; >perl hello_world.pl Hello world >perl -e 'print "Hello worldn”;' Hello world The shebang line must be the first line. It tells the computer where to find Perl. •  print is a Perl function name •  Double quotes are used for Strings •  The semi-colon must be present at the end of every command in Perl
  • 11. Basic Programming Concepts •  Variables •  Scalars •  Arrays •  Hashes •  Flow Control •  if/else •  unless •  Loops •  for •  foreach •  while •  until •  Files •  Regexes
  • 12. Variables •  In computer programming, a variable is a symbolic name used to refer to a value – WikiPedia o  Examples •  Variable names can contain letters, digits, and _, but cannot begin with a digit   $x = 4;   $y = 1.0;   $name = 'Bob';   $seq = "ACTGTTGTAAGC"; Perl will treat integers and floating point numbers as numbers, so x and y can be used together in an equation. Strings are indicated by either single or double quotes.
  • 14. Variables - Scalar •  Can store a single string, or number •  Begins with a $ •  Single or double quotes for strings   my $x = 4;   my $name = 'Bob';   my $seq= "ACTGTTGTAAGC";   print "My name is $name."; #prints My name is Bob.
  • 15. http://perldoc.perl.org/perlintro.html && and || or ! not = assignment . string concatenation .. range operator Arithmetic Numeric Comparison Boolean Logic Miscellaneous eq equality ne inequality lt less than gt greater le less than or equal ge greater than or equal String Comparison Scalar Operators == equality != inequality < less than > greater <= less than or equal >= greater than or equal + addition - subtraction * multiplication / division ++ increment (by one) -- decrement (by one) += increment (by value) -= decrement (by value)
  • 16. Common Scalar Functions Function Name Description length Length of the scalar value lc Lower case value uc Upper case value reverse Returns the value in the opposite order substr Returns a substring chomp Removes the last newline (n) character chop Removes the last character defined Checks if scalar value exists split Splits scalar into array
  • 17. Common Scalar Functions Examples my $string = "This string has a newline.n"; chomp $string; print $string; #prints "This string has a newline.” @array = split(" ", $string); #array looks like ["This", "string", "has", "a", "newline."]
  • 18. Array Vivek Jennifer Jason Darrell Qina 0 1 432 •  Stores a list of scalar values (strings or numbers) •  Zero based index
  • 19. Variables - Array •  Begins with @ •  Use the () brackets for creating •  Use the $ and [] brackets for retrieving a single element in the array my @grades = (75, 80, 35); my @mixnmatch = (5, "A", 4.5); my @names = ("Bob", "Vivek", "Jane"); # zero-based index my $first_name = $names[0]; #special variable to retrieve the last item in an array my $last_name = $names[$#names];
  • 20. Common Array Functions Function Name Description scalar Size of the array push Add value to end of an array pop Removes the last element from an array shift Removes the first element from an array unshift Add value to the beginning of an array join Convert array to scalar splice Removes or replaces specified range of elements from array grep Search array elements sort Orders array elements
  • 21. push/pop Tim Molly Betty Chris push(@names, "Charles"); @names = @names = Tim Molly Betty Chris Charles pop(@names); @names = Tim Molly Betty Chris
  • 22. shift/unshift Tim Molly Betty Chris unshift(@names, "Charles"); @names = @names = Charles Tim Molly Betty Chris shift(@names); @names = Tim Molly Betty Chris
  • 23. Variables - Hashes KEYS VALUES Title Programming Perl, 3rd Edition Publisher O’Reilly Media ISBN 978-0-596-00027-1
  • 24. Variables - Hash •  Stores data using key, value pairs •  Indicated with % •  Use the () brackets for creating •  Use the $ and {} brackets for setting or retrieving a single element from the hash my %book_info = ( title =>"Perl for bioinformatics", author => "James Tisdall", pages => 270, price => 40 ); $book_info{"author"}; #returns "James Tisdall"
  • 25. Variables - Hash •  Retrieving single value or all the keys/ values •  NOTE: Keys and values are unordered my $book_title = $book_info{"title"}; #refers to "Perl for bioinformatics" my @book_attributes = keys %book_info; my @book_attribute_values = values %book_info;
  • 26. Common Hash Functions Function Name Description keys Returns array of keys values Returns array of values reverse Converts keys to values in hash
  • 27. Variables summary # A. Scalar variable my $first_name = "vivek"; my $last_name = "gopalan”; # B. Array variable # use 'circular' bracket and @ symbol for assignment my @personal_info = ("vivek", $last_name); # use 'square' bracket and the integer index to access an entry my $fname = $personal_info[0]; # C. Hash variable # use 'circular' brackets (similar to array) and % symbol for assignment my %personal_info = ( first_name => "vivek", last_name => "gopalan" ); # use 'curly' brackets to access a single entry my $fname1 = $personal_info{first_name};
  • 28. Tutorial 1 •  Create a variable with the following sequence: ILE GLY GLY ASN ALA GLN ALA THR ALA ALA ASN SER ILE ALA LEU GLY SER GLY ALA THR THR •  print in lowercase •  split into an array •  print the array •  print the first value in the array •  shift the first value off the array and store it in a variable •  print the variable and the array •  push the variable onto the end of the array •  print the array
  • 29. Basic Programming Concepts •  Variables •  Scalars •  Arrays •  Hashes •  Flow Control •  if/else •  unless •  Loops •  for •  foreach •  while •  until •  Files •  Regexes
  • 30. Flow Controls •  If/elsif/else •  unless   $x = 4;   if ($x > 4) {   print "I am greater than 4";   }elsif ($x == 4) {   print "I am equal to 4";   }else {   print "I am less than 4";   }   unless($x > 4) {   print "I am not greater than 4";   }
  • 31. Post-condition # the traditional way if ($x == 4) { print "I am 4."; } # this line is equivalent to the if statement above, but you can only use it if you have a one line action print "I am 4." if ( $x == 4 ); print "I am not 4." unless ( $x == 4);
  • 32. Basic Programming Concepts •  Variables •  Scalars •  Arrays •  Hashes •  Flow Control •  if/else •  unless •  Loops •  for •  foreach •  while •  until •  Files •  Regexes
  • 33. Loops •  for •  foreach   for ( my $x = 0; $x < 4 ; $x++ ) {   print "$xn";   }   my @names = ("Bob", "Vivek", "Jane");     foreach my $name (@names) {   print "My name is $name.n";   }   #prints:   #My name is Bob.   #My name is Vivek.   #My name is Jane.
  • 34. Hashes with foreach my %book_info = ( title =>"Perl for Bioinformatics", author => "James Tisdall");   foreach my $key (keys %book_info) {   print "$key : $book_info{$key}n";   }   #prints:   #title : Perl for Bioinformatics   #author : James Tisdall
  • 35. Loops - continued •  while •  until   my $x =0;   until($x => 4) {   print "$xn";   $x++;   }   my $x =0;   while($x < 4) {   print "$xn";   $x++;   }
  • 36. Tutorial 2 •  iterate through the array •  print everything unless ILE •  use a hash to count how many times each AA occurs •  iterate through the hash •  print the counts
  • 37. Basic Programming Concepts •  Variables •  Scalars •  Arrays •  Hashes •  Flow Control •  if/else •  unless •  Loops •  for •  foreach •  while •  until •  Files •  Regexes
  • 38. Files •  Existence o  if(-e $file) •  Open o  Read - open(FILE, "< $file"); o  New - open(FILE, "> $file"); o  Append - open(FILE, ">> $file"); •  Read o  while(<FILE>) •  Write o  print FILE $string; •  Close o  close(FILE)
  • 39. Directory •  Existence o  if(-d $directory) •  Open o  opendir(DIR, "$directory") •  Read o  readdir(DIR) •  Close o  closedir(DIR) •  Create o  mkdir($directory) unless (-d $directory)
  • 40. # A. Reading file # create a variable that can tell the program where to find your data my $file = "/User/Vivek/Documents/perlTutorials/myFile.txt"; # Check if file exists and read through it if(-e $file){ open(FILE, "<$file") or die "cannot open file"; while(<FILE>){ chomp; my $line = $_; #do something useful here } close(FILE); } # B. Reading directory my $directory = "/User/Vivek"; if(-d $directory){ opendir(DIR, $directory); my @files = readdir(DIR); closedir(DIR); print @files; } Notice the special character. When it is used here, it holds the line that was just read from the file. The array @files will hold the name of every file in the the directory.
  • 41. Basic Programming Concepts •  Variables •  Scalars •  Arrays •  Hashes •  Flow Control •  if/else •  unless •  Loops •  for •  foreach •  while •  until •  Files •  Regexes
  • 42. Regular Expressions (REGEX) •  "A regular expression ... is a set of pattern matching rules encoded in a string according to certain syntax rules." -wikipedia •  Fast and efficient for "Fuzzy" matches •  Example - Find all sequences from human o $seq_name =~ /(human|Homo sapiens)/i;
  • 43. Beginning Perl for Bioinformatics - James Tidall
  • 44. Simple Examples my $protein = "MET SER ASN ASN THR SER"; $protein =~ s/SER/THR/g; print $protein; #prints "MET THR ASN ASN THR THR"; $protein =~ m/asn/i; #will match ASN
  • 45. Regular Expressions (REGEX) Symbol Meaning . Match any one character (except newline). ^ Match at beginning of string $ Match at end of string n Match the newline t Match a tab s Match any whitespace character w Match any word character (alphanumeric plus "_") W Match any non-word character d Match any digit character [A-Za-z] Match any letter [0-9] same as d my $string = "See also xyz"; $string =~ /See also ./; #matches "See also x” $string =~ /^./; #matches "S” $string =~ /.$/; #matches "z” $string =~ /wsw/; #matches "e a"
  • 46. Regular Expressions (REGEX) Quantifier Meaning * Match 0 or more times + Match at least once ? Match 0 or 1 times *? Match 0 or more times (minimal). +? Match 1 or more times (minimal). ?? Match 0 or 1 time (minimal). {COUNT} Match exactly COUNT times. {MIN,} Match at least MIN times (maximal). {MIN, MAX} Match at least MIN but not more than MAX times (maximal). my $string = "See also xyz"; $string =~ /See also .*/; #matches "See also xyz” $string =~ /^.*/; #matches "See also xyz” $string =~ /.?$/; #matches "z” $string =~ /w+s+w+/; #matches "See also"
  • 47. REGEX Examples my $string = ">ref|XP_001882498.1| retrovirus-related pol polyprotein [Laccaria bicolor S238N-H82]"; $string =~/s.*virus/; #will match "retrovirus" $string =~ /XP_d+/; #will match "XP_001882498” $string =~ /XP_d/; #match “XP_0” $string =~ /[.*]$/; #will match "[Laccaria bicolor S238N-H82]" $string =~ /^.*|/; #will match ">ref|XP_001882498.1|" $string =~ /^.*?|/; #will match ">ref|" $string =~ s/|/:/g; #string becomes ">ref:XP_001882498.1: retrovirus-related pol polyprotein [Laccaria bicolor S238N-H82]"
  • 48. Tutorial 3 •  open the file example.fa •  read through the file •  print the id lines for the human sequences (NOTE: the ids will start with HS)
  • 49. Summary of Basics •  Variables •  Scalar •  Array •  Hash •  Flow Control •  if/else •  unless •  Loops •  for •  foreach •  while •  until •  Files •  Regexes
  • 50. Basic BioPerl •  GenBank file manipulation using Seq::IO o  Fetch from NCBI o  Select a subsequence o  Print to a FASTA file •  Analyzing BLAST results using Search::IO o  Retrieve hits with greater than 75% identity and length greater than 50
  • 51. BioPerl •  BioPerl is a collection of Perl libraries for analyzing biological data. •  Sequence Analysis, Phylogenetic Analysis, Protein Structure Analysis, etc. •  Installation instructions can be found at www.bioperl.org •  It is NOT a separate programming language.
  • 52. Getting BioPerl •  Installation instructions can be found at www.bioperl.org •  Current version 1.6.1 •  Documentation: o  http://search.cpan.org/~cjfields/BioPerl/ o  http://doc.bioperl.org/ o  use perldoc
  • 53. BioPerl Notes •  All of the BioPerl libraries begin with "Bio::” •  The libraries are grouped by function •  Align, Phylogeny, DB, Seq, Search, Structure, etc. •  All of the parsing libraries end in "IO"
  • 54. Hello GenBank #!/usr/bin/perl use strict; use warnings; # Import the Bioperl Library use Bio::DB::GenBank; #create GenBank download handle my $gb = new Bio::DB::GenBank; # this returns a Seq object via internet connection to GenBank: my $seq = $gb->get_Seq_by_acc('AF303112'); print "ID: ". $seq->display_id(). "nSEQ: ". $seq->seq()."n";
  • 55. File Handling with Perl •  Existence o  if(-e $file) •  Open o  Read - open(FILE, "< $file"); o  New - open(FILE, "> $file"); o  Append - open(FILE, ">> $file"); •  Read o  while(<FILE>) •  Write o  print FILE $string; •  Close o  close(FILE)
  • 56. Files With BioPerl •  Open •  Read - my $seq_in = Bio::SeqIO->new( -file => '<$infile', -format => 'Genbank'); •  New - my $seq_out = Bio::SeqIO->new( -file => '>$outfile', -format => 'Genbank'); •  Append - my $seq_out = Bio::SeqIO->new( -file => '>>$outfile', -format => 'Genbank'); •  Read •  while (my $inseq = $seq_in->next_seq()) •  Write •  $seq_out->write_seq($inseq);
  • 57. #!/usr/bin/perl use strict; use warnings; ##--------- Divide GB File Based on Species ---------## use lib “/Users/afniuser/Downloads/BioPerl-1.6.1”; use Bio::SeqIO; my $infile = "myGenbankFile.gb"; my $inseq = Bio::SeqIO->new(-file => “<$infile”,-format => 'Genbank'); my $humanFile = Bio::SeqIO->new(-file => '>human.gb',-format => 'Genbank'); my $otherFile = Bio::SeqIO->new(-file => '>other.gb',-format => 'Genbank'); while(my $seqin = $inseq->next_seq()){ #here we make use of the Bio::Seq object’s species attribute, which #returns a Bio::Species object, which has a binomial attribute that #holds the species name of the source of the sequence if($seqin->species()->binomial() =~ m/Homo sapiens/){ $humanFile->write_seq($seqin); }else{ $otherFile->write_seq($seqin); } } Create the two output files. Use a REGEX to decide which file to write the sequence to.
  • 58. Bio::SearchIO •  These objects represent the three components of a BLAST or FASTA pairwise database search result –  Result - a container object for a given query sequence, there will be a Result for every query sequence in a database search •  Hit - a container object for each identified sequence found to be similar to the query sequence, it contains HSPs – HSP - represents the alignment of the query and hit sequence. For BLAST there can be multiple HSPs while only a single one for FASTA results. The HSP object will contain reference to the query and subject alignment start and end. Result Hit HSP
  • 59. #!/usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'report.bls'); while(my $result = $in->next_result()){ ## $result is a Bio::Search::Result::ResultI compliant object while(my $hit = $result->next_hit()){ ## $hit is a Bio::Search::Hit::HitI compliant object while(my $hsp = $hit->next_hsp()){ ##$hsp is a Bio::Search::HSP::HSPI compliant object if($hsp->length('total') > 50 && $hsp->percent_identity() >= 75){ print "Query = ". $result->query_name(). "Hit = ". $hit->name(). "Length = ". $hsp->length('total'). "Percent_id = ". $hsp->percent_identity()."n"; } } } } We need to look at all of the results, hits, and hsps, so we’ll use nested while loops.
  • 60. Tutorial 4 •  open the fasta file •  create two output files in genbank format, one for human, one for other •  if the sequence ids start with HS, print to the human file •  if the id doesn't start with HS, print to the other file