SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
Parsing a File with Perl

Regexp, substr and oneliners




 Paolo Marcatili - Programmazione 09-10
Agenda

Today we will see how to
> Extract information from a file
> Substr and regexp

We already know how to use:
> Scalar variables $ and arrays @
> If, for, while, open, print, close…

                                                  2
         Paolo Marcatili - Programmazione 09-10
Task Today




Paolo Marcatili - Programmazione 09-10
Protein Structures

1st task:
> Open a PDB file
> Operate a symmetry transformation
> Extract data from file header




                                                 4
        Paolo Marcatili - Programmazione 09-10
Zinc Finger

2nd task:
> Open a fasta file
> Find all occurencies of Zinc Fingers

  (homework?)




                                                  5
         Paolo Marcatili - Programmazione 09-10
Parsing




Paolo Marcatili - Programmazione 09-10
Rationale

Biological data -> human readable files

If you can read it, Perl can read it as well
*BUT*
It can be tricky




                                                  7
         Paolo Marcatili - Programmazione 09-10
Parsing flow-chart

Open the file
For each line{
     look for “grammar”
     and store data
}
Close file
Use data




                                                    8
           Paolo Marcatili - Programmazione 09-10
Substr




Paolo Marcatili - Programmazione 09-10
Substr

substr($data, start, length)
returns a substring from the expression supplied as first
   argument.




                                                           10
Substr

substr($data, start, length)
        ^          ^        ^
    your string     |        |
              start from 0 |
                      you can omit this
                (you will extract up to the end of string)




                                                             11
Substr

substr($data, start, length)
Examples:

my $data=“il mattino ha l’oro in bocca”;
print substr($data,0) . “n”; #prints all string
print substr($data,3,5) . “n”; #prints matti
print substr($data,25) . “n”; #prints bocca
print substr($data,-5) . “n”; #prints bocca



                                                   12
Pdb rotation




Paolo Marcatili - Programmazione 09-10
PDB

ATOM    4   O   ASP L   1   43.716 -12.235   68.502   1.00 70.05   O
ATOM    5   N   ILE L   2   44.679 -10.569   69.673   1.00 48.19   N
…




COLUMNS        DATA TYPE     FIELD        DEFINITION
-------------------------------------------------------------------------------------
 1 - 6         Record name   "ATOM "
 7 - 11        Integer       serial       Atom serial number.
13 - 16        Atom          name         Atom name.
17             Character     altLoc       Alternate location indicator.
18 - 20        Residue name resName       Residue name.
22             Character     chainID      Chain identifier.
23 - 26        Integer       resSeq       Residue sequence number.
27             AChar         iCode        Code for insertion of residues.
31 - 38        Real(8.3)     x            Orthogonal coordinates for X in Angstroms
39 - 46        Real(8.3)     y            Orthogonal coordinates for Y in Angstroms
47 - 54        Real(8.3)     z            Orthogonal coordinates for Z in Angstroms
55 - 80        Bla Bla Bla (not useful for our purposes)



                                                                                14
Rotation
X->Z
Y->X   ===> rotation of 120° around u=(1,1,1)
Z->Y


         Y




                                X
                                                15
Rotation
#! /usr/bin/perl -w

use strict;
open(IG, "<IG.pdb") || die "cannot open IG.pdb:$!";
   open(IGR, ">IG_rotated.pdb") || die "cannot open IG_rotated.pdb:$!";
   while (my $line=<IG>){
     if (substr($line,0,4) eq "ATOM"){
         my $X= substr($line,30,8);
         my $Y= substr($line,38,8);
         my $Z= substr($line,46,8);
         print IGR substr($line,0,30).$Z.$X.$Y.substr($line,54);
     }
     else{
         print IGR $line;
     }
}
close IG;
close IGR;




                                                                          16
RegExp




Paolo Marcatili - Programmazione 09-10
Regular Expressions

  PDB have a “fixed” structures.

What if we want to do something like
“check for a valid email address”…




                                       18
Regular Expressions

     PDB have a “fixed” structures.

What if we want to do something like
“check for a valid email address”…
1. There must be some letters or numbers
2. There must be a @
3. Other letters
4. .something
paolo.marcatili@gmail.com is good

paolo.marcatili@.com is not good


                                           19
Regular Expressions

$line =~ m/^[a-z |1-9| .| _]+@[^.]+.[a-z]{2,}$/

WHAAAT???

This means:
Check if $line has some chars at the beginning, then @, then
some non-points, then a point, then at least two letters

….
Ok, let’s start from something simpler :)




                                                               20
Regular Expressions

$line =~ m/^[a-z |1-9| .| _]+@[^.]+.[a-z]{2,}$/

WHAAAT???

This means:
Check if $line has some chars at the beginning, then @, then
some non-points, then a point, then at least two letters

….
Ok, let’s start from something simpler :)




                                                               21
Regular Expressions
$line =~ m/^ATOM/
Line starts with ATOM

$line =~ m/^ATOMs+/
Line starts with ATOM, then there are some spaces

$line =~ m/^ATOMs+[-|0-9]+/
Line starts with ATOM, then there are some spaces, then there are some
       digits or -
$line =~ m/^ATOMs+-?[0-9]+/
Line starts with ATOM, then there are some spaces, then there can be a
       minus, then some digits




                                                                         22
Regular Expressions




                      23
PDB Header

We want to find %id for L and H chain




                                       24
PDB Header

We want to find %id for L and H chain


$pidL= $1 if ($line=~m/REMARK SUMMARY-ID_GLOB_L:([.|0-9])/);
$pidH= $1 if ($line=~m/REMARK SUMMARY-ID_GLOB_H:([.|0-9])/);

ONELINER!!


cat IG.pdb | perl -ne ‘print “$1n”
if ($_=~m/^REMARK SUMMARY-ID_GLOB_([LH]:[.|0-9]+)/);’




                                                                25
Zinc Finger




Paolo Marcatili - Programmazione 09-10
Zinc Finger

A zinc finger is a large superfamily of protein
   domains that can bind to DNA.

A zinc finger consists of two antiparallel β
   strands, and an α helix.
The zinc ion is crucial for the stability of this
   domain type - in the absence of the metal
   ion the domain unfolds as it is too small to
   have a hydrophobic core.
The consensus sequence of a single finger is:

C-X{2-4}-C-X{3}-[LIVMFYWC]-X{8}-H-X{3}-H


                                                    27
Homework

  Find all occurencies of ZF motif in
  zincfinger.fasta

Put them in file ZF_motif.fasta

e.g.
weofjpihouwefghoicacvgnfglapglhtylhyuiui




                                           28
Homework

  Find all occurencies of ZF motif in
  zincfinger.fasta

Put them in file ZF_motif.fasta

e.g.
Weofjpihouwefghoicacvgnfglapglifhtylhyuiui

cacvgnfglapglifhtylh

                                             29

Weitere ähnliche Inhalte

Was ist angesagt?

Php radomize
Php radomizePhp radomize
Php radomizedo_aki
 
OPL best practices - Doing more with less easier
OPL best practices - Doing more with less easierOPL best practices - Doing more with less easier
OPL best practices - Doing more with less easierAlex Fleischer
 
Advanced perl finer points ,pack&amp;unpack,eval,files
Advanced perl   finer points ,pack&amp;unpack,eval,filesAdvanced perl   finer points ,pack&amp;unpack,eval,files
Advanced perl finer points ,pack&amp;unpack,eval,filesShankar D
 
Introduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in HaskellIntroduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in Haskellnebuta
 
3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib웅식 전
 
Hacking Parse.y with ujihisa
Hacking Parse.y with ujihisaHacking Parse.y with ujihisa
Hacking Parse.y with ujihisaujihisa
 
Learn c++ (functions) with nauman ur rehman
Learn  c++ (functions) with nauman ur rehmanLearn  c++ (functions) with nauman ur rehman
Learn c++ (functions) with nauman ur rehmanNauman Rehman
 
C++の話(本当にあった怖い話)
C++の話(本当にあった怖い話)C++の話(本当にあった怖い話)
C++の話(本当にあった怖い話)Yuki Tamura
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python Chetan Giridhar
 
React Js Training In Bangalore | ES6 Concepts in Depth
React Js Training   In Bangalore | ES6  Concepts in DepthReact Js Training   In Bangalore | ES6  Concepts in Depth
React Js Training In Bangalore | ES6 Concepts in DepthSiva Vadlamudi
 
Swift the implicit parts
Swift the implicit partsSwift the implicit parts
Swift the implicit partsMaxim Zaks
 
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5 b...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5  b...Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5  b...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5 b...ssuserd6b1fd
 
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...ssuserd6b1fd
 
ekb.py - Python VS ...
ekb.py - Python VS ...ekb.py - Python VS ...
ekb.py - Python VS ...it-people
 
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練Sheng-Hao Ma
 

Was ist angesagt? (19)

Php radomize
Php radomizePhp radomize
Php radomize
 
The most exciting features of PHP 7.1
The most exciting features of PHP 7.1The most exciting features of PHP 7.1
The most exciting features of PHP 7.1
 
OPL best practices - Doing more with less easier
OPL best practices - Doing more with less easierOPL best practices - Doing more with less easier
OPL best practices - Doing more with less easier
 
Advanced perl finer points ,pack&amp;unpack,eval,files
Advanced perl   finer points ,pack&amp;unpack,eval,filesAdvanced perl   finer points ,pack&amp;unpack,eval,files
Advanced perl finer points ,pack&amp;unpack,eval,files
 
Introduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in HaskellIntroduction to ad-3.4, an automatic differentiation library in Haskell
Introduction to ad-3.4, an automatic differentiation library in Haskell
 
Unit vii wp ppt
Unit vii wp pptUnit vii wp ppt
Unit vii wp ppt
 
3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib3 1. preprocessor, math, stdlib
3 1. preprocessor, math, stdlib
 
Hacking Parse.y with ujihisa
Hacking Parse.y with ujihisaHacking Parse.y with ujihisa
Hacking Parse.y with ujihisa
 
Learn c++ (functions) with nauman ur rehman
Learn  c++ (functions) with nauman ur rehmanLearn  c++ (functions) with nauman ur rehman
Learn c++ (functions) with nauman ur rehman
 
C++の話(本当にあった怖い話)
C++の話(本当にあった怖い話)C++の話(本当にあった怖い話)
C++の話(本当にあった怖い話)
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python
 
React Js Training In Bangalore | ES6 Concepts in Depth
React Js Training   In Bangalore | ES6  Concepts in DepthReact Js Training   In Bangalore | ES6  Concepts in Depth
React Js Training In Bangalore | ES6 Concepts in Depth
 
Swift the implicit parts
Swift the implicit partsSwift the implicit parts
Swift the implicit parts
 
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5 b...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5  b...Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5  b...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 3 of 5 b...
 
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 2 of 5 by...
 
Auxiliary
AuxiliaryAuxiliary
Auxiliary
 
ekb.py - Python VS ...
ekb.py - Python VS ...ekb.py - Python VS ...
ekb.py - Python VS ...
 
Data Types Master
Data Types MasterData Types Master
Data Types Master
 
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
NTUSTxTDOH 資訊安全基礎工作坊 基礎逆向教育訓練
 

Ähnlich wie Regexp Master

Perl6 a whistle stop tour
Perl6 a whistle stop tourPerl6 a whistle stop tour
Perl6 a whistle stop tourSimon Proctor
 
Perl6 a whistle stop tour
Perl6 a whistle stop tourPerl6 a whistle stop tour
Perl6 a whistle stop tourSimon Proctor
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsSunil Kumar Gunasekaran
 
Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWorkhorse Computing
 
Slaying the Dragon: Implementing a Programming Language in Ruby
Slaying the Dragon: Implementing a Programming Language in RubySlaying the Dragon: Implementing a Programming Language in Ruby
Slaying the Dragon: Implementing a Programming Language in RubyJason Yeo Jie Shun
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionProf. Wim Van Criekinge
 
Hidden treasures of Ruby
Hidden treasures of RubyHidden treasures of Ruby
Hidden treasures of RubyTom Crinson
 
Real World Haskell: Lecture 7
Real World Haskell: Lecture 7Real World Haskell: Lecture 7
Real World Haskell: Lecture 7Bryan O'Sullivan
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to PerlSway Wang
 
Instruction Set Architecture: MIPS
Instruction Set Architecture: MIPSInstruction Set Architecture: MIPS
Instruction Set Architecture: MIPSPrasenjit Dey
 
Overloading Perl OPs using XS
Overloading Perl OPs using XSOverloading Perl OPs using XS
Overloading Perl OPs using XSℕicolas ℝ.
 
Old Oracle Versions
Old Oracle VersionsOld Oracle Versions
Old Oracle VersionsJeffrey Kemp
 
Scala + WattzOn, sitting in a tree....
Scala + WattzOn, sitting in a tree....Scala + WattzOn, sitting in a tree....
Scala + WattzOn, sitting in a tree....Raffi Krikorian
 
Perl6 for-beginners
Perl6 for-beginnersPerl6 for-beginners
Perl6 for-beginnersJens Rehsack
 

Ähnlich wie Regexp Master (20)

Regexp master 2011
Regexp master 2011Regexp master 2011
Regexp master 2011
 
Perl6 a whistle stop tour
Perl6 a whistle stop tourPerl6 a whistle stop tour
Perl6 a whistle stop tour
 
Perl6 a whistle stop tour
Perl6 a whistle stop tourPerl6 a whistle stop tour
Perl6 a whistle stop tour
 
PERL for QA - Important Commands and applications
PERL for QA - Important Commands and applicationsPERL for QA - Important Commands and applications
PERL for QA - Important Commands and applications
 
Mips1
Mips1Mips1
Mips1
 
Apache pig
Apache pigApache pig
Apache pig
 
Wheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility ModulesWheels we didn't re-invent: Perl's Utility Modules
Wheels we didn't re-invent: Perl's Utility Modules
 
Slaying the Dragon: Implementing a Programming Language in Ruby
Slaying the Dragon: Implementing a Programming Language in RubySlaying the Dragon: Implementing a Programming Language in Ruby
Slaying the Dragon: Implementing a Programming Language in Ruby
 
Awk essentials
Awk essentialsAwk essentials
Awk essentials
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
Hidden treasures of Ruby
Hidden treasures of RubyHidden treasures of Ruby
Hidden treasures of Ruby
 
Real World Haskell: Lecture 7
Real World Haskell: Lecture 7Real World Haskell: Lecture 7
Real World Haskell: Lecture 7
 
Hashes Master
Hashes MasterHashes Master
Hashes Master
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Instruction Set Architecture: MIPS
Instruction Set Architecture: MIPSInstruction Set Architecture: MIPS
Instruction Set Architecture: MIPS
 
Mips
MipsMips
Mips
 
Overloading Perl OPs using XS
Overloading Perl OPs using XSOverloading Perl OPs using XS
Overloading Perl OPs using XS
 
Old Oracle Versions
Old Oracle VersionsOld Oracle Versions
Old Oracle Versions
 
Scala + WattzOn, sitting in a tree....
Scala + WattzOn, sitting in a tree....Scala + WattzOn, sitting in a tree....
Scala + WattzOn, sitting in a tree....
 
Perl6 for-beginners
Perl6 for-beginnersPerl6 for-beginners
Perl6 for-beginners
 

Mehr von Paolo Marcatili

Mehr von Paolo Marcatili (6)

Master perl io_2011
Master perl io_2011Master perl io_2011
Master perl io_2011
 
Master datatypes 2011
Master datatypes 2011Master datatypes 2011
Master datatypes 2011
 
Master unix 2011
Master unix 2011Master unix 2011
Master unix 2011
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Perl Io Master
Perl Io MasterPerl Io Master
Perl Io Master
 
Unix Master
Unix MasterUnix Master
Unix Master
 

Kürzlich hochgeladen

Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改
文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改
文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改atducpo
 
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdfBreath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdfJess Walker
 
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改atducpo
 
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...anilsa9823
 
Lilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxLilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxABMWeaklings
 
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual serviceanilsa9823
 
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...ur8mqw8e
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfpastor83
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushShivain97
 
Independent Escorts in Lucknow (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...
Independent Escorts in Lucknow  (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...Independent Escorts in Lucknow  (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...
Independent Escorts in Lucknow (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...gurkirankumar98700
 
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceanilsa9823
 
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...Leko Durda
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girlsPooja Nehwal
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...PsychicRuben LoveSpells
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,dollysharma2066
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot AndCall Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot AndPooja Nehwal
 
Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666nishakur201
 

Kürzlich hochgeladen (20)

Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Kalyan Vihar Delhi 💯 Call Us 🔝8264348440🔝
 
文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改
文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改
文凭办理《原版美国USU学位证书》犹他州立大学毕业证制作成绩单修改
 
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdfBreath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
Breath, Brain & Beyond_A Holistic Approach to Peak Performance.pdf
 
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
办理国外毕业证学位证《原版美国montana文凭》蒙大拿州立大学毕业证制作成绩单修改
 
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
Lucknow 💋 High Class Call Girls Lucknow 10k @ I'm VIP Independent Escorts Gir...
 
Lilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptxLilac Illustrated Social Psychology Presentation.pptx
Lilac Illustrated Social Psychology Presentation.pptx
 
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Rajajipuram Lucknow best sexual service
 
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
《塔夫斯大学毕业证成绩单购买》做Tufts文凭毕业证成绩单/伪造美国假文凭假毕业证书图片Q微信741003700《塔夫斯大学毕业证购买》《Tufts毕业文...
 
LC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdfLC_YouSaidYes_NewBelieverBookletDone.pdf
LC_YouSaidYes_NewBelieverBookletDone.pdf
 
The Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by MindbrushThe Selfspace Journal Preview by Mindbrush
The Selfspace Journal Preview by Mindbrush
 
escort service sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974
escort service  sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974escort service  sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974
escort service sasti (*~Call Girls in Paschim Vihar Metro❤️9953056974
 
Independent Escorts in Lucknow (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...
Independent Escorts in Lucknow  (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...Independent Escorts in Lucknow  (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...
Independent Escorts in Lucknow (Adult Only) 👩🏽‍❤️‍💋‍👩🏼 8923113531 ♛ Escort S...
 
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual serviceCALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
CALL ON ➥8923113531 🔝Call Girls Mahanagar Lucknow best sexual service
 
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
Reinventing Corporate Philanthropy_ Strategies for Meaningful Impact by Leko ...
 
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
9892124323, Call Girls in mumbai, Vashi Call Girls , Kurla Call girls
 
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
$ Love Spells^ 💎 (310) 882-6330 in West Virginia, WV | Psychic Reading Best B...
 
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
8377087607 Full Enjoy @24/7-CLEAN-Call Girls In Chhatarpur,
 
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Tingre Nagar ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot AndCall Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
Call Girls In Andheri East Call US Pooja📞 9892124323 Book Hot And
 
Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666Call Girls Anjuna beach Mariott Resort ₰8588052666
Call Girls Anjuna beach Mariott Resort ₰8588052666
 

Regexp Master

  • 1. Parsing a File with Perl Regexp, substr and oneliners Paolo Marcatili - Programmazione 09-10
  • 2. Agenda Today we will see how to > Extract information from a file > Substr and regexp We already know how to use: > Scalar variables $ and arrays @ > If, for, while, open, print, close… 2 Paolo Marcatili - Programmazione 09-10
  • 3. Task Today Paolo Marcatili - Programmazione 09-10
  • 4. Protein Structures 1st task: > Open a PDB file > Operate a symmetry transformation > Extract data from file header 4 Paolo Marcatili - Programmazione 09-10
  • 5. Zinc Finger 2nd task: > Open a fasta file > Find all occurencies of Zinc Fingers (homework?) 5 Paolo Marcatili - Programmazione 09-10
  • 6. Parsing Paolo Marcatili - Programmazione 09-10
  • 7. Rationale Biological data -> human readable files If you can read it, Perl can read it as well *BUT* It can be tricky 7 Paolo Marcatili - Programmazione 09-10
  • 8. Parsing flow-chart Open the file For each line{ look for “grammar” and store data } Close file Use data 8 Paolo Marcatili - Programmazione 09-10
  • 9. Substr Paolo Marcatili - Programmazione 09-10
  • 10. Substr substr($data, start, length) returns a substring from the expression supplied as first argument. 10
  • 11. Substr substr($data, start, length) ^ ^ ^ your string | | start from 0 | you can omit this (you will extract up to the end of string) 11
  • 12. Substr substr($data, start, length) Examples: my $data=“il mattino ha l’oro in bocca”; print substr($data,0) . “n”; #prints all string print substr($data,3,5) . “n”; #prints matti print substr($data,25) . “n”; #prints bocca print substr($data,-5) . “n”; #prints bocca 12
  • 13. Pdb rotation Paolo Marcatili - Programmazione 09-10
  • 14. PDB ATOM 4 O ASP L 1 43.716 -12.235 68.502 1.00 70.05 O ATOM 5 N ILE L 2 44.679 -10.569 69.673 1.00 48.19 N … COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------------- 1 - 6 Record name "ATOM " 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. 18 - 20 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. 31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms 39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms 47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms 55 - 80 Bla Bla Bla (not useful for our purposes) 14
  • 15. Rotation X->Z Y->X ===> rotation of 120° around u=(1,1,1) Z->Y Y X 15
  • 16. Rotation #! /usr/bin/perl -w use strict; open(IG, "<IG.pdb") || die "cannot open IG.pdb:$!"; open(IGR, ">IG_rotated.pdb") || die "cannot open IG_rotated.pdb:$!"; while (my $line=<IG>){ if (substr($line,0,4) eq "ATOM"){ my $X= substr($line,30,8); my $Y= substr($line,38,8); my $Z= substr($line,46,8); print IGR substr($line,0,30).$Z.$X.$Y.substr($line,54); } else{ print IGR $line; } } close IG; close IGR; 16
  • 17. RegExp Paolo Marcatili - Programmazione 09-10
  • 18. Regular Expressions PDB have a “fixed” structures. What if we want to do something like “check for a valid email address”… 18
  • 19. Regular Expressions PDB have a “fixed” structures. What if we want to do something like “check for a valid email address”… 1. There must be some letters or numbers 2. There must be a @ 3. Other letters 4. .something paolo.marcatili@gmail.com is good paolo.marcatili@.com is not good 19
  • 20. Regular Expressions $line =~ m/^[a-z |1-9| .| _]+@[^.]+.[a-z]{2,}$/ WHAAAT??? This means: Check if $line has some chars at the beginning, then @, then some non-points, then a point, then at least two letters …. Ok, let’s start from something simpler :) 20
  • 21. Regular Expressions $line =~ m/^[a-z |1-9| .| _]+@[^.]+.[a-z]{2,}$/ WHAAAT??? This means: Check if $line has some chars at the beginning, then @, then some non-points, then a point, then at least two letters …. Ok, let’s start from something simpler :) 21
  • 22. Regular Expressions $line =~ m/^ATOM/ Line starts with ATOM $line =~ m/^ATOMs+/ Line starts with ATOM, then there are some spaces $line =~ m/^ATOMs+[-|0-9]+/ Line starts with ATOM, then there are some spaces, then there are some digits or - $line =~ m/^ATOMs+-?[0-9]+/ Line starts with ATOM, then there are some spaces, then there can be a minus, then some digits 22
  • 24. PDB Header We want to find %id for L and H chain 24
  • 25. PDB Header We want to find %id for L and H chain $pidL= $1 if ($line=~m/REMARK SUMMARY-ID_GLOB_L:([.|0-9])/); $pidH= $1 if ($line=~m/REMARK SUMMARY-ID_GLOB_H:([.|0-9])/); ONELINER!! cat IG.pdb | perl -ne ‘print “$1n” if ($_=~m/^REMARK SUMMARY-ID_GLOB_([LH]:[.|0-9]+)/);’ 25
  • 26. Zinc Finger Paolo Marcatili - Programmazione 09-10
  • 27. Zinc Finger A zinc finger is a large superfamily of protein domains that can bind to DNA. A zinc finger consists of two antiparallel β strands, and an α helix. The zinc ion is crucial for the stability of this domain type - in the absence of the metal ion the domain unfolds as it is too small to have a hydrophobic core. The consensus sequence of a single finger is: C-X{2-4}-C-X{3}-[LIVMFYWC]-X{8}-H-X{3}-H 27
  • 28. Homework Find all occurencies of ZF motif in zincfinger.fasta Put them in file ZF_motif.fasta e.g. weofjpihouwefghoicacvgnfglapglhtylhyuiui 28
  • 29. Homework Find all occurencies of ZF motif in zincfinger.fasta Put them in file ZF_motif.fasta e.g. Weofjpihouwefghoicacvgnfglapglifhtylhyuiui cacvgnfglapglifhtylh 29