SlideShare a Scribd company logo
1 of 21
Download to read offline
Protein function and bioinformatics



   Outline of talk

       Why do we need bioinformatics?
   ●




       What tools do we need?
   ●




       Case study: The Methanococcoides burtonii genome
   ●




                                        Neil Saunders
                                        76-455
                                        n.saunders@uq.edu.au
                                        www.uq.edu.au/~uqnsaun1/
Protein function and bioinformatics
            Why do we need bioinformatics?




        Rapid increase in data due to genomics
    ●


        Too much data to characterise genes/proteins individually
    ●


        Bioinformatics = “smart use” of information
    ●


        Ideally, computational and experimental biology are partners
    ●
Protein function and bioinformatics
    The ideal computational – wet lab cycle


         Biological system                   Biological objects




            Experiments                    Computational objects




        Biological inferences                    Analyses




      Bioinformatics is about helping biologists solve problems
Protein function and bioinformatics
              Introduction to genomics


                                 Genomes Online database
                                   www.genomesonline.org
                                 ●




                                 Published/complete     413
                                 Bacteria in progress   977
                                 Eukarya in progress    629
                                 Archaea in progress     57
                                 Metagenomes             56




   10-50% of genes in a new genome may have no known function
Protein function and bioinformatics
        Computational skills for genomics



      "So what new skills will postdocs need to ensure that 
      they don't become science relics? The answer is math,
      statistics, and knowledge of a scripting language for 
      computers."

      ­The Scientist, "Bioinformatics Knowledge Vital to Careers"
      Volume 16 | Issue 17 | 53 | Sep. 2, 2002
      www.the­scientist.com
Protein function and bioinformatics
                    Using WWW resources

       The best web resources provide:
   ●


            - useful tools for analysis
            - integrated data from many sources

   Good examples
     InterPro database          http://www.ebi.ac.uk/interpro/
   ●


     Expasy                     http://au.expasy.org
   ●


     UniProt                    http://www.uniprot.org/
   ●


     CBS Prediction servers     http://www.cbs.dtu.dk/services/
   ●


     IMG Database               http://img.jgi.doe.gov/
   ●




   But...
     Web services no good for genome-scale analyses
   ●


     Usually limits to data input (with good reason)
   ●




   Nucleic Acids Research publishes annual database and
   web servers editions:       http://nar.oxfordjournals.org/
Protein function and bioinformatics
    Computational infrastructure for genomics

    Biological                                    Analysis
     objects                                     (limitless)

      Genome                                  Sequence analysis

     Assembly                                  Regulatory motifs
                        Computational
                          objects
  Gene sequence                               Structural modeling

  Protein sequence                                Phylogeny

  Protein structure                         Comparative genomics

      Pathway                               Pathway reconstruction


          Key points
            Appropriate hardware: workstation v. cluster
          ●


            Linux Linux Linux!
          ●


            Freely-available, open source software is all you need
          ●


            Toolkits and libraries (e.g. BioPerl) to build your own solutions
          ●


            Philosophy of “many small tools plus glue” - scripting language
          ●


            Website + database skills - sharing
          ●
Protein function and bioinformatics
    BioPerl: a life sciences computational toolkit
    Website: http://www.bioperl.org
●



    A collection of Perl modules for biology
●



    Handles many common tasks in sequence/structure analysis, e.g.
●


     - read/write various sequence formats
     - run BLAST and parse the output
     - read/write/analyse sequence alignments
     - access local or remote databases
Protein function and bioinformatics
           Annotation (or not) using BLAST
     BLAST: Basic Local Alignment and Search Tool
      Is useful for finding similar sequences quickly
    ●


      Not sensitive – less useful for weakly-similar sequences
    ●


      Not much good at all for annotation
    ●




    Why not?
      “Hypothetical”: the database sequence is unique
    ●


      “Conserved hypothetical”: several hits but no known function
    ●


      Multi-domain proteins
    ●


      BLAST database contains incorrect annotations
    ●


      Annotation is at the whim of whoever deposited the sequence
    ●




  Classic example: IMPDH
  Wu et al. (2003)
  Comp. Biol. Chem. 27: 37-47
Protein function and bioinformatics
     A better annotation tool: InterProScan
        IPRScan is a tool to search the InterPro database
    ●


        It uses sequence signature profiles – more sensitive than BLAST
    ●


        Integrates the search results from multiple databases
    ●


        A good first step to characterise a new sequence
    ●


        Available as standalone package and runs on clusters
    ●
Protein function and bioinformatics
     Structure prediction: threading and modelling
    The structure of a protein often explains how it functions
●


    However, structural determination is laborious, difficult and time-consuming
●


    Modelling can be useful in cases sequence is similar to a known structure
●




       Threading                                    Homology modelling




    Fit query sequence to fold database   Assume similar sequence = similar structure
Protein function and bioinformatics
         Some modelling tools and databases

        SwissModel:   http://swissmodel.expasy.org/
    ●



        MODELLER:     http://www.salilab.org/modeller/
    ●



        PROSPECT:     http://compbio.ornl.gov/structure/prospect2/
    ●



        ModBase:      http://modbase.compbio.ucsf.edu/
    ●
Protein function and bioinformatics
                Introduction to M. burtonii




  M. burtonii      Ace Lake, Vestfold Hills               The Archaea




                Methanococcoides burtonii
                  Isolated from Ace Lake, Antarctica (1-2 °C)
                ●


                  Grows optimally at 23 °C
                ●


                  Is an archaeon
                ●


                  Is a psychrophilic methanogen
                ●
Protein function and bioinformatics
            The M. burtonii genome




                           What features of this genome
                           are related to cold adaptation?
Protein function and bioinformatics
     Discovery of CSP-like proteins in M. burtonii




   CSP = cold shock protein
 ●


   Expressed in bacteria at low temperature
 ●


   Functions as RNA chaperone to facilitate
 ●


 transcription at low temperature
   Present in some Archaea, including
 ●


 M. frigidum, but not M. burtonii
Protein function and bioinformatics
  Discovery of CSP-like proteins in M. burtonii

   Protein sequences




      PROSPECT
  thread v. CSD folds



      MODELLER                              d1sro__        M. burtonii YP_564958
    structural model




                Both proteins are expressed (proteomics)
            ●


                Located in a putative exosome/proteasome superoperon
            ●


                This is consistent with their proposed function
            ●
Protein function and bioinformatics
   Integrating information: structural RNA study

                                  stems
% GC




                                  all bases




                   OGT (°C)

Is tRNA GC content related to OGT?            Dihydrouridine in M. burtonii
  tRNAScan find tRNA in genomes                 tRNA contains > 1 hU/tRNA
●                                             ●


  GC content calculated using Perl scripts      Maintains flexibility at low temperature
●                                             ●


                                                DUS gene identified using iprscan
                                              ●
Protein function and bioinformatics
       Pyrrolysine: a problem for bioinformatics
                               Proteomics used to identify expressed proteins
                           ●


                               One is trimethylamine methyltransferase (TMA-MT)
                           ●


                               It shows post-translational modification
                           ●


                               It also maps to 2 ORFs in the genome sequence
                           ●




     The ORFs are actually one gene with a read-through UAG codon
 ●


     Pyrrolysine is incorporated at the UAG
 ●


     This is the 22nd genetically-encoded amino acid
 ●
Protein function and bioinformatics
    Statistical analysis of protein properties

          Archaea
        27 organisms
        62 338 ORFs    Amino acid frequency
                             (bioperl)
         Bacteria
       52 organisms
       165 192 ORFs
                             data matrix
                         organisms (rows) x
                       composition (columns)


                                PCA
                       principal components
                         (R stats package)
Protein function and bioinformatics
 Principal components analysis of composition




        2 components explain most of the variation in amino acid composition
    ●


        PC1 correlates with genome GC content
    ●


        PC2 correlates with optimum growth temperature
    ●


        The psychrophilic archaea are distinguished by PC2 score
    ●


        Their proteins contain:  more Gln, Ser, Thr, His, Asp
    ●


                                 less Leu, Trp and Glu
Protein function and bioinformatics
                               Conclusions

    Computational biology and bioinformatics are essential to modern biology
●



    Many tools are available to annotate proteins: web-based
●



                                                    standalone

    Without experiments, bioinformatics is just predictions
●




    Data integration is our biggest problem
●




                                                  www.uq.edu.au/~uqnsaun1/

More Related Content

What's hot

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
hemantbreeder
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
Amit Ruchi Yadav
 

What's hot (20)

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Prosite
PrositeProsite
Prosite
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
Composite protein databases
Composite protein databasesComposite protein databases
Composite protein databases
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Structural databases
Structural databases Structural databases
Structural databases
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
UniProt
UniProtUniProt
UniProt
 
Protein database
Protein databaseProtein database
Protein database
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 

Viewers also liked (6)

Protein classification
Protein classificationProtein classification
Protein classification
 
4.3 proteins
4.3   proteins4.3   proteins
4.3 proteins
 
Protein
ProteinProtein
Protein
 
Classification and properties of protein
Classification and properties of proteinClassification and properties of protein
Classification and properties of protein
 
Protein structure: details
Protein structure: detailsProtein structure: details
Protein structure: details
 
Protein Structure & Function
Protein Structure & FunctionProtein Structure & Function
Protein Structure & Function
 

Similar to Protein function and bioinformatics

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
nadeem akhter
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
Wagied Davids
 

Similar to Protein function and bioinformatics (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 
Protein database
Protein databaseProtein database
Protein database
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Thesis def
Thesis defThesis def
Thesis def
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 database
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 

More from Neil Saunders

More from Neil Saunders (11)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Protein function and bioinformatics

  • 1. Protein function and bioinformatics Outline of talk Why do we need bioinformatics? ● What tools do we need? ● Case study: The Methanococcoides burtonii genome ● Neil Saunders 76-455 n.saunders@uq.edu.au www.uq.edu.au/~uqnsaun1/
  • 2. Protein function and bioinformatics Why do we need bioinformatics? Rapid increase in data due to genomics ● Too much data to characterise genes/proteins individually ● Bioinformatics = “smart use” of information ● Ideally, computational and experimental biology are partners ●
  • 3. Protein function and bioinformatics The ideal computational – wet lab cycle Biological system Biological objects Experiments Computational objects Biological inferences Analyses Bioinformatics is about helping biologists solve problems
  • 4. Protein function and bioinformatics Introduction to genomics Genomes Online database www.genomesonline.org ● Published/complete 413 Bacteria in progress 977 Eukarya in progress 629 Archaea in progress 57 Metagenomes 56 10-50% of genes in a new genome may have no known function
  • 5. Protein function and bioinformatics Computational skills for genomics "So what new skills will postdocs need to ensure that  they don't become science relics? The answer is math, statistics, and knowledge of a scripting language for  computers." ­The Scientist, "Bioinformatics Knowledge Vital to Careers" Volume 16 | Issue 17 | 53 | Sep. 2, 2002 www.the­scientist.com
  • 6. Protein function and bioinformatics Using WWW resources The best web resources provide: ● - useful tools for analysis - integrated data from many sources Good examples InterPro database http://www.ebi.ac.uk/interpro/ ● Expasy http://au.expasy.org ● UniProt http://www.uniprot.org/ ● CBS Prediction servers http://www.cbs.dtu.dk/services/ ● IMG Database http://img.jgi.doe.gov/ ● But... Web services no good for genome-scale analyses ● Usually limits to data input (with good reason) ● Nucleic Acids Research publishes annual database and web servers editions: http://nar.oxfordjournals.org/
  • 7. Protein function and bioinformatics Computational infrastructure for genomics Biological Analysis objects (limitless) Genome Sequence analysis Assembly Regulatory motifs Computational objects Gene sequence Structural modeling Protein sequence Phylogeny Protein structure Comparative genomics Pathway Pathway reconstruction Key points Appropriate hardware: workstation v. cluster ● Linux Linux Linux! ● Freely-available, open source software is all you need ● Toolkits and libraries (e.g. BioPerl) to build your own solutions ● Philosophy of “many small tools plus glue” - scripting language ● Website + database skills - sharing ●
  • 8. Protein function and bioinformatics BioPerl: a life sciences computational toolkit Website: http://www.bioperl.org ● A collection of Perl modules for biology ● Handles many common tasks in sequence/structure analysis, e.g. ● - read/write various sequence formats - run BLAST and parse the output - read/write/analyse sequence alignments - access local or remote databases
  • 9. Protein function and bioinformatics Annotation (or not) using BLAST BLAST: Basic Local Alignment and Search Tool Is useful for finding similar sequences quickly ● Not sensitive – less useful for weakly-similar sequences ● Not much good at all for annotation ● Why not? “Hypothetical”: the database sequence is unique ● “Conserved hypothetical”: several hits but no known function ● Multi-domain proteins ● BLAST database contains incorrect annotations ● Annotation is at the whim of whoever deposited the sequence ● Classic example: IMPDH Wu et al. (2003) Comp. Biol. Chem. 27: 37-47
  • 10. Protein function and bioinformatics A better annotation tool: InterProScan IPRScan is a tool to search the InterPro database ● It uses sequence signature profiles – more sensitive than BLAST ● Integrates the search results from multiple databases ● A good first step to characterise a new sequence ● Available as standalone package and runs on clusters ●
  • 11. Protein function and bioinformatics Structure prediction: threading and modelling The structure of a protein often explains how it functions ● However, structural determination is laborious, difficult and time-consuming ● Modelling can be useful in cases sequence is similar to a known structure ● Threading Homology modelling Fit query sequence to fold database Assume similar sequence = similar structure
  • 12. Protein function and bioinformatics Some modelling tools and databases SwissModel: http://swissmodel.expasy.org/ ● MODELLER: http://www.salilab.org/modeller/ ● PROSPECT: http://compbio.ornl.gov/structure/prospect2/ ● ModBase: http://modbase.compbio.ucsf.edu/ ●
  • 13. Protein function and bioinformatics Introduction to M. burtonii M. burtonii Ace Lake, Vestfold Hills The Archaea Methanococcoides burtonii Isolated from Ace Lake, Antarctica (1-2 °C) ● Grows optimally at 23 °C ● Is an archaeon ● Is a psychrophilic methanogen ●
  • 14. Protein function and bioinformatics The M. burtonii genome What features of this genome are related to cold adaptation?
  • 15. Protein function and bioinformatics Discovery of CSP-like proteins in M. burtonii CSP = cold shock protein ● Expressed in bacteria at low temperature ● Functions as RNA chaperone to facilitate ● transcription at low temperature Present in some Archaea, including ● M. frigidum, but not M. burtonii
  • 16. Protein function and bioinformatics Discovery of CSP-like proteins in M. burtonii Protein sequences PROSPECT thread v. CSD folds MODELLER d1sro__ M. burtonii YP_564958 structural model Both proteins are expressed (proteomics) ● Located in a putative exosome/proteasome superoperon ● This is consistent with their proposed function ●
  • 17. Protein function and bioinformatics Integrating information: structural RNA study stems % GC all bases OGT (°C) Is tRNA GC content related to OGT? Dihydrouridine in M. burtonii tRNAScan find tRNA in genomes tRNA contains > 1 hU/tRNA ● ● GC content calculated using Perl scripts Maintains flexibility at low temperature ● ● DUS gene identified using iprscan ●
  • 18. Protein function and bioinformatics Pyrrolysine: a problem for bioinformatics Proteomics used to identify expressed proteins ● One is trimethylamine methyltransferase (TMA-MT) ● It shows post-translational modification ● It also maps to 2 ORFs in the genome sequence ● The ORFs are actually one gene with a read-through UAG codon ● Pyrrolysine is incorporated at the UAG ● This is the 22nd genetically-encoded amino acid ●
  • 19. Protein function and bioinformatics Statistical analysis of protein properties Archaea 27 organisms 62 338 ORFs Amino acid frequency (bioperl) Bacteria 52 organisms 165 192 ORFs data matrix organisms (rows) x composition (columns) PCA principal components (R stats package)
  • 20. Protein function and bioinformatics Principal components analysis of composition 2 components explain most of the variation in amino acid composition ● PC1 correlates with genome GC content ● PC2 correlates with optimum growth temperature ● The psychrophilic archaea are distinguished by PC2 score ● Their proteins contain: more Gln, Ser, Thr, His, Asp ● less Leu, Trp and Glu
  • 21. Protein function and bioinformatics Conclusions Computational biology and bioinformatics are essential to modern biology ● Many tools are available to annotate proteins: web-based ● standalone Without experiments, bioinformatics is just predictions ● Data integration is our biggest problem ● www.uq.edu.au/~uqnsaun1/