SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Comparative genomics
in eukaryotes
Gene family analysis



  Klaas Vandepoele, PhD


Professor Ghent University
Comparative & Integrative Genomics
VIB – Ghent University, Belgium


                 http://www.bits.vib.be
Workflow




2
Applications of clustering the
        proteome(s)
       Gene families form the basis for the evolutionary
        (or phylogenetic) analysis of
          Detection of orthologs and paralogs
          Gene duplication, family expansions,
           pseudogene formation and gene loss
          Species taxonomies
          Horizontal Gene Transfer (HGT)
          Evolution of gene structure
             • Introns
             • Protein domain organisation &
               (re)arrangements
          Base composition and codon usage

3
I. Structural annotation: genome-
        wide versus family-wise
       Rationale family-wise annotation
           Since every gene has different (sequence)
            characteristics and different genes evolve at
            different rates, using these characteristics to
            determine homologous gene models will
            improve the overall structural annotation
            quality
       Properties:
           Slow & nearly-manual procedure
           High-quality gene models revealing biological
            novel findings

4
Workflow family-wise annotation
            procedure

  Collecting experi-        MSA experimental                          Family
                                                 HMMbuild
mental representatives       representatives                        HMM profile

              EST/cDNA


                                      BLAST                         Species X
                                                                    proteome
           Protein motifs                      Ab initio gene prediction

      Correction gene model               Putative
                                                                    HMMsearch
                                         Homologs
        Classification using
        Phylogenetic trees

5   Detailed characterization                                    http://hmmer.janelia.org/
Experimental representatives


InterProScan




PFAM HMM logo
     Clustalw + JalView




6
BLAST / HMMsearch


    1. Use multiple sequence
       alignment to create HMM profile
    2. Use HMM profile to search for
       similar proteins




7
Representatives + putative homologs

                                                                        BioEdit Sequence Editor




Suffix finalcds indicates corrected gene model compared to the original gene model
generate by the ab-initio gene prediction


             Multiple sequence alignments assist in the detection and
              correction of errors in the structural annotation (missed exon)
8
Representatives + putative homologs




Suffix finalcds indicates corrected gene model compared to the original gene model
generate by the ab-initio gene prediction


             Multiple sequence alignments assist in the detection of errors
              in the structural annotation (false first exon)
9
Examples of family-specific protein
         motifs




        B-type cyclins have HxKF signature
        Cyclin destruction boxes (B1-type cyclin R-[AV]LGDIGN)

10
Examples of family-specific protein
     Arabidopsis
     Rice
                        motifs




                      D-type cyclins contain LxCxE Rb-binding motif
                      Low conservation of phylogenetic signal at primary sequence level
                      General rules are rarely general: exceptions (i.e. missing protein
                       motifs) are frequent and might indicate functional divergence
11
Classification using phylogenetic
                tree construction
        A- and B-type cyclins
          are mitotic cyclins


                                                                           D-type cyclins are
                                                                               G1-specific



     H-type cyclins regulate activity
       of CDK-activating kinases




         • The complexity of the cyclin gene family appears to be higher in plants than in
         mammals
         • Whether there is functional redundancy within A- and B-type cyclins or different
         regulation (and expression) of some cyclin subclasses remains to be analyzed
12
Unraveling functional divergence using
     Genes   large-scale expression compendia




13
                           Plant tissues
Unraveling functional divergence using
             large-scale expression compendia


                                      A-type cyclin




                                      B-type cyclin
     Genes




                                      D-type cyclin



14
                      Plant tissues                   Genevestigator
II. Orthology & paralogy

        A major goal of sequence analysis is evolutionary
         reconstruction. It is critical to distinguish between two
         principal types of homologous relationships, which differ
         in their evolutionary history and functional implications.

        Orthologs, defined as homologous genes evolved
         through speciation (~evolutionary counterparts derived
         from a single ancestral gene in the last common ancestor
         of the given two species)

        Paralogs, which are homologous genes evolved through
         duplication within the same (perhaps ancestral) genome.

        These definitions were first introduced by Fitch (1970)

15
Orthology & paralogy inference


     Organism phylogeny        Gene phylogenies
     (species tree)                gene duplication
                                                              a1
                    A

                                                              b1

                    B                                         c1
                                          a1
                                               b)             a2
                                          a2
                    C                                         b2
                                          b1
                                                              c2
                          a)              b2
       speciation                                     Outparalogs

16                        Inparalogs      c1
In- and outparalogy




17   Sonnhammer & Koonin: Orthology, paralogy and proposed classification for paralog subtypes
Tree reconciliation

        The automatic detection of speciation and duplication
         events using a species tree and gene family tree




18
III. Types of proteome analysis




19
The evolution of multi-domain
     proteins




20
Interpreting the output of an all-
       against-all similarity search




     Metrics for sequence similarity:
     • E-value, Bit score or percent identity
21   • alignment coverage
Clustering of similar sequences




             Proteins = vertices ~ nodes
        Sequence similarity relationship = edges
22
Clustering of similar sequences




23
Advanced methods for protein
         (orthology) clustering
        Sequence similarity-based
            COG (RBH)         [Tatusov 1997]
            InParanoid        [Remm et al., 2001]
            Tribe-MCL         [Van Dongen 2000]
            OrthoMCL          [Li et al., 2003]

        Phylogenetic tree-based
            PhylomeDB         [Huerta-Cepas et al., 2007]
            Ensembl Compara   [Vilella et al., 2008]


24
Overview methodologies



     BBH
                               Inparanoid



            COG




                                 species overlap




25                                                 Gabaldon, 2008
              reconciliation
IV. Resources




26
Resources (bis)

        Ensembl (Vertebrates)
        EnsembGenomes (Metazoa, Protists,
         Fungi, Plants & Bacteria)

        OrthoMCLDB 5 (150 genomes)
        YGOB (>15 Fungi)




27
Hands-on

        Goal: identify and characterize gene family
         members encoding for talin 2 (TLN2)

         1.   Select Query gene
         2.   Retrieve homo/orthologs
         3.   Create multiple sequence alignment
         4.   Identify conserved positions
         5.   Create phylogenetic tree and identify
              ortho/paralogous genes



28

Weitere ähnliche Inhalte

Was ist angesagt?

Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...
Promila Sheoran
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease gene
Dineshk117
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
Nikhil Aggarwal
 

Was ist angesagt? (20)

Mapping population ppt
Mapping population pptMapping population ppt
Mapping population ppt
 
Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2
 
Gene mapping & its role in evolution
Gene mapping & its role in evolutionGene mapping & its role in evolution
Gene mapping & its role in evolution
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Gene mapping and gene cloning
Gene mapping and gene cloningGene mapping and gene cloning
Gene mapping and gene cloning
 
Pradeep.ii
Pradeep.iiPradeep.ii
Pradeep.ii
 
Cisgenesis and Intragenesis
Cisgenesis and IntragenesisCisgenesis and Intragenesis
Cisgenesis and Intragenesis
 
Comparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesComparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organelles
 
Mapping
MappingMapping
Mapping
 
3.1 genes (2)
3.1 genes (2)3.1 genes (2)
3.1 genes (2)
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
 
genetic linkage and gene mapping
genetic linkage and gene mappinggenetic linkage and gene mapping
genetic linkage and gene mapping
 
Linkage mapping and QTL analysis_Lecture
Linkage mapping and QTL analysis_LectureLinkage mapping and QTL analysis_Lecture
Linkage mapping and QTL analysis_Lecture
 
Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease gene
 
Mapping the genome of bacteria
Mapping the genome of bacteriaMapping the genome of bacteria
Mapping the genome of bacteria
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 

Andere mochten auch

B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
Rai University
 

Andere mochten auch (20)

BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Exchange your knowledge on plant gene families
Exchange your knowledge on plant gene familiesExchange your knowledge on plant gene families
Exchange your knowledge on plant gene families
 
Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
IntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotationsIntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotations
 
Central dogma of dna
Central dogma of dnaCentral dogma of dna
Central dogma of dna
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
 
SCoT and RAPD
SCoT and RAPDSCoT and RAPD
SCoT and RAPD
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Ähnlich wie BITS - Comparative genomics: gene family analysis

Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomes
Klaas Vandepoele
 
HHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 Bipolar
Hana (Hoang) Willner
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
CSCJournals
 
13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt
soniiKolhi
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Klaas Vandepoele
 

Ähnlich wie BITS - Comparative genomics: gene family analysis (20)

Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomes
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Life science grade 12
Life science grade 12Life science grade 12
Life science grade 12
 
HHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 Bipolar
 
Expression systems
Expression systemsExpression systems
Expression systems
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt
 
13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt
 
13 miller-chap-5a-lecture
13 miller-chap-5a-lecture13 miller-chap-5a-lecture
13 miller-chap-5a-lecture
 
miller-chap-5a
 miller-chap-5a miller-chap-5a
miller-chap-5a
 
Microbiology Assignment Help
Microbiology Assignment HelpMicrobiology Assignment Help
Microbiology Assignment Help
 
Asnmnt 4
Asnmnt 4Asnmnt 4
Asnmnt 4
 
4_BCOR12_4develop_2008.ppt
4_BCOR12_4develop_2008.ppt4_BCOR12_4develop_2008.ppt
4_BCOR12_4develop_2008.ppt
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
THE human genome
THE human genomeTHE human genome
THE human genome
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 
2014 intro-genetics
2014 intro-genetics2014 intro-genetics
2014 intro-genetics
 

Mehr von BITS

Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
BITS
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
BITS
 

Mehr von BITS (19)

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generation
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Basics statistics
Basics statistics Basics statistics
Basics statistics
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

BITS - Comparative genomics: gene family analysis

  • 1. Comparative genomics in eukaryotes Gene family analysis Klaas Vandepoele, PhD Professor Ghent University Comparative & Integrative Genomics VIB – Ghent University, Belgium http://www.bits.vib.be
  • 3. Applications of clustering the proteome(s)  Gene families form the basis for the evolutionary (or phylogenetic) analysis of  Detection of orthologs and paralogs  Gene duplication, family expansions, pseudogene formation and gene loss  Species taxonomies  Horizontal Gene Transfer (HGT)  Evolution of gene structure • Introns • Protein domain organisation & (re)arrangements  Base composition and codon usage 3
  • 4. I. Structural annotation: genome- wide versus family-wise  Rationale family-wise annotation  Since every gene has different (sequence) characteristics and different genes evolve at different rates, using these characteristics to determine homologous gene models will improve the overall structural annotation quality  Properties:  Slow & nearly-manual procedure  High-quality gene models revealing biological novel findings 4
  • 5. Workflow family-wise annotation procedure Collecting experi- MSA experimental Family HMMbuild mental representatives representatives HMM profile EST/cDNA BLAST Species X proteome Protein motifs Ab initio gene prediction Correction gene model Putative HMMsearch Homologs Classification using Phylogenetic trees 5 Detailed characterization http://hmmer.janelia.org/
  • 7. BLAST / HMMsearch 1. Use multiple sequence alignment to create HMM profile 2. Use HMM profile to search for similar proteins 7
  • 8. Representatives + putative homologs BioEdit Sequence Editor Suffix finalcds indicates corrected gene model compared to the original gene model generate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection and correction of errors in the structural annotation (missed exon) 8
  • 9. Representatives + putative homologs Suffix finalcds indicates corrected gene model compared to the original gene model generate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection of errors in the structural annotation (false first exon) 9
  • 10. Examples of family-specific protein motifs  B-type cyclins have HxKF signature  Cyclin destruction boxes (B1-type cyclin R-[AV]LGDIGN) 10
  • 11. Examples of family-specific protein Arabidopsis Rice motifs  D-type cyclins contain LxCxE Rb-binding motif  Low conservation of phylogenetic signal at primary sequence level  General rules are rarely general: exceptions (i.e. missing protein motifs) are frequent and might indicate functional divergence 11
  • 12. Classification using phylogenetic tree construction A- and B-type cyclins are mitotic cyclins D-type cyclins are G1-specific H-type cyclins regulate activity of CDK-activating kinases • The complexity of the cyclin gene family appears to be higher in plants than in mammals • Whether there is functional redundancy within A- and B-type cyclins or different regulation (and expression) of some cyclin subclasses remains to be analyzed 12
  • 13. Unraveling functional divergence using Genes large-scale expression compendia 13 Plant tissues
  • 14. Unraveling functional divergence using large-scale expression compendia A-type cyclin B-type cyclin Genes D-type cyclin 14 Plant tissues Genevestigator
  • 15. II. Orthology & paralogy  A major goal of sequence analysis is evolutionary reconstruction. It is critical to distinguish between two principal types of homologous relationships, which differ in their evolutionary history and functional implications.  Orthologs, defined as homologous genes evolved through speciation (~evolutionary counterparts derived from a single ancestral gene in the last common ancestor of the given two species)  Paralogs, which are homologous genes evolved through duplication within the same (perhaps ancestral) genome.  These definitions were first introduced by Fitch (1970) 15
  • 16. Orthology & paralogy inference Organism phylogeny Gene phylogenies (species tree) gene duplication a1 A b1 B c1 a1 b) a2 a2 C b2 b1 c2 a) b2 speciation Outparalogs 16 Inparalogs c1
  • 17. In- and outparalogy 17 Sonnhammer & Koonin: Orthology, paralogy and proposed classification for paralog subtypes
  • 18. Tree reconciliation  The automatic detection of speciation and duplication events using a species tree and gene family tree 18
  • 19. III. Types of proteome analysis 19
  • 20. The evolution of multi-domain proteins 20
  • 21. Interpreting the output of an all- against-all similarity search Metrics for sequence similarity: • E-value, Bit score or percent identity 21 • alignment coverage
  • 22. Clustering of similar sequences Proteins = vertices ~ nodes Sequence similarity relationship = edges 22
  • 23. Clustering of similar sequences 23
  • 24. Advanced methods for protein (orthology) clustering  Sequence similarity-based  COG (RBH) [Tatusov 1997]  InParanoid [Remm et al., 2001]  Tribe-MCL [Van Dongen 2000]  OrthoMCL [Li et al., 2003]  Phylogenetic tree-based  PhylomeDB [Huerta-Cepas et al., 2007]  Ensembl Compara [Vilella et al., 2008] 24
  • 25. Overview methodologies BBH Inparanoid COG species overlap 25 Gabaldon, 2008 reconciliation
  • 27. Resources (bis)  Ensembl (Vertebrates)  EnsembGenomes (Metazoa, Protists, Fungi, Plants & Bacteria)  OrthoMCLDB 5 (150 genomes)  YGOB (>15 Fungi) 27
  • 28. Hands-on  Goal: identify and characterize gene family members encoding for talin 2 (TLN2) 1. Select Query gene 2. Retrieve homo/orthologs 3. Create multiple sequence alignment 4. Identify conserved positions 5. Create phylogenetic tree and identify ortho/paralogous genes 28