Dissecting plant genomes with the PLAZA comparative genomics platform.
Van Bel M, Proost S, Wischnitzki E, Movahedi S, Scheerlinck C, Van de Peer Y, Vandepoele K.
Plant Physiol. 2012 Feb;158(2):590-600.
With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase.
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
1. Dissecting plant genomes with the
PLAZA 2.5 comparative genomics
platform
Integrating sequence orthology with expression data to
predict functional homologs across plant species
Klaas Vandepoele
PLANT GENOMES & BIOTECHNOLOGY: FROM GENES TO
NETWORKS (CSHL, 1 December 2011)
Comparative & Integrative Genomics
VIB – Ghent University, Belgium
2. Genome sequencing in different plant clades
1.0 2.0 2.5
Green algae Chlorophyceae C. reinhardtii V. carteri
Prasinophyceae O. lucimarinus Micromomas
O. tauri
Club-mosses P. patens S. moellondorffii
Mosses
Monocots
O. sativa japonica O. sativa indica
S. bicolor Z. mays
B. distachon
Basal Eudicots
V. vinifera L. japonics, M. truncatula, G. max
Eudicots
Angiosperms P. trichocarpa M. esculenta, R. communis, F. vesca
Rosids
C. papaya M. domestica, T. cacao
A. thaliana A. lyrata
Asterids
9 genomes 25 genomes
2
3. Exploiting cross-species genome information
Centralized infrastructure
Detailed gene catalog per species
Structural annotation (gene models, UTRs)
Functional annotation (experimental, sequence-based)
Intuitive & advanced data mining tools for non-expert
users
• Gene function
• Genome organization
• Pathway evolution
• Data manipulation
Computational resources
3
4. PLAZA, a resource for plant comparative genomics
http://bioinformatics.psb.ugent.be/plaza/
4
5. Gene family analysis
Genome analysis
20 tools available!
More information? Check Help – Documentation
• Data content & Construction
• Tutorial & FAQ Proost , Van Bel, … & Vandepoele, Plant Cell 2009
5
7. Comparative sequence analysis
Homology = shared ancestral common origin
Inferred based on
sequence similarity (BLAST)
similar (multi-)domain composition & organization
So sequence similarity means homology? No, it depends!
JGI
TAIR All-against-all sequence BLASTCLUST
similarity search (BLAST) Tribe-MCL
EMBL
Inparanoid
OrthoMCL
C/KOG
7
8. Gene family Similarity heatmap, Multiple
sequence alignment & Phylogenetic trees
>780K proteins
from 25 species
18K trees incl. 420K 22K multi-species gene families
annotated tree nodes covering 83% of the total proteome
8
17. Workbench data import
Create a custom gene set (~experiment) using gene identifiers or
BLAST
External/internal gene IDs (e.g. AN3, AT5G28640, GRMZM2G180246_T01)
BLAST interface can be used to map sequence data from a non-model
species to a reference species present in PLAZA
A toolbox is available to analyze user-defined gene sets
Microarray
transcript profiling WGMapping Gene Families
EST Functional
PLAZA GO enrichment
sequencing annotations
Workbench
Sequence Tandem/block
retrieval duplicates
Genes reported
in Suppl. data Orthologs Export data…
17
19. Detection of orthologous plant genes
Meaning…
Orthology = genes derived from a common ancestor
in different species
Functionally conserved homologs = genes in different
species having similar functions
Due to gene duplication events , complex many-to-many
gene orthology is frequently observed
Functional homologs in different species share …
similar expression?
regulation?
19
protein-protein interactions?
21. Integrative Orthology Viewer - an ensemble of
different gene orthology prediction approaches
•Tree-based orthologs (TROG) inferred using tree reconciliation
•Orthologous gene families (ORTHO) inferred using OrthoMCL
•Anchor points refer to gene-based colinearity between species
21 •Best hit families (BHIF) inferred from Blast hits including inparalogs
22. How to evaluate sequence-based orthology
methods?
Cross-species analysis of orthologs using Expression
Context Conservation (ECC)
Expression context conservation
quantifies shared orthologs in
coexpression networks
ECC score = 0.088
(16 shared orthologs / 182 in both
coexpression clusters)
P-value(conserved)<0.001
22 Movahedi, Van de Peer & Vandepoele, Plant Physiology 2011
24. Conclusions
PLAZA 2.5 provides a versatile toolbox for plant genomics
Expression Context Conservation provides a valuable
approach to study orthologs and predict functional
homologs across species
The integration of complementary data types extends the
scope of complex orthology relationships
24
25. Acknowledgments
• – plant comparative genomics
Michiel Van Bel
Sebastian Proost
Yves Van de Peer
http://bioinformatics.psb.ugent.be/plaza/
Evolutionary analysis of expression networks
Sara Movahedi
Plant Physiology 2011 paper
25
Editor's Notes
23 plant genomes: 11 dicots, 5 monocotspico-PLAZA: 10 green algae
Intuitive & complete view of gene orthology
Method to quantify expression conservation across species by comparing coexpression networks of orthologous genes.These quantifications are robust with respect to moderate modifications in the underlying expression data set, and the method corrects for network connectivity or tissue specific expression when determining significance levels.