PLAZA 3.0 - an access point for plant comparative genomics
1. PLAZA 3.0: an access point for plant
comparative genomics
Klaas Vandepoele
29 September 2014
VIB – Ghent University, Belgium
2. Overview
Plant genomes: status & challenges
Comparative genomics using PLAZA:
concepts & tools
What’s new in PLAZA 3.0 ?
2
3. Plant genome sequencing is booming
New and faster sequencing
technologies
Generating a draft genome sequence
has become cheap
The number of published plant
genomes grows exponentially
Unlocking biological information is
the real challenge
3 Michael & Jackson, 2013
4. Genome annotation
Structural annotation shows
where genes are
Describes their intron-exon gene
structure
Functional annotation tells you
what genes do
Can be downloaded along with the
genome sequence
4
5. Comparative genomics
Comparative genomics is a powerful
tool allowing us:
to link genomic changes to
environmental adaptation
to transfer knowledge from model
species to others plants
to trace structural changes within a
genome trough time
5
6. Comparative genomics has a steep learning curve
A thorough knowledge of data
processing tools is required
Computer clusters and high memory
machines are used
New visualizations and methods are
necessary to explore genomic geatures
across multiple species
Limited access to high-quality
comparative genomics information
6
17. Comparative sequence analysis
17
Homology = shared ancestral common origin
Inferred based on
sequence similarity (BLAST)
similar (multi-)domain composition & organization
TAIR
JGI
EMBL
BLASTCLUST
Tribe-MCL
Inparanoid
OrthoMCL
C/KOG
All-against-all sequence
similarity search (BLAST)
18. Gene families, Multiple sequence alignment &
Phylogenetic trees
18
26K multi-gene families covering
90% of the total proteome
>1M proteins from
31 species
17K trees incl. 580K
annotated tree nodes
20. Integrative Orthology Viewer
•Tree-based orthologs (TROG) inferred using tree reconciliation
•Orthologous gene families (ORTHO) inferred using OrthoMCL
•Anchor points refer to gene-based colinearity between species
•Best hit families (BHIF) inferred from Blast hits including inparalogs
21. 21
Gene colinearity & genome organization
Gene Homology Matrix (GHM)
i-ADHoRe 3.0
• Represent chromosomes as
sorted gene lists
• Identify all homologous gene
pairs between chromosomes (all-against-
all BLASTP).
• Score pairs of homologues in
matrix
1
2
24. PLAZA Workbench
25
Create a custom gene set (~experiment) using gene identifiers or
BLAST
External/internal gene IDs (e.g. AN3, AT5G28640, GRMZM2G180246_T01)
BLAST interface can be used to map sequence data from a non-model
species to a reference species present in PLAZA
A toolbox is available to analyze user-defined gene sets
PLAZA
Workbench
WGMapping
Functional
annotations
Gene Families
GO enrichment
Tandem/block
duplicates
Sequence
retrieval
Microarray
transcript profiling
EST / RNA-sequencing
Genes reported
in Suppl. data
iOrthologs Export data…
27. What’s new in PLAZA 3.0?
New genomes
Dicots (13)
• Gossypium raimondii (cotton), Eucalyptus grandis
(eucalyptus), Solanum lycopersicum (tomato), Solanum
tuberosum (potato), Beta vulgaris (sugar beet), Prunus
persica (peach), Citrus sinensis (sweet orange), Cucumis
melo (melon), Citrullus lanatus (watermelon)
• Capsella rubella, Brassica rapa and Thelungiella parvula
• Amborella trichopoda
Monocots (3)
• Musa acuminata (banana), Setaria italica (foxtail millet)
and Hordeum vulgare (barley)
28
28. What’s new in PLAZA 3.0?
Gene function information
Free-text gene descriptions
• Primary data provider + UniProt
• AnnoMine* text-mining
Protein domains
• InterPro
Structured functional annotations
• Gene Ontology
• MapMan
• PlnTFDB and PlantTFDB
29 * Sofie Van Landeghem
29. Extended GO projection
30
Orthology-based
Homology-based
Transfer of experimentally confirmed GO information to orthologs and homologs
30. Coverage gene function information
31
Gene Ontology (Biological Process)
Gene descriptions
blue = primary GO; green = GO projection (orthology + homology)
31. Conclusions
PLAZA 3.0 provides a versatile toolbox for plant genomics
Integration of complementary data sources describing gene
functions
Improved algorithms to transfer functional annotation from
well-characterized plant genomes to other species
Technical improvements
database design
comparative genomics tools
speed
visualizations
32
32. 33
Acknowledgments
• – plant comparative genomics
Sebastian Proost
Michiel Van Bel
Dries Vaneechoutte
Yves Van de Peer
Dirk Inzé
plaza_genomics
http://bioinformatics.psb.ugent.be/plaza/