Presentation of Eugeni Belda (LABGeM-Genoscope) at the Biocuration 2012 conference (Georgetown University, Washington DC): From bacterial genome annotation to metabolic pathway curation
08448380779 Call Girls In Civil Lines Women Seeking Men
Biocuration2012 Eugeni Belda
1. Eugenio Belda
Laboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team)
CEA/DSV/IG/Genoscope & CNRS UMR8030
2. Introduction
Advances in sequencing technologies has allowed an exponential accumulation
of complete genome sequences in public databases in recent years.
12273 protein
4712 enzymatic
However, wide gap exist activities families (Pfam)
between rapid advances in genome (EC number)
sequencing and slow progress in 25% of 26%
characterization of new protein orphan of unknown
functions reactions functions
?
Genoscope (French National Sequencing Center) has
as one fundamental research objective the extension of in
silico sequence annotations with experimental
characterization of new enzymatic functions (Metabolic
Genomics).
Lab. of Genomics & Biochemistry of Metabolism (LGBM)
Lab. of Organic Chemistry and Biocatalysis (LCOB)
Lab. For enzymatic cloning and screening (LCAB)
Lab. of Bioinformatic Analysis in Genomic and Metabolism
(LABGeM)
3. Three MicroScope components
Process Management
Primary Databank Syntactic Functional / relational > 25 methods :
Update Annotations Analyses
Integrated in a
JBPM Database
workflow
DB Job management system
Release History
=> full automatisation :
PkGDB MicroCyc
• genome annotation
Data Management
• primary data up-to-date
Primary Internal Computational Pathway
Databanks Genomic results Genome
Objects DataBases
Vallenet D. et al.
«MicroScope - a platform for
microbial genome annotation
MaGe Web Interface Keyword search
Blast and Pattern and comparative genomics»
Tutorial
Login Phylogenetic Profile Database 2009
Visualization
Fusion / Fission
Genome overview Tandem duplications
Genome browser Minimal Gene Set Vallenet D, et al.
Data Export and RGPfinder
Synteny maps SNPs / InDels «MaGe - a microbial genome
Artemis annotation system supported
KEGG
MicroCyc by synteny results» Nucleic
CGView
LinePlot
Synton Gene Gene Metabolic Profile Acids Research 2006
display editor card Pathway / Synteny
4. Database Management
Relational DataBase PkGDB
(Prokaryotic Genome DataBase)
EC / reaction
correspondence
• Experimentally elucidated
metabolic pathways
• 1800 pathways from 2216
organisms
(P. Karp, SRI, USA)
Pathway Tools
A metabolic database is built for each annotated microbial genome
PGDB = Pathway/Genome Database (orgname_Cyc)
http://www.genoscope.cns.fr/agc/microcyc
Today: 1233 organisms
(of which 676 public
genomes)
Mapping on the PkGDB
KEGG metabolic
maps
(http://www.kegg.jp/)
5. MicroScope Web site
More than 30 tools are made available to the community
«guest» access
«guest» access
Since 2005, more than
50.000 expert
annotations per year
> 1,000 users, 300 active
www.genoscope.cns.fr/agc/microscope
6. Curation of metabolic data in Microscope
CanOE (Candidate genes for Orphan Enzymes): Method for the automatic integration
of genomic and metabolic contexts, that assists expert functional annotation, especially
in the case of orphan enzymes. Based on the concept of Metabolon (“close” genes in
genome sequence associated to “close” metabolic reactions):
Boyer et. Al; Bioinformatics 2005; Dec 1;21(23):4209-15.
gene gaps
genes
on genome
functional
annotations
? reactions and
compounds in
metabolic network
reaction gap
And ORPHAN
The method provides candidate genes for global/local orphan enzymatic activities
that are located in the “gaps” of metabolons
https://www.genoscope.cns.fr/agc/microscope/metabolism/canoe.php
7. Curation of metabolic data in Microscope
CanOE (Candidate genes for Orphan Enzymes)
Example: Allantoin degradation metabolon in E. coli K12
2.1.3.5 is a global orphan reaction (no associated to any gene in any
organism)
Three candidate genes for EC:2.1.3.5 reaction
None share any significant similarities with kown carbamoytransferases
Protein expression and biochemical assays under way
Smith AAT, Belda E., Viari A., Médigue C., and Vallenet D. “The CanOE strategy: integrating genomic and metabolic contexts across multiple
prokaryote genomes to find candidate genes for orphan enzymes” (Plos Computational Biology, In revision)
8. Curation of metabolic data in Microscope
GPR curation interface: In the context of network reconstruction, is essential the
definition of Gene-Protein-Reaction associations (Genes encoding
enzymes/complexes/isozymes catalyzing a particular metabolic reaction):
Thiele & Palsson; Nat Protoc. 2010;5(1):93-121
9. Curation of metabolic data in Microscope
GPR curation interface: The gene curation interface of Microscope allows the
validation of Gene-Reaction associations based on curated gene annotations. Two
reference reaction resources availables, MetaCyc (functional) and RHEA (under
development):
4.1.3.27, 2.4.2.18 Automatic retrieval of
Metacyc/Rhea
reactions based on
EC number
Keyword
search
10. Curation of metabolic data in Microscope
Pathway validation interface: Validation/curation of automatically projected MetaCyc
pathways based on Gene-Reaction associations:
11. Projet Microme : www.microme.eu
A Knowledge-Based Bioinformatics Framework
for Microbial Pathway Genomics
AMAbiotics
Purpose : develop bioinformatics infrastructures, Centro Nacional
together with a projection and curation process, in de Biotecnología
order to generate : CEA-Genoscope
- complete metabolic pathways from genome European
Bioinformatics
annotations Center for research
Institute
- whole-cell metabolic models from pathway and Technology
German Collection of
Hellas
assemblies Microorganisms and
Cell Cultures
ISTHMUS Spanish National
Experimentally validation of metabolic model Cancer Centre
using growth phenotype data (i.e, BIOLOG Molecular Tel-Aviv
experiments) generated within the project for a Networks University
subset of selected species.
Université
Swiss Institute of
Libre de
Bioinformatics
Bruxelles
Analytical tools are integrated for comparative
and phylogenetic analysis based on projected Wageningen
Wellcome Trust
pathways and metabolic models Sanger Institute University
12. Microme WP2: Objectives
Provide EU with a curated microbial metabolic resource
Implement a unique cyclic and colaborative curation process for metabolic data
Unification of existing metabolic resources:
Pivot resources: ChEBI (chemical compounds) and Rhea (chemical reactions)
Cross-references External resources (compounds, reactions, pathways):
KEGG, MetaCyc, Metabolic models
Alcantara R., Axelsen K.B., Morgat A., Belda E., Coudert E., Bridge A., Cao H., de Matos P., Ennis M., Turner S., Owen G., Bougueleret
L., Xenarios I., and Steinbeck C. (2012) Rhea - a manually curated resource of biochemical reactions. Nucleic Acids Research. 40, D754-
D760, Database issue.
MicroScope and Microme
Use MicroScope as reference resource of curated GPR (Gene Protein Reaction)
associations for microbial genomes included in Microme project
Development of novel interfaces for GPR curation in Microscope environment. Retrieval
of METACYC and RHEA reactions for a particular gene object from EC number annotations
13. MicroScope and Microme
Development of web-services to provide Microme partners with curated Gene-
Reaction associations from Microscope platform
Curation tool
Reconstruction
microcyc Each night PkGDB
Web-services
14. Test-case: Bacillus subtilis 168 re-annotation
Second most intensively studied bacterium after Escherichia coli, being a model
organism for Gram-positive bacteria
Genome sequenced in
1997. 4,214 Megabases, 4000
CDSs
Nature 1997 Nov 20;390(6657):249-56
Re-sequencing and first re-
annotation of the genome in
2009
Microbiology (2009), 155, 1758-1775
Re-annotation of the genome in the context of Microme project with special
focus in the curation of Gene-Reaction associations by using Microscope metabolic
tools and curation interface. Collaborative work LABGeM (CEA)-SIB-AMAbiotics
(Antoine Danchin)
15. Test-case: Bacillus subtilis 168 re-annotation
Starting data for curation of Gene-Reaction associations
Predicted MetaCyc
reaction; BBH relationship
with E. coli CDSs
Predicted MetaCyc
reaction; No BBH
310 CDSs
relationship with E. coli
531 CDSs CDSs
909 CDSs
508 CDSs 378 CDSs "Putative enzymes" in
Product type annotation;
No predicted MetaCyc
reaction
"Enzymes" in Product type
annotation; No predicted
MetaCyc reaction
16. Test-case: Bacillus subtilis 168 re-annotation
From the 909 CDS with predicted reaction
531 with BBH in E. coli:
416 with same GPR in B. Automatic validation of Gene-
subtilis and E. coli (EcoCyc) Reaction associations
115 CDS with different GPR in
B. subtilis and E. coli (EcoCyc) Manual curation of Gene-Reaction
associations in Microscope
378 without BBH in E. coli: environment
254 with GPR predicted from Sequence similarity profiles
the curated EC number
Genomic context
124 with GPR predicted from
conservation
“product” annotation
310 CDS with “enzyme” annotation and Integration of genomic and
without predicted reaction metabolic context (CanOE
strategy)
508 CDS with “enzyme” annotation and
without predicted reaction: Filter by
Co-evolution patterns of
Catalytic activity field in SwissProt
annotations (41 CDSs)
functionally related genes
17. Test-case: Bacillus subtilis 168 re-annotation
Problems associated to
automatic predictions of Gene-
Reaction associations. Example:
Generic EC number definition
associated to multiple specific No experimental
reaction instances in MetaCyc evidence of activity ;
generic product
annotation
17 predicted reactions based
on EC:1.2.1.3 annotation.
Problems in terms of
modelling purposes
Without experimental
evidence of specific
substrates, only generic
reaction has been validated
18. Test-case: Bacillus subtilis 168 re-annotation
Stats of curation Gene-Reaction associations in Microscope
1022
Nº reactions Initial Gene-
985 (388)
Reaction
predictions
901 (Pathway Tools)
Nº CDS
1006 (517)
Current Gene-
Nº Gene-Reaction 1549 Reaction
associations 1406 (715) associations
(Manually Curated)
0 500 1000 1500 2000
105 CDS without
automatically predicted 147 new reactions added (not
reaction in initial originally predicted)
projections 184 originally predicted
reactions removed
19. Test-case: Bacillus subtilis 168 re-annotation
17 possible updates of SwissProt annotations Reported to
SwissProt/IUBMB
6 possible new EC numbers curators
13 possible new metabolic pathways/pathway variants not presents in MetaCyc
Biotin biosynthesis pathway variant
Lipoate biosynthesis pathway variant
New Myoinositol catabolism pathway variant
pathway Rhamnogalacturonan type I degradation pathway variant
variants Acetoin dehydrogenase pathway variant
Methionin salvage pathway variant
Bacillaene biosynthesis pathway
Aerobic respiration pathway variants
Aromatic polyketide biosynthesis pathway
New 2-methylthio-N6-threocarbamoyladenosine biosynthesis
metab. Bacilysocin biosynthesis
pathways Archaeal-type ether lipid biosynthesis
Bacillaene biosynthesis pathway
Methionine-Cysteine interconversion
20. Test-case: Bacillus subtilis 168 re-annotation
Biotin biosynthesis pathway variant: Update of DAP aminotransferase pathway variant
(EC:2.6.1.62)
KEGG pathway (map00780) MetaCyc pathway (PWY-5005)
S-Adenosyl-L-
methionine as amino
group donor
L-lysine instead S-adenosyl-
Methionine as amino group donor in
Bacillus subtilis BioA enzyme
21. Test-case: Bacillus subtilis 168 re-annotation
Biotin biosynthesis pathway variant: Link with fatty acid metabolism. Improvement of
genome-scale metabolic models
iBsu1103: Most up-to-date B. subtilis 168 metabolic model (SEED
methodology; 1437 reactions, 1103 genes). Henry CS, Zinner JF, Cohoon MP, Stevens RL.
Genome Biol. 2009;10(6):R69
Dead-end
metabolite
Auxotrophic for
EX_pimelate Biotin
biosynthesis
FBA simulations iBsu1103 model
122.97 122.97 122.97
140.00
Not included in
Biomass prod. rate
120.00
Biomass equation 100.00
80.00
60.00
EX_biotin 40.00
0.00
20.00
0.00
iBsu1103 iBsu1103; Biotin iBsu1103; iBsu1103;
in Biomass External influx External influx
Pimelate Biotin
22. Test-case: Bacillus subtilis 168 re-annotation
BioI enzyme of B. subtilis 168: cytochrome
P450 protein that catalyzes the oxidative
cleavage of acyl-ACP/free fatty acid molecules
generated in the context of fatty acid
biosynthesis yielding pimeloyl-ACP as primary
product.
Fatty acids An Acyl-ACP
metabolism BioI (BSU30190) L-Alanine+H+
Pimeloyl-ACP BioF (BSU30220)
CO2+HoloACP
A fatty acid
BioI
(BSU30190)
23. Future work
Extension of the reference set of Microme species to:
Acinetobacter sp. ADP1
Pseudomonas putida KT2440
Bacillus subtilis 168
Second version of Gene-Reaction curation interface in Microscope
environment:
Curation of protein complexes / Isozyme sets
Management of Rhea reactions in addition of MetaCyc reactions
Definition of strategies for vertical annotation and propagation of curated
GPR across multiple microbial genomes
Use UniPathway as reference resource of metabolic pathways in Microscope;
Specie-specific pathway representations based on Pathway modules
combination (http://www.unipathway.org)
24. Contributions
Claudine Médigue (Group Leader)
David Vallenet (Researcher)
Damien Monrico (Engineer)
François Lefèvre (Engineer)
Alexander T. Smith (PhD)
Eugeni Belda (Post doc)
IT team Claude Scarpelli
Ludovic Fleury
External partners
Anne Morgat Antoine Danchin
Foundings
EU Framework Programme 7 Collaborative
Project. Grant Agreement Number 222886-2