SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Orphan and non-orphan EC number distribution across superkingdoms
Integrated approaches for the discovery of novel enzymatic activities
Guillaume REBOUL, Maria SOROKINA, Jonathan MERCIER, Karine BASTARD, Mark STAM, David VALLENET, Claudine MEDIGUE – CEA, Genoscope, LABGeM
Annotation Rules
Functional annotation rules
UniRule: HAMAP + PIRSF + RuleBase
Pathway Rules
Consistency against biological
processes
Knowledge
Base Rule Engine
Enzyme Activity Discovery workflow
Biological Facts
rule "Missing state"
when
$fact: Fact( present == "no", require == "yes", avoid == "no" )
then
modify( $fact.setState("missing") );
end
rule "require Pathway"
when
$org: Organism()
$path: Pathway(org == $org)
then
Fact fact = new Fact($org, $path);
fact.setRequire("yes");
insert(fact);
end
The “Novel Enzymatic Activities” group
‱ Group of the LABGeM: Laboratory of Bioinformatics
Analyses for Genomics and Metabolism
‱ Part of the CEA (French Alternative Energies and
Atomic Energy Commission)
‱ 3 Researchers
‱ 2 PhD students
‱ 1 sandwich placement Master Student
‱ 1 undergraduate student
‱ 1 post-doctoral placement available
Pool of
information
able to pull up
Novel
Enzymatic
Activities
Protein or
Domain
Families
Protein
Annotation
and
Sequences
Literature
Enzymatic
Reactions
Own database: NEADB
Summary
?
 iso-functional groups
 multi-functional group
 non-”activity” groups
 Motifs and key residues
for function assignment
3
Promiscuity
& Specificity
Modeling of
compounds
in active sites
+
Full family
 new metabolic
functions and
associated
pathways
4
Metabolic
Role Biochemical
validation
+
Genomic contextRepresentants
enzymes family
1
Define one
reaction
+Multiple alignment
one generic
reaction
A +B <-> A’ + B’
 A family of unknown function
 with experimental evidences
 with one available structure
17 substrats
new
reactions
2
Selection &
Screening
+Enzymatic
screening
Statistical
analysis
+
Family partitioning Potential metabolites
Set of sequences
BLAST PDB
Homology Modeling - MODELLER
3D Models
Cavity Detection - FPOCKET
3D-Active Sites
Structural Alignment - MULTALIGN
Hierarchical Clustering - WEKA
Specificity Determining Residues are determined by a log-likelihood analysis
Pfam unknown
family (DUF 846)
Next Generation Sequencing technology has dramatically increased the number of available sequences in
public databases. At the same time, many enzymatic activities (~22%) are orphans of protein sequence
(Sorokina et al., 2014). The large amount of available protein sequences is an opportunity to discover
enzymes associated to new reactions. We present here an integrated bioinformatics approach to reduce
this lack of knowledge in metabolism and to propose new activity/protein associations for experimental
validation. With this objective, the “New Enzymatic Activity” group of the LABGeM team is developing
several methods. The CanOE method combines genomic and metabolic contexts to predict candidate genes
for orphan enzymes (Smith et al., 2012). Currently, this approach is extended to the detection of
conserved chemical transformation motifs in the metabolism (Sorokina et al., submitted).
From a structural point of view, the ASMC (Active Site Modeling and Clustering) method
finds and compares active site pockets to classify enzymes of a family and detects
important residues for substrate specificity (de Melo-Minardi et al., 2010).
These methods were successful applied to elucidate the enzymatic diversity
of a protein family of unknown function (Bastard et al., 2014). Their
results, associated with present knowledge, must be unified in a
database allowing the elaboration of strategies for the selection of
enzymatic families of interest.
This work is supported by genomic and metabolic network data from
MicroScope, a platform for microbial genome analyses
(Vallenet et al., 2013).
Exploration of archaeal enzyme activities:
ARCHAEOACTOME research project
Literature references
Bastard, K. et al. Revealing the hidden functional diversity of an enzyme family. Nat.
Chem. Biol. 10, 42–9 (2014).
de Melo-Minardi, R. C., Bastard, K. & Artiguenave, F. Identification of subfamily-
specific sites based on active sites modeling and clustering. Bioinformatics 26,
3075–82 (2010).
Smith, A. A. T., Belda, E., Viari, A., Medigue, C. & Vallenet, D. The CanOE strategy:
Integrating genomic and metabolic contexts across multiple prokaryote genomes to
find candidate genes for orphan enzymes. PLoS Comput. Biol. 8, (2012).
Sorokina, M., Stam, M., MĂ©digue, C., Lespinet, O. & Vallenet, D. Profiling the orphan
enzymes. Biol. Direct 9, 10 (2014).
Vallenet, D. et al. MicroScope--an integrated microbial resource for the curation and
comparative analysis of genomic and metabolic data. Nucleic Acids Res. 41, D636–
47 (2013).
Sorokina et al. A novel metabolic network representation for the discovery of
conserved modules of chemical transformations. Submitted.
Mercier, J., Vallenet, D. GROOLS: Reactive Graph Reasoning for Genome
Annotation. RuleML 2015 Conference
Active Sites Classification (ASMC)
The CanOE strategy
Reaction Molecular Signature Network
The dynamics of enzyme discoveryMicroScope
From genomes to biological
systems
Reactions sharing a same RMS
Reaction Network reduction
into a RMS Network
Microbial genome
analysis Metabolic network
>3,900 genomes
1-10 Mb
ASMC method
Classification of a family
into groups of similar active sites
NEA team
Workbench
Data
Integration
Orphan
Enzymes
Grools
Structural
Analysis
Metabolic
Network
MicroScope

Weitere Àhnliche Inhalte

Ähnlich wie Integrated approaches for the discovery of novel enzymatic activities

Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Tania Acuna
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applicationsSham Sadiq
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfVisheshMishra20
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 
Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416Ali Ashrafzadeh
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...Sara Alvarez
 
Roleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamRoleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamAshish Gautam
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseRothamsted Research, UK
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talkc.titus.brown
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 pptrehman2009
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritisAnkit Bhardwaj
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMonica Munoz-Torres
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...CSCJournals
 

Ähnlich wie Integrated approaches for the discovery of novel enzymatic activities (20)

Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)Ransbotyn et al PUBLISHED (1)
Ransbotyn et al PUBLISHED (1)
 
Gdt 2-126
Gdt 2-126Gdt 2-126
Gdt 2-126
 
Gdt 2-126 (1)
Gdt 2-126 (1)Gdt 2-126 (1)
Gdt 2-126 (1)
 
Semantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life SciencesSemantic (Web) Technologies for Translational Research in Life Sciences
Semantic (Web) Technologies for Translational Research in Life Sciences
 
Metagenomics and it’s applications
Metagenomics and it’s applicationsMetagenomics and it’s applications
Metagenomics and it’s applications
 
metagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdfmetagenomicsanditsapplications-161222180924.pdf
metagenomicsanditsapplications-161222180924.pdf
 
B.3.5
B.3.5B.3.5
B.3.5
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 
Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416Resume-Cover letter-Ali Ashrafzadeh020416
Resume-Cover letter-Ali Ashrafzadeh020416
 
Maize database
Maize database Maize database
Maize database
 
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
A Critical Assessment Of Mus Musculus Gene Function Prediction Using Integrat...
 
Roleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautamRoleoffunctionalgenomicsincropimprovement ashishgautam
Roleoffunctionalgenomicsincropimprovement ashishgautam
 
FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
Thesis def
Thesis defThesis def
Thesis def
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
MORPH-R article
MORPH-R articleMORPH-R article
MORPH-R article
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
 

KĂŒrzlich hochgeladen

Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .Poonam Aher Patil
 
Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSĂ©rgio Sacani
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 

KĂŒrzlich hochgeladen (20)

Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❀CALL GIRL 7870993772 ❀CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 

Integrated approaches for the discovery of novel enzymatic activities

  • 1. Orphan and non-orphan EC number distribution across superkingdoms Integrated approaches for the discovery of novel enzymatic activities Guillaume REBOUL, Maria SOROKINA, Jonathan MERCIER, Karine BASTARD, Mark STAM, David VALLENET, Claudine MEDIGUE – CEA, Genoscope, LABGeM Annotation Rules Functional annotation rules UniRule: HAMAP + PIRSF + RuleBase Pathway Rules Consistency against biological processes Knowledge Base Rule Engine Enzyme Activity Discovery workflow Biological Facts rule "Missing state" when $fact: Fact( present == "no", require == "yes", avoid == "no" ) then modify( $fact.setState("missing") ); end rule "require Pathway" when $org: Organism() $path: Pathway(org == $org) then Fact fact = new Fact($org, $path); fact.setRequire("yes"); insert(fact); end The “Novel Enzymatic Activities” group ‱ Group of the LABGeM: Laboratory of Bioinformatics Analyses for Genomics and Metabolism ‱ Part of the CEA (French Alternative Energies and Atomic Energy Commission) ‱ 3 Researchers ‱ 2 PhD students ‱ 1 sandwich placement Master Student ‱ 1 undergraduate student ‱ 1 post-doctoral placement available Pool of information able to pull up Novel Enzymatic Activities Protein or Domain Families Protein Annotation and Sequences Literature Enzymatic Reactions Own database: NEADB Summary ?  iso-functional groups  multi-functional group  non-”activity” groups  Motifs and key residues for function assignment 3 Promiscuity & Specificity Modeling of compounds in active sites + Full family  new metabolic functions and associated pathways 4 Metabolic Role Biochemical validation + Genomic contextRepresentants enzymes family 1 Define one reaction +Multiple alignment one generic reaction A +B <-> A’ + B’  A family of unknown function  with experimental evidences  with one available structure 17 substrats new reactions 2 Selection & Screening +Enzymatic screening Statistical analysis + Family partitioning Potential metabolites Set of sequences BLAST PDB Homology Modeling - MODELLER 3D Models Cavity Detection - FPOCKET 3D-Active Sites Structural Alignment - MULTALIGN Hierarchical Clustering - WEKA Specificity Determining Residues are determined by a log-likelihood analysis Pfam unknown family (DUF 846) Next Generation Sequencing technology has dramatically increased the number of available sequences in public databases. At the same time, many enzymatic activities (~22%) are orphans of protein sequence (Sorokina et al., 2014). The large amount of available protein sequences is an opportunity to discover enzymes associated to new reactions. We present here an integrated bioinformatics approach to reduce this lack of knowledge in metabolism and to propose new activity/protein associations for experimental validation. With this objective, the “New Enzymatic Activity” group of the LABGeM team is developing several methods. The CanOE method combines genomic and metabolic contexts to predict candidate genes for orphan enzymes (Smith et al., 2012). Currently, this approach is extended to the detection of conserved chemical transformation motifs in the metabolism (Sorokina et al., submitted). From a structural point of view, the ASMC (Active Site Modeling and Clustering) method finds and compares active site pockets to classify enzymes of a family and detects important residues for substrate specificity (de Melo-Minardi et al., 2010). These methods were successful applied to elucidate the enzymatic diversity of a protein family of unknown function (Bastard et al., 2014). Their results, associated with present knowledge, must be unified in a database allowing the elaboration of strategies for the selection of enzymatic families of interest. This work is supported by genomic and metabolic network data from MicroScope, a platform for microbial genome analyses (Vallenet et al., 2013). Exploration of archaeal enzyme activities: ARCHAEOACTOME research project Literature references Bastard, K. et al. Revealing the hidden functional diversity of an enzyme family. Nat. Chem. Biol. 10, 42–9 (2014). de Melo-Minardi, R. C., Bastard, K. & Artiguenave, F. Identification of subfamily- specific sites based on active sites modeling and clustering. Bioinformatics 26, 3075–82 (2010). Smith, A. A. T., Belda, E., Viari, A., Medigue, C. & Vallenet, D. The CanOE strategy: Integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes. PLoS Comput. Biol. 8, (2012). Sorokina, M., Stam, M., MĂ©digue, C., Lespinet, O. & Vallenet, D. Profiling the orphan enzymes. Biol. Direct 9, 10 (2014). Vallenet, D. et al. MicroScope--an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data. Nucleic Acids Res. 41, D636– 47 (2013). Sorokina et al. A novel metabolic network representation for the discovery of conserved modules of chemical transformations. Submitted. Mercier, J., Vallenet, D. GROOLS: Reactive Graph Reasoning for Genome Annotation. RuleML 2015 Conference Active Sites Classification (ASMC) The CanOE strategy Reaction Molecular Signature Network The dynamics of enzyme discoveryMicroScope From genomes to biological systems Reactions sharing a same RMS Reaction Network reduction into a RMS Network Microbial genome analysis Metabolic network >3,900 genomes 1-10 Mb ASMC method Classification of a family into groups of similar active sites NEA team Workbench Data Integration Orphan Enzymes Grools Structural Analysis Metabolic Network MicroScope