SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Pipeline for automated structure-based
classification in the ChEBI ontology
Janna Hastings
Coordinator,
Cheminformatics and Metabolism
www.ebi.ac.uk/chebi
ACS Symposium on Chemical Ontologies,
Taxonomies and Schemas. Dallas, 16 March 2014
Chemical Entities of Biological Interest
Freely available
online, available
for download in full
Freely available
online, available
for download in full
Low molecular weight,
i.e. no proteins
Low molecular weight,
i.e. no proteins
Definitions,
relationships,
hierarchy
Definitions,
relationships,
hierarchy
E.g.
metabolites,
drugs,
pesticides
E.g.
metabolites,
drugs,
pesticides
38,215 entries last
release
38,215 entries last
release
What does ChEBI provide?
Chemical structures and
visualisations
caffeine
1,3,7-trimethylxanthine
methyltheobromine
Names and synonyms
Formula: C8H10N4O2
Charge: 0
Mass: 194.19
Chemical data
metabolite
CNS stimulant
trimethylxanthines
Ontology –
classifications
MSDchem: CFF
KEGG DRUG: D00528
PubMed citations
Links to more
information
Chemical Informatics
InChI=1/C8H10N4O2/c1-10-4-9-6-
5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
Example ChEBI entry page
Example entry page (continued)
Example entry page (continued)
Structure-based classification in ChEBI
Challenges with manual classification
• May be incomplete
• May be inconsistent
• Difficult to maintain (even with extensive use of
computationally expensive automatic validations)
• Blocks automatic loading of otherwise high-quality
externally annotated chemical data into ChEBI
(as no classification available)
SOCO (SMARTS, OWL)
Leonid Chepelev, Michel Dumontier, collaborators
• Given a training set of classified molecules, examine
structures for consensus features across all (using
fragmentation and feature detection)
• Capture features hierarchically
• Use OWL to classify
Chepelev et al. BMC Bioinformatics 2012 13:3 doi:10.1186/1471-2105-13-3
Limitations of SOCO
• No support for negation
• Only “min” (at least) counting supported, not max or
exact. Thus, dicarboxylic acid is_a monocarboxylic acid
(Every two-legged human is also a one-legged human in the sense
that they have at least one leg…)
• SMARTS is powerful – but not very human-readable.
ChEBI is for human biologist and chemist consumption.
E.g. SMARTS for the class of aliphatic amines: [$([NH2][CX4]),$
([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])]
Can we do better at making definitions accessible?
A new pipeline for automated structure-
based ontology classification in ChEBI
Definitions (OWL)
ChEBI structures
OWL Parser =>
logical
cheminformatics
definitions
OWL Parser =>
logical
cheminformatics
definitions
Novel
structure
Candidate
classes
RankingRankingBest classes: save is_a relations
MatchingMatching
Human-readable definitions, mapped to
structures in ChEBI knowledgebase
thiadiazoles:
molecular_entity and has_part
some ( 1,2,3-thiadiazole or 1,2,4-thiadiazole
or 1,2,5-thiadiazole or 1,3,4-thiadiazole )
diterpenoid: organic_molecular_entity and
has_part exactly 2 terpenoid
organic ion: organic_molecular_entity and
( has_charge some int[>0] or has_charge some int[<0] )
monocyclic compound: molecular_entity and
has_cycles value "1"^^int
Logical operatorsLogical operators
Counts (min, max
and exact)
Counts (min, max
and exact)
PropertiesProperties
PartsParts
Planned integration into ChEBI tools
• ChEBI internal data loader and bulk submissions
• ChEBI online submission tool
Pre-population
of matched
classes
Pre-population
of matched
classes
Acknowledgements – Thanks!
ChEBI team:
Christoph Steinbeck
Gareth Owen
Adriano Dekker
Namrata Kale
Steve Turner
Venkatesh Muthukrishnan
Collaborators:
Colin Batchelor, RSC
Lian Duan, ETH
Leonid Chepelev, Ottawa
Michel Dumontier, Stanford
Despoina Magka, Oxford
Ilinca Tudose and John May, EBI
Funding:
BBSRC “Continued
development of ChEBI towards
better usability for the systems
biology and metabolic
modelling communities”
BB/K019783/1
Questions?
Thank you for listening!
chebi-help@ebi.ac.uk
ACS Symposium on Chemical Ontologies,
Taxonomies and Schemas. Dallas, 16 March 2014

Weitere ähnliche Inhalte

Ähnlich wie Pipeline for automated structure-based classification in the ChEBI ontology

Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBIDuncan Hull
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 Abner Huang
 
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" Dr. Haxel Consult
 
Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein databasechinmayeec
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databasesMeetika Gupta
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 
Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)BIOVIA
 
Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Neil Swainston
 
Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)Dinesh Barupal
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRChris Southan
 
Pep Talk San Diego 011311
Pep Talk San Diego 011311Pep Talk San Diego 011311
Pep Talk San Diego 011311Philip Bourne
 
Implications of structural and chemical data bases
Implications of structural and chemical data basesImplications of structural and chemical data bases
Implications of structural and chemical data basesBhavitha Pulaparthi
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Sachin Kumar
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature miningLars Juhl Jensen
 

Ähnlich wie Pipeline for automated structure-based classification in the ChEBI ontology (20)

Accessing small molecule data using ChEBI
Accessing small molecule data using ChEBIAccessing small molecule data using ChEBI
Accessing small molecule data using ChEBI
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network"
 
PhDc exam presentation
PhDc exam presentationPhDc exam presentation
PhDc exam presentation
 
Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein database
 
protein.pptx
protein.pptxprotein.pptx
protein.pptx
 
100505 koenig biological_databases
100505 koenig biological_databases100505 koenig biological_databases
100505 koenig biological_databases
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)
 
Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...Continued development of ChEBI towards better usability for the systems biolo...
Continued development of ChEBI towards better usability for the systems biolo...
 
cath-171102055313.pptx
cath-171102055313.pptxcath-171102055313.pptx
cath-171102055313.pptx
 
Automatic vs manual curation of a multisource chemical dictionary
Automatic vs manual curation of a multisource chemical dictionaryAutomatic vs manual curation of a multisource chemical dictionary
Automatic vs manual curation of a multisource chemical dictionary
 
Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)Metabolite Set Enrichment Analysis (ChemRICH)
Metabolite Set Enrichment Analysis (ChemRICH)
 
Looking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIRLooking at chemistry - protein - papers connectivity in ELIXIR
Looking at chemistry - protein - papers connectivity in ELIXIR
 
Pep Talk San Diego 011311
Pep Talk San Diego 011311Pep Talk San Diego 011311
Pep Talk San Diego 011311
 
Protein database
Protein databaseProtein database
Protein database
 
Implications of structural and chemical data bases
Implications of structural and chemical data basesImplications of structural and chemical data bases
Implications of structural and chemical data bases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 

Mehr von Janna Hastings

Using ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studiesUsing ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studiesJanna Hastings
 
Chemical classification for the Semantic Web
Chemical classification for the Semantic WebChemical classification for the Semantic Web
Chemical classification for the Semantic WebJanna Hastings
 
Emotion Ontology and Affective Neuroscience
Emotion Ontology and Affective NeuroscienceEmotion Ontology and Affective Neuroscience
Emotion Ontology and Affective NeuroscienceJanna Hastings
 
Waves and fields in bio-ontologies
Waves and fields in bio-ontologiesWaves and fields in bio-ontologies
Waves and fields in bio-ontologiesJanna Hastings
 
Representing addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologiesRepresenting addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologiesJanna Hastings
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesJanna Hastings
 
Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...Janna Hastings
 
From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...Janna Hastings
 
Modularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBIModularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBIJanna Hastings
 
The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings Janna Hastings
 
A chemical view into biological systems
A chemical view into biological systemsA chemical view into biological systems
A chemical view into biological systemsJanna Hastings
 
Chemical diagrams and the IAO
Chemical diagrams and the IAOChemical diagrams and the IAO
Chemical diagrams and the IAOJanna Hastings
 
The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...Janna Hastings
 
Hyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologistHyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologistJanna Hastings
 
Using multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small moleculesUsing multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small moleculesJanna Hastings
 
Processes and Properties
Processes and PropertiesProcesses and Properties
Processes and PropertiesJanna Hastings
 
Representing sequences of parts in processes using OWL
Representing sequences of parts in processes using OWLRepresenting sequences of parts in processes using OWL
Representing sequences of parts in processes using OWLJanna Hastings
 
Modelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using ProntoModelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using ProntoJanna Hastings
 
Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...Janna Hastings
 
Ontological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistryOntological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistryJanna Hastings
 

Mehr von Janna Hastings (20)

Using ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studiesUsing ChEBI to explore the underlying biology in metabolomics studies
Using ChEBI to explore the underlying biology in metabolomics studies
 
Chemical classification for the Semantic Web
Chemical classification for the Semantic WebChemical classification for the Semantic Web
Chemical classification for the Semantic Web
 
Emotion Ontology and Affective Neuroscience
Emotion Ontology and Affective NeuroscienceEmotion Ontology and Affective Neuroscience
Emotion Ontology and Affective Neuroscience
 
Waves and fields in bio-ontologies
Waves and fields in bio-ontologiesWaves and fields in bio-ontologies
Waves and fields in bio-ontologies
 
Representing addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologiesRepresenting addiction in Mental Functioning and Disease ontologies
Representing addiction in Mental Functioning and Disease ontologies
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...Mental functioning ontology for interdisciplinary research into mental diseas...
Mental functioning ontology for interdisciplinary research into mental diseas...
 
From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...From chemicals to minds: Integrated ontologies in the search for scientific u...
From chemicals to minds: Integrated ontologies in the search for scientific u...
 
Modularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBIModularity requirements in bio-ontologies: a case study of ChEBI
Modularity requirements in bio-ontologies: a case study of ChEBI
 
The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings The SHAPES workshop, and Holes in living beings
The SHAPES workshop, and Holes in living beings
 
A chemical view into biological systems
A chemical view into biological systemsA chemical view into biological systems
A chemical view into biological systems
 
Chemical diagrams and the IAO
Chemical diagrams and the IAOChemical diagrams and the IAO
Chemical diagrams and the IAO
 
The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...The emotion ontology: enabling interdisciplinary research in the affective sc...
The emotion ontology: enabling interdisciplinary research in the affective sc...
 
Hyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologistHyperontology for the biomedical ontologist
Hyperontology for the biomedical ontologist
 
Using multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small moleculesUsing multiple ontologies to characterise the bioactivity of small molecules
Using multiple ontologies to characterise the bioactivity of small molecules
 
Processes and Properties
Processes and PropertiesProcesses and Properties
Processes and Properties
 
Representing sequences of parts in processes using OWL
Representing sequences of parts in processes using OWLRepresenting sequences of parts in processes using OWL
Representing sequences of parts in processes using OWL
 
Modelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using ProntoModelling metabolite concentrations in OWL using Pronto
Modelling metabolite concentrations in OWL using Pronto
 
Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...Chemical ontologies: what are they, what are they for, and what are the chall...
Chemical ontologies: what are they, what are they for, and what are the chall...
 
Ontological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistryOntological dependence, dispositions and institutional reality in chemistry
Ontological dependence, dispositions and institutional reality in chemistry
 

Kürzlich hochgeladen

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Pipeline for automated structure-based classification in the ChEBI ontology

  • 1. Pipeline for automated structure-based classification in the ChEBI ontology Janna Hastings Coordinator, Cheminformatics and Metabolism www.ebi.ac.uk/chebi ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014
  • 2. Chemical Entities of Biological Interest Freely available online, available for download in full Freely available online, available for download in full Low molecular weight, i.e. no proteins Low molecular weight, i.e. no proteins Definitions, relationships, hierarchy Definitions, relationships, hierarchy E.g. metabolites, drugs, pesticides E.g. metabolites, drugs, pesticides 38,215 entries last release 38,215 entries last release
  • 3. What does ChEBI provide? Chemical structures and visualisations caffeine 1,3,7-trimethylxanthine methyltheobromine Names and synonyms Formula: C8H10N4O2 Charge: 0 Mass: 194.19 Chemical data metabolite CNS stimulant trimethylxanthines Ontology – classifications MSDchem: CFF KEGG DRUG: D00528 PubMed citations Links to more information Chemical Informatics InChI=1/C8H10N4O2/c1-10-4-9-6- 5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O
  • 5. Example entry page (continued)
  • 6. Example entry page (continued)
  • 8. Challenges with manual classification • May be incomplete • May be inconsistent • Difficult to maintain (even with extensive use of computationally expensive automatic validations) • Blocks automatic loading of otherwise high-quality externally annotated chemical data into ChEBI (as no classification available)
  • 9. SOCO (SMARTS, OWL) Leonid Chepelev, Michel Dumontier, collaborators • Given a training set of classified molecules, examine structures for consensus features across all (using fragmentation and feature detection) • Capture features hierarchically • Use OWL to classify Chepelev et al. BMC Bioinformatics 2012 13:3 doi:10.1186/1471-2105-13-3
  • 10. Limitations of SOCO • No support for negation • Only “min” (at least) counting supported, not max or exact. Thus, dicarboxylic acid is_a monocarboxylic acid (Every two-legged human is also a one-legged human in the sense that they have at least one leg…) • SMARTS is powerful – but not very human-readable. ChEBI is for human biologist and chemist consumption. E.g. SMARTS for the class of aliphatic amines: [$([NH2][CX4]),$ ([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])] Can we do better at making definitions accessible?
  • 11. A new pipeline for automated structure- based ontology classification in ChEBI Definitions (OWL) ChEBI structures OWL Parser => logical cheminformatics definitions OWL Parser => logical cheminformatics definitions Novel structure Candidate classes RankingRankingBest classes: save is_a relations MatchingMatching
  • 12. Human-readable definitions, mapped to structures in ChEBI knowledgebase thiadiazoles: molecular_entity and has_part some ( 1,2,3-thiadiazole or 1,2,4-thiadiazole or 1,2,5-thiadiazole or 1,3,4-thiadiazole ) diterpenoid: organic_molecular_entity and has_part exactly 2 terpenoid organic ion: organic_molecular_entity and ( has_charge some int[>0] or has_charge some int[<0] ) monocyclic compound: molecular_entity and has_cycles value "1"^^int Logical operatorsLogical operators Counts (min, max and exact) Counts (min, max and exact) PropertiesProperties PartsParts
  • 13. Planned integration into ChEBI tools • ChEBI internal data loader and bulk submissions • ChEBI online submission tool Pre-population of matched classes Pre-population of matched classes
  • 14. Acknowledgements – Thanks! ChEBI team: Christoph Steinbeck Gareth Owen Adriano Dekker Namrata Kale Steve Turner Venkatesh Muthukrishnan Collaborators: Colin Batchelor, RSC Lian Duan, ETH Leonid Chepelev, Ottawa Michel Dumontier, Stanford Despoina Magka, Oxford Ilinca Tudose and John May, EBI Funding: BBSRC “Continued development of ChEBI towards better usability for the systems biology and metabolic modelling communities” BB/K019783/1
  • 15. Questions? Thank you for listening! chebi-help@ebi.ac.uk ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014

Hinweis der Redaktion

  1. ChEBI is a database and ontology of chemical entities of biological interest. As of October 2013, it contains more than 35,000 entries, organised into a structure-based and role-based classification hierarchy. Each entry is extensively annotated with a name, definition and synonyms, other metadata such as cross-references, and chemical structure information where appropriate. In addition to the classification hierarchy, the ontology also contains diverse chemical and ontological relationships. While ChEBI is primarily manually maintained, recent developments have focused on improvements in curation through partial automation of common tasks. We will describe a pipeline we have developed for structure-based classification of chemicals into the ChEBI structural classification. The pipeline connects class-level structural knowledge encoded in Web Ontology Language (OWL) axioms as an extension to the ontology, and structural information specified in standard MOLfiles. We make use of the Chemistry Development Kit, the OWL API and the OWLTools library. Harnessing the pipeline, we are able to suggest the best structural classes for the classification of novel structures within the ChEBI ontology.