SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Chemistry counting across databases
Chemistry totals counting in UniChem
1 Centre for Discovery Brian Sciences, University of Edinburgh, Edinburgh, UK. 2 (currently) TW2Informatics Ltd, Göteborg, Sweden, cdsouthan@gmail.com
Assessing chemistry <> proteins <> papers
connectivity between ELIXIR resources
Introduction
C
As we know, the utility of ELIXIR is largely determined by connectivity and
interoperability. This can be expressed in different ways including the ability to
computationally query across the same entities between resources and the simple
provision of cross-pointers as live URLs for users to manually navigate between entity
records from different databases.
So how is ELXIR doing in this respect? This has been addressed in a blog post
https://cdsouthan.blogspot.com/2018/08/an-initial-look-at-elixir-chemistry.html that
asses chemistry <> protein <> papers connectivity (C-P-P). The should be consulted
for details since only an outline can be presented in this poster. The starting point was
our own UK ELIXIR resource of the IUPHAR/BPS Guide to PHARMACOLOGY
(GtoPdb) that includes C-P-P capture (see poster by Harding et al. and
http://www.guidetopharmacology.org/). We offer users outlinks and intersects of our
proteins via UniProt cross-references and updated our chemistry in PubChem and
UniChem. However, entity overlaps with other ELIXIR resources offer crucial
complementarity for users. Those compared for curated C-P-P are GtoPdb, ChEMBL,
ChEBI, PDBe, and most recently BRENDA, (excepting ChEBI that auto-maps C-P)
From the pre-computed chemistry intersects that UniChem generates at each release
one can plot informative comparative overlaps. The blog-post shows all five of these
but the example for GtoPdb is shown above. The pattern of overlaps has been
described in our NAR paper (PMID: 29149325). Note this is highest for PubChem
because we are a submitting source but there are minor chemistry rule differences.
Protein intersects
The easiest way to intersect proteins is via the UniProt cross-references, although
these are not available for ChEBI. The Venn diagram above shows selections of
Human Swiss-Prot x-refs for the other four sources. Some of the divergence is
explicable (e.g. the three sources do not curate PDB proteins that have no reported
chemical interactions). Note also the mappings are not all for small-molecules (e.g.
the ChEMBL and GtoPdb x-refs include antibody and large peptide interactions).
Unique or 2-way overlaps can be cross-curation opportunities to increase coverage.
Christopher Southan1,2
Publication intersects in European PubMed Central (EPMC)
For curated C-P-P resources it is useful to compare which papers have been selected
for chemistry extraction (even though its more difficult to discern “why”). In EPMC the
Data Links and Data Citations queries (HAS_CHEMBL:y) and (HAS_PDB:y) worked
cleanly. However, there was some ambiguity for (HAS_CHEBI:y). It turns out,
unfortunately, these are papers where there is a term match to ChEBI entries but not
papers that they curated to extract their chemical entries from. Neither GotPdb nor
BRENDA are current data links (GtoPdb intend to address this but in the interim lists of
papers they have curated chemistry from can be obtained via PubMed > PubMed). The
curation selectivity underlying the capture divergence is worthy of further investigation.
Chemistry intersects in PubChem
PubChem offers powerful “slice ‘n dice” options to compare 600+ sources. Of our five,
BRENDA and PDBe are not submitters but we can use the NCBI Structure (ligands
extracted from PDB) to substitute for the latter (n.b. 4-way Venn intersects are difficult
from the interface so only a 3-way is shown). Reasons for the wide divergence of
ELIXIR chemistry seen above can be partially but not entirely explained (see blog-post).
Conclusions
• This intra-ELIXIR comparative analysis was more difficult that in should have been
• One reason is that these databases have independently diverged over decades into
their utility niches with little (pre-ELIXIR) consideration of interoperability
• The exercise turned out to be peculiarly “gapped” in that it was not possible to do
standardized C-P-P x-mappings between all five, there was always at least one odd-
man-out
• Some of this could be easily addressed, for example that C-P for GtoPdb, ChEBI and
BRENDA get PMIDs indexed in EPMC for the papers they curated/extracted
• Another enhancement would be to harmonise chemistry submissions to both
UniChem and PubChem (e.g. for PDBe ligands and BRENDA compounds)
• The 37% unique chemistry in BRENDA may represent valuable capture but this
needs to be checked
• More technical dialogue between ELIXIR resources with entities-in-common would
be valuable (e.g. to cogitate on causes of divergent capture, pragmatic
interoperability assessments, collaborative curation and future RDF cross-testing)
• The C-P-P is extendable (e.g. for the new ELIXIR 3D-BioInfo imitative
• While ELIXIR Training is progressing and resources have good Help and FAQ these
results indicate an unmet need for “comparative exploitation guides” even for just C-
P-P. For example users need to know not only “what's in one but not t’other and
why?” but also “which permutations of these five, and/or others, should I use for
what?” (for chemistry see PMID: 29451740)
The EBI UniChem database provides
chemical structure cross-indexing between
39 sources that include the five compared
here. For comparison PubChem,
SureChEMBL (patents) and Human
Metabolites (HMDB) are shown on the right.
Counts refer to InChIKeys. The % unique
are for that source from the 128 million in the
11 Nov release that includes PubChem
(some are slightly different from the August
blog-post). This unique content is significant
for BRENDA, HMDB and PDBe.

Weitere ähnliche Inhalte

Was ist angesagt?

6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
Dmitry Grapov
 
Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Was ist angesagt? (20)

Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)Self-Contained Sequence Representation (SCSR)
Self-Contained Sequence Representation (SCSR)
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 
Chemistry Reserach as a Social Machine
 Chemistry Reserach as a Social Machine Chemistry Reserach as a Social Machine
Chemistry Reserach as a Social Machine
 
6 metabolite enrichment analysis
6  metabolite enrichment analysis6  metabolite enrichment analysis
6 metabolite enrichment analysis
 
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
Slicing and dicing curated protein targets: Analysing the drugged, druggable ...
 
GtoPdb_ITMAT_2017
GtoPdb_ITMAT_2017GtoPdb_ITMAT_2017
GtoPdb_ITMAT_2017
 
Protein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural AlignmentProtein Structure, Databases and Structural Alignment
Protein Structure, Databases and Structural Alignment
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...Automated identification and conversion of chemical names to structure search...
Automated identification and conversion of chemical names to structure search...
 
Molecular docking by harendra ...power point presentation
Molecular docking by harendra ...power point presentationMolecular docking by harendra ...power point presentation
Molecular docking by harendra ...power point presentation
 
Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...Automated Identification and Conversion of Chemical Names to Structure Search...
Automated Identification and Conversion of Chemical Names to Structure Search...
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
IUPHAR/BPS Guide to Pharmacology: concise mapping of chemistry, data, and tar...
 
Vls
VlsVls
Vls
 
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updatesThe IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
 
Cambridge structural database
Cambridge structural databaseCambridge structural database
Cambridge structural database
 
In Silico methods for ADMET prediction of new molecules
 In Silico methods for ADMET prediction of new molecules In Silico methods for ADMET prediction of new molecules
In Silico methods for ADMET prediction of new molecules
 
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
MOLECULAR DOCKING AND RELATED DRUG DESIGN ACHIEVEMENTS
 
Chemical database preparation ppt
Chemical database preparation pptChemical database preparation ppt
Chemical database preparation ppt
 
2 cluster analysis
2  cluster analysis2  cluster analysis
2 cluster analysis
 

Ähnlich wie Looking at chemistry - protein - papers connectivity in ELIXIR

The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacology
Chris Southan
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
Neil Swainston
 

Ähnlich wie Looking at chemistry - protein - papers connectivity in ELIXIR (20)

Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Physicochemical Profiling In Drug Research
Physicochemical Profiling In Drug ResearchPhysicochemical Profiling In Drug Research
Physicochemical Profiling In Drug Research
 
Assessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChemAssessing GtoPdb ligand content in PubChem
Assessing GtoPdb ligand content in PubChem
 
The big data join in pharmacology
The big data join in pharmacologyThe big data join in pharmacology
The big data join in pharmacology
 
Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)
 
Connectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivityConnectivity > documents > structures > bioactivity
Connectivity > documents > structures > bioactivity
 
PubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biologyPubChem for drug discovery and chemical biology
PubChem for drug discovery and chemical biology
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...
 
Introduction to Proteogenomics
Introduction to Proteogenomics Introduction to Proteogenomics
Introduction to Proteogenomics
 
Types of biological databases-protein database
Types of biological databases-protein databaseTypes of biological databases-protein database
Types of biological databases-protein database
 
Ppi
PpiPpi
Ppi
 
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
Comparing ChEMBL, DrugBank, Human Metabolome db and Therapeutic Target db at ...
 
Cadd assignment 4 (sarita)
Cadd assignment 4 (sarita)Cadd assignment 4 (sarita)
Cadd assignment 4 (sarita)
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練 2009 CSBB LAB 新生訓練
2009 CSBB LAB 新生訓練
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Validation of Clomipramine interactions identified by BioBind against experim...
Validation of Clomipramine interactions identified by BioBind against experim...Validation of Clomipramine interactions identified by BioBind against experim...
Validation of Clomipramine interactions identified by BioBind against experim...
 
Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysis
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
The Subliminal Toolbox: automating steps in the reconstruction of metabolic n...
 

Mehr von Chris Southan

Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
Chris Southan
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
Chris Southan
 

Mehr von Chris Southan (20)

FAIR connectivity for DARCP
FAIR  connectivity for DARCPFAIR  connectivity for DARCP
FAIR connectivity for DARCP
 
Peptide tribulations
Peptide tribulationsPeptide tribulations
Peptide tribulations
 
Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2 Vicissitudes of target validation for BACE1 and BACE2
Vicissitudes of target validation for BACE1 and BACE2
 
Guide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updaeGuide to Pharmacology database: ELIXIR updae
Guide to Pharmacology database: ELIXIR updae
 
In silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug DevelopmentIn silico 360 Analysis for Drug Development
In silico 360 Analysis for Drug Development
 
Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?Will the correct BACE ORFs please stand up?
Will the correct BACE ORFs please stand up?
 
Seeking glimmers of light in Pharos “Tdark” proteins
Seeking glimmers of light in  Pharos “Tdark” proteinsSeeking glimmers of light in  Pharos “Tdark” proteins
Seeking glimmers of light in Pharos “Tdark” proteins
 
5HT2A modulators update for SAFER
5HT2A modulators update for SAFER5HT2A modulators update for SAFER
5HT2A modulators update for SAFER
 
Quality and noise in big chemistry databases
Quality and noise in big chemistry databasesQuality and noise in big chemistry databases
Quality and noise in big chemistry databases
 
GtoPdb June 2019 poster
GtoPdb June 2019 posterGtoPdb June 2019 poster
GtoPdb June 2019 poster
 
PubChem as a source of systems biology perturbagens
PubChem as a source of  systems biology perturbagensPubChem as a source of  systems biology perturbagens
PubChem as a source of systems biology perturbagens
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
Peptide Tribulations
Peptide TribulationsPeptide Tribulations
Peptide Tribulations
 
Guide to Immunopharmacology update
Guide to Immunopharmacology updateGuide to Immunopharmacology update
Guide to Immunopharmacology update
 
Druggable Proteome sources in UniProt
Druggable Proteome sources in UniProtDruggable Proteome sources in UniProt
Druggable Proteome sources in UniProt
 
Peptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdbPeptide Tribulations in GtoPdb
Peptide Tribulations in GtoPdb
 
Patents in PubChem
Patents in PubChemPatents in PubChem
Patents in PubChem
 
The IUPHAR/MMV Guide to Malaria Pharmacology
The  IUPHAR/MMV Guide to Malaria Pharmacology  The  IUPHAR/MMV Guide to Malaria Pharmacology
The IUPHAR/MMV Guide to Malaria Pharmacology
 
Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed Linking GtoP <> PubChem <> PubMed
Linking GtoP <> PubChem <> PubMed
 
Druggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbsDruggable genome in GtoPdb and other dbs
Druggable genome in GtoPdb and other dbs
 

Kürzlich hochgeladen

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 

Kürzlich hochgeladen (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Concept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdfConcept of gene and Complementation test.pdf
Concept of gene and Complementation test.pdf
 
Early Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdfEarly Development of Mammals (Mouse and Human).pdf
Early Development of Mammals (Mouse and Human).pdf
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 

Looking at chemistry - protein - papers connectivity in ELIXIR

  • 1. Chemistry counting across databases Chemistry totals counting in UniChem 1 Centre for Discovery Brian Sciences, University of Edinburgh, Edinburgh, UK. 2 (currently) TW2Informatics Ltd, Göteborg, Sweden, cdsouthan@gmail.com Assessing chemistry <> proteins <> papers connectivity between ELIXIR resources Introduction C As we know, the utility of ELIXIR is largely determined by connectivity and interoperability. This can be expressed in different ways including the ability to computationally query across the same entities between resources and the simple provision of cross-pointers as live URLs for users to manually navigate between entity records from different databases. So how is ELXIR doing in this respect? This has been addressed in a blog post https://cdsouthan.blogspot.com/2018/08/an-initial-look-at-elixir-chemistry.html that asses chemistry <> protein <> papers connectivity (C-P-P). The should be consulted for details since only an outline can be presented in this poster. The starting point was our own UK ELIXIR resource of the IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) that includes C-P-P capture (see poster by Harding et al. and http://www.guidetopharmacology.org/). We offer users outlinks and intersects of our proteins via UniProt cross-references and updated our chemistry in PubChem and UniChem. However, entity overlaps with other ELIXIR resources offer crucial complementarity for users. Those compared for curated C-P-P are GtoPdb, ChEMBL, ChEBI, PDBe, and most recently BRENDA, (excepting ChEBI that auto-maps C-P) From the pre-computed chemistry intersects that UniChem generates at each release one can plot informative comparative overlaps. The blog-post shows all five of these but the example for GtoPdb is shown above. The pattern of overlaps has been described in our NAR paper (PMID: 29149325). Note this is highest for PubChem because we are a submitting source but there are minor chemistry rule differences. Protein intersects The easiest way to intersect proteins is via the UniProt cross-references, although these are not available for ChEBI. The Venn diagram above shows selections of Human Swiss-Prot x-refs for the other four sources. Some of the divergence is explicable (e.g. the three sources do not curate PDB proteins that have no reported chemical interactions). Note also the mappings are not all for small-molecules (e.g. the ChEMBL and GtoPdb x-refs include antibody and large peptide interactions). Unique or 2-way overlaps can be cross-curation opportunities to increase coverage. Christopher Southan1,2 Publication intersects in European PubMed Central (EPMC) For curated C-P-P resources it is useful to compare which papers have been selected for chemistry extraction (even though its more difficult to discern “why”). In EPMC the Data Links and Data Citations queries (HAS_CHEMBL:y) and (HAS_PDB:y) worked cleanly. However, there was some ambiguity for (HAS_CHEBI:y). It turns out, unfortunately, these are papers where there is a term match to ChEBI entries but not papers that they curated to extract their chemical entries from. Neither GotPdb nor BRENDA are current data links (GtoPdb intend to address this but in the interim lists of papers they have curated chemistry from can be obtained via PubMed > PubMed). The curation selectivity underlying the capture divergence is worthy of further investigation. Chemistry intersects in PubChem PubChem offers powerful “slice ‘n dice” options to compare 600+ sources. Of our five, BRENDA and PDBe are not submitters but we can use the NCBI Structure (ligands extracted from PDB) to substitute for the latter (n.b. 4-way Venn intersects are difficult from the interface so only a 3-way is shown). Reasons for the wide divergence of ELIXIR chemistry seen above can be partially but not entirely explained (see blog-post). Conclusions • This intra-ELIXIR comparative analysis was more difficult that in should have been • One reason is that these databases have independently diverged over decades into their utility niches with little (pre-ELIXIR) consideration of interoperability • The exercise turned out to be peculiarly “gapped” in that it was not possible to do standardized C-P-P x-mappings between all five, there was always at least one odd- man-out • Some of this could be easily addressed, for example that C-P for GtoPdb, ChEBI and BRENDA get PMIDs indexed in EPMC for the papers they curated/extracted • Another enhancement would be to harmonise chemistry submissions to both UniChem and PubChem (e.g. for PDBe ligands and BRENDA compounds) • The 37% unique chemistry in BRENDA may represent valuable capture but this needs to be checked • More technical dialogue between ELIXIR resources with entities-in-common would be valuable (e.g. to cogitate on causes of divergent capture, pragmatic interoperability assessments, collaborative curation and future RDF cross-testing) • The C-P-P is extendable (e.g. for the new ELIXIR 3D-BioInfo imitative • While ELIXIR Training is progressing and resources have good Help and FAQ these results indicate an unmet need for “comparative exploitation guides” even for just C- P-P. For example users need to know not only “what's in one but not t’other and why?” but also “which permutations of these five, and/or others, should I use for what?” (for chemistry see PMID: 29451740) The EBI UniChem database provides chemical structure cross-indexing between 39 sources that include the five compared here. For comparison PubChem, SureChEMBL (patents) and Human Metabolites (HMDB) are shown on the right. Counts refer to InChIKeys. The % unique are for that source from the 128 million in the 11 Nov release that includes PubChem (some are slightly different from the August blog-post). This unique content is significant for BRENDA, HMDB and PDBe.