Wikidata and Scholia as hub linking metabolite knowledge

We here discuss Wikidata, a young sister project of Wikipedia but with one big difference: it is a machine readable database, making it far more useful for interoperability of molecular databases in systems biology [1,2]. Thanks to the Wikidata:WikiProject Chemistry community, there is a growing amount of information about chemical compounds: Wikidata currently has over 150 thousand chemical compounds, of which more than 95% is associated with InChIKeys and has more than 70 thousand CAS registry numbers. Ongoing work by this WikiProject includes capturing chemical classes and chemical compounds in the various Wikipedia’s as machine readable data. We recently reported about ongoing efforts of using Wikidata and Scholia for chemical knowledge [3].

  1. 1. Wikidata and Scholia as hub linking metabolite knowledge Egon Willighagen ORCID:0000-0001-7542-0286 chem-bla-ics.blogspot.com T: @egonwillighagen M: @egonw@scholar.social BeNeLux Metabolomics Days #NMCDays Rotterdam, 2018-08-19 CC-BY 4.0 Int. (except slides with )
  2. 2. Acknowledgements ● Finn Nielsen (Scholia inventor) ● Denise Slenter (BiGCaT PhD candidate) ● Others – Various Maastricht University research groups – EPA CompTox Dashboard: Tony Williams – MetaboLights team: Reza Salek and Chandu Venkata – ChEBI team: Christoph Steinbeck (now Jena), Gareth Owen – PubChem, WikiGenomes teams: Evan Bolton, Gang Fu, Sebastian Burgstaller, Andra Waagmeester (Micelio) – SPLASH team: Gert Wolgemuth, Sajjan Singh Mehta – Wikidata & WikiCite teams: Daniel Mietchen, Dario Tataborelli – Wikidata:WikiProject Chemistry – Reactome: Robin Haw, Henning Hermjakob
  3. 3. One ID in the pathway, many in the popup
  4. 4. Databases & identifiers ● HMDB: Human Metabolome Database ● ChEBI: Database of Chemicals Entities of Biological Interest ● ChemSpider, PubChem ● CAS: Chemical Abstracts Service ● InChI: International Chemical Identifier – UniChem, ...
  5. 5. So, what IDs are used in WikiPathways? Curated subset 201220152017 + Reactome
  6. 6. Expression chemistry in metabolic pathways CHEBI:15361 (Pyruvate) -> Ce:CHEBI:32816 (conjugate) -> Ck:C00022 -> [WP2456 HIF1A and PPARG regulation of glycolysis, WP2453 TCA Cycle and PDHc] CHEBI:15361 CHEBI:32816 Brenninkmeijer, CYA, et al. "Scientific Lenses over Linked Data: An approach to support task specific views of the data. A vision." Proceedings of 2nd International Workshop on Linked Science. 2012.
  7. 7. What if we had more metabolite info? ● more CAS registry numbers? ● more physchem properties? ● more literature references?
  8. 8. Wikidata Mietchen, D. et al. Enabling open science: Wikidata for research (Wiki4R). Research Ideas and Outcomes 1, e7573+ (2015)
  9. 9. Wikidata: external (database) identifiers
  10. 10. QuickStatements: scriptable adding of content Spjuth, O. et al., 2007, BMC Bioinformatics Willighagen, E. et al., 2017, J. Cheminformatics
  11. 11. QuickStatements: scriptable adding of content, here, the SPLASH QuickStatements: Magnus Manske, WT Sanger Institute in Cambridge SPLASH: G. Wolhgemuth, 10.1038/nbt.3689
  12. 12. More identifiers: EPA CompTox Dashboard, LIPID MAPS, PDB ligand identifiers
  13. 13. Scholia: visualizing data doi:10.1007/978-3-319-70407-4_36
  14. 14. Scholia: compounds
  15. 15. Scholia: visualizing data (compound classes) doi:10.1007/978-3-319-70407-4_36
  16. 16. Scholia: physical-chemical properties
  17. 17. Scholia: keeping up with literature
  18. 18. Wikidata and Scholia as hub linking metabolite knowledge
  19. 19. Conclusions ● Wikidata + WikiCite + Scholia – Largest open data set that link CAS numbers to chemical structures – Literature is central – Integrates ontological relations with data – Large community around it ● Powerful query service ● FAIR by design