GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp
1. Building a repository of biomedical
ontologies with Neo4j
Simon Jupp jupp@ebi.ac.uk, @simonjupp
Samples, Phenotypes and Ontologies Team
European Bioinformatics Institute
Cambridge, UK.
7. The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-a
Disease-location
8. The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-a
Disease-location
“Colorblindness”
“A form of colorblindness in
which only two of the three
fundamental colors can be
distinguished due to a lack of
one of the retinal cone
pigments.”
synonym
definition
9. 9
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Phenotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
- Human phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)
Ontologies for life sciences
10. Ontology Lookup Service
• Ontology search engine (Solr)
• Graph database of terms (Neo4j)
• Powerful RESTful API (Built with Spring data neo4j / rest)
• Open source project
• Generic infrastructure (can load any ontology represented in OWL)
https://github.com/EBISPOT/OLS
Repository of over 140 biomedical ontologies (4.5 million terms, 11 million relations)
http://www.ebi.ac.uk/ols/beta
11. Web Ontology Language – (OWL)
• W3C standard vocabulary for describing
ontologies
• Powerful knowledge representation
However
• OWL ontologies aren’t graphs, but…
… can be represented as an RDF graph
… people want to use them as graphs
• Plenty of RDF databases around
• But incomplete w.r.t. OWL semantics
• SPARQL is an acquired taste
12. OWL to Neo4j schema
• Each node label one of {Class, Property, Individuals} AND {Ontology name}
• All OWL annotations become properties (labels, id, descriptions etc)
• Superclass of (named and simple existentials) become edges in Neo4j
• E.g. In OWL “heart” subclassOf (part-of some “cardiovascular system”)
In Neo4j “heart” part-of “cardiovascular system”
13. What are the sub types of “colorblindess”?
MATCH (n:Class {obo_id: 'HP:0007641'})<-[r*]-(types:Class)
RETURN n, r, types
14. What parts of the eye are related to
diseases?MATCH
(eye:Class {obo_id: 'UBERON:0000970'})<-[r:Related
{label : "part_of"}]-(eye_part:Class)<-[r1:Related
{label : "has_disease_location"}]-(disease:Class)
RETURN eye, r,r1, eye_part, disease
15. Finding common ancestors via shortest path
Match p=shortestPath( (a:Class)-[r:SUBCLASSOF*]-(b:Class) )
Return nodes(p)
What is the common taxonomic
superfamily of Gibbons and Chimpanzees?
(or Hylobatidae and Pan troglodytes!)
https://commons.wikimedia.org/wiki/File:Hylobates_lar_pair_of_white_and_black_01.jpg
16. OLS visualisations
• Partonomy for heart from the UBERON anatomy ontology
MATCH path = (n:Class)-[r:SUBCLASSOF|PartOf*]->(ancestor)
17. REST API (Spring Data REST + Neo4j)
• Crawlable API - Hypermedia drivel (HAL)
• Get ontology and term meta data
• /ontologies
• /ontologies/{name}
• /ontologies/{name}/terms
• /ontologies/{name}/terms/{termid}
• Get related terms and navigate ontology structure
• /ontologies/{name}/terms/{termid}/parent
• /ontologies/{name}/terms/{termid}/children
• /ontologies/{name}/terms/{termid}/descendants
• /ontologies/{name}/terms/{termid}/ancestors
• /ontologies/{name}/terms/{termid}/{relation} e.g. part_of
http://www.ebi.ac.uk/ols/beta/api
18. Building the index
• We check all 140 external ontology files nightly for
changes
• We have a master build index
• When ontology updates we remove the old version and
reload using the Neo4j BatchInserter (Potentially fragile)
• We push master index to various production data centers
• Provides load balancing
Nightly crawl of all
>140 registered
ontologies
19. Conclusion
• We’ve built a scalable repository of biomedical ontologies
with Neo4j
• Generic OWL indexer (simplified OWL)
• Powerful REST API built with Spring
• Acts as standalone OWL ontology server
• Now being deployed externally
• Beta ~2000 users / 10 Million requests per month
• Would like to discuss
• Batch Inserter
• Migrating to Spring Data Neo4j 4
20. Acknowledgements
• Sample Phenotypes and Ontologies Team - Tony
Burdett, James Malone, Dani Welter, Catherine Leroy,
Sira Sarntivijai, Ilinca Tudose, Helen Parkinson
• Matt Pearce – Flax (BioSOLR project)
• Michal Bachman and GraphAware team (Neo4j training)
• Funding
• European Molecular Biology Laboratory (EMBL)
• European Union projects: DIACHRON, BioMedBridges and
CORBEL