Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology Repository
1. NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web
Ontology Repository
Trish Whetzel, Manuel Salvadores, Paul R. Alexander, Mark A. Musen, Natalya F. Noy
Stanford University, Stanford, CA
Acknowledgements
http://bioportal.bioontology.org http://alphasparql.bioontology.org
The National Center for Biomedical Ontology is one of the National Centers for
Biomedical Computing supported by the NHGRI, the NHLBI, and the NIH Common
Fund under grant U54-HG004028.
Contact
For more information on the NCBO, visit http://www.bioontology.org
or email support@bioontology.org
Abstract
The NCBO Web services provide a common output (XML/JSON) for ontology content regardless of the
ontology representation format (OWL, OBO, Protégé frames, RRF), however there is no single uniform
storage for the ontologies and their metadata. As the amount of information and number of hits to the Web
services increases, a more scalable solution is needed. To address these issues, we analyzed the use of
a quad store since quad stores easily scale to millions of triples and provides SPARQL query access to
the ontologies. Currently each ontology in BioPortal includes the materialization of all owl:imports. Thus, if
a small ontology imports a large ontology then the former becomes a large ontology. Taking into account
that BioPortal stores multiple versions of an ontology, the problem is reproduced for every version. Our
hypothesis was that we could optimize the number of quads in the system using a more granular model
where owl:imports are not materialized and every ontology graph contains its own RDF triples without the
triples from the owl:imports ontologies. One of the questions to be answered is the optimization ratio–in
number of triples–when using an ontology-per-graph model versus a closure-materialized model. Of the
149 OWL ontologies reviewed, there are 299 ontologies in the import closure (i.e., if we follow all the
owl:imports links from the 149 ontologies, we will create a set of 299 ontologies). These 299 OWL
ontologies contain 303 owl:imports, the materialized import closure is a set of 495 owl:imports. We also
reviewed the number of re-used triples. Ontologies with no imports gather 5.4M triples in the system;
ontologies with one import 1.7M; ontologies with 2-9 imports reach 0.5M triples; and more than 10 imports
2.1M. To conclude, our analysis shows that while ontology reuse is still far from being the norm, effective
reuse is a goal worth pursuing and the level of reuse can have significant implications for the scalability of
ontology storage systems.
BioPortal SPARQL Endpoint Features
• Open library of biomedical ontologies
• Each ontology is materialized in a single graph to facilitate query articulation
• Ontology content is synchronized daily with BioPortal
• Only the latest version of each ontology can be accessed, but metadata for all versions is
available
• Access control for SPARQL named graphs to restrict access to private and licensed ontologies
based on the BioPortal user API Key
• rdfs:subPropertyOf reasoning for preferred name, synonyms and definitions allows queries to
bind top level predicates of the property hierarchy to query consistently across ontologies using
the graph "globals”
• UMLS ontologies can be generated at the CUI or CODE level
• To assure a fair usage of the triple store some queries are not permitted, for example SELECT *
WHERE { ?s ?p ?o }
Sample Code
Examples[1] are provided for the following platforms/languages:
- Java:
* Java with no 3-party libs (SimpleTest.java)
* Java with JenaARQ (JenaARQTest.java)
* Java with OpenRDF [*] (OpenRDFAlibabaTest.java)
- Python:
* Python with no 3-party libs (sparql1.py)
* Python with SPARQLWrapper[2] (sparql2.py)
- Javascript:
* Javacript with the SPARQLClient[3] lib (index.html)
* Javascript with node.js. (node_test.js)
- Perl using sparql.pm from[4] (test.pl)
- TODO
* Ruby, C#, Scala
[1] https://github.com/ncbo/sparql-code-examples
[2] http://sparql-wrapper.sourceforge.net
[3] http://thefigtrees.net/lee/sw/sparql.js (slightly modified to allow API keys)
[4] https://github.com/swh/Perl-SPARQL-client-library (slightly modified to allow API keys)
[*] The jar file alibaba-repository-sparql-2.0-beta9-patched.jar has been patched to allow API keys
and GET HTTP requests.
Example Queries
http://alphasparql.bioontology.org/examples