The KUPKB integrates thousands of kidney and urinary pathway studies into an RDF knowledge base using ontologies to provide schema and annotation. The iKUP browser exposes the knowledge in a simple web interface, allowing biologists to more easily survey biological publications and generate hypotheses than traditional literature searches. The tools and APIs used make it possible to build such applications at relatively low cost.
Sharing, Connecting and Exposing Kidney and Urinary Knowledge using RDF and OWL
1. KUPKB: Sharing, Connecting and
Exposing Kidney and Urinary
Knowledge using RDF and OWL
www.kupkb.org
Julie Klein & Simon Jupp
Bio-health informatics group
University of Manchester
2. The problem domain
Thousands of studies have been conducted by the kidney research community
On different species
human mouse
On different materials
urine tissue cell
• On different biological levels
gene protein
Large diversity Integration of the knowldege is complex
3. Where does the data go?
Bespoke kidney laboratory databases
Research Papers
Generalist databases
Scattered, hidden in figures, coming in different formats
Most of the data is lost!
4. The Kidney and Urinary Pathway Knowledge Base:
SHARE AND CONNECT
The iKUP Browser:
EXPOSE
www.kupkb.org
5. Stucture
Populous
Experimental data
KUP Ontology
(schema)
RightField
RDF triple store
iKUP Browser
KUP Knowledge Base
6. Ontologies provide the schema
What has been observed, where and when?
Mouse anatomy Experimental factors
ontology
Gene Ontology
Animal model
Cell type ontology
Disease ontology
We needed to connect these reference ontologies.
Creation of a specialized Kidney and Urinary Pathway Ontology (KUPO)
http://www.e-lico.org/public/kupo/
7. Ontologies by stealth
The domain experts are the experts so get them build it
Biological
Cells Anatomy
processes(
(CTO) (MAO)
GO)
Spreadsheet
OPPL Scripts
Ontology
Populous generates simple Excel based templates
http://www.e-lico.eu/populous.html
8. Describing/Collecting experimental data
Gathering good meta-data AND data again by stealth using RightField
Content of the meta-data cells is constraint to
the relevant set of KUPO terms
http://www.sysmo-db.org/rightfield
10. Mashing it all together
Kidney and Urinary Pathway Ontology Experimental data
~1800 classes (~40,000 after imports closure) 220 KUP experiments integrated
Owl reasoning
RDF triple store
~35M triples
KUP Knowledge Base
11. SPARQLing results
Make it all RDF/OWL and expose a SPARQL endpoint…
…then we are done right?
We can now ask queries that span several databases
We can exploit OWL semantics for intelligent answers
BUT!
Easy to use application…
…this is what the biologist really want
14. Doing some biology
1. A biological question 2. No answer with classical tools
Can calreticulin be associated Search in Pubmed and Google does
to the development of human not return any relevant result!
kidney disease?
3. Querying the KUPKB
4. Validation in the wet-lab 5. Publish an innovative result
KUPKB in silico result Accepted for publication in the FASEB J!
confirmed.
16. Reusing and Building
Ontologies provide the schema Experimental data
Kidney and Urinary Pathway Ontology Annotations, homogenization
Tool to facilitate building of onto. Tool to facilitate data annotation
Owl reasoning
RDF triple store iKUP Browser
KUP Knowledge Base
17. What next
User study and evaluation experiments ongoing with
Manchester Web Ergonomics Lab
Application to other biological domains
Change the domain model in the ontologies and we can construct any
organ knowledge base in this way
Already interests in gut, liver, heart and metabolic diseases
18. Acknowledgments
• Simon Jupp
• Stuart Owen, Matthew Horridge, Katy Wolstencroft and Carole Goble @
University of Manchester for RightField
• Joost Schanstra, Panagiotis Moulos, Jean-Loup Bascands @ Renal Fibrosis
Lab, Toulouse, France
• Aristidis Charonis, Bénédicte Buffin-Meyer, Myriem Fernandez for the CALR
example
• e-LICO FP7 project and EuroKUP
• Robert Stevens, ontology development, University of Manchester
Open Source License: GNU Lesser General Public License
Code: http://code.google.com/p/kupkb-dev/
20. Some rough stats…
• 195 KUP experiments integrated
• KUPKB RDF store ~35M triples
• KUPK Ontology ~1800 classes. ~40,000 after imports closure
Architecture
• Sesame and BigOWLIM for the RDF store
• Web site developed with Google web toolkit
• OWL API and HermiT reasoner for classification and faceted browsing
21. Summary
The KUPKB RDF store is a mashup of biological knowledge relating to the
KUP domain
Ontologies provide the schema and a consistent data annotation mechanism
We expose this knowledge base through a simple web interface that real
biologists can use, the iKUP
iKUP and KUPKB provides a faster mechanism for the biologist to survey the
data in biological publications and helps the hypothesis generation process.
It is a testament to the tools and APIs that such applications are now being
delivered at relatively low cost
Hinweis der Redaktion
Renal physiology Human urinary protein map Renal pathophysiology Biomarker discovery