4. What is an ontology ?
• Ontology = a specification of a
conceptualization (Gruber 1993)
• In practice: controlled vocabularies
– Disambiguation (e.g. Bank, Running)
– Language/species independence
• Very useful in biology – complex hierarchies of
terms
5. Ontologies in the bio Domain
• OBO Foundry - open Biological and
Biomedical Ontologies
• Common principles
• List of ontologies at
http://www.obofoundry.org
• OBO is also a data format .obo
6. SideTrack – The Gene Ontology
• The mother of bio-ontologies: the GO
– Oldest bio – ontology
– Many practical applications:
• Cross species studies
• Term abundance studies
• GO is an OBO ontology
8. SideTrack – The Gene Ontology
• Relationships between terms:
– Subsumption: is_a
– Partonomic: part_of
• These terms are transitive
• Terms form a DAG (directed, acyclic graph)
• Some information can be inferred
16. Semantic Technologies
• Representation of triples
– Basic data format: RDF/XML
– All data expressed in RDF (Resource Description
Framework)
– Several compatible syntaxes: TTL (Terse Triple
Language) most human readable
22. IRI’s and Literals
• Terms can be either IRI’s, Literals or blank nodes
• IRI = Internationalized Resource Identifier
• Unique id – a virtual URI
– Example: <http://bioinformatics.be/terms#martijn>
– There is no requirement for resolving
– Now: Open Data initiatives: please do use resolvable
URI’s http://linkeddata.org
– Unique identifiers can be registered on
http://identifiers.org
23. Introduction
• Literals: can be typed, allowed types from the
XSD namespace:
– E.g. “This is a string example”^^xsd:string
– E.g. “5”^^xsd:integer
• IRI’s are used for entities and attributes
• Literals are used for attribute values that
aren’t entities
30. Graphs
• Triples are building blocks of Graphs
• Combining sets of triples allows the
construction of arbitrarily complex graphs
b4x:martijn b4x:karmeliethas_favorite_beer
31. Add meaning !
• Reuse terms from existing, well defined
vocabularies – ontologies (foaf, dc, go, so)
• Describe new terms = Ontologies
• Contain
– A crisp human definition
– Some machine readable facts
32. Metadata
• Ontologies are also described in RDF
– RDFS: RDF - Schema
– OWL: Web Ontology Language
– Also expressed in RDF
• For clarity, file extension can be .rdfs or .owl
37. Semantic Technologies
• Inference
– Enhance dataset using knowledge from metadata
(e.g. rdfs, owl)
• Types of inference engines
– RDFS inference
• RDFS entailment regime
– OWL inference
• Under active research
• Engines exist for specific subsets of OWL (OWL-DL)
41. DuckTyping
• Watch out with inference !
Example: You want to express that people can
have lengths
b4x:length a rdf:Property;
rdfs:domain foaf:Person;
rdfs:range xsd:integer.
42. DuckTyping
• Problem:
ex:VW_Transporter b4x:length “600”^xsd:integer.
• Would infer that VW_Transporter is a Person !
• This is called DuckTyping
If it looks like a duck, swims like a duck, and
quacks like a duck, then it probably is a duck
45. Storing RDF
• As an RDF file for download
• In a Triplestore
– Database optimised for storing triples
– Examples: BlazeGraph, Fuseki, Sesame
46. Semantic Technologies
• Querying over RDF data: SPARQL
• Cool features:
– Distributed querying = actual distribution of data
and computing resources
– SPARQL/Update: modify data
• SPARQL endpoints: SPARQL over HTTP
47. SPARQL Query Syntax
• First example:
SELECT ?subject ?predicate ?object WHERE {
?subject ?predicate ?object.
}
(Generally not a good idea as it will pull down
the whole dataset)
Binding variables
Graph matching
51. SPARQL Query Syntax
• Find all classes:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
}
(This will only retrieve classes that have a label)
52. SPARQL Query Syntax
• Find all classes:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
OPTIONAL {
?class rdfs:label ?label.
}
}
53. SPARQL Query Syntax
• Find all classes that contain “duck” in the
label:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS (str(?label) , “duck” ) )
}
54. SPARQL Query Syntax
• Make it case insensitive:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
55. SPARQL Query Syntax
• Search in specific graph:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label
FROM <http://example.org/animals>
WHERE {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
56. SPARQL Query Syntax
• Search in specific graph:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?class ?label WHERE {
GRAPH <http://example.org/animals> {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
}
57. SPARQL Query Syntax
• Can also search for graphs :
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?g WHERE {
GRAPH ?g {
?class a rdfs:Class.
?class rdfs:label ?label.
FILTER( CONTAINS ( UCASE(str(?label)) , “DUCK” ) )
}
}
59. • Basic data element = a Triple
– A mini sentence
– Contains three Terms:
– Subject Predicate Object
• Example:
<http://xmpl/entities#martijn>
<http://xmpl/relations#has_favorite_beer>
<http://xmpl/entities#karmeliet>.
Take home Summary
64. Interoperability between OBO and
Semantic Technologies
• Originated from two separate academic worlds
• Computing applications of OBO mainly
consistency checking and overrepresentation
analysis
• Semantic Technologies: much broader toolset
• Interoperability ?
– Direct offering in both formats
– Automated mapping
65. Where to find ontologies
• OBO Foundry
• Bioportal; NCBO
• Biogateway
• Bio2RDF
66. Where to find RDF data
• Google for SPARQL endpoint
• => e.g. EBI databases
• Non biological: DBpedia
67. How about Tim Berners Lee’s vision
• We’re not there yet, but for bio data we’re
getting quite close
– The explicitome
– Crowd sourcing
– Nanopublications
79. • From a web interface
• Using http
– HTTP GET
– HTTP POST : for larger query strings
– Headers determine response type (JSON, XML, HTML)
http://…/sparql?default-graph-uri=<http://graphName>&query=URLENCODEDQUERYSTRING
Running SPARQL