Modelling context and statement-level metadata in knowledge graphs

Modelling Context & Statement-Level
Metadata in KGs
Dr. Fabrizio Orlandi
ADAPT Research Centre (TCD)

www.adaptcentre.ieKnowledge Graphs - Example
2Image source: https://aws.amazon.com/neptune/

www.adaptcentre.ieKnowledge Graphs - Example
3Image source: https://aws.amazon.com/neptune/
When did this occur?
What is the time span?
(Valid time)

www.adaptcentre.ie
4
(Valid time)
What’s the confidence
of this fact?
(Certainty)
Knowledge Graphs - Example

www.adaptcentre.ie
(Valid time)
When were these facts
created? What’s their
time validity?
(Transaction time)
of this fact?
(Certainty)
5

www.adaptcentre.ie
(Valid time)
When were these facts
created? What’s their
time validity?
(Transaction time)
of this fact?
(Certainty)
6
Where does this data
come from?
(Provenance)

www.adaptcentre.ie
● Temporal aspects of facts are usually not reflected in KGs
(When are specific statements - triples - valid?)
● Facts extracted from heterogeneous data sources hold different degrees of
certainty, depending on the source or the extraction/generation process
● Missing efficient solutions for managing the dynamics (the evolution) of KGs
(When were specific statements added/updated?)
● Need for data provenance: what’s the origin of the data?
Popular Use Cases for Contextual Metadata
7

www.adaptcentre.ieData Provenance with PROV-O
Provenance (W3C definition¹):
“Provenance of a resource is a record that describes entities and processes involved in producing and delivering or
otherwise influencing that resource.
Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility.
Provenance assertions are a form of contextual metadata and can themselves become important records with their own
provenance.”
PROV-O:
W3C ontology (OWL) based on
the core PROV data model
http://www.w3.org/TR/prov-o/
8¹ https://www.w3.org/2005/Incubator/prov/wiki/What_Is_Provenance

www.adaptcentre.ieData Provenance with PROV-O - Example
9Image source: https://www.w3.org/TR/prov-o/

10
prov:wasAttributedTo
:Fabrizio
… but how do we achieve this?

www.adaptcentre.ieExample of Statement-Level Metadata
11
Subject Predicate Object Starts Ends
Cristiano Ronaldo team Real Madrid 1 July 2009 10 July 2018
Cristiano Ronaldo team Juventus 11 July 2018
Cristiano Ronaldo Real Madrid
team
How to represent this
in a graph?
?
the problem of n-ary (not binary) relations...

www.adaptcentre.ieRDF graphs vs. Property graphs
12
RDF Graphs
● Formally defined data model
● Various well-defined serialization
formats
● Well-defined query language with a
formal semantics
● Natural support for globally unique
identifiers
● Semantics of data can be made
explicit in the data itself
● W3C recommendations (standards!)
● High usage complexity
Labeled-Property Graphs (e.g. neo4j )
● Easy to manage statement-level
metadata
● Efficient graph traversals
● Fast and scalable implementations
● No open standards defined
● Different proprietary implementations
and query languages
● Good adoption in enterprise

13
RDF Graphs
Vertices
Every statement produces two vertices in the graph.
Some are uniquely identiﬁed by URIs: Resources
Some are property values: e.g. Literals
Edges
Every statement produces an edge.
Uniquely identiﬁed by URIs
Vertices or Edges have NO internal structure
Labeled-Property Graphs (e.g. neo4j )
Vertices
Unique Id + set of key-value pairs
Edges
Unique Id + set of key-value pairs
Vertices and Edges have internal structure

14
SPARQL
SELECT ?who
WHERE
{
?who :likes ?a .
?a rdf:type :Person .
?a :name ?aName .
FILTER regex(?aName,’Ann’)
}
Cypher (neo4j)
MATCH
(who)-[:LIKES]->(a:Person)
WHERE
a.name CONTAINS ‘Ann’
RETURN who
Query: Who likes a person named “Ann”?

www.adaptcentre.ieStatement-Level Metadata with Property Graphs
15
Cristiano_Ronaldo team Real_Madrid 1 July 2009 10 July 2018
Cristiano Ronaldo Real Madrid
team {
starts : 2009-07-01
ends : 2018-07-10 }

www.adaptcentre.ieModelling (1) - RDF Reification
16
Cristiano_Ronaldo
team
Subject Predicate Object
Cristiano_Ronaldo team Real_Madrid
Stmt1 type Statement
Stmt1 subject Cristiano_Ronaldo
Stmt1 predicate team
Stmt1 object Real_Madrid
Stmt1 starts 2009-07-01
Stmt1 ends 2018-07-10
Real_Madrid
Stmt1 Statement
2009-07-01
2018-07-10
subject object
predicate
type
starts
ends

www.adaptcentre.ieModelling (1) - RDF Reification
Pros:
1. Easy to understand
Cons:
1. Not Scalable => Takes 4N to represent
a statement
2. No formal semantics defined
3. Discouraged in LOD!
4N
Cristiano_Ronaldo team Real_Madrid
Stmt1 type Statement
Stmt1 subject Cristiano_Ronaldo
Stmt1 predicate team
Stmt1 object Real_Madrid
Stmt1 starts 2009-07-01
Stmt1 ends 2018-07-10

www.adaptcentre.ie
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Modelling (2) - Singleton Property
18
Cristiano_Ronaldo
team#1
Real_Madrid
team
2009-07-01
2018-07-10
singletonPropertyOf
starts
ends
Cristiano_Ronaldo team#1 Real_Madrid
team#1 singletonPropertyOf team
team#1 starts 2009-07-01
team#1 ends 2018-07-10

www.adaptcentre.ie
Vinh Nguyen, Olivier Bodenreider, and Amit Sheth. "Don't like RDF reification?: making statements about statements using singleton property."
In Proceedings of the 23rd international conference on World wide web, ACM, 2014.
Modelling (2) - Singleton Property
19
Cristiano_Ronaldo team#1 Real_Madrid
team#1 singletonPropertyOf team
team#1 starts 2009-07-01
team#1 ends 2018-07-10
Pros:
1. More scalable => only 1 extra triple
Cons:
1. Less intuitive
2. Large number of unique predicates
3. Requires verbose constructs in queries

www.adaptcentre.ieModelling (3) - RDF* and SPARQL*
20
RDF extension for nested triples:
<< :Cristiano_Ronaldo :team :Real_Madrid >>
:starts “2009-07-01” ;
:ends “2018-07-10”.
SPARQL extension with nested triple patterns:
SELECT ?player WHERE {
<< ?player :team :Real_Madrid >> :starts ?date .
FILTER (?date >= “2009-07-01”) }

www.adaptcentre.ie
21
1. Purely syntactic “sugar” on top of standard RDF and SPARQL
a. Can be parsed directly into standard RDF and SPARQL
b. Can be implemented easily by a small wrapper on top of any
existing RDF store (DBMS)
2. A logical model in its own right, with the possibility of a
dedicated physical schema
a. Extension of the RDF data model and of SPARQL to capture the notion of
nested triples
b. Supported by some of the most popular triplestores (e.g. Jena, Blazegraph)
Modelling (3) - RDF* and SPARQL*
O Hartig: “Foundations of RDF* and SPARQL* - An Alternative Approach to Statement-Level Metadata in RDF.” In Proc. of the 11th Alberto Mendelzon
International Workshop on Foundations of Data Management (AMW), 2017.

www.adaptcentre.ie
22
Recent effort and solution, receiving wider attention and support.
Since 2020, part of the W3C “RDF dev community group”: https://w3c.github.io/rdf-star/
Modelling (3) - RDF* and SPARQL*
Now you can also test it live on Yago (https://yago-knowledge.org)
Try --> https://bit.ly/2V4ARXL

www.adaptcentre.ie
Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
Modelling (4) - Named Graphs (Quads)
23
Subject Predicate Object NG
Cristiano_Ronaldo team Real_Madrid graph_1
graph_1 starts 2009-07-01 graph_X
graph_1 ends 2018-07-10 graph_X
Cristiano_Ronaldo
team
Real_Madrid
graph_1
2009-07-01
2018-07-10
starts
ends
graph_X

www.adaptcentre.ie
Pros:
1. Intuitive - creates N named graphs for N
sources
2. Attach metadata for a set of triples
3. RDF and SPARQL standards
https://www.w3.org/TR/sparql11-query/#specifyingDataset
Cons:
1. Restricts usage of named graphs to
provenance only
2. Requires verbose constructs in queries
Modelling (4) - Named Graphs (Quads)
24
Subject Predicate Object NG
Cristiano_Ronaldo team Real_Madrid graph_1
graph_1 starts 2009-07-01 graph_X
graph_1 ends 2018-07-10 graph_X
Carroll, Jeremy J., et al. "Named graphs, provenance and trust." Proceedings of the 14th international conference on World Wide Web. ACM, 2005.
A possible specification is N-Quads that extends N-Triples
with an optional context value at the fourth position
http://www.w3.org/TR/n-quads/ (W3C Recommendation)

25
prov:wasAttributedTo
:Fabrizio
Expressing statements about statements using Named Graphs and PROV-O
:graphName

www.adaptcentre.ie(5) The case of Wikidata
26
Subject
Predicate
Object
https://www.wikidata.org/wiki/Q11571

www.adaptcentre.ieWikidata
27
Predicate
Object
Statement-level Metadata
(Wikidata “Qualiﬁers”)

www.adaptcentre.ieModelling (5) - Qualifiers in Wikidata
28
wd:Cristiano_Ronaldo
wdt:member_of_sports
_team wd:Real_Madrid
wds:Statement
2009-07-01
2018-07-10
p:member_of_sports_team ps:member_of_sports_team
pq:start_time
pq:end_time
The prefix p: points not to the object, but to a statement node. This node then is the subject of other triples.
The prefix ps: within the statement node retrieves the object.
The prefix pq: within the statement node retrieves the qualifier information.
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
(see: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks)

www.adaptcentre.ie
30
Try it out at https://query.wikidata.org/
(or directly at: https://w.wiki/BWZ )

www.adaptcentre.ieSummary - Statement Level Metadata in RDF
1) Standard Reification
2) Singleton Property
3) RDF* / SPARQL*
4) Named Graphs (Quads)
5) Wikidata Qualifiers
31

www.adaptcentre.ie
Research in our group…
How can we effectively represent and manage temporal dynamics
and uncertainty of facts in knowledge graphs?
Current activities:
● Model and characterise facts in KGs according to temporal and uncertainty aspects
● Develop solutions for real-time processing, update and propagation of changes in
KGs
● Evaluate the developed solutions, applying them to different use cases
32

www.adaptcentre.ie
Research in our group…
- RDF* Observatory: Benchmarking RDF*/SPARQL* engines
https://github.com/dgraux/RDFStarObservatory
- A real-time dashboard for Wikidata edits
- Summarising and verbalising the evolution of KGs with Formal
Concept Analysis
- A scalable and efficient storage layer for temporal KGs
33

www.adaptcentre.ie
Some Industrial Use-Cases
1) Finance (temporal aspects)
Data about companies, their shares & market is complex, available and very time-dependent.
→ See “Thomson Reuters” and “Bloomberg” KGs
2) Law / Court Cases (uncertainty)
Legal search and Q&A systems on large corpora of court cases need the uncertainty dimension for
their different information extraction systems
→ See “Wolters Kluwer’s KG” and Google’s “Knowledge Vault”
3) News & Social Media (dynamics)
Very time-dependent & uncertain data which needs an efficient management solution for its dynamics
→ See “GDELT” Global Knowledge Graph project
34

Modelling context and statement-level metadata in knowledge graphs

Recommended

Recommended

More Related Content

Similar to Modelling context and statement-level metadata in knowledge graphs

Similar to Modelling context and statement-level metadata in knowledge graphs (20)

More from Fabrizio Orlandi

More from Fabrizio Orlandi (13)

Recently uploaded

Recently uploaded (20)

Modelling context and statement-level metadata in knowledge graphs