Many Linked Data datasets model elements in their domains in the form of lists: a countable number of ordered resources.
When publishing these lists in RDF, an important concern is making them easy to consume.
Therefore, a well-known recommendation is to find an existing list modelling solution, and reuse it.
However, a specific domain model can be implemented in different ways and vocabularies may provide alternative solutions.
In this paper, we argue that a wrong decision could have a significant impact in terms of performance and, ultimately, the availability of the data.
We take the case of RDF Lists and make the hypothesis that the efficiency of retrieving sequential linked data depends primarily on how they are modelled (triple-store invariance hypothesis).
To demonstrate this, we survey different solutions for modelling sequences in RDF, and propose a pragmatic approach for assessing their impact on data availability.
Finally, we derive good (and bad) practices on how to publish lists as linked open data.
By doing this, we sketch the foundations of an empirical, task-oriented methodology for benchmarking linked data modelling solutions.
Modelling and Querying Lists in RDF. A Pragmatic Study
1. 1 Knowledge Representation & Reasoning, Computer Science Department
MODELLING AND QUERYING LISTS
IN RDF: A PRAGMATIC STUDY
Enrico Daga, The Open University
Albert Meroño-Peñuela, Vrije Universiteit Amsterdam
@albertmeronyo
Enrico Motta, The Open University
QuWeDa 2019: 3rd Workshop on Querying and Benchmarking the
Web of Data
ISWC, 26 October, Auckland
2. 2
LOD publishing should make data easy to consume
Modelling choices are often left to subjective choice
These practices and their reuse are key in query performance
Lists are everywhere! Co-authors, timelines, media, recipes, etc.
Knowledge Representation & Reasoning, Computer Science Department
MOTIVATION
3. 3
And in MIDI
Knowledge Representation & Reasoning, Computer Science Department
MOTIVATION
So what do we know about performance of RDF List solutions?
…
[ 144, 60, 100]
[ 128, 60, 64 ]
…
[Pic of music editing software]
5. 5
What RDF list models are common in LOD? What is their impact in
performance when retrieving them? Can we identify patterns enabling
sustainability?
C1: Survey of common list modelling practices in RDF
C2: Their comparison when queried from common triplestores in
various sizes and operations
Knowledge Representation & Reasoning, Computer Science Department
RESEARCH QUESTIONS & CONTRIBUTIONS
6. 6
CQ1. Full list lookup: What is the ordered content of the list?
CQ2. N-th Lookup: Which is the n-th item in the list?
CQ3. Ordered Range: What are the n…m items in the list?
Aimed at supporting use-case LOD publishing
Do not deal with list management (edit, merge, split, etc.)
Focus on minimal and atomic operations related to list ordered access
Knowledge Representation & Reasoning, Computer Science Department
REQUIREMENTS (OPERATIONS)
7. 7
Surveyed from:
W3C standards
The Ontology Design Patterns portal
List choices in RDF datasets from ISWC resource track papers
Linked Open Vocabularies (LOV)
LOD Laundromat/LOD-a-lot file
Findings: RDF Sequences, RDF Lists, URI-based Lists, Number-based
Lists, Timestamp-based Lists, Sequence Ontology Pattern
Knowledge Representation & Reasoning, Computer Science Department
LIST PATTERNS
8. 8 Knowledge Representation & Reasoning, Computer Science Department
RDF SEQUENCE AND RDF LIST
[SEQ]
[LIST]
16. 16
Coherent behavior among triplestores (model > optimization)
rdf:List elegant but poor performance (Fuseki timeout)
SOP scales better than rdf:List yet less efficient than property-based
lists
rdf:Seq and property-based [NUM], [TIME], [URI] perform best
> Hypothesize mostly due to P and S-O database indexes, resp.
Virtuoso’s management of OFFSET, LIMIT on [NUM], [TIME]
rdf:Seq is a good trade-off but strictly for open lists
> Indices rdf:_N do not guarantee random access
> Update in SPARQL 1.2 spec?Knowledge Representation & Reasoning, Computer Science Department
OBSERVATIONS
17. 17
Lists are important! But how to assess the impact of their models?
6 common list patterns in RDF and their performance comparison
2 model families: link-based lists, property-based lists
For our CQs, inelegant literals > Link-based lists
Limitations/future work:
Limited set of list operations (e.g. rdf:List could win in e.g. addition)
No triplestore optimization
Apply methodology to other data structures
Knowledge Representation & Reasoning, Computer Science Department
CONCLUSIONS
18. 18
Questions, comments, suggestions
most welcome
@enridaga
@albertmeronyo
https://github.com/MIDI-LD/List.MID
Knowledge Representation & Reasoning, Computer Science
Department
THANK YOU
19. 19
Motivation
Related Work
Requirements
List Patterns
Queries
Performance experiments
Conclusions
Knowledge Representation & Reasoning, Computer Science Department
OUTLINE
21. 21
Use this slide to place an image to
the left and text to the right.
With bullet
> Secundary list
To replace the image, right-click the
image (click on it with your other
mouse button), select Change
picture… and choose the new
image.
Knowledge Representation & Reasoning, Computer Science
Department