Modelling and Querying Lists in RDF. A Pragmatic Study

1 Knowledge Representation & Reasoning, Computer Science Department
MODELLING AND QUERYING LISTS
IN RDF: A PRAGMATIC STUDY
Enrico Daga, The Open University
Albert Meroño-Peñuela, Vrije Universiteit Amsterdam
@albertmeronyo
Enrico Motta, The Open University
QuWeDa 2019: 3rd Workshop on Querying and Benchmarking the
Web of Data
ISWC, 26 October, Auckland

2
 LOD publishing should make data easy to consume
 Modelling choices are often left to subjective choice
 These practices and their reuse are key in query performance
 Lists are everywhere! Co-authors, timelines, media, recipes, etc.
Knowledge Representation & Reasoning, Computer Science Department
MOTIVATION

3
 And in MIDI
MOTIVATION
 So what do we know about performance of RDF List solutions?
…
[ 144, 60, 100]
[ 128, 60, 64 ]
…
[Pic of music editing software]

4
 Modelling of RDF lists
> RDF(S) container classes (rdf:Bag, rdf:Alt, rdf:Seq)
> Closed collections (rdf:List : rdf:first, rdf:rest, rdf:nil)
> JSON-LD/Turtle syntaxes: "@list": [ "joe", "bob", "jaybee" ],
:a :b ( "bob" "alice" "carol")
> Ontology Design Patterns: Sequence OP, Collections Ontology
 Benchmark datasets and queries
> BSBM, LUBM, SP2Bench, DBPedia SPARQL, WatDiv
> LSQ
> IGUANA, LDBC
RELATED WORK

5
What RDF list models are common in LOD? What is their impact in
performance when retrieving them? Can we identify patterns enabling
sustainability?
C1: Survey of common list modelling practices in RDF
C2: Their comparison when queried from common triplestores in
various sizes and operations
RESEARCH QUESTIONS & CONTRIBUTIONS

6
CQ1. Full list lookup: What is the ordered content of the list?
CQ2. N-th Lookup: Which is the n-th item in the list?
CQ3. Ordered Range: What are the n…m items in the list?
Aimed at supporting use-case LOD publishing
Do not deal with list management (edit, merge, split, etc.)
Focus on minimal and atomic operations related to list ordered access
REQUIREMENTS (OPERATIONS)

7
Surveyed from:
 W3C standards
 The Ontology Design Patterns portal
 List choices in RDF datasets from ISWC resource track papers
 Linked Open Vocabularies (LOV)
 LOD Laundromat/LOD-a-lot file
Findings: RDF Sequences, RDF Lists, URI-based Lists, Number-based
Lists, Timestamp-based Lists, Sequence Ontology Pattern
LIST PATTERNS

RDF SEQUENCE AND RDF LIST
[SEQ]
[LIST]

URI, NUMBER, TIMESTAMP IMPLICIT ORDERING
[URI]
[NUM] [TIME]

SEQUENCE ONTOLOGY PATTERN
[SOP]

11
[SEQ] WHERE {:list a midi:Track ; midi:hasEvents [ ?seq ?event ] .
BIND (xsd:integer(SUBSTR(str(?seq), 45)) AS ?index)
} ORDER BY ?index  OFFSET <N> LIMIT <M-N+1>
[LIST] SELECT ?event (COUNT(?step) as ?index) WHERE {
:list a midi:Track ; midi:hasEvents ?events . ?events rdf:rest∗ ?step .
?step rdf:rest∗ ?elt . ?elt rdf:first ?event .
} GROUP BY ?event ORDER BY ?index  rdf:rest{N}, /…{N}…/
[URI] WHERE { [] a midi:Track ; midi:hasEvent ?event .
BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id) } ORDER BY ?id  OFFSET…
[NUM/TIME] WHERE { [] a midi:Track ; midi:hasEvent ?event .
?event midi:absoluteTick ?tick . } ORDER BY ?tick
[SOP] WHERE { [] a midi:Track ; midi:hasEvent ?event . ?event sequence:precedes?
?next_event . ?next_event sequence:follows? ?event .
BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id)
} ORDER BY ?id
FORMALIZATION

12
 Dataset: MIDI Linked Data Cloud, 300K MIDIs in RDF [Meroño-Peñuela
et al. ISWC 2017]
 Benchmark: List.MID (come see ISWC resource paper!)
 Size dimension: lists of 1k, 30k, 60k, 90k, 120k elements
 Pattern dimension: list patterns
 Operations: SPARQL for all list, n-th element, n-m range
 Triplestores: Virtuoso V7, Blazegraph 2.1.5, Fuseki v3 TDB, Fuseki
v3 Memory
EVALUATION

RESULTS CQ1 (FULL LIST)

RESULTS CQ2 (N-TH ELEMENT)

RESULTS CQ3 (N…M RANGE)

16
 Coherent behavior among triplestores (model > optimization)
 rdf:List elegant but poor performance (Fuseki timeout)
 SOP scales better than rdf:List yet less efficient than property-based
lists
 rdf:Seq and property-based [NUM], [TIME], [URI] perform best
> Hypothesize mostly due to P and S-O database indexes, resp.
 Virtuoso’s management of OFFSET, LIMIT on [NUM], [TIME]
 rdf:Seq is a good trade-off but strictly for open lists
> Indices rdf:_N do not guarantee random access
> Update in SPARQL 1.2 spec?Knowledge Representation & Reasoning, Computer Science Department
OBSERVATIONS

17
Lists are important! But how to assess the impact of their models?
 6 common list patterns in RDF and their performance comparison
 2 model families: link-based lists, property-based lists
 For our CQs, inelegant literals > Link-based lists
Limitations/future work:
 Limited set of list operations (e.g. rdf:List could win in e.g. addition)
 No triplestore optimization
 Apply methodology to other data structures
CONCLUSIONS

18
Questions, comments, suggestions
most welcome
@enridaga
@albertmeronyo
https://github.com/MIDI-LD/List.MID
Knowledge Representation & Reasoning, Computer Science
Department
THANK YOU

19
 Motivation
 Related Work
 Requirements
 List Patterns
 Queries
 Performance experiments
 Conclusions
OUTLINE

21
Use this slide to place an image to
the left and text to the right.
 With bullet
> Secundary list
To replace the image, right-click the
image (click on it with your other
mouse button), select Change
picture… and choose the new
image.
Knowledge Representation & Reasoning, Computer Science
Department

Modelling and Querying Lists in RDF. A Pragmatic Study

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (16)

Ähnlich wie Modelling and Querying Lists in RDF. A Pragmatic Study

Ähnlich wie Modelling and Querying Lists in RDF. A Pragmatic Study (20)

Mehr von Albert Meroño-Peñuela

Mehr von Albert Meroño-Peñuela (19)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Modelling and Querying Lists in RDF. A Pragmatic Study