1. Benchmarking Versioning systems for Big Linked Data
Irini Fundulaki
Institute of Computer Science - FORTH
Greece
4th Graph-TA
Barcelona, Spain, March 4, 2016
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 1 / 5
2. Versioning Benchmarks
√
"Versioning is the creation and management of multiple releases of a product,
all of which have the same general function but are improved, upgraded or
customized."
√
Refers to the ability to store and retrieve different versions of an evolving
dataset.
√
A Versioning Benchmark should test how different systems behave with
respect to
the space required by the multiversion repository and
the efficiency of retrieving different versions and answering
cross-snapshot queries
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 2 / 5
3. Versioning Approaches
√
Full Materialization
Each version is stored in its entirety in the system
√
Delta-based approach
Only the difference (changes) between the different versions is stored
√
Timestamped tuples
Tuples are associated with timestamps to indicate when the tuple has
been added and/or deleted (as in standard databases)
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 3 / 5
4. Linked Data stores with versioning capabilities
Version Control for RDF Triple Stores, S. Cassidy and J. Ballantine
@IC-SOFT 2007.
x-RDF-3X: Fast Querying, High Update Rates, T. Neumann and G. Weikum
@PVLDB 3(1), 2010.
A Version Management Framework for RDF Triple Stores, D-H. Im, S.-W.
Lee and H.-J. Kim, @Int’ Journal of Software and Knowledge Engineering
2011.
R&Wbase: Git for triples, M. Sande, P. Colpaert, R. Verborgh, S. Coppens,
E. Mannens and R. V. de Walle @LDOW 2013.
R43ples: Revisions for Triples, M. Graube, S. Hensel and L. Urbas @LDQ
2014.
TailR: a platform for preserving history on the web of data, P. Meinhardt, M.
Knuth and H. Sack, @SEMANTICS 2015.
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 4 / 5
5. Linked Data stores with versioning capabilities
Version Control for RDF Triple Stores, S. Cassidy and J. Ballantine
@IC-SOFT 2007.
x-RDF-3X: Fast Querying, High Update Rates, T. Neumann and G. Weikum
@PVLDB 3(1), 2010.
A Version Management Framework for RDF Triple Stores, D-H. Im, S.-W.
Lee and H.-J. Kim, @Int’ Journal of Software and Knowledge Engineering
2011.
R&Wbase: Git for triples, M. Sande, P. Colpaert, R. Verborgh, S. Coppens,
E. Mannens and R. V. de Walle @LDOW 2013.
R43ples: Revisions for Triples, M. Graube, S. Hensel and L. Urbas @LDQ
2014.
TailR: a platform for preserving history on the web of data, P. Meinhardt, M.
Knuth and H. Sack, @SEMANTICS 2015.
Complete lack of Versioning Benchmarks!
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 4 / 5
6. Versioning Benchmark @ HOBBIT
√
Design a version generator that will be based on real changes observed in
evolving datasets
analyze evolving datasets widely used in various domains to assess the
most frequent simple and complex changes
define template changes to produce the versions, thereby mimicking real
world changes
√
design cross-snapshot queries to address the performance of the system to
answer queries
√
employ standard metrics for assessing the performance of versioning systems
space required for storing the versions
time required to execute the cross snapshot queries
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 5 / 5
7. Preliminary Results: Datasets & Changes
Datasets
Dataset #triples per version
Gene Ontology (GO) 200K
Ontology of Genes and Genomes (OGG) 1.2M
Medical Subject Headings (MSH) 1.6M
Foundational Model of Anatomy (FMA) 1.6M
Dbpedia 60M
BioModels 10M
Atlas RDF Ontology (ATLAS) 440M
Changes
√
Schema level
addition, deletion, modification of classes and properties, constraints etc.
√
Instance level
addition, deletion and modification of instances, comments, labels, etc.
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 6 / 5
8. Most prominent changes
ATLAS RDF Ontology (ATLAS), Gene Ontology (GO)
addition, deletion of comments
addition, deletion of labels
addition, deletion of property instances
addition, deletion of type information for instances
Medical Subject Headings (MSH)
addition, deletion of hierarchies
addition of property instances
addition, deletion of type information for instances
DBPedia
addition, deletion of labels
addition of property instances
addition of type information for instances
Irini Fundulaki (FORTH) HOBBIT March 4, 2016 7 / 5