Database Integrated Analytics using R InitialExperiences wi
final_copy_camera_ready_paper (7)
1. An Analytical Approach for Query Optimization
Based on Hypergraph
Sangeeta Sen
NIT, DURGAPUR
Email: sangeetaaec@
gmail.com
Anisha Agrawal
NIT, DURGAPUR
Email: anishaagrawal23@
gmail.com
Ankit Rathi
NIT DURGAPUR
Email: ankitrathi94@
it.nitdgp.ac.in
Animesh Dutta
NIT DURGAPUR
Email: animesh.dutta@
it.nitdgp.ac.in
Biswanath Dutta
ISI Bangalore
Email: bisu@drtc
.isibang.ac.in
Abstract—In Semantic web, Resource Description Framework
(RDF) plays an important role for storing the data. Growth of
RDF data throws a challenge for data management and evaluation
of SPARQL in optimized time. In this paper we propose a
hypergraph based data management system and SPARQL query
optimization technique. We use the concept of hypergraph which
is a generalization of the graph where the edges connect more
than two vertices. We propose some algorithms for storing RDF
data as hypergraph and make query on it. We compare the
performance of our algorithms with 3 other systems (Rdf-3x,
Apache-Jena and AllegroGraph) based on SP2
Bench a SPARQL
performance benchmark dataset and SPARQL queries.
I. INTRODUCTION
Resource Description Framework 1
(RDF) is a core data
model for storing information on the Semantic Web [2]. RDF
data is stored as triplets (subject or resource, predicate or
property and object or property value). Due to the development
of RDF data, optimized query processing of SPARQL2
queries
over RDF datasets has become an challenging issue. SPARQL
is a graph based query language. The main task of SPARQL is
to match the pattern, which consists of subject, predicate and
object, in the underlying RDF graph.
In this work, we propose a hypergraph [6], [16] based query
optimization and data management technique that needs less
computation time than the existing techniques discussed in
rdf-3x [1], AllegroGraph [13], Jena [14]. A hypergraph is a
generalization of the graph where the edges connect more
than two vertices and such edges are called hyper-edges.
To illustrate the novelty of our proposed work, we provide
an experimental study that contrast the performance of 4
systems (including ours) against SP2
Bench benchmark [9]
with different triples datasets and SPARQL queries. Following
are the main contributions of this paper:
• Hypergraph based database of RDF data: All sub-
jects and objects connected with a particular predicate
resides under a particular hyper-edge (predicate itself),
so searching of RDF triples become easy to execute.
• Query path based query Processing: To process
a query, first the model will generates a query path
according to rearranged query predicates. According
to this path, query is processed in the hypergraph.
1www.w3.org/RDF/
2www.w3.org/2001/sw/wiki/SPARQL
Rest of the paper is organized as follows. In section II, current
state of the art is highlighted. Section III provides an overview
of the problem related to SPARQL query optimization and
section IV presents the system architecture. In section V we
describe the approach definition of the terms used throughout
the paper. In section VI, we discuss a method for solving the
problem of query optimization in RDF graph. In Section VII,
we discuss the results of the proposed method. Section VIII
concludes the work mentioning our future work.
II. STATE-OF-THE-ART
There has been a lot of research work in the field of
RDF data storage and SPARQL query optimization. There
are many approaches for SPARQL query processing, GRIN
Index [12], PIG [11], and g-store [18] focuses on reducing the
search space of the graph traversing algorithm using graph
index. RG-Index [7] also uses graph index but focuses on
reducing the joins in relation based RDF store. The model
of the paper [8] represents an efficient RDF query engine to
evaluate SPARQL query. This paper employs inverted index
structure for indexing the RDF triple. A tree shaped optimized
algorithm is developed that transforms a SPARQL query graph
into optimal query plan by effectively reducing the searching
space. In paper [15], the authors present TripleBit, a fast
and compact system for storing and accessing RDF data.
The compact design of TripleBit reduces both the size of
stored data and size of its indexes. They develop a query
processor which dynamically generates an optimal execution
ordering for join queries, leading to fast query execution and
effective reduction of intermediate result. System SQUIN, in
paper [4], implements a novel query execution approach. The
main idea is to integrate the traversal of data links into the
construction of result. The authors of paper [5] introduce
a scalable RDF data management system. They introduce
techniques for partitioning the data across nodes in a manner
that helps to accelerate query processing. Authors of the paper
[17] introduces Trinity. RDF, a distributed, memory based
graph engine for web scale data. The paper also introduces
a new cost model, novel cardinality estimation technique and
optimization algorithm for distributed query plan generation. A
new join ordering algorithm that performs a SPARQL tailored
query simplification is introduced by paper [3]. The paper
also represents a novel RDF statistical synopsis that accurately
estimates cardinality targeting SPARQL queries. They provide
dynamic programming that decomposes the SPARQL graph
into disjoint star shaped sub-queries.978-1-4799-7961-5/15/$31.00 c 2015 IEEE
2. In the current work, we propose a hypergraph based query
optimization technique as discussed in the following sections.
III. PROBLEM SCENARIO
SPARQL, a query language, is developed to provide a so-
lution to the problem of RDF data querying. In 2008 SPARQL
becomes a W3C recommendation, but query optimization is a
big challenge [[7], [8]]. To explain the problems we take an
exemplary RDF graph as shown in figure 1.
a4a1 a7
type
playsfor
bornPosition
type
a2
a3
a5
position
born
playsfor
region
manages
a9
a10population
a8
a6
Fig. 1: RDF Graph
The above figure 1 shows that the vertices subject and
object are connected by edge predicates. Direction of the
edges are from subject to object. Here a4, a5, a9 are subjects,
a1, a2, a3, a7, a10 are objects and a6, a8 represent both
subjects and objects. The edges playsfor, born, position, etc.
are the predicates of the graph. SPARQL query contains match
patterns as conditions. These match patterns are required to be
matched in RDF database. Query1 contains one match pattern
to extract information about player’s birth place.
query1 : Qp
1 ⇒ {?player born ?region}.
Although there is a single predicate match pattern in the above
query, it becomes challenging when RDF graph is too large.
Because to find the particular edge labeled as predicate of
SPARQL query from the graph becomes complex and time
consuming. We can understand that as the number of match
pattern increases, the complexity and execution time also
increases. For instance, the following query 2 contains 5 match
patterns.
query2 : Qp
1 ⇒ {?player playesfor ?club},
Qp
2 ⇒ {?player type a2},
Qp
3 ⇒ {?player born ?region} ,
Qp
4 ⇒ {?club region ?region},
Qp
5 ⇒ {?region population ?pop}.
In the current work our aim is to match these patterns in the
database in optimized time. In our earlier work in [10] we have
provided the idea of using hypergraph on top of the RDF graph.
To be more specific, we have applied the concept of undirected
hypergraph with an idea of placing the subjects and objects
of a particular predicate into a single hyperedge. The paper
provides some algorithms to achieve the goal of SPARQL
query optimization. In the current work we implement the
algorithms with some refinements to improve the performance
of the system.
IV. SYSTEM ARCHITECTURE
Figure 2 presents the architecture of our proposed system.
It consists of 3 layers: (1) user interface layer; (2) query
processing layer; and (3) storage layer.
1) User Interface Layer : In this layer user invokes
their queries. After executing the queries our system
provides the answer to the user.
2) Query Processing Layer : This layer contains 3 sub
components.
Query parser, for extracting the match patterns from
the query.
Query optimizer, for rearranging the match patterns
according to the size of hyperedges.
Query processor, for executing the query against the
stored information.
3) Storage Layer : This layer also contains 2 sub
components.
RDF data, represented as a graph of triplets.
Indexer and Storage, for building the index by tak-
ing subjects and objects of particular predicate in a
single unit and storing the data as a key-value pair
{predicate as key and subject and object pairs as
value}.
USER
QUERY PARSER
SPARQL QUERY RESULT
RDF DATA OF EACH PREDICATE)
MATCH
PATTERNS
(TAKING SUBJECTS, OBJECTS
USER INTERFACE LAYER
QUERY PROCESSING LAYER
STORAGE LAYER
QUERY
PROCESSOR
QUERY OPTIMIZER
STORAGEINDEXER AND
(STORE AS KEY−VALUE
PAIR)
Fig. 2: System Architecture
V. APPROACH DEFINITION
In this section we provide formal definition of the basic
terms used in this paper.
Definition 1 : RDF graph: a RDF graph G=(V, E) where
V={v|v∈ S ∪ O} and E={ e1, e2..} ∃ e={u, v} where u,
v∈V. There are two function le and lv. le is an edge-labeling
function. le(S, O)=P and lv is the node labeling function.
lv(vt)=t where t ∈ (S ∪ O) and S=Subject (URI ∪ BLANKS),
P=Predicate (URI), O=Object(URI ∪ BLANKS ∪ LIT).
Definition 2 : Hypergraph: a Hypergraph HG=(V, E) where
node V= { v1, v2.., vn } and E={ e1, e2,.., en } where
V={v|v∈ S ∪ O ∪ P} and each edge E is a non-empty set
of V. E=∪n
i=1V.
Definition 3 : Proposed Hypergraph: a Proposed Hy-
pergraph Hy(G)=(V, E, lev) where V= { v1, v2.., vn }|v ∈
(S, O, P), E={ e1, e2,.., en }|e ∈ {vi, vj} and lev is the node
labeling function. lev = X where X ∈ (subject, object and
predicate).
Definition 4 : SPARQL query: a SPARQL query (QR
)
mainly contains <Qq
, Qs
, Qp
> where Qq
is the query form
and Qp
is the match pattern. If ?x∈var(Qq
) then ?x∈varQp
and
Qs
represents constrains over the query. It contains constraints
like FILTER, OPTIONAL.
3. VI. HYPERGRAPH CREATION AND QUERY PROCESSING
In this section, we describe our proposed approach. The
approach consists of 3 steps.
Steps 1: Transformation of RDF graph into a hypergraph:
In this step we convert RDF graph into a hypergraph by
calculating the subjects and objects connected with the distinct
predicate.
Steps 2: Query parsing and Rearrangement of SPARQL
query according to predicate size: In this step we rearrange
the match patterns of a given query according to hyperedge
size and build a query path.
Steps 3: Query processing: In this step we process and
execute the query based on the rearranged query in step 2.
This step involves 3 sub-steps:
1) Loop over the match patterns;
2) process(t, res, d, tempdict);
3) FindVariables(p, tempdict, d, t). The advantage of the
proposed approach is we eliminate the joining cost of tables
of a relational database. Here all the subjects and objects
connected with a particular predicate are grouped together.
A. Transformation of RDF graph into a hypergraph
To transform the RDF graph into a hypergraph, we deploy
algorithm 1.
Algorithm 1 Creating Hypergraph using Dictionary for storing
triples
Input: G: Graph containing triples
Output: data: Dictionary storing subjects objects pairs for each predicate
1: data ← ∅ , arr ← ∅
2: for all Gj where 0≤j≤G.length-1 do
3: arr ← arr ∪ Gj {predicate}
4: j ← j + 1
5: end for
6: remove duplicate entries from arr
7: for all arri where 0≤i≤arr.length-1 do
8: for all Gj where 0≤j≤G.length-1 do
9: if arri == Gj {predicate} then
10: data[arri] ← data[arri] ∪ Gj {subject} ∪ Gj {object}
11: end if
12: j ← j + 1
13: end for
14: i ← i + 1
15: end for
As shown above, we store the subject object pairs for
each predicate in a dictionary. Here dictionary refers to the
idea of python dictionary (a.k.a. mappings) 3
. Dictionary is an
unordered set of key: value pairs, with the requirement that
the keys are unique (within one dictionary). Unlike sequences,
which are indexed by a range of numbers, dictionaries are
indexed by keys, which can be any immutable type; strings
and numbers can always be keys. Figure 3 is the pictorial
representation of dictionary as a hypergraph that is generated
from the RDF graph (figure 1). Here, in this figure 3, each
predicate is a hyperedge which connects subjects and objects
associated with it.
3https://docs.python.org/2/tutorial/datastructutes.html dictionaries
Fig. 3: Hypergraph
B. Query parsing and Rearrangement of SPARQL query ac-
cording to predicate size
When a SPARQL query is invoked into the system, we
parse the query and arrange the match patterns in the query
in ascending order of hyperedge size and use the sorted order
to build a query path. To rearrange the predicate list for query
execution we deploy the following algorithm 2.
Algorithm 2 Rearranging match patterns in query according
to hyperedge size
Input: data, q: list storing the match patterns of input query.
Output: sizeEdge: dictionary storing size of each hyperedge, predicatelist: list storing
rearranged match patterns and predicate according to hyperedge size.
sizeEdge ← ∅ , predicate ← ∅ , predicatelist ← ∅ , line ← ∅ ,
pred ← ∅
2: for all predicate in data.keys do
sizeEdge[predicate] ← data[predicate].length/2
4: end for
for each line in q do
6: pred ← line {predicate}
predicatelist ← predicatelist ∪ {line, pred, sizeEdge[pred]}
8: end for
sort predicatelist based on sizeEdge[pred]
10: return sizeEdge, predicatelist
Here we take the input query in a list q. We store the size
i.e. the number of subject object pairs for each predicate in
dictionary sizeEdge. From query q, we extract the predicates,
store each predicate along with its match pattern and size in
predicatelist and sort it on predicate size.
C. Query processing
Using the rearranged SPARQL query, we extract the re-
quired subjects and objects from the data dictionary. We
achieve this using the steps below.
1) Loop over the match patterns To loop over the match
patterns in the rearranged query, we deploy algorithm 3. Here,
for each match pattern, 3 cases are possible: (1) string at
subject position is constant and string at object position is a
variable; (2) string at subject position is a variable and string at
object position is constant; (3) both strings at subject position
and object position are variables. For the first two cases we
use the function process(t, res, d, tempdict), where res stores
the answer of the variable in current triple. The third case
has three further subcases: (1) subject and object have been
encountered before in the match patterns already processed in
the query; (2) either subject or object has been encountered
before in the match patterns already processed; (3) neither
subject nor object has been encountered before in the triples
already processed. For the first two subcases, we store, in p all
the variables associated with subject(and/or object) in match
patterns already processed, and answers of these variables in
tempdict. Then we use the function FindVariables(p, tempdict,
d, t). We find the answers common to answers of subject
variable(and/or object) in current match pattern and answers of
4. same variable previously encountered and accordingly modify
the answers of variables in p. We store the final answers of
all the variables in d. For the last subcase, we simply find the
subjects and objects of predicate of current match pattern and
store in d.
2) process(t, res, d, tempdict) This method handles cases
when either subject position string or object position string is
variable.
The algorithm 4, describes the function process. If subject vari-
able(or object) is not present in d(i.e has not been encountered
before) then we just store all the subjects associated with object
position string through predicate of the current match pattern,
in d. Else we store in p, all variables associated with subject(or
object) through predicate in match patterns already processed
and answers of these variables in tempdict. Then we use the
function FindVariables(p, tempdict, d, t). Then we find the
answers common to answers of subject(or object) variable in
current match pattern and answers of same subject(or object)
variable previously encountered and accordingly modify the
answers of variables in p.
3) FindVariables(p, tempdict, d, t) In this step we find
variables whose answers will be affected by the current match
pattern and will undergo modification.
Using the algorithm 5, we find variables associated with the
variables(through match patterns already processed in current
query) stored in p. t stores the subject(or object) of the current
match pattern being processed. curvar stores the variables in
the current triple.
VII. EVALUATION AND RESULT
In this section, we evaluate the performance of our
algorithm, both in terms of dataset loading time and query
execution time. For this purpose, we use SP2
Bench
benchmark datasets in [9] and also apply the following 3
SPARQL queries. SP2
Bench is settled in the DBLP scenario
and comprises both data generator for creating arbitrarily
large DBLP-like documents and a set of carefully designed
benchmark queries. They provide datasets in N3 format.
RDFLib-3.2.0, a python library, is also used to implement our
algorithms.
query3: SELECT ?article WHERE {?article
http://www.w3.org/1999/02/22-rdf-syntax-ns type
http://localhost/vocabulary/bench/Article}
query4: SELECT ?yr WHERE {?journal
http://www.w3.org/1999/02/22-rdf-syntax-ns type
http://localhost/vocabulary/bench/Journal .?journal
http://purl.org/dc/terms/issued ?yr }
query5: SELECT ?inproc ?booktitle ?title WHERE
{?inproc http://www.w3.org/1999/02/22-rdf-syntax-ns type
http://localhost/vocabulary/bench/Inproceedings .?inproc
http://purl.org/dc/elements/1.1/creator ?author .?inproc
http://localhost/vocabulary/bench/booktitle ?booktitle .?inproc
http://purl.org/dc/elements/1.1/title ?title }
Our proposed algorithms as discussed above are implemented
in python and compared with apache-jena-2.12.1, rdf3x-0.3.7
and AllegroGraph WebView 4.14.1. Jena is implemented in
java, rdf-3x is in C++. All tests are run on a laptop with a
processor Intel(R) Core(TM)2 duo CPU P8600@ 2.40GHz,
4GB RAM and 465GB 5400RPM storage capacity, while the
operating system is Fedora20. We use NetbeansIDE8.0.2 for
implementing jena, rdf-3x and our system.
Algorithm 3 Loop over the match patterns
Input: data, predicatelist
Output: d: Dictionary storing answers for all variables in Query
1: d ← ∅
2: for c = 0 to predicatelist.length-1 do
3: subobj ← fetch and store subject object pairs of cth predicate in predicatelist
from data dictionary
4: res ← ∅ , tempdict ← ∅
5: x ← subject position string in cth match pattern, y← object position string in
cth match pattern
6: if subject position string does not start with ’?’ then
7: res ← objects from subobj associated with subject position string
8: process(y, res, d, tempdict)
9: else if object position string does not start with ’?’ then
10: res ← subjects from subobj associated with object position string
11: process(x, res, d, tempdict)
12: else
13: if key x and key y are in d then
14: psub ← d[x], pobj ← d[y], d[x] ← ∅ , d[y] ← ∅ , p← ∅
15: p ← list of variables associated with x and y through predicates in match
pattern already processed in the query
16: FindVariables(p, tempdict, d, t)
17: for all subobji where 0≤i≤subobj.length-1 do
18: if subobji is in psub and subobji+1 is in pobj then
19: ind1 ← index of subobji in psub, ind2 ← index of subobji+1
in pobj
20: if ind1 = = ind2 then
21: d[x] ← d[x] ∪subobji
22: d[y] ← d[y] ∪subobji+1
23: var ← ∅
24: for all var in tempdict.keys do
25: d[var] ← d[var] ∪tempdict[var][ind1]
26: end for
27: end if
28: end if
29: i ← i + 2
30: end for
31: else if key x is in d or key y is in d then
32: i ← 0, counter ← 1
33: if y in d then
34: swap x and y
35: i ← 1, counter ← -1
36: end if
37: psub ← d[x], d[x] ← ∅, d[y] ← ∅
38: p ← list of variables associated with x through predicates in match
patterns already processed in current query
39: FindVariables (p, tempdict, d, t)
40: while i < subobj.length do
41: if subobji in psub then
42: d[x] ← d[x] ∪subobji
43: d[y] ← d[y] ∪subobji+counter
44: ind ← index of subobji in psub
45: var ← ∅
46: for all var in tempdict.keys do
47: d[var] ← d[var] ∪tempdict[var][ind]
48: end for
49: end if
50: i ← i+2
51: end while
52: else
53: d[x] ← all subjects from subobj
54: d[y] ← all objects from subobj
55: end if
56: end if
57: c ← c+1
58: end for
59: return d
A. Dataset loading and hypergraph creation time Analysis
In this section we present the statistics of dataset loading
time and hypergraph creation time of our proposed system
using the line diagrams shown in figure 4(a) and 4(b) respec-
tively. Figure 4(a) shows the loading time of various size of
datasets into our system. The figure shows that our system
takes 0.23 sec. to load 691 triples and it increases to 0.44
sec. when we double the load containing 1285 triples. As we
increase the number of triples (the dataset), the load time also
gradually increases. For 32330 triples, total load time is 11.12
5. (a) Loading Time of Dataset
(b) Hypergraph creation Time from
Dataset
Fig. 4: Hypergraph creation and Loading Time of Dataset
Algorithm 4 process(t, res, d, tempdict)
Input: t: variable in current match pattern
res: storing answer of variable in current match pattern
Output: tempdict , d
1: if key t is not in d then
2: d[t] ← res
3: else
4: new ← d[t]
5: d[t]← ∅, p ← ∅
6: p ← list of variables associated with t through predicates in match patterns
already processed in current query
7: FindVariables(p, tempdict, d, t)
8: for all resi where 0≤i≤res.length-1 do
9: if resi is present in new then
10: d[y] ← d[y] ∪ resi
11: ind ← index of resi in new
12: var ← ∅
13: for all var in tempdict.keys do
14: d[var] ← d[var] ∪ tempdict[var][ind]
15: end for
16: end if
17: i ← i + 1
18: end for
19: end if
Algorithm 5 FindVariables(p, tempdict, d, t)
Input: p: list of variables associated with subject or object in current match pattern been
processed, d
Output: tempdict
1: curvar ← ∅
2: curvar ← curvar ∪ t
3: for all pi where 0≤i≤P.length-1 do
4: if pi not in tempdict and pi not in curvar then
5: tempdict[pi] ← d[pi]
6: d[pi] ← ∅
7: if pi has any variable associated with it through predicates in match patterns
already processed in current query then
8: p ← p ∪ variables associated with pi
9: end if
10: end if
11: i ← i + 1
12: end for
sec. We use RDFLib to load the datasets into our program.
Figure 4(b) shows hypergraph creation time w. r. t. various
size of datasets consisting of triples. As it can be seen from
figure 4(b) that with the increase in the number of triples,
the hypergraph creation time also gradually increases. For
instance, for 691 triples, the system takes 0.20 sec., whereas,
for 1285 triples, it is 0.52 sec, and for 2006 triples 0.57 sec.
We do this experiment with the total 16040 triples. For 16040,
the hypergraph creation time is 12.21 sec.
B. Query execution time Analysis
In this section we present the query execution time using
the bar diagrams comparing our proposed system with the
other three external systems such as apache-jena-2.12.1, rdf3x-
0.3.7 and AllegroGraph WebView 4.14.1. Figure 5 (a), shows
the comparison of query evaluation time between our system
and the other three systems for a dataset containing 691 triples.
We run the SPARQL queries listed in the beginning of this
section. We can see from figure 5(a), that our system takes
less time to run the above three queries compared to the other
3 systems. Our system takes 0.00048s to run query3, 0.00040s
to run query4 and 0.00093s to run query5.
Figure 5 (b), shows the comparison of query evaluation for a
dataset containing 1285 triples. After running the queries we
observe that our system takes 0.00043s to run query3, 0.00058s
to run query4 and 0.00099s to run query5. In a similar manner,
we increase the dataset size and run the above queries. As it
can be seen from the figures (5(a), 5(b), 5(c) and 5(d)) that to
execute the above three queries our proposed system takes less
time compared to the other systems, except the following cases.
We observe that for the given dataset containing 8023 triples,
for query4, rdf-3x takes less time and for query5, both rdf-
3x and AllegroGraph take less time compared to our system
(figure 5(e)). Also, for the dataset containing 16040 triples,
for query4 and query5, rdf-3x takes less time compared to our
system (figure 5(f)).
VIII. CONCLUSION
In this paper we provide a framework for hypergraph
based SPARQL query optimization for RDF data. We focus
on storing of RDF data as hypergraph and then processing the
SPARQL queries into the hypergraph. After the performance
analysis w. r. t. query processing time with other existing
systems, we state that our system yields better result in less
time.
6. (a) Query Execution with 691 triples (b) Query Execution with 1285 triples (c) Query Execution with 2006 triples
(d) Query Execution with 4021 triples (e) Query Execution with 8023 triples
(f) Query Execution with 16040
triples
Fig. 5: Query Processing ( dataset 691 triples to 16040 triples)
In the present work, we have considered the simple queries.
As a continuation to the current work, the presented approach
can be further extended with more complex queries.
REFERENCES
[1] Thomas Neumann 0001 and Gerhard Weikum. The rdf-3x engine for
scalable management of rdf data. VLDB J., 19(1):91–113, 2010.
[2] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web.
2001.
[3] Andrey Gubichev and Thomas Neumann. Exploiting the query structure
for efficient join ordering in sparql queries. In EDBT’14, pages 439–
450, 2014.
[4] Olaf Hartig. Squin: A traversal based query execution system for the
web of linked data. 2013.
[5] Jiewen Huang, Daniel J. Abadi, and Kun Ren. Scalable sparql querying
of large rdf graphs. PVLDB, 4(11):1123–1134, 2011.
[6] Dong-Hyuk Im, Nansu Zong, Eung-Hee Kim, Seokchan Yun, and
Hong-Gee Kim. A hypergraph-based storage policy for rdf version
management system. 2012.
[7] Kisung Kim, Bongki Moon, and Hyoung-Joo Kim. Rg-index: An rdf
graph index for efficient sparql query processing. 2014.
[8] Chang Liu, Haofen Wang, Yong Yu, and Linhao Xu. Towards efficient
sparql query processing on rdf data, 2010.
[9] Georg Lausen Christoph Pinkel Michael Schmidt, Thomas Hornung.
Sp2bench: A sparql performance benchmark.
[10] Animesh Dutta Biswanath Dutta Sangeeta Sen, Moumita Ghosh. Hy-
pergraph based query optimization. January 2015.
[11] Gunter Ladwig Thanh Tran. Structure index for rdf data. 2010.
[12] Udrea, Octavian, Pugliese, Andrea, Subrahmanian, and V. S. Grin: A
graph based rdf index. In AAAI, pages 1465–1470. AAAI Press, 2007.
[13] W3C. Allegrograph rdfstore web 3.0’s database, September 2009.
[14] Kevin Wilkinson, Craig Sayers, Harumi Kuno, and Dave Reynolds.
Efficient RDF storage and retrieval in Jena2. In Proc. First International
Workshop on Semantic Web and Databases, 2003.
[15] Pingpeng Yuan, Pu Liu, Buwen Wu, Hai Jin, Wenya Zhang, and Ling
Liu. Triplebit: a fast and compact system for large scale rdf data.
PVLDB, 6(7):517–528, 2013.
[16] Pingpeng Yuan, Wenya Zhang, Hai Jin, and Buwen Wu. Statement
hypergraph as partitioning model for rdf data processing. pages 138–
145, 2012.
[17] Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan
Wang. A distributed graph engine for web scale rdf data. In
Proceedings of the 39th international conference on Very Large Data
Bases, PVLDB’13, pages 265–276. VLDB Endowment, 2013.
[18] Lei Zou, Jinghui Mo, Lei Chen 0002, M. Tamer zsu, and Dongyan
Zhao. gstore: Answering sparql queries via subgraph matching. PVLDB,
4(8):482–493, 2011.