SlideShare ist ein Scribd-Unternehmen logo
1 von 162
Semantic Data Management in
Graph Databases
-Tutorial at ESWC 2013-
Maria-Esther
Vidal
Edna
Ruckhaus
Maribel
Acosta
Cosmin
Basca
USB
1
2
! " #
$ %% #
&
Network of Friends in a High School Annotation Graph representing relations
between clinical trails
Air-traffic between US cities A Fragment of Facebook Relationships between comments
about a topic in Twitter
Graphs …
3
! " #
$ %% #
&
A significant increase of graph data in the form of social & biological information.
Tasks to be Solved …
Patternsof
connectionsbetween people to
understand functioning of society.
Topological properties
of graphs can be used to identify
patterns that reveal phenomena,
anomalies and potentially lead to
a discovery.
Tasks to be Solved … (2)
Relatedness
between two nodes.
4
The importance of the information relies on the relations more or equal
than on the entities.
[Anderson et al. 2012]
Best
matchingbetwee
n two set of sub-
graphs.
Keyword
queries.
Proximitypa
tterns.
Frequents
ub-graphs.
Graph
ranking.
Basic Graph Operations
Basic Graph operations that need to be efficiently
performed:
 Graph traversal.
 Measuring path length.
 Finding the most interesting central node.
5
NoSQL-based Approaches
These approaches can handle unstructured,
unpredictable or messy data.
6
Wide-column
stores
BigTable
model of
Google
Cassandra
Document
Stores
Semi-structured
Data
MongoDB
Key-value
stores
Berkeley DB
Graph
Databases
Graph-based
oriented data
Agenda
7
Basic Concepts & Background (25 min.)
The Graph Data Management Paradigm (5 min.)
Existing Graph Database Engines (45 min.)
Coffee Break (30 min.)
Existing RDF Graph Engines (50 min.)
Hands-on session(25 min.)
Questions & Discussion (5 min.)
Summary & Closing (10 min.)
1
2
3
6
7
5
4
BASIC CONCEPTS &
BACKGROUND
8
1
Abstract Data Type Graph
G=(V, E,Σ,L) is a graph:
 V is a finite set of nodes or vertices,
e.g. V={Term, forOffice, Organization,…}
 E is a set of edges representing binary
relationship between elements in V,
e.g. E={(forOffice,Term)
(forOffice,Organization),(Office,Organization)…}
 Σis a set of labels,
e.g., Σ={domain, range, sc, type, …}
 L is a function: V x V Σ,
9
e.g., L={((forOffice,Term),domain), ((forOffice,Organization),range)… }
Abstract Data Type Multi-Graph
G=(V, E,Σ,L) is a multi-graph:
 V is a finite set of nodes or vertices,
e.g. V={Term, forOffice, Organization,…}
 E is a set of edges representing binary
relationship between elements in V,
e.g. E={(forOffice,Term)
(forOffice,Organization),(Office,Organization)…}
 Σis a set of labels,
e.g., Σ={domain, range, sc, type, …}
 L is a function: V x V PowerSet(Σ),
10
e.g., L={((forOffice,Term),{domain}), ((forOffice,Organization),{range}),
((_id0,AZ),{forOffice, forOrganization})… }
Basic Operations
Given a graph G, the following are operations over G:
 AddNode(G,x): adds node x to the graph G.
 DeleteNode(G,x): deletes the node x from graph G.
 Adjacent(G,x,y):tests if there is an edge from x to y.
 Neighbors(G,x):nodes y s.t. there is a node from x to y.
 AdjacentEdges(G,x,y):set of labels of edges from x to y.
 Add(G,x,y,l):adds an edge between x and y with label l.
 Delete(G,x,y,l):deletes an edge between x and y with label l.
 Reach(G,x,y):tests if there a path from x to y.
 Path(G,x,y): a (shortest) path from x to y.
 2-hop(G,x):set of nodes y s.t. there is a path of length 2 from x to y, or from
y to x.
 n-hop(G,x):set of nodes y s.t. there is a path of length n from x to y, or from
y to x.
11
Implementation of Graphs
12
Adjacency
List
For each node a list
of neighbors.
If the graph is
directed,
adjacency list of i
contains only the
outgoing nodes of
i.
Cheaper for
obtaining the
neighbors of a
node.
Not suitable for
checking if there
is an edge
between two
nodes.
Incidence
List
Vertices and edges
are stored as records
or objects.
Each vertex stores
incident edges.
Each edge stores
incident nodes.
Incidence
Matrix
Bidimensional
representation of
graph.
Rows represent
Vertices.
Columns
represent edges
An entry of 1
represents that
the Source
Vertex is incident
to the Edge.
Adjacency
Matrix
Bidimensional
representation of
graph.
Rows represent
Source Vertices.
Columns represent
Destination Vertices.
Each entry with 1
represents that
there is an edge
from the source
node to the
destination node.
Compressed
Adjacency
Matrix
Differential
encoding
between two
consecutive
nodes
[Sakr and Pardede2012]
Adjacency List
13
V1
V2
V3
V4
(V1,{L2}) (V3,{L3}
)
(V1,{L1})
Properties:
 Storage: O(|V|+|E|+|L|)
 Adjacent(G,x,y): O(|E|)
 Neighbors(G,x): O(|E|)
 AdjacentEdges(G,x,y): O(|E|)
 Add(G,x,y,l): O(|E|)
 Delete(G,x,y,l): O(|E|)
L2 L3
L1
V1 V2 V3
V4
Implementation of Graphs
14
Adjacency
List
For each node a list
of neighbors.
If the graph is
directed, adjacency
list of i contains
only the outgoing
nodes of i.
Cheaper for
obtaining the
neighbors of a
node.
Not suitable for
checking if there
is an edge
between two
nodes.
Incidence
List
Vertices and edges
are stored as records
of objects.
Each vertex stores
incident edges.
Each edge stores
incident nodes.
Adjacency
Matrix
Bidimensional
representation of
graph.
Rows represent
Source Vertices.
Columns represent
Destination Vertices.
Each entry with 1
represents that
there is an edge
from the source
node to the
destination node.
Incidence
Matrix
Bi-dimensional
representation
of graph.
Rows
represent
Vertices.
Columns
represent
edges
An entry of 1 represents
that the Source Vertex is
incident to the Edge.
Compressed
Adjacency
Matrix
Differential
encoding
between two
consecutive
nodes
[Sakr and Pardede2012]
Incidence List
15
Properties:
 Storage: O(|V|+|E|+|L|)
 Adjacent(G,x,y): O(|E|)
 Neighbors(G,x): O(|E|)
 AdjacentEdges(G,x,y): O(|E|)
 Add(G,x,y,l): O(|E|)
 Delete(G,x,y,l): O(|E|)
(source,L2) (source,L3)
(source,L1
)
(destination,L3)
(V4,V1)
(V2,V1)
(V2,V3)
(destination,L
2)
(destination,L
1)
V1
V2
V3
V4
L1
L2
L3
L2 L3
L1
V1 V2 V3
V4
Implementation of Graphs
16
Adjacency
List
For each node a list
of neighbors.
If the graph is
directed, adjacency
list of i contains
only the outgoing
nodes of i.
Cheaper for
obtaining the
neighbors of a
node.
Not suitable for
checking if there
is an edge
between two
nodes.
Incidence
List
Vertices and edges
are stored as records
of objects.
Each vertex stores
incident edges.
Each edge stores
incident nodes.
Adjacency
Matrix
Bidimensional
graph
representation.
Rows represent
source vertices.
Columns represent
destination vertices.
Each non-null
entry represents
that there is an
edge from the
source node to the
destination node.
Incidence
Matrix
Bi-dimensional
representation
of graph.
Rows
represent
Vertices.
Columns
represent
edges
An entry of 1 represents
that the Source Vertex is
incident to the Edge.
Compressed
Adjacency
Matrix
Differential
encoding
between two
consecutive
nodes
[Sakr and Pardede2012]
17
{L2} {L3}
{L1}
V1 V2 V3 V4
V1
V2
V3
V4
Adjacency Matrix
L2 L3
L1
V1 V2 V3
V4
Properties:
 Storage: O(|V|2)
 Adjacent(G,x,y): O(1)
 Neighbors(G,x): O(|V|)
 AdjacentEdges(G,x,y): O(|E|)
 Add(G,x,y,l): O(|E|)
 Delete(G,x,y,l): O(|E|)
Implementation of Graphs
18
Adjacency
List
For each node a list
of neighbors.
If the graph is
directed, adjacency
list of i contains
only the outgoing
nodes of i.
Cheaper for
obtaining the
neighbors of a
node.
Not suitable for
checking if there
is an edge
between two
nodes.
Incidence
List
Vertices and edges
are stored as records
of objects.
Each vertex stores
incident edges.
Each edge stores
incident nodes.
Adjacency
Matrix
Bidimensional
graph
representation.
Rows represent
source vertices.
Columns represent
destination vertices.
Each non-null
entry represents
that there is an
edge from the
source node to the
destination node.
Incidence
Matrix
Bidimensional
graph
representation.
Rows
represent
vertices.
Columns
represent
edges
A non-null entry represents
that the source vertex is
incident to the edge.
Compressed
Adjacency
Matrix
Differential
encoding
between two
consecutive
nodes
[Sakr and Pardede2012]
19
destination destination
source source
destination
source
L1 L2 L3
V1
V2
V3
V4
Incidence Matrix
L2 L3
L1
V1 V2 V3
V4
Properties:
 Storage: O(|V|x|E|)
 Adjacent(G,x,y): O(|E|)
 Neighbors(G,x): O(|V|x|E|)
 AdjacentEdges(G,x,y): O(|E|)
 Add(G,x,y,l): O(|V|)
 Delete(G,x,y,l): O(|V|)
Implementation of Graphs
20
Adjacency
List
For each node a list
of neighbors.
If the graph is
directed, adjacency
list of i contains
only the outgoing
nodes of i.
Cheaper for
obtaining the
neighbors of a
node.
Not suitable for
checking if there
is an edge
between two
nodes.
Incidence
List
Vertices and edges
are stored as records
of objects.
Each vertex stores
incident edges.
Each edge stores
incident nodes.
Adjacency
Matrix
Bidimensional
graph
representation.
Rows represent
source vertices.
Columns represent
destination vertices.
Each non-null
entry represents
that there is an
edge from the
source node to the
destination node.
Incidence
Matrix
Bidimensional
graph
representation.
Rows
represent
Vertices.
Columns
represent
edges
A non-null entry represents
that the source vertex is
incident to the Edge.
Compressed
Adjacency
Matrix
Differential
encoding
between two
consecutive
nodes
[Sakr and Pardede2012]
Compressed Adjacency List
Gap Encoding
21
Node Neighbors
1 20,23,24,25,27
2 30,32,33,34
3 40,42,45,46,47,48
…..
Node Successors
1 38,2,0,0,1
2 56,1,0,0
3 74,1,2,0,0,0
…..
v(x) =
2x if x ³ 0
2 x -1 if x < 0
ì
í
ï
îï
The ordered adjacent list:
A(x)=(a1,a2,a3,…,an)
is encoded as
(v(a1-x),a2-a1-1,a3-a2-1,…, an-an-1 -1)
Where:
Properties:
 Storage: O(|V|+|E|+|L|)
 Adjacent(G,x,y): O(|E|)
 Neighbors(G,x): O(|E|)
 AdjacentEdges(G,x,y): O(|E|)
 Add(G,x,y,l): O(|E|)
 Delete(G,x,y,l): O(|E|)
Original
AdjacencyList
Compressed
AdjacencyList
Traversal Search
Breadth First
Search
Expands shallowest
unexpanded nodes
first.
Search is
complete.
Depth First
Search
Expands deepest
unexpanded nodes
first.
Search may be not
complete: may fail in
loops.
22
23
7
8
2
1
3
4
5
6
0
2
1
3
4
5
6
0
1 Starting Node
First Level Visited Nodes
Second Level Visited Nodes
Third Level Visited Nodes
Breadth First Search
Notation:
24
7
8
2
1
3
4
5
6
0
2
1
3
4
5
6
0
Depth First Search
1 Starting Node
First Level Visited Nodes
Second Level Visited Nodes
Third Level Visited Nodes
Notation:
THE GRAPH DATA
MANAGEMENT PARADIGM
25
2
The Graph Data Management
Paradigm
Graph database models:
 Data models for schema and instances are
graph models or generalizations of them.
 Data manipulation is expressed as graph-
based operations.
26
Survey of Graph Database Models
27
A graph data model
represents data and
schema:
 Graphs or
 More general, the
notion of graphs:
Hypergraphs,
Digraphs.
[Renzo and Gutiérrez 2008]
Survey of Graph Database Models
28
[Renzo and Gutiérrez 2008]
Types of relationship supported by graph data models:
• Properties,
• Mono- or multi-
valued.
• Groups of real-
world objects.
• Structures to
represent
neighborhoods
of an entity.
Attributes Entities
Neighborhood
relations
• Part-of,
composed-by,
n-ary
associations.
• Subclasses and
superclasses,
• Relations of
instantiations.
• Recursively
specified
relations.
Standard
abstractions
Derivation and
inheritance
Nested
relations
Representative Approaches
29
No Index
Only works
for “toy”
graphs.
Vertices are
on the scale
of 1K.
Node/Edge
Index
Create index
in different
nodes/edges
Hexastore,
RDF3x,
BitMat,Neoj4,
Frequent
Subgraph Index
Indices on
frequently
queried
subgraphs
Reachability and
Neighborhood
Indices
2-hop
labeling
schemas for
index
structures.
Every two
nodes within
distance L.
GRAPH DATABASES
ENGINES
30
3
Graph Databases Advantages
 Natural modeling of highly connected data.
 Special graph storage structure
 Efficient schemalessgraph algorithms.
 Support for query languages.
 Operators to query the graph structure.
31
32
Graph Databases
http://www.neo4j.org
http://www.hypergraphdb.org
http://www.sparsity-
technologies.com/dex.php
Existing General Graph Database
Engines
33
DEX
Java library for
management of
persistent and
temporary
graphs.
Implementation
relies on bitmaps
and secondary
structures (B+-
tree)
HyperGraphDB
Implements the
hyper graph data
model.
Neo4j
Network oriented
model where
relations are first-
class objects.
Native disk-based
storage manager
for graphs.
Framework for
graph traversal.
Most graph databases implement an API instead of a query language.
DEX (1)
 Labeled attribute multi-graph.
 All node and edge objects belong to a type.
 Edges have a direction:
 Tail: corresponds to the source of the edge.
 Head: corresponds to the destination of the edge.
 Node and edge objects can have one or
more attributes.
 There are no restrictions on the
number of edges between two nodes.
 Loops are allowed.
 Attributes are single valued.
34
[Martínez-Basan et al. 2007]
http://www.sparsity-technologies.com/dex_tutorials
DEX (2)
 Attributes can have different indexes:
 Basic attributes.
 Indexed attributes.
 Unique attributes.
 Edges can index neighborhoods.
 Neighborhood index.
 The persistent database of DEX is a single file.
 DEX can manage very large graphs.
35
[Martínez-Basan et al. 2007]
http://www.sparsity-technologies.com/dex_tutorials
DEX Architecture
36
[Martínez-Basan et al. 2007]
http://www.sparsity-technologies.com/dex_tutorials
DEX Internal Representation (1)
37
[Martínez-Basan et al. 2007]
DEX Approach: Map + Bitmaps  Links
Link: bidirectional association between values and OIDs
 Given a value  a set of OIDs
 Given an OID  the value
http://www.sparsity-technologies.com/dex_tutorials
DEX Internal Representation (2)
38
[Martínez-Basan et al. 2007]
 A DEX Graph is a combination of
Bitmaps:
 Bitmap for each node or edge
type.
 One link for each attribute.
 Two links for each type: Out-
going and in-going edges.
 Maps are B+trees
 A compressed UTF-8 storage
for UNICODE string.
http://www.sparsity-technologies.com/dex_tutorials
DEX API
39
Operations of the Abstract Data Type Graph:
 Graph construction.
 Definition of node and edge types.
 Creation of node and edge objects.
 Definition and use of attributes.
 Query nodes and edges.
 Management of edges and nodes.
http://www.sparsity-technologies.com/dex_tutorials
DEX API (2)
Graph construction
40
DexConfigcfg = new DexConfig();
cfg.setCacheMaxSize(2048); // 2 GB
cfg.setLogFile(“PUBLICATIONSDex.log");
Dexdex = new Dex(cfg);
Database db = dex.create(”Publications.dex",
”PUBLICATIONSDex");
Session sess = db.newSession();
Graph graph = sess.getGraph();
……
sess.close();
db.close();
dex.close();
Operations on the graph
database will be performed
on the object graph
Wrote Cites
Wrote
Paper1 Paper2
Paper3
Peter Smith
41
Conference
ESWC
Paper4
Conference
Cites
Paper5
ISWC
Conference
Conference
Cites
Conference
Juan Perez
Wrote
John Smith
Wrote
Wrote
Wrote
Cites
Wrote
Example
DEX API (3)
Adding nodes to the graph
42
intauthorTypeId = graph.newNodeType(‛AUTHOR");
intpaperTypeId = graph.newNodeType(‛PAPER");
intconferenceTypeId = graph.newNodeType(‛CONFERENCE");
long author1 = graph.newNode(authorTypeId);
long author2 = graph.newNode(authorTypeId);
long author3 = graph.newNode(authorTypeId);
long paper1 = graph.newNode(paperTypeId);
long paper2 = graph.newNode(paperTypeId);
long paper3 = graph.newNode(paperTypeId);
long paper4 = graph.newNode(paperTypeId);
long paper5 = graph.newNode(paperTypeId);
long conference1 = graph.newNode(conferenceTypeId);
long conference2 = graph.newNode(conferenceTypeId);
DEX API (4)
Adding edges to the graph
43
intwroteTypeId = graph.newEdgeType(‛WROTE", true,true);
intisWrittenTypeId = graph.newEdgeType(‛Is-Written", true,true);
intciteTypeId = graph.newEdgeType(‛CITE", true,true);
intisCitedByTypeId = graph.newEdgeType(‛IsCitedBy", true,true);
intconferenceTypeId = graph.newEdgeType(‛IsConference", true,true);
long wrote1 = graph.newEdge(wroteTypeId, author1, paper1);
long wrote2 = graph.newEdge(wroteTypeId, author1, paper3);
long wrote3 = graph.newEdge(wroteTypeId, author1, paper5);
long wrote4 = graph.newEdge(wroteTypeId, author2, paper3);
long wrote5 = graph.newEdge(wroteTypeId, author2, paper4);
long wrote4 = graph.newEdge(wroteTypeId, author3, paper2);
long wrote5 = graph.newEdge(wroteTypeId, author3, paper4);
DEX API (5)
Indexing
 Basic attributes:There is no index associated with the attribute.
 Indexed attributes:There is an index automatically maintained by the
system associated with the attribute.
 Unique attributes:The same as for indexed attributes but with an added
integrity restriction: two different objects cannot have the same value, with
the exception of the null value.
 DEX operations:Accessing the graph through an attribute will
automatically use the defined index, significantly improving the performance
of the operation.
 A specific indexcan also be defined to improve certain navigational
operations, for example, Neighborhood.
 There is theAttributeKindenum class which includes basic, indexed and
unique attributes.
44
DEX API (6)
Adding attributes to nodes of the graph
45
nameAttrId = graph.newAttribute(authorTypeId, "Name",
DataType.String, AttributeKind.Indexed);
conferenceNameAttrId = graph.newAttribute(conferenceTypeId,
‛ConferenceName", DataType.String, AttributeKind.Indexed);
Value v = new Value();
graph.setAttribute(author1, nameAttrId, v.setString(Peter Smith"));
graph.setAttribute(author2, nameAttrId, v.setString(”Juan Perez"));
graph.setAttribute(author2, nameAttrId, v.setString(”John Smith"));
graph.setAttribute(conference1, conferenceNameAttrId,
v.setString(“ESWC”)));
graph.setAttribute(conference2, conferenceNameAttrId,
v.setString(“ISWC"));
DEX API (7)
Searching in the graph
46
Value v = new Value();
// retrieve all ’AUTHOR’ node objects
nodeObjects authorObjs1 = graph.select(authorTypeId);
...// retrieve Peter Smith from the graph, which is a ”AUTHOR"
nodeObjects authorObjs2 = graph.select(nameAttrId, Condition.Equal,
v.setString(‛Peter Smith"));
...// retrieve all ’author' node objects having ”Smith" in the name.
It would retrieve
// Peter Smith and John Smith
nodeObjects authorObjs3 = graph.select(nameAttrId, Condition.Like,
v.setString(‛Smith"));
...
authorObjs1.close();
authorObjs2.close();
authorObjs3.close();
DEX API (8)
Searching in the graph
47
Graph graph = sess.getGraph();
Value v = new Value();
...
// retrieve all ’author' node objects having a value for the 'name'
attribute
// satisfying the ’^J[^]*n$'
regular expressionObjectsauthorObjs =
graph.select(nameAttrId, Condition.RegExp,
v.setString(‛^J[^]*n$"));
...
authorObjs.close();
DEX API (9)
Searching in the graph
48
Value v = new Value();
Graph graph = sess.getGraph();
sess.begin();
intauthorTypeId = graph.findType(‛AUTHOR");
intnameAttrId = graph.findAttribute(authorTypeId, "NAME");
v.setString(‛Peter Smith");
Long peterSmith = graph.findObject(nameAttrId, v);
sess.commit();
DEX API (10)
Traversing the graph
 Navigational directioncan be restricted through the
edges
 The EdgesDirectionenumclass:
 Outgoing: Only out-going edges, starting from the source node,
will be valid for the navigation.
 Ingoing: Only in-going edges will be valid for the navigation.
 Any: Both out-going and in-going edges will be valid for the
navigation.
49
Graph graph = sess.getGraph();
...
Objects authorObjs1 = graph.select(authorTypeId);
...// retrieve Peter Smith from the graph, which is a ”AUTHOR"
nodeObjects node1 = graph.select(nameAttrId, Condition.Equal,
v.setString(‛Peter Smith"));
nodeObjects node2 = graph.select(nameAttrId, Condition.Like,
v.setString(‛Paper"));
IntwroteTypeId = graph.findType(‛WROTE");
IntciteTypeId = graph.findType(‛CITE");
...// retrieve all in-comings WROTE
edgesObjects edges = graph.explode(node1, wroteTypeId,
EdgesDirection.Ingoing);
...// retrieve all nodes through CITE
edgesObjects cites = graph.neighbors(node2, citeTypeId,
EdgesDirection.Any);
...
edges.close();
cites.close();
DEX API (11)
Traversing the graph
50
Explode-basedmethods visit
the edges of a given node
Neighbor-basedmethods visit
the neighbor nodes of a given
node identifier
DEX API (12)
Traversing the graph
51
Graph graph = sess.getGraph();
...
nodeObjects node2 = graph.select(nameAttrId, Condition.Like,
v.setString(‛Paper"));
intciteTypeId = graph.findType(‛CITE");
// 1-hop cites
edgesObjects cites1 = graph.neighbors(node2, citeTypeId,
EdgesDirection.Any);
// cites of cites (2-hop)
edgesObjects cites2 = graph.neighbors(cites1, citeTypeId,
EdgesDirection.Any);
…..
edges.close();
cites.close();
DEX API Summary
52
Operation Description
CreateGraph Creates a empty graph
DropGraph Removes a graph
RegisterType Registers a new node or edge type
GetTypes Returns the list of all node or edge types
FindType Returns the id of a node or edge type
NewNode Creates a new empty node of a given node type
AddNode Copies a node
AddEdge Adds a new edge of some edge type
DropObject Removes a node or an edge
Scan Returns all nodes or edges of a given type
Select Returns the nodes or edges of a given type which have an attribute that
evaluates true to a comparison operator over a value
Related Returns all neighbors of a node following edges members of an edge type
Neighbors Returns all neighbors
HyperGraphDB(1)
 Represents graph and hypergraphstructures, a mathematical
generalization of a graph, where several edges correspond to
a hyperedge.
 Hyperedges connect an arbitrary set of nodes (n-ary
relations) and can point to other hyperedges(higher-order
relations).
 This representation model saves space.
53
[Iordanov 2010]
Researcher:Bob Researcher:Mary
Area:Database Area:AI Area:SW
Researcher:Bob Researcher:Mary
Area:Database Area:AI Area:SW
http://www.kobrix.com/index.jsp
HyperGraphDB(2)
 Edgesare represented as ordered sets (directed graphs with
hyperedges).
 Nodesand edges are unified into something called atom,
which is a typed tuple.
 The elements of a tuple are known as the Target Set.
 The size of a Target Set is the arity of a tuple.
 If a tuple x has 0 arity, it’s a node, otherwise it’s a link (or edge).
 The set of atoms pointing to a tuple is called the incidence
set of the tuple.
 Every atom has a Handle (unique identifier).
 Each entity has a type and value.
 A type is an atom conforming to a special interface, values are
data managed and stored by a type atom.
54
[Iordanov 2010]
http://www.kobrix.com/index.jsp
HyperGraphDB Storage
 Data is stored in the form of key-value pairs.
 Berkeley DB is used for storage of key-value structures.
 Provides three access methods for data:
 Hashes:Linear hash.
 B-Trees:Data stored in leaves of a balanced tree.
 Recno:Assigns a logical record identifier to each pair (indexed for direct access).
 Based on a key-value layer with duplicate support.
 Handles are custom generated UUIDs, ints, longs, etc.
 Several key-value stores: some core, some are user defined indices.
55http://www.kobrix.com/index.jsp
HyperGraphDB Operations And
Indexing
56
Predefined Indexers Description
ByPartIndexer Indexes atoms with compound
types along a given
dimension.
ByTargetIndexer Indexes atoms by a specific
target (a position in the target
tuple).
CompositeIndexer Combines any two indexers
into a single one. This allows
to essentially precompute and
maintain an arbitrary join.
LinkIndexer Indexes atoms by their full
target tuple.
TargetToTargetIndexer Given a link, index one of its
target by another.
Essential graph queries:
 Node/Edge adjacency.
 Pattern matching.
 Graph traversal.
Indexing by:
 Properties.
 Targets.
 User defined.
Predefined indexers:
 ByPartIndexer.
 ByTargetIndexer.
 CompositeIndexer.
 LinkIndexer.
 TargetToTargetIndexer.
http://www.kobrix.com/index.jsp
Wrote Cites
Wrote
Paper1 Paper2
Paper3
Peter Smith
57
Conference
ESWC
Paper4
Conference
Cites
Paper5
ISWC
Conference
Conference
Cites
Conference
Juan Perez
Wrote
John Smith
Wrote
Wrote
Wrote
Cites
Wrote
Example
HyperGraphDB API (1)
Adding nodes to the graph
HyperGraph graph = new HyperGraph(databaseLocation);
author author1 = new AUTHOR(‛Peter Smith‛);
author author2 = new AUTHOR(‛Juan Perez‛);
author author3 = new AUTHOR(‛John Smith‛);
PAPER paper1 = new PAPER(‚Paper1‛);
PAPER paper2 = new PAPER(‚Paper2‛);
PAPER paper3 = new PAPER(‚Paper3‛);
PAPER paper4 = new PAPER(‚Paper4‛);
PAPER paper5 = new PAPER(‚Paper5‛);
CONFERENCE conference1 = new CONFERENCE(‚ESWC‛);
CONFERENCE conference2 = new CONFERENCE(‚ISWC‛);
HGHandle authorHandle1 = graph.add(paper1);
HGHandle authorHandle2 = graph.add(paper2);
…
HGHandle conferenceHandle2 = graph.add(conference2);
…
graph.close();
58
HyperGraphDB API (2)
Adding edges to the graph
59
HyperGraph graph = new HyperGraph(databaseLocation);
author author1 = new AUTHOR(‛Peter Smith‛);
author author2 = new AUTHOR(‛Juan Perez‛);
author author3 = new AUTHOR(‛John Smith‛);
PAPER paper1 = new PAPER(‚Paper1‛);
HGHandle authorHandle1 = graph.add(paper1);
HGHandleyearHandle = graph.add(2013);
HGValueLink link1 = new HGValueLink(‛publication year‛,
authorHandle1, yearHandle);
HGHandle linkHandle1 = graph.add(link1);
graph.close();
HyperGraphDB API (3)
Searching node properties
60
HGQueryCondition condition= new And(
new AtomTypeCondition(AUTHOR.class),
new AtomPartCondition(newString[]{‛name"}, ‛Peter Smith",
ComparisonOperator.EQ));
HGSearchResult<HGHandle>rs = graph.find(condition);
while (rs.hasNext()) {
HGHandle current = rs.next();
AUTHOR author = graph.get(current);
System.out.println(AUTHOR.getBithDate());
}
HyperGraphDB API (4)
61
List<author> authors = hg.getAll(hg.and(hg.type(AUTHOR.class),
hg.eq("author", ‛Peter Smith")));
for (author p : authors)
System.out.println(author.getBirthDate());
}
Searching node properties
HyperGraphDB API (5)
Searching node properties
62
DefaultALGeneratoralgen = new DefaultALGenerator(graph,
hg.type(CitedBy.class),
hg.and(hg.type(PAPER.class),
hg.eq(‛track‛,‛Semantic Data Management‛),
true, false, false);
HGTraversal traversal = new HGBreadthFirstTraversal(
startingPaper, algen);
Paper currentArticle = startingPaper;
while (traversal.hasNext()) {
Pair<HGHandle, HGHandle> next = traversal.next();
PAPER nextPaper = graph.get(next.getSecond());
System.out.println(‛Paper " + current + " quotes " + nextPaper);
currentPaper = nextPaper;
}
HGBreadthFirstTraversalmeth
od to traverse a graph returns a
startAtom and
adjListGenerator
HyperGraphDB API (7)
Searching node properties
63
author author1 = newAUTHOR(‛Peter Smith‛);
HGDepthFirstTraversal traversal = new HGDepthFirstTraversal(
author1,
new SimpleALGenerator(graph));
while (traversal.hasNext()) {
Pair<HGHandle, HGHandle> current = traversal.next();
HGLink l = (HGLink)graph.get(current.getFirst());
Object atom = graph.get(current.getSecond());
System.out.println("Visiting atom " + atom +
‚ pointed to by " + l);
}
SimpleAlGeneratormethod to
produce all atoms link to a given
atom
HyperGraphDB API Summary
64
Class/Operation Description
HGEnvironment Creates a hypergraph from a file
Add Adds a new object to the hypergraph
Update Updates an object in the hypergraph
Remove Removes an object from the hypergraph
HGHandle A reference to a hypergraph atom
HGPersistentHandle a HGHandle that survives system downtime
HGValueLink Adds a newhyperedge
get Returns the object referred by a given HGHandle
HGQueryCondition Returns the nodes or edges of a given type which
have an attribute that evaluates true to a
comparison operator over a value
HGQuery.hg.getAll Returns all the objects that meet a condition
Neo4J
 Labeled attribute multigraph.
 Nodes and edges can have properties.
 There are norestrictions on the number of edges
between two nodes.
 Loops are allowed.
 Attributes are single valued.
 Different types of indexes: Nodes & relationships.
 Different types of traversal strategies.
 API for Java, Python.
65
[Robinson et al. 2013]
http://neo4j.org/
Neo4J Architecture
66
[Robinson et al. 2013]
http://neo4j.org/
Neo4J Logical View
67
[Robinson et al. 2013]
http://neo4j.org/
Wrote Cites
Wrote
Paper1 Paper2
Paper3
Peter Smith
68
Conference
ESWC
Paper4
Conference
Cites
Paper5
ISWC
Conference
Conference
Cites
Conference
Juan Perez
Wrote
John Smith
Wrote
Wrote
Wrote
Cites
Wrote
Example
Neo4J API (1)
Creating a graph and adding nodes
69
db = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH );
Node author1 = db.createNode();
Node author2 = db.createNode();
Node author3 = db.createNode();
Node paper1 = db.createNode();
Node paper2 = db.createNode();
Node paper3 = db.createNode();
Node paper4 = db.createNode();
Node paper5 = db.createNode();
Node conference1 = db.createNode();
Node conference2= db.createNode();
Neo4J API (2)
Adding attributes to nodes
author1.setProperty(‚firstname‛,‛Peter‛);
author1.setProperty(‚lastname‛,‛Smith‛);
author2.setProperty(‚firstname‛,‛Juan‛);
author2.setProperty(‚lastname‛,‛Perez‛);
author3.setProperty(‚firstname‛,‛John‛);
author3.setProperty(‚lastname‛,‛Smith‛);
conference1.setProperty(‚name‛,‛ESWC‛);
conference2.setProperty(‚name‛,‛ISWC‛);
70
Neo4J API (3)
Adding relationships
71
Relationship rel1=author1.createRelationshipTo(paper1,DynamicRelationshipType.of(‛WROTE‛));
Relationship rel2=author1.createRelationshipTo(paper5,DynamicRelationshipType.of(‛WROTE‛));
Relationship rel3=author1.createRelationshipTo(paper3,DynamicRelationshipType.of(‛WROTE‛));
…
Relationship rel8=paper1.createRelationshipTo(paper2,DynamicRelationshipType.of(‛CITES‛));
Relationship rel9=paper3.createRelationshipTo(paper1,DynamicRelationshipType.of(‛CITES‛));
Relationship rel10=paper3.createRelationshipTo(paper4,DynamicRelationshipType.of(‛CITES‛));
Relationship rel11=paper4.createRelationshipTo(paper2,DynamicRelationshipType.of(‛CITES‛));
Relationship
rel12=paper1.createRelationshipTo(conference1,DynamicRelationshipType.of(‛Conference‛));
Relationship rel13=paper2.createRelationshipTo(conference2,DynamicRelationshipType.of
(‛Conference‛));
Relationship rel14=paper3.createRelationshipTo(conference1,DynamicRelationshipType.of
(‛Conference‛));
Relationship rel15=paper4.createRelationshipTo(conference1,DynamicRelationshipType.of
(‛Conference‛));
Relationship rel16=paper5.createRelationshipTo(conference2,DynamicRelationshipType.of
(‛Conference‛));
Neo4J API (4)
Creating indexes
72
IndexManager index = db.index();
Index<Node>authorIndex = index.forNodes( ‛authors" );
//‛authors‛ index name
Index<Node>paperIndex = index.forNodes( ‛papers" );
Index<Node>conferenceIndex = index.forNodes( ‛conferences" );
RelationshipIndexwroteIndex = index.forRelationships( ‛write" );
RelationshipIndexciteIndex = index.forRelationships( ‛cite" );
RelationshipIndexconferenceRelIndex = index.forRelationships(
‛conferenceRel" );
author1.setProperty(‚firstname‛,‛Peter‛);
author1.setProperty(‚lastname‛,‛Smith‛);
authorIndex.add(author1,lastname,author1.getProperty(lastname));
author2.setProperty(‚firstname‛,‛Juan‛);
author2.setProperty(‚lastname‛,‛Perez‛);
authorIndex.add(author2,lastname,author2.getProperty(lastname));
author3.setProperty(‚firstname‛,‛John ‛);
author3.setProperty(‚lastname‛,‛Smith‛);
authorIndex.add(author3,lastname,author3.getProperty(lastname));
conference1.setProperty(‚name‛,‛ESWC‛);
conferenceIndex.add(conference1,name,conference1.getProperty(name));
conference2.setProperty(‚name‛,‛ISWC‛);
conferenceIndex.add(conference2,name,conference2.getProperty(name));
Neo4J API (5)
Indexing nodes
73
Neo4J: Cypher Query Language (1)
 START:
 Specifies one or more starting points in the graph.
 Starting points can be defined as index lookups or just by
an element ID.
 MATCH:
 Pattern matching to match based on the starting point(s)
 WHERE:
 Filtering criteria.
 RETURN:
 What is projected out from the evaluation of the query.
74
Neo4J: Cypher Query Language (2)
Query:IsPeter Smith connected to John Smith.
75
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[R*]->(m)
RETURNcount(R)
Neo4J: Cypher Query Language (2)
Query: Papers written by Peter Smith.
76
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[:WROTE]->(papers)
RETURN papers
Neo4J: Cypher Query Language (3)
Query: Papers cited by a paper written by Peter
Smith.
77
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[:WROTE]->()-[:CITES](papers)
RETURN papers
Neo4J: Cypher Query Language (3)
Query: Papers cited by a paper written by Peter
Smith that have at most 20 cites.
78
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[:WROTE]->()-[:CITES]->()-[:CITES]->(papers)
WITH COUNT(papers) as cites
WHERE cites < 21
RETURN papers
Neo4J: Cypher Query Language (4)
Query: Papers cited by a paper written by Peter
Smith or cited by papers cited by a paper written
by Peter Smith.
79
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers)
RETURN papers
Neo4J: Cypher Query Language (5)
Query: Number of papers cited by a paper written
by Peter Smith or cited by papers cited by a paper
written by Peter Smith.
80
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers)
RETURN count(papers)
Neo4J: Cypher Query Language (6)
Query: Number of papers cited by a paper written
by Peter Smith or cited by papers cited by a paper
written by Peter Smith, and have been published in
ESWC.
81
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers)
WHERE papers.conference! =‘ESWC’
RETURN count(papers)
Exclamation mark in papers.conference ensures that papers
without the conference attribute are not included in the output.
Neo4J: Cypher Query Language (7)
Query: Number of papers cited by a paper written
by Peter Smith or cited by papers cited by a paper
written by Peter Smith, have been published in
ESWC and have at most 4 co-authors .
82
START author=node:authors(
'firstname:‛Peter" AND lastname:‛Smith"')
MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers)-
[:Was-Written]->authorFinal
WHERE papers.conference! =‘ESWC’
WITH author, count(authorFinal) as authorFinalCount
WHERE authorFinalCount< 4
RETURN author, authorFinalCount
Exclamation mark in papers.conference ensures that papers
without the conference attribute are not included in the output.
Neo4J API-Cypher
Query: Peter Smith Neighborhood
83
static Neo4j TestGraph;
String Query = "start n=node:node_auto_index(idNode='PeterSmith')" +
"match n-[r]->m return m.idNode";
ExecutionResultresult = TestGraph.query(Query);
booleanres = false;
for ( Map<String, Object> row : result ){
for ( Map.Entry<String, Object> column : row.entrySet() ){
print(column.getValue().toString());
res = true;
}
}
if (!res) print(" No result");
Operation Description
GraphDatabaseFactory Creates a empty graph
registerShutdownHook Removes a graph
createNode Creates a new empty node of a given node type
AddNode Copies a node
createRelationshipTo Adds a new edge of some edge type
forNodes Index for a Node
IndexManager Index Manager
forRelationships Index for a Relationship
add Add an object to an index
TraversalDescription Different types of strategies to traverse a graph
Traversal
Branch Selector
preorderDepthFirst, postorderDepthFirst,
preorderBreadthFirst, postorderBreadthFirst
Neo4J API Summary
84
COFFEE BREAK
85
RDF GRAPH ENGINES
86
4
RDF-Based Graph Database Engines
87
Native storage
Hexastore
RDF3X
BitMat
DBMS-backed
storage
Hexastore*
BHyper
In the beginnings …
88
… there were dinosaurs
In the beginnings …
89
• Traditionally:
– Use Relational DBMS’ to store data
– Early Approach: onelargeSPOtable
(physical)
• Pros: simple schema, fast updates (when no index
present), fast single triple pattern queries
• Cons: not suitable for multi-join queries, results in
multiple self-joins
In the beginnings …
90
• Traditionally:
– Use Relational DBMS’ to store data
– Further Approaches: propertytables
(multiple flavors)
• Pros: fast retrieval of subject / object pairs for
given predicates
• Cons: not easy to solve queries where predicate is
unbound & harder to maintain schema (a table for
each known predicate)
RDF-tailored management approaches
91
• Same conceptual approach: onelargeSPOtable (but
with RDF-specific physical organization):
– Triples are indexed in all 3!=6 possible permutations:
SPO, PSO, OSP, OPS, PSO, POS
– Logical extension of the vertical partitioning proposed
in [Abadi et al 2008] (the PSO index = generalization
of the SO table)
– Each index (i.e. SPO) stores partial triple pattern
cardinalities
– Resulting in 6 tree-based indexes, each 3 levels deep
• First proposed in Hexastore
[Weiss et al. 2008]
Hexastore / Conceptual Model
92
• 2 main operations: (input = tp: partially bounded triple pattern, i.e.
<bound_subject><some predicate> ?o)
• cardinality = getCardinality(tp)
• term_set = getSet(tp, set)
• Resulting term_set is sorted
Hexastore / Conceptual Model
93
• Pros:
• The optimal index permutation & level is used (i.e. <bound_s> ?p ?o : SPO level
1)
• Ability to make use of fast merge-joins
• Cons:
• Higher upkeep cost: a theoretical upper bound of 5x space increase over original
(encoded) data
94
Wrote Cite
s
Wrote
Paper1 Paper2
Paper3
Peter Smith
Conference
ESWC
Paper4
Conference
Cite
s
Paper5
ISWC
Conference
Conference
Cite
s
Conference
Juan Perez
Wrote
John Smith
Wrote
Wrote
Wrote
Cite
s
Wrote
An Example (Publications):
logical representation
RDF Subgraph for the Publication Graph
95
conference/eswc/2
012/paper/03
conference/iswc/20
12/paper/05
person/peter-smith
ontoware/
ontology#author
2012
ontoware/
ontology#year
conference/iswc/
semanticweb/
ontology#isPartOf
ISWC
semanticweb/
ontology#hasAcronym
semanticweb/ontology/C
onferenceEvent
w3c/
rdf-syntax#type
ontoware/
ontology#author
conference/eswc/
semanticweb/
ontology#isPartOf
2012
ontoware/
ontology#year
w3c/
rdf-syntax#type
ontoware/
ontology#biblioReference
ESWC
semanticweb/
ontology#hasAcronym
conference/iswc/20
12/paper/01
An Example (Publications):
first step, encode strings
96
http://data.semanticweb.org/person/john-smith 20
http://data.semanticweb.org/person/juan-perez 19
18http://data.semanticweb.org/person/peter-smith
http://data.semanticweb.org/ns/swc/ontology#hasAcronym 17
http://www.w3.org/1999/02/22-rdf-syntax-ns#type 16
15"2012"
14http://swrc.ontoware.org/ontology#biblioReference
13http://swrc.ontoware.org/ontology#year
12http://swrc.ontoware.org/ontology#author
11http://data.semanticweb.org/ns/swc/ontology#isPartOf
http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent 10
http://data.semanticweb.org/conference/eswc/2012 9
http://data.semanticweb.org/conference/iswc/2012 8
7"ISWC2012"
"ESWC2012" 6
5http://data.semanticweb.org/conference/iswc/2012/paper/05
http://data.semanticweb.org/conference/eswc/2012/paper/04 4
3http://data.semanticweb.org/conference/eswc/2012/paper/03
http://data.semanticweb.org/conference/iswc/2012/paper/02 2
1http://data.semanticweb.org/conference/eswc/2012/paper/01
HexastoreMappingDictionary
An Example (Publications):
first step, encode strings
97
http://data.semanticweb.org/person/john-smith 20
http://data.semanticweb.org/person/juan-perez 19
18http://data.semanticweb.org/person/peter-smith
http://data.semanticweb.org/ns/swc/ontology#hasAcronym 17
http://www.w3.org/1999/02/22-rdf-syntax-ns#type 16
15"2012"
14http://swrc.ontoware.org/ontology#biblioReference
13http://swrc.ontoware.org/ontology#year
12http://swrc.ontoware.org/ontology#author
11http://data.semanticweb.org/ns/swc/ontology#isPartOf
http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent 10
http://data.semanticweb.org/conference/eswc/2012 9
http://data.semanticweb.org/conference/iswc/2012 8
7"ISWC2012"
"ESWC2012" 6
5http://data.semanticweb.org/conference/iswc/2012/paper/05
http://data.semanticweb.org/conference/eswc/2012/paper/04 4
3http://data.semanticweb.org/conference/eswc/2012/paper/03
http://data.semanticweb.org/conference/iswc/2012/paper/02 2
1http://data.semanticweb.org/conference/eswc/2012/paper/01
HexastoreMappingDictionary
• Next, index original encoded RDF file …
Hexastore / Physical implementation.
Native storage
98
2
2
4
3
3
4
4
9
8
1
5
2
4
3
S
11
12
1
1
13 1
14 1
9
18
5
2
1
13
1
12
11
1
8
20
15
11
12
1
2
13 1
14 2
9
1918
15
41
11
12
1
2
13 1
14 1
9
2019
15
2
12 1
111
13 1
8
18
15
17 1
116 10
7
17
1
1
16 10
6
P O
• Uses vector-storage[Weiss et
al 2009]
• Each index level:
– partially sorted vector (on disk,
one file per level)
– each record:
<rdf term id, cardinality, pointer to set>
– first level gets special
treatment:
• fully sorted vector: search =
binary search, O(log(N))
• Hint: using an additional hash
can circumvent the binary search
Hexastore / Physical implementation.
Native storage
99
2
2
4
3
3
4
4
9
8
1
5
2
4
3
S
11
12
1
1
13 1
14 1
9
18
5
2
1
13
1
12
11
1
8
20
15
11
12
1
2
13 1
14 2
9
1918
15
41
11
12
1
2
13 1
14 1
9
2019
15
2
12 1
111
13 1
8
18
15
17 1
116 10
7
17
1
1
16 10
6
P O
• Pros:
– Optimized for read-intensive /
static workloads
– Optimal / fast triple pattern
matching
– No comparisons when scanning
since each (partial) cardinality is
known !
• Cons:
– Not suitable for write intensive
workloads (worst case can result
in rewrite of entire index!)
– Hypothesis (not yet tested): SSD
media may be adequate…
Hexastore / Physical implementation.
DBMS (managed) storage
100
• Builds on top of sorted
indexes:
– B+Tree, LSM Tree, etc
• Each index level:
– Same sorted index structure
(but not necessarily)
– each record:
<rdf term id list, cardinality>
– Search cost = as provided
by the managed index
structure
2
2
4
3
3
4
4
9
8
1
5
2
4
3
S
1
1
1
1
11
12
1
1
13 1
14 1
1 11 9 -
1 -1812
1 13 5 -
1 14 2 -
2
2
2
1
13
1
12
11
1
2 11 8 -
2 12 20 -
2 13 15 -
3
3
3
3
11
12
1
2
13 1
14 2
3 11 9 -
12 -3 18
3 13 15 -
3 -14 1
4
4
4
4 11
12
1
2
13 1
14 1
4 11 9 -
4 -12 20
4 13 15 -
4 14 2 -
5
5
5 12 1
111
13 1
5 11 8 -
5 12 18 -
5 13 15 -
8
8 17 1
116
8 16 10 -
8 17 7 -
9
9 17
1
1
16
9 16 10 -
9 17 6 -
SP SPO
-3 1912
43 -14
12 19 -4
Hexastore / Physical implementation.
DBMS (managed) storage
101
• Pros:
– Flexible design: each index can
be configured for a different
physical data-structure:
• i.e. PSO=B+Tree, OSP=LSM Tree
– Separation of concerns (the
underlying DBMS manages and
optimizes for access patterns,
cache, etc…)
• Cons:
– Less control over fine-grained
data storage (up to the level
permitted by the DBMS API)
– Usually less potential for native
optimization
2
2
4
3
3
4
4
9
8
1
5
2
4
3
S
1
1
1
1
11
12
1
1
13 1
14 1
1 11 9 -
1 -1812
1 13 5 -
1 14 2 -
2
2
2
1
13
1
12
11
1
2 11 8 -
2 12 20 -
2 13 15 -
3
3
3
3
11
12
1
2
13 1
14 2
3 11 9 -
12 -3 18
3 13 15 -
3 -14 1
4
4
4
4 11
12
1
2
13 1
14 1
4 11 9 -
4 -12 20
4 13 15 -
4 14 2 -
5
5
5 12 1
111
13 1
5 11 8 -
5 12 18 -
5 13 15 -
8
8 17 1
116
8 16 10 -
8 17 7 -
9
9 17
1
1
16
9 16 10 -
9 17 6 -
SP SPO
-3 1912
43 -14
12 19 -4
RDF3x (1)
 RISC style operators.
 Implements the traditionalOptimize-then-Execute
paradigm, to identify left-linear plans of star-shaped sub-
queries of basic triple patterns.
 Makes use of secondary memory structures to locally
store RDF data and reduce the overhead of I/O
operations.
 Registers aggregated count or number of occurrences,
and is able to detect if a query will return an empty
answer.
102
[Neumann and Weikum 2008]
RDF3x (2)
 Triple table, each row represents a triple.
 Mapping dictionary, replacing all the literal
strings by an id; two dictionary indices.
 Compressed clustered B+-tree to index all triples:
 Six permutations of subject, predicate and object;
each permutation in a different index.
 Triples are sorted lexicographically in each B+-tree.
103
[Neumann and Weikum 2008]
RDF3x (3)
104
Mapping dictionary
Triple Table
105
RDF3x(4)
[Figure from Neumann et al 2010]
 Compressed B+-trees.
 One B+-tree per pattern.
 Structure is updatable.
106
RDF3x (5)
[Figure from Lei Zouhttp://hotdb.info/slides/RDF-talk.pdf]
RDF3x (6)
107
Wrote Cites
Wrote
Paper1 Paper2
Paper3
Peter Smith
108
Conference
ESWC
Paper4
Conference
Cites
Paper5
ISWC
Conference
Conference
Cites
Conference
Juan Perez
Wrote
John Smith
Wrote
Wrote
Wrote
Cites
Wrote
Example
RDF Subgraph for the Publication Graph
109
conference/eswc/2
012/paper/03
conference/iswc/20
12/paper/05
person/peter-smith
ontoware/
ontology#author
2012
ontoware/
ontology#year
conference/iswc/
semanticweb/
ontology#isPartOf
ISWC
semanticweb/
ontology#hasAcronym
semanticweb/ontology/C
onferenceEvent
w3c/
rdf-syntax#type
ontoware/
ontology#author
conference/eswc/
semanticweb/
ontology#isPartOf
2012
ontoware/
ontology#year
w3c/
rdf-syntax#type
ontoware/
ontology#biblioReference
ESWC
semanticweb/
ontology#hasAcronym
conference/iswc/20
12/paper/01
RDF3x: Representation of the
Publication Graph
110
OID Resource
0 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
1 http://data.semanticweb.org/ns/swc/ontology#hasAcronym
2 http://data.semanticweb.org/ns/swc/ontology#isPartOf
3 http://swrc.ontoware.org/ontology#author
4 http://swrc.ontoware.org/ontology#year
5 http://swrc.ontoware.org/ontology#biblioReference
6 http://data.semanticweb.org/conference/iswc/2012
7 http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent
8 ISWC2012
9 http://data.semanticweb.org/conference/eswc/2012
10 ESWC2012
11 http://data.semanticweb.org/conference/iswc/2012/paper/05
12 http://data.semanticweb.org/conference/iswc/2012/paper/02
13 http://data.semanticweb.org/conference/eswc/2012/paper/01
14 http://data.semanticweb.org/conference/eswc/2012/paper/03
15 http://data.semanticweb.org/conference/eswc/2012/paper/04
16 http://data.semanticweb.org/author/peter-smith
17 http://data.semanticweb.org/author/juan-perez
18 http://data.semanticweb.org/author/john-smith
19 2012
S P O
6 0 7
9 0 7
6 0 7
9 1 8
11 1 10
12 2 6
13 2 6
14 2 9
15 2 9
11 3 16
12 3 18
13 3 16
14 3 16
14 3 17
15 3 17
15 3 18
11 4 19
12 4 19
13 4 19
14 4 19
15 4 19
13 5 12
14 5 13
14 5 15
15 5 12
RDF3xMappingDictionary
TripleTable
BitMat(1)
 3D bit-cube representation where each dimension
represents subjects, predicates and objects.
 Each cell of the matrix is 0 or 1, and represents an RDF
triple that can be formed by the combination of S, P, O.
 3D bit-cube is slicedalong a dimension to obtain 2D
matrices.
 A gap compression schema on each bit-row is
implemented:
 Basic operations over BitMat are defined to join basic triple
patterns.
111
[Atre et al. 2010]
112
Triple Table 3D-cube BitMat
S-O and O-S
BitMats for Ps
BitMat(2)
113
BitMat(3)[Figure from Atreand Hendler 2009]
Wrote Cites
Wrote
Paper1 Paper2
Paper3
Peter Smith
114
Conference
ESWC
Paper4
Conference
Cites
Paper5
ISWC
Conference
Conference
Cites
Conference
Juan Perez
Wrote
John Smith
Wrote
Wrote
Wrote
Cites
Wrote
Example
RDF Subgraph for the Publication Graph
115
conference/eswc/2
012/paper/03
conference/iswc/20
12/paper/05
person/peter-smith
ontoware/
ontology#author
2012
ontoware/
ontology#year
conference/iswc/
semanticweb/
ontology#isPartOf
ISWC
semanticweb/
ontology#hasAcronym
semanticweb/ontology/C
onferenceEvent
w3c/
rdf-syntax#type
ontoware/
ontology#author
conference/eswc/
semanticweb/
ontology#isPartOf
2012
ontoware/
ontology#year
w3c/
rdf-syntax#type
ontoware/
ontology#biblioReference
ESWC
semanticweb/
ontology#hasAcronym
conference/iswc/20
12/paper/01
BitMat: Representation of the
Publication Graph (1)
116
OID Resource
1 http://data.semanticweb.org/conference/eswc/2012
2 http://data.semanticweb.org/conference/eswc/2012/paper/01
3 http://data.semanticweb.org/conference/eswc/2012/paper/04
4 http://data.semanticweb.org/conference/iswc/2012
5 http://data.semanticweb.org/conference/iswc/2012/paper/02
6 http://data.semanticweb.org/conference/iswc/2012/paper/03
7 http://data.semanticweb.org/conference/eswc/2012/paper/05
S P O
4 6 8
4 1 1
2
1 6 8
1 1 7
7 2 4
5 2 4
2 2 1
6 2 1
3 2 1
2 3 1
1
6 3 1
1
6 3 1
0
3 3 1
0
3 3 9
7 3 1
1
5 3 9
2 5 6
5 5 6
6 5 6
3 5 6
7 5 6
sID Table
3DCubeBitMat
BitMat: Representation of the
Publication Graph (2)
117
S P O
4 6 8
4 1 1
2
1 6 8
1 1 7
7 2 4
5 2 4
2 2 1
6 2 1
3 2 1
2 3 1
1
6 3 1
1
6 3 1
0
3 3 1
0
3 3 9
7 3 1
1
5 3 9
2 5 6
5 5 6
6 5 6
3 5 6
7 5 6
pID Table
3DCubeBitMat
OID Resource
1 http://data.semanticweb.org/ns/swc/ontology#hasAcronym
2 http://data.semanticweb.org/ns/swc/ontology#isPartOf
3 http://swrc.ontoware.org/ontology#author
4 http://swrc.ontoware.org/ontology#biblioReference
5 http://swrc.ontoware.org/ontology#year
6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
BitMat: Representation of the
Publication Graph (3)
118
S P O
4 6 8
4 1 1
2
1 6 8
1 1 7
7 2 4
5 2 4
2 2 1
6 2 1
3 2 1
2 3 1
1
6 3 1
1
6 3 1
0
3 3 1
0
3 3 9
7 3 1
1
5 3 9
2 5 6
5 5 6
6 5 6
3 5 6
7 5 6
oID Table
3DCubeBitMat
OID Resource
1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
2 http://data.semanticweb.org/ns/swc/ontology#hasAcronym
3 http://data.semanticweb.org/ns/swc/ontology#isPartOf
4 http://swrc.ontoware.org/ontology#author
5 http://swrc.ontoware.org/ontology#year
6 http://swrc.ontoware.org/ontology#biblioReference
7 http://data.semanticweb.org/conference/iswc/2012
8 http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent
9 ISWC2012
10 http://data.semanticweb.org/conference/eswc/2012
11 ESWC2012
12 http://data.semanticweb.org/conference/iswc/2012/paper/05
BHyper(1)
 A hypergraph-basedstructure is used to represent RDF documents.
 Mapping dictionary, replacing all the literal strings by an id, two
dictionary indices, indexed with a Hash Index.
 Encodings of RDF resources are implemented as nodes, and set of
triple patterns corresponds to hyperedges.
 A role function is used to denote the role played by a resource in a
hyperedge. Roles are represented as labels in the hyperedges.
 Indicesare used to index the hyperdges and the different roles
played by the resources in the hyperedges:
 B+-tree are used to index hyperedges.
 BHyper is built on top of Tokyo Cabinet1.4.451.
119
1 http://sourceforge.net/projects/tokyocabinet/
[Vidal and Martínez 2011]
BHyper(2)
120
[Vidal and Martínez 2011]
A hyper-edge can relate multiple edges.
BHyper(3)
121
[Vidal and Martínez 2011]
HANDS-ON SESSION
122
5
Experimental Set-Up
The experiments presented in this section were executed on a
machine with the following characteristics:
 Model: Sun Fire x4100
 Quantity of processors: 2
 Processor specification: Dual-Core AMD Opteron™ Processor 2218 64
bits
 RAM memory: 16GB
 Disk capacity: 2 disks, 135GB e/o
 Network interface: 2 + ALOM
 Operating System:CentOS 5.5. 64 bits
 RDF3x version: 0.3.7
 DEX version: 4.8 Very Large Databases
 HyperGraphDB version: 1.2
 Neo4J version: 1.8.2
 Timeout: 3,600 secs. (Value returned in the portal -2)
123
124
Graph Name #Nodes #Edges Description
DSJC1000.1 1,000 49,629 Graph Density:0.1
[Johnson91]
DSJC1000.5 1,000 249,826 Graph Density:0.5
[Johnson91]
DSJC1000.9 1,000 449,449 Graph Density:0.9
[Johnson91]
SSCA2-17 131,072 3,907,909 GTgraph 1)
[Johnson91] Johnson, D., Aragon, C., McGeoch, L., and Schevon, C. Optimization by simulated
annealing: an experimental evaluation; part ii, graph coloring and number partitioning. Operations
research 39, 3 (1991), 378–406.
GTgraph: A suite of three synthetic graph generators:
1) SSCA2: generates graphs used as input instances for the DARPA. High Productivity Computing
Systems SSCA#2 graph theory benchmark
http://www.cse.psu.edu/~madduri/software/GTgraph/index.html
Benchmark Graphs
125
Task Description
adjacentXP Given a node X and an edge label P, find
adjacentnodes Y.
adjacentX Given a node X,find adjacentnodes Y.
edgeBetween Given two nodes X and Y, find the labels of the
edges between X and Y.
2-hopX Given a node X, find the 2-hop Y nodes.
3-hopX Given a node X, find the 3-hop Y nodes.
4-hopX Given a node X, find the 4-hop Y nodes.
Benchmark Tasks
The benchmark tasks evaluate the engine performance while executing
the following graph operations:
126
Task SPARQL Query
adjacentXP Select ?y
where {
<http://graphdatabase.ldc.usb.ve/resource/6><http://graphdatabase.ldc.usb.ve/resource/pr>
?y}
adjacentX Select ?y
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p ?y}
edgeBetween Select ?p
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p<http://graphdatabase.ldc.usb.ve/resource/8>
2-hopX Select ?z
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1.
?y1 ?p2 ?z}
3-hopX Select ?z
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1.
?y1 ?p2 ?y2.
?y2 ?p3 ?z}
4-hopX Select ?z
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1.
?y1 ?p2 ?y2.
?y2 ?p3 ?y3.
?y3 ?p4 ?z}
Benchmark Tasks: RDF Engines
The benchmark tasks were implemented in the RDF engines with the
following SPARQL queries:
Graph Mining Tasks
These tasks evaluate the engine performance
while executing the following graph mining:
 Graph summarization.
 Dense subgraph.
 Shortest path.
127
4 5 76
1 2 3
8
9
0
4 5 6 7
1 2 3
Add(8,1)
Add(9,3)
Add(0.3)
Del(4,1)
8 9
0
Corrections
Cost = 1(superedge) + 4(corrections)= 5
Graph Summarization
[Navlakha et al. 2008]
Original
Graph
Summarized
Graph
128
A
B
C
D
E
F
G
H
1
2 1
1
1
3
3
2
3
1
1
1
2
[Khuller et al. 2010]
Dense Subgraph
129
A
B
C
D
E
F
G
H
1
2 1
1
1
3
3
2
3
1
1
1
2
Given a subset of nodes E’ compute the densest subgraph containing
them, i.e., density of E’ is maximized:
density(E') =
weight(a,b)
| E'|(a,b) in E'
å
[Khuller et al. 2010]
Dense Subgraph
130
Shortest Path
• Given two nodes finds the shortest path
between them.
• Implemented using Iterative Depth First Search
(DFID).
• Ten node pairs are taken at random from each
graph, and then the Shortest Path algorithm is
executed on each pair of nodes
131
Experimental Results
http://graphium.ldc.usb.ve/
José
Sánchez
Valeria
Pestana
José
Piñeiro
Domingo De
Abreu
Jonathan
Queipo Alejandro
Flores
Guillermo
Palma
Collaborators:
USB
132
133
Loading Time: RDF Engines
Graph Name #Nodes #Edges RDF3x
Loading Time
Hexastore
Loading Time
DSJC1000.1 1,000 49,629 2.036s 6.174s
DSJC1000.5 1,000 249,826 5.594s 30.809s
DSJC1000.9 1,000 449,449 8.992s 30.809s
SSCA2-17 131,072 3,907,909 44.775s 2m43.814s
134
Task Description
adjacentXP Given a node X and an edge label P, find
adjacentnodes Y.
adjacentX Given a node X,find adjacentnodes Y.
edgeBetween Given two nodes X and Y, find the labels of the
edges between X and Y.
2-hopX Given a node X, find the 2-hop Y nodes.
3-hopX Given a node X, find the 3-hop Y nodes.
4-hopX Given a node X, find the 4-hop Y nodes.
Benchmark Tasks
The benchmark tasks evaluate the engine performance while executing
the following graph operations:
135
Task SPARQL Query
adjacentXP Select ?y
where {
<http://graphdatabase.ldc.usb.ve/resource/6><http://graphdatabase.ldc.usb.ve/resource/pr>
?y}
adjacentX Select ?y
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p ?y}
edgeBetween Select ?p
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p<http://graphdatabase.ldc.usb.ve/resource/8>
2-hopX Select ?z
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1.
?y1 ?p2 ?z}
3-hopX Select ?z
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1.
?y1 ?p2 ?y2.
?y2 ?p3 ?z}
4-hopX Select ?z
where {
<http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1.
?y1 ?p2 ?y2.
?y2 ?p3 ?y3.
?y3 ?p4 ?z}
Benchmark Tasks: RDF Engines
The benchmark tasks were implemented in the RDF engines with the
following SPARQL queries:
136
Benchmark Tasks: Hexastore
! "#$
#$
#! $
#! ! $
#! ! ! $
#! ! ! ! $
%&'%(%)*%%+$,&-,.%+)/$,&-,.%+)/0$
12345/$
62345/$
72345/$
!"#$%&'()*+,#)-.#$./0)1'2)3$45#)
6#( $7, 489)*4.9.)
: #"4.;' 8#)
89:; #! ! ! "#$
89:; #! ! ! "<$
89:; #! ! ! "=$
99; >12#?$
K-hop seems to grow
exponentially
137
Benchmark Tasks: RDF3x
! "! ! #$
! "! #$
! "#$
#$
#! $
#! ! $
#! ! ! $
#! ! ! ! $
%&'%(%)*
%%+$,&-,.%+)/$,&-,.%+)/0$
12345/$
62345/$
72345/$
!"#$%&'()*+,#)-.#$./0)1'2)3$45#)
6#( $7, 489)*4.9.)
: ; <="))
89:; #! ! ! "#$
89:; #! ! ! "<$
89:; #! ! ! "=$
99; >12#?$
K-hop seems to grow
exponentially
138
Loading Time: Graph Database
Engines
Graph Name #Nodes #Edges DEX
Loading
Time
Neo4J
Loading
Time
Hyper
GraphDB
Loading
Time
DSJC1000.1 1,000 49,629 10.031s 19.884s 159.665s
DSJC1000.5 1,000 249,826 41.677s 113.174s 812.135s
DSJC1000.9 1,000 449,449 76.389s 321.299s 1,557.574s
SSCA2-17 131,072 3,907,909 455.293s 359.082s 7,148.301s
139
Benchmark Tasks: Neo4J
! "#$
#$
#! $
#! ! $
#! ! ! $
#! ! ! ! $
%&'%(%)*%%+$,&-,.%+)/$,&-,.%+)/0$
12345/$
62345/$
72345/$
!"#$%&'()*+,#)-.#$./)0'1)2$34#)
5#( $6, 378)*3.8)
9 #' : ;)
89:; #! ! ! "#$
89:; #! ! ! "<$
89:; #! ! ! "=$
99; >12#?$
K-hop seems to grow
exponentially
140
Benchmark Tasks: DEX
K-hop seems to grow
exponentially
141
Benchmark Tasks: HyperGraphDB
K-hop seems to grow
exponentially
Graph Mining Tasks
These tasks evaluate the engine performance
while executing the following graph mining:
 Graph summarization.
 Dense subgraph.
 Shortest path.
142
4 5 76
1 2 3
8
9
0
4 5 6 7
1 2 3
Add(8,1)
Add(9,3)
Add(0.3)
Del(4,1)
8 9
0
Corrections
Cost = 1(superedge) + 4(corrections)= 5
Graph Summarization
[Navlakha et al. 2008]
Original
Graph
Summarized
Graph
143
A
B
C
D
E
F
G
H
1
2 1
1
1
3
3
2
3
1
1
1
2
[Khuller et al. 2010]
Dense Subgraph
144
A
B
C
D
E
F
G
H
1
2 1
1
1
3
3
2
3
1
1
1
2
Given a subset of nodes E’ compute the densest subgraph containing
them, i.e., density of E’ is maximized:
density(E') =
weight(a,b)
| E'|(a,b) in E'
å
[Khuller et al. 2010]
Dense Subgraph
145
Shortest Path
• Given two nodes finds the shortest path
between them.
• Implemented using Iterative Depth First Search
(DFID).
• Ten node pairs are taken at random from each
graph, and then the Shortest Path algorithm is
executed on each pair of nodes
146
147
Graph Mining Tasks: DEX
Graph Mining Tasks: Neo4J
148
! "
! #"
! ##"
! ###"
! ####"
$%&'!###(!"
$%&'!###()"
$%&'!###(*"
%%'+,-!."
!"#$%&'()*+,#)-.#$./0)1'2)3$45#)
6#( $7, 489)*4.9.)
: #' ; <)
/ 0123"%45 5 1067189: "
$; : <; "%4=/ 0123"
%390>; <>"?1>3"
Graph Mining Tasks: HyperGraphDB
149
! "
! #"
! ##"
! ###"
! ####"
$%&'!###(!"
$%&'!###()"
$%&'!###(*"
%%'+,-!."
!"#$%&'()*+,#)-.#$./0)1'2)3$45#)
6#( $7, 489)*4.9.)
: ; <#8=84<7>6)
/ 0123"%45 5 1067189: "
$; : <; "%4=/ 0123"
%390>; <>"?1>3"
QUESTIONS & DISCUSSION
150
6
Graph Data Storing Features
151
Graph Database Main
Memory
Persistent
Memory
Backend
Storage
Indexing
DEX X X X
HyperGraphDB1 X X X X
Neo4j X X X
Hexastore2 X X X X
RDF3x X X
Bitmat X X
Bypher2 X X X X
2Backend Storage: Tokyo Cabinet
1Backend Storage: Berkeley DB
Graph Data Management Features
152
Graph Database Graph API Query
Language
DataVi
suali-
zationData
Definition
Data
Manipulation
Standard
Compliance
DEX X X X X
HyperGraphDB X X X
Neo4j X X X X X
Hexastore X X X X
RDF3x X
BitMat X X X
Bypher X X X
Adjacency
Queries
Node/Edge
Adjacency
K-neighborhood
of a node.
Reachability
Queries
Test whenever
two nodes are
connected by a
path.
Shortest Path.
Pattern
Matching
Queries
Find all sub-graphs of
a data graph.
Summarization
Queries
Aggregate the
results of a
graph-based
query
153
Graph-Based Operations
154
Graph
Database
Node/Edge
adjacency
K-
neighborhood
Fixed-length
paths
RegularSimple
Paths
ShortestPath
Pattern
Matching
Summarization
Queries
DEX X X X X X X
HyperGraph X X X
Neo4j X X X X X X
Hexastore X X
RDF3x X X
BitMat X X
Bypher X X
Graph-Based Operations
SUMMARY & CLOSING
155
7
Data Storing Features:
 Engines implement a wide variety of
structures to efficiently represent large graphs.
 RDF engines implement special-purpose
structures to efficiently store RDF graphs.
156
Conclusions (1)
Data Management Features:
 General-purpose graph database engines
provide APIs to manage and query data.
 RDF engines support general data management
operations.
157
Conclusions (2)
Graph-Based Operations Support:
 General-purpose graph database engines provide API
methods to implement a wide variety of graph-based
operations.
 Only few offer query languages to express pattern
matching.
 RDF engines are not customized for basic graph-based
operations, however,
 They efficiently implement the pattern matching-
based language SPARQL.
158
Conclusions (3)
 Extensionof existing enginesto support specific
tasks of graph traversal and mining.
 Definition of benchmarksto study the performance
and quality of existing graph database engines, e.g.,
 Graph traversal, link prediction, pattern discovery,
graph mining.
 Empirical evaluation of the performance and quality
of existing graph database engines
159
Future Directions
P. Anderson, A. Thor, J. Benik, L. Raschid, and M.-E. Vidal. Pang: finding patterns in
anno- tation graphs. In SIGMOD Conference, pages 677–680, 2012.
R. Angles and C. Gutiérrez. Survey of graph database models. ACM Comput. Surv.,
40(1), 2008.
M. Atre, V. Chaoji, M. Zaki, and J. Hendler. Matrix Bit loaded: a scalable lightweight join
query processor for RDF data. In International Conference on World Wide Web,
Raleigh, USA, 2010. ACM.
B.Iordanov.Hypergraphdb:Ageneralizedgraphdatabase.InWAIMWorkshops,pages25–36,
2010.
N. Martíınez-Bazan, V. Muntés-Mulero, S. G ómez-Villamor, J. Nin, M.-A. nchez-
Martíınez, and J.-L. Larriba-Pey. Dex: high-performance exploration on large graphs
for information retrieval. In CIKM, pages 573–582, 2007.
T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. VLDB, 1(1),
2008.
160
References
M. K. Nguyen, C. Basca, and A. Bernstein. B+hash tree: optimizing query execution
times for on-disk semantic web data structures. In Proceedings Of The 6th
International Workshop On Scalable Semantic Web Knowledge Base Systems
(SSWS2010), 2010.
I Robinson, J. Webber, and E. Eifrem. Graph Databases. O’Reilly Media, Incorporated,
2013.
S. Sakr and E. Pardede, editors. Graph Data Management: Techniques and Applications.
IGI Global, 2011.
D.ShimpiandS.Chaudhari. An overview of graph databases .In IJCA Proceedings on
International Conference on Recent Trends in Information Technology and
Computer Science 2012, 2013.
M.-E. Vidal, A. Martínez, E. Ruckhaus, T. Lampo, and J. Sierra. On the efficiency of
querying and storing rdf documents. In Graph Data Management, pages 354–385.
2011.
C. Weiss, P. Karras, and A. Bernstein. Hexastore: Sextuple indexing for semantic web
data management. Proc. of the 34th Intl Conf. on Very Large Data Bases (VLDB),
2008.
161
References
C Weiss, A Bernstein, On-disk storage techniques for Semantic Web data - Are B-Trees
always the optimal solution?, Proceedings of the 5th International Workshop on
Scalable Semantic Web Knowledge Base Systems, October 2009.
162
References

Weitere ähnliche Inhalte

Was ist angesagt?

Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018Amazon Web Services
 
Neo4j Bloom: What’s New with Neo4j's Data Visualization Tool
Neo4j Bloom: What’s New with Neo4j's Data Visualization ToolNeo4j Bloom: What’s New with Neo4j's Data Visualization Tool
Neo4j Bloom: What’s New with Neo4j's Data Visualization ToolNeo4j
 
Graphs for Data Science and Machine Learning
Graphs for Data Science and Machine LearningGraphs for Data Science and Machine Learning
Graphs for Data Science and Machine LearningNeo4j
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceNeo4j
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationNeo4j
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data ManagementNeo4j
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at ScaleNeo4j
 
Building a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical gridBuilding a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical gridNeo4j
 
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4jAdobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4jNeo4j
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph DatabaseTobias Lindaaker
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...Neo4j
 
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfNeo4j
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Neo4j
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...Neo4j
 
Enabling Compliance with GDPR on AWS
Enabling Compliance with GDPR on AWSEnabling Compliance with GDPR on AWS
Enabling Compliance with GDPR on AWSAmazon Web Services
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDBMongoDB
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 

Was ist angesagt? (20)

Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
Migrating to Amazon Neptune (DAT338) - AWS re:Invent 2018
 
Neo4j Bloom: What’s New with Neo4j's Data Visualization Tool
Neo4j Bloom: What’s New with Neo4j's Data Visualization ToolNeo4j Bloom: What’s New with Neo4j's Data Visualization Tool
Neo4j Bloom: What’s New with Neo4j's Data Visualization Tool
 
Graphs for Data Science and Machine Learning
Graphs for Data Science and Machine LearningGraphs for Data Science and Machine Learning
Graphs for Data Science and Machine Learning
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
A Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain OptimizationA Connections-first Approach to Supply Chain Optimization
A Connections-first Approach to Supply Chain Optimization
 
Graph Databases for Master Data Management
Graph Databases for Master Data ManagementGraph Databases for Master Data Management
Graph Databases for Master Data Management
 
Graph Data Science at Scale
Graph Data Science at ScaleGraph Data Science at Scale
Graph Data Science at Scale
 
Introdução ao neo4j
Introdução ao neo4jIntrodução ao neo4j
Introdução ao neo4j
 
Building a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical gridBuilding a modern data stack to maintain an efficient and safe electrical grid
Building a modern data stack to maintain an efficient and safe electrical grid
 
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4jAdobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
https://www.slideshare.net/neo4j/a-fusion-of-machine-learning-and-graph-analy...
 
Graph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdfGraph Database 101- What, Why and How?.pdf
Graph Database 101- What, Why and How?.pdf
 
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
 
The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...The three layers of a knowledge graph and what it means for authoring, storag...
The three layers of a knowledge graph and what it means for authoring, storag...
 
Enabling Compliance with GDPR on AWS
Enabling Compliance with GDPR on AWSEnabling Compliance with GDPR on AWS
Enabling Compliance with GDPR on AWS
 
Webinar: How Banks Manage Reference Data with MongoDB
 Webinar: How Banks Manage Reference Data with MongoDB Webinar: How Banks Manage Reference Data with MongoDB
Webinar: How Banks Manage Reference Data with MongoDB
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 

Ähnlich wie Semantic Data Management in Graph Databases

Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialMaribel Acosta Deibe
 
Graph Introduction.ppt
Graph Introduction.pptGraph Introduction.ppt
Graph Introduction.pptFaruk Hossen
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashingVictor Palmar
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabatinabati
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataMarko Rodriguez
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
Graph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptxGraph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptxasimshahzad8611
 
graph representation.pdf
graph representation.pdfgraph representation.pdf
graph representation.pdfamitbhachne
 
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONFREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONcscpconf
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAmrinder Arora
 
Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctureszukun
 
Graph in Data Structure
Graph in Data StructureGraph in Data Structure
Graph in Data StructureProf Ansari
 
Lecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxLecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxking779879
 
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRYON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRYFransiskeran
 

Ähnlich wie Semantic Data Management in Graph Databases (20)

Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Graph Introduction.ppt
Graph Introduction.pptGraph Introduction.ppt
Graph Introduction.ppt
 
Lecture 5b graphs and hashing
Lecture 5b graphs and hashingLecture 5b graphs and hashing
Lecture 5b graphs and hashing
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabati
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Dijkstra
DijkstraDijkstra
Dijkstra
 
d
dd
d
 
Graph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptxGraph terminology and algorithm and tree.pptx
Graph terminology and algorithm and tree.pptx
 
05 graph
05 graph05 graph
05 graph
 
graph representation.pdf
graph representation.pdfgraph representation.pdf
graph representation.pdf
 
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATIONFREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data Structures
 
Skiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strcturesSkiena algorithm 2007 lecture10 graph data strctures
Skiena algorithm 2007 lecture10 graph data strctures
 
Graph Theory
Graph TheoryGraph Theory
Graph Theory
 
Graphs
GraphsGraphs
Graphs
 
Graph in Data Structure
Graph in Data StructureGraph in Data Structure
Graph in Data Structure
 
Lecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptxLecture 2.3.1 Graph.pptx
Lecture 2.3.1 Graph.pptx
 
18 Basic Graph Algorithms
18 Basic Graph Algorithms18 Basic Graph Algorithms
18 Basic Graph Algorithms
 
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRYON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
ON ALGORITHMIC PROBLEMS CONCERNING GRAPHS OF HIGHER DEGREE OF SYMMETRY
 

Mehr von Maribel Acosta Deibe

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsMaribel Acosta Deibe
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia StudyMaribel Acosta Deibe
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...Maribel Acosta Deibe
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Maribel Acosta Deibe
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingMaribel Acosta Deibe
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 

Mehr von Maribel Acosta Deibe (7)

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via CrowdsourcingHARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 

Kürzlich hochgeladen

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 

Kürzlich hochgeladen (20)

Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 

Semantic Data Management in Graph Databases

  • 1. Semantic Data Management in Graph Databases -Tutorial at ESWC 2013- Maria-Esther Vidal Edna Ruckhaus Maribel Acosta Cosmin Basca USB 1
  • 2. 2 ! " # $ %% # & Network of Friends in a High School Annotation Graph representing relations between clinical trails Air-traffic between US cities A Fragment of Facebook Relationships between comments about a topic in Twitter Graphs …
  • 3. 3 ! " # $ %% # & A significant increase of graph data in the form of social & biological information. Tasks to be Solved … Patternsof connectionsbetween people to understand functioning of society. Topological properties of graphs can be used to identify patterns that reveal phenomena, anomalies and potentially lead to a discovery.
  • 4. Tasks to be Solved … (2) Relatedness between two nodes. 4 The importance of the information relies on the relations more or equal than on the entities. [Anderson et al. 2012] Best matchingbetwee n two set of sub- graphs. Keyword queries. Proximitypa tterns. Frequents ub-graphs. Graph ranking.
  • 5. Basic Graph Operations Basic Graph operations that need to be efficiently performed:  Graph traversal.  Measuring path length.  Finding the most interesting central node. 5
  • 6. NoSQL-based Approaches These approaches can handle unstructured, unpredictable or messy data. 6 Wide-column stores BigTable model of Google Cassandra Document Stores Semi-structured Data MongoDB Key-value stores Berkeley DB Graph Databases Graph-based oriented data
  • 7. Agenda 7 Basic Concepts & Background (25 min.) The Graph Data Management Paradigm (5 min.) Existing Graph Database Engines (45 min.) Coffee Break (30 min.) Existing RDF Graph Engines (50 min.) Hands-on session(25 min.) Questions & Discussion (5 min.) Summary & Closing (10 min.) 1 2 3 6 7 5 4
  • 9. Abstract Data Type Graph G=(V, E,Σ,L) is a graph:  V is a finite set of nodes or vertices, e.g. V={Term, forOffice, Organization,…}  E is a set of edges representing binary relationship between elements in V, e.g. E={(forOffice,Term) (forOffice,Organization),(Office,Organization)…}  Σis a set of labels, e.g., Σ={domain, range, sc, type, …}  L is a function: V x V Σ, 9 e.g., L={((forOffice,Term),domain), ((forOffice,Organization),range)… }
  • 10. Abstract Data Type Multi-Graph G=(V, E,Σ,L) is a multi-graph:  V is a finite set of nodes or vertices, e.g. V={Term, forOffice, Organization,…}  E is a set of edges representing binary relationship between elements in V, e.g. E={(forOffice,Term) (forOffice,Organization),(Office,Organization)…}  Σis a set of labels, e.g., Σ={domain, range, sc, type, …}  L is a function: V x V PowerSet(Σ), 10 e.g., L={((forOffice,Term),{domain}), ((forOffice,Organization),{range}), ((_id0,AZ),{forOffice, forOrganization})… }
  • 11. Basic Operations Given a graph G, the following are operations over G:  AddNode(G,x): adds node x to the graph G.  DeleteNode(G,x): deletes the node x from graph G.  Adjacent(G,x,y):tests if there is an edge from x to y.  Neighbors(G,x):nodes y s.t. there is a node from x to y.  AdjacentEdges(G,x,y):set of labels of edges from x to y.  Add(G,x,y,l):adds an edge between x and y with label l.  Delete(G,x,y,l):deletes an edge between x and y with label l.  Reach(G,x,y):tests if there a path from x to y.  Path(G,x,y): a (shortest) path from x to y.  2-hop(G,x):set of nodes y s.t. there is a path of length 2 from x to y, or from y to x.  n-hop(G,x):set of nodes y s.t. there is a path of length n from x to y, or from y to x. 11
  • 12. Implementation of Graphs 12 Adjacency List For each node a list of neighbors. If the graph is directed, adjacency list of i contains only the outgoing nodes of i. Cheaper for obtaining the neighbors of a node. Not suitable for checking if there is an edge between two nodes. Incidence List Vertices and edges are stored as records or objects. Each vertex stores incident edges. Each edge stores incident nodes. Incidence Matrix Bidimensional representation of graph. Rows represent Vertices. Columns represent edges An entry of 1 represents that the Source Vertex is incident to the Edge. Adjacency Matrix Bidimensional representation of graph. Rows represent Source Vertices. Columns represent Destination Vertices. Each entry with 1 represents that there is an edge from the source node to the destination node. Compressed Adjacency Matrix Differential encoding between two consecutive nodes [Sakr and Pardede2012]
  • 13. Adjacency List 13 V1 V2 V3 V4 (V1,{L2}) (V3,{L3} ) (V1,{L1}) Properties:  Storage: O(|V|+|E|+|L|)  Adjacent(G,x,y): O(|E|)  Neighbors(G,x): O(|E|)  AdjacentEdges(G,x,y): O(|E|)  Add(G,x,y,l): O(|E|)  Delete(G,x,y,l): O(|E|) L2 L3 L1 V1 V2 V3 V4
  • 14. Implementation of Graphs 14 Adjacency List For each node a list of neighbors. If the graph is directed, adjacency list of i contains only the outgoing nodes of i. Cheaper for obtaining the neighbors of a node. Not suitable for checking if there is an edge between two nodes. Incidence List Vertices and edges are stored as records of objects. Each vertex stores incident edges. Each edge stores incident nodes. Adjacency Matrix Bidimensional representation of graph. Rows represent Source Vertices. Columns represent Destination Vertices. Each entry with 1 represents that there is an edge from the source node to the destination node. Incidence Matrix Bi-dimensional representation of graph. Rows represent Vertices. Columns represent edges An entry of 1 represents that the Source Vertex is incident to the Edge. Compressed Adjacency Matrix Differential encoding between two consecutive nodes [Sakr and Pardede2012]
  • 15. Incidence List 15 Properties:  Storage: O(|V|+|E|+|L|)  Adjacent(G,x,y): O(|E|)  Neighbors(G,x): O(|E|)  AdjacentEdges(G,x,y): O(|E|)  Add(G,x,y,l): O(|E|)  Delete(G,x,y,l): O(|E|) (source,L2) (source,L3) (source,L1 ) (destination,L3) (V4,V1) (V2,V1) (V2,V3) (destination,L 2) (destination,L 1) V1 V2 V3 V4 L1 L2 L3 L2 L3 L1 V1 V2 V3 V4
  • 16. Implementation of Graphs 16 Adjacency List For each node a list of neighbors. If the graph is directed, adjacency list of i contains only the outgoing nodes of i. Cheaper for obtaining the neighbors of a node. Not suitable for checking if there is an edge between two nodes. Incidence List Vertices and edges are stored as records of objects. Each vertex stores incident edges. Each edge stores incident nodes. Adjacency Matrix Bidimensional graph representation. Rows represent source vertices. Columns represent destination vertices. Each non-null entry represents that there is an edge from the source node to the destination node. Incidence Matrix Bi-dimensional representation of graph. Rows represent Vertices. Columns represent edges An entry of 1 represents that the Source Vertex is incident to the Edge. Compressed Adjacency Matrix Differential encoding between two consecutive nodes [Sakr and Pardede2012]
  • 17. 17 {L2} {L3} {L1} V1 V2 V3 V4 V1 V2 V3 V4 Adjacency Matrix L2 L3 L1 V1 V2 V3 V4 Properties:  Storage: O(|V|2)  Adjacent(G,x,y): O(1)  Neighbors(G,x): O(|V|)  AdjacentEdges(G,x,y): O(|E|)  Add(G,x,y,l): O(|E|)  Delete(G,x,y,l): O(|E|)
  • 18. Implementation of Graphs 18 Adjacency List For each node a list of neighbors. If the graph is directed, adjacency list of i contains only the outgoing nodes of i. Cheaper for obtaining the neighbors of a node. Not suitable for checking if there is an edge between two nodes. Incidence List Vertices and edges are stored as records of objects. Each vertex stores incident edges. Each edge stores incident nodes. Adjacency Matrix Bidimensional graph representation. Rows represent source vertices. Columns represent destination vertices. Each non-null entry represents that there is an edge from the source node to the destination node. Incidence Matrix Bidimensional graph representation. Rows represent vertices. Columns represent edges A non-null entry represents that the source vertex is incident to the edge. Compressed Adjacency Matrix Differential encoding between two consecutive nodes [Sakr and Pardede2012]
  • 19. 19 destination destination source source destination source L1 L2 L3 V1 V2 V3 V4 Incidence Matrix L2 L3 L1 V1 V2 V3 V4 Properties:  Storage: O(|V|x|E|)  Adjacent(G,x,y): O(|E|)  Neighbors(G,x): O(|V|x|E|)  AdjacentEdges(G,x,y): O(|E|)  Add(G,x,y,l): O(|V|)  Delete(G,x,y,l): O(|V|)
  • 20. Implementation of Graphs 20 Adjacency List For each node a list of neighbors. If the graph is directed, adjacency list of i contains only the outgoing nodes of i. Cheaper for obtaining the neighbors of a node. Not suitable for checking if there is an edge between two nodes. Incidence List Vertices and edges are stored as records of objects. Each vertex stores incident edges. Each edge stores incident nodes. Adjacency Matrix Bidimensional graph representation. Rows represent source vertices. Columns represent destination vertices. Each non-null entry represents that there is an edge from the source node to the destination node. Incidence Matrix Bidimensional graph representation. Rows represent Vertices. Columns represent edges A non-null entry represents that the source vertex is incident to the Edge. Compressed Adjacency Matrix Differential encoding between two consecutive nodes [Sakr and Pardede2012]
  • 21. Compressed Adjacency List Gap Encoding 21 Node Neighbors 1 20,23,24,25,27 2 30,32,33,34 3 40,42,45,46,47,48 ….. Node Successors 1 38,2,0,0,1 2 56,1,0,0 3 74,1,2,0,0,0 ….. v(x) = 2x if x ³ 0 2 x -1 if x < 0 ì í ï îï The ordered adjacent list: A(x)=(a1,a2,a3,…,an) is encoded as (v(a1-x),a2-a1-1,a3-a2-1,…, an-an-1 -1) Where: Properties:  Storage: O(|V|+|E|+|L|)  Adjacent(G,x,y): O(|E|)  Neighbors(G,x): O(|E|)  AdjacentEdges(G,x,y): O(|E|)  Add(G,x,y,l): O(|E|)  Delete(G,x,y,l): O(|E|) Original AdjacencyList Compressed AdjacencyList
  • 22. Traversal Search Breadth First Search Expands shallowest unexpanded nodes first. Search is complete. Depth First Search Expands deepest unexpanded nodes first. Search may be not complete: may fail in loops. 22
  • 23. 23 7 8 2 1 3 4 5 6 0 2 1 3 4 5 6 0 1 Starting Node First Level Visited Nodes Second Level Visited Nodes Third Level Visited Nodes Breadth First Search Notation:
  • 24. 24 7 8 2 1 3 4 5 6 0 2 1 3 4 5 6 0 Depth First Search 1 Starting Node First Level Visited Nodes Second Level Visited Nodes Third Level Visited Nodes Notation:
  • 25. THE GRAPH DATA MANAGEMENT PARADIGM 25 2
  • 26. The Graph Data Management Paradigm Graph database models:  Data models for schema and instances are graph models or generalizations of them.  Data manipulation is expressed as graph- based operations. 26
  • 27. Survey of Graph Database Models 27 A graph data model represents data and schema:  Graphs or  More general, the notion of graphs: Hypergraphs, Digraphs. [Renzo and Gutiérrez 2008]
  • 28. Survey of Graph Database Models 28 [Renzo and Gutiérrez 2008] Types of relationship supported by graph data models: • Properties, • Mono- or multi- valued. • Groups of real- world objects. • Structures to represent neighborhoods of an entity. Attributes Entities Neighborhood relations • Part-of, composed-by, n-ary associations. • Subclasses and superclasses, • Relations of instantiations. • Recursively specified relations. Standard abstractions Derivation and inheritance Nested relations
  • 29. Representative Approaches 29 No Index Only works for “toy” graphs. Vertices are on the scale of 1K. Node/Edge Index Create index in different nodes/edges Hexastore, RDF3x, BitMat,Neoj4, Frequent Subgraph Index Indices on frequently queried subgraphs Reachability and Neighborhood Indices 2-hop labeling schemas for index structures. Every two nodes within distance L.
  • 31. Graph Databases Advantages  Natural modeling of highly connected data.  Special graph storage structure  Efficient schemalessgraph algorithms.  Support for query languages.  Operators to query the graph structure. 31
  • 33. Existing General Graph Database Engines 33 DEX Java library for management of persistent and temporary graphs. Implementation relies on bitmaps and secondary structures (B+- tree) HyperGraphDB Implements the hyper graph data model. Neo4j Network oriented model where relations are first- class objects. Native disk-based storage manager for graphs. Framework for graph traversal. Most graph databases implement an API instead of a query language.
  • 34. DEX (1)  Labeled attribute multi-graph.  All node and edge objects belong to a type.  Edges have a direction:  Tail: corresponds to the source of the edge.  Head: corresponds to the destination of the edge.  Node and edge objects can have one or more attributes.  There are no restrictions on the number of edges between two nodes.  Loops are allowed.  Attributes are single valued. 34 [Martínez-Basan et al. 2007] http://www.sparsity-technologies.com/dex_tutorials
  • 35. DEX (2)  Attributes can have different indexes:  Basic attributes.  Indexed attributes.  Unique attributes.  Edges can index neighborhoods.  Neighborhood index.  The persistent database of DEX is a single file.  DEX can manage very large graphs. 35 [Martínez-Basan et al. 2007] http://www.sparsity-technologies.com/dex_tutorials
  • 36. DEX Architecture 36 [Martínez-Basan et al. 2007] http://www.sparsity-technologies.com/dex_tutorials
  • 37. DEX Internal Representation (1) 37 [Martínez-Basan et al. 2007] DEX Approach: Map + Bitmaps  Links Link: bidirectional association between values and OIDs  Given a value  a set of OIDs  Given an OID  the value http://www.sparsity-technologies.com/dex_tutorials
  • 38. DEX Internal Representation (2) 38 [Martínez-Basan et al. 2007]  A DEX Graph is a combination of Bitmaps:  Bitmap for each node or edge type.  One link for each attribute.  Two links for each type: Out- going and in-going edges.  Maps are B+trees  A compressed UTF-8 storage for UNICODE string. http://www.sparsity-technologies.com/dex_tutorials
  • 39. DEX API 39 Operations of the Abstract Data Type Graph:  Graph construction.  Definition of node and edge types.  Creation of node and edge objects.  Definition and use of attributes.  Query nodes and edges.  Management of edges and nodes. http://www.sparsity-technologies.com/dex_tutorials
  • 40. DEX API (2) Graph construction 40 DexConfigcfg = new DexConfig(); cfg.setCacheMaxSize(2048); // 2 GB cfg.setLogFile(“PUBLICATIONSDex.log"); Dexdex = new Dex(cfg); Database db = dex.create(”Publications.dex", ”PUBLICATIONSDex"); Session sess = db.newSession(); Graph graph = sess.getGraph(); …… sess.close(); db.close(); dex.close(); Operations on the graph database will be performed on the object graph
  • 41. Wrote Cites Wrote Paper1 Paper2 Paper3 Peter Smith 41 Conference ESWC Paper4 Conference Cites Paper5 ISWC Conference Conference Cites Conference Juan Perez Wrote John Smith Wrote Wrote Wrote Cites Wrote Example
  • 42. DEX API (3) Adding nodes to the graph 42 intauthorTypeId = graph.newNodeType(‛AUTHOR"); intpaperTypeId = graph.newNodeType(‛PAPER"); intconferenceTypeId = graph.newNodeType(‛CONFERENCE"); long author1 = graph.newNode(authorTypeId); long author2 = graph.newNode(authorTypeId); long author3 = graph.newNode(authorTypeId); long paper1 = graph.newNode(paperTypeId); long paper2 = graph.newNode(paperTypeId); long paper3 = graph.newNode(paperTypeId); long paper4 = graph.newNode(paperTypeId); long paper5 = graph.newNode(paperTypeId); long conference1 = graph.newNode(conferenceTypeId); long conference2 = graph.newNode(conferenceTypeId);
  • 43. DEX API (4) Adding edges to the graph 43 intwroteTypeId = graph.newEdgeType(‛WROTE", true,true); intisWrittenTypeId = graph.newEdgeType(‛Is-Written", true,true); intciteTypeId = graph.newEdgeType(‛CITE", true,true); intisCitedByTypeId = graph.newEdgeType(‛IsCitedBy", true,true); intconferenceTypeId = graph.newEdgeType(‛IsConference", true,true); long wrote1 = graph.newEdge(wroteTypeId, author1, paper1); long wrote2 = graph.newEdge(wroteTypeId, author1, paper3); long wrote3 = graph.newEdge(wroteTypeId, author1, paper5); long wrote4 = graph.newEdge(wroteTypeId, author2, paper3); long wrote5 = graph.newEdge(wroteTypeId, author2, paper4); long wrote4 = graph.newEdge(wroteTypeId, author3, paper2); long wrote5 = graph.newEdge(wroteTypeId, author3, paper4);
  • 44. DEX API (5) Indexing  Basic attributes:There is no index associated with the attribute.  Indexed attributes:There is an index automatically maintained by the system associated with the attribute.  Unique attributes:The same as for indexed attributes but with an added integrity restriction: two different objects cannot have the same value, with the exception of the null value.  DEX operations:Accessing the graph through an attribute will automatically use the defined index, significantly improving the performance of the operation.  A specific indexcan also be defined to improve certain navigational operations, for example, Neighborhood.  There is theAttributeKindenum class which includes basic, indexed and unique attributes. 44
  • 45. DEX API (6) Adding attributes to nodes of the graph 45 nameAttrId = graph.newAttribute(authorTypeId, "Name", DataType.String, AttributeKind.Indexed); conferenceNameAttrId = graph.newAttribute(conferenceTypeId, ‛ConferenceName", DataType.String, AttributeKind.Indexed); Value v = new Value(); graph.setAttribute(author1, nameAttrId, v.setString(Peter Smith")); graph.setAttribute(author2, nameAttrId, v.setString(”Juan Perez")); graph.setAttribute(author2, nameAttrId, v.setString(”John Smith")); graph.setAttribute(conference1, conferenceNameAttrId, v.setString(“ESWC”))); graph.setAttribute(conference2, conferenceNameAttrId, v.setString(“ISWC"));
  • 46. DEX API (7) Searching in the graph 46 Value v = new Value(); // retrieve all ’AUTHOR’ node objects nodeObjects authorObjs1 = graph.select(authorTypeId); ...// retrieve Peter Smith from the graph, which is a ”AUTHOR" nodeObjects authorObjs2 = graph.select(nameAttrId, Condition.Equal, v.setString(‛Peter Smith")); ...// retrieve all ’author' node objects having ”Smith" in the name. It would retrieve // Peter Smith and John Smith nodeObjects authorObjs3 = graph.select(nameAttrId, Condition.Like, v.setString(‛Smith")); ... authorObjs1.close(); authorObjs2.close(); authorObjs3.close();
  • 47. DEX API (8) Searching in the graph 47 Graph graph = sess.getGraph(); Value v = new Value(); ... // retrieve all ’author' node objects having a value for the 'name' attribute // satisfying the ’^J[^]*n$' regular expressionObjectsauthorObjs = graph.select(nameAttrId, Condition.RegExp, v.setString(‛^J[^]*n$")); ... authorObjs.close();
  • 48. DEX API (9) Searching in the graph 48 Value v = new Value(); Graph graph = sess.getGraph(); sess.begin(); intauthorTypeId = graph.findType(‛AUTHOR"); intnameAttrId = graph.findAttribute(authorTypeId, "NAME"); v.setString(‛Peter Smith"); Long peterSmith = graph.findObject(nameAttrId, v); sess.commit();
  • 49. DEX API (10) Traversing the graph  Navigational directioncan be restricted through the edges  The EdgesDirectionenumclass:  Outgoing: Only out-going edges, starting from the source node, will be valid for the navigation.  Ingoing: Only in-going edges will be valid for the navigation.  Any: Both out-going and in-going edges will be valid for the navigation. 49
  • 50. Graph graph = sess.getGraph(); ... Objects authorObjs1 = graph.select(authorTypeId); ...// retrieve Peter Smith from the graph, which is a ”AUTHOR" nodeObjects node1 = graph.select(nameAttrId, Condition.Equal, v.setString(‛Peter Smith")); nodeObjects node2 = graph.select(nameAttrId, Condition.Like, v.setString(‛Paper")); IntwroteTypeId = graph.findType(‛WROTE"); IntciteTypeId = graph.findType(‛CITE"); ...// retrieve all in-comings WROTE edgesObjects edges = graph.explode(node1, wroteTypeId, EdgesDirection.Ingoing); ...// retrieve all nodes through CITE edgesObjects cites = graph.neighbors(node2, citeTypeId, EdgesDirection.Any); ... edges.close(); cites.close(); DEX API (11) Traversing the graph 50 Explode-basedmethods visit the edges of a given node Neighbor-basedmethods visit the neighbor nodes of a given node identifier
  • 51. DEX API (12) Traversing the graph 51 Graph graph = sess.getGraph(); ... nodeObjects node2 = graph.select(nameAttrId, Condition.Like, v.setString(‛Paper")); intciteTypeId = graph.findType(‛CITE"); // 1-hop cites edgesObjects cites1 = graph.neighbors(node2, citeTypeId, EdgesDirection.Any); // cites of cites (2-hop) edgesObjects cites2 = graph.neighbors(cites1, citeTypeId, EdgesDirection.Any); ….. edges.close(); cites.close();
  • 52. DEX API Summary 52 Operation Description CreateGraph Creates a empty graph DropGraph Removes a graph RegisterType Registers a new node or edge type GetTypes Returns the list of all node or edge types FindType Returns the id of a node or edge type NewNode Creates a new empty node of a given node type AddNode Copies a node AddEdge Adds a new edge of some edge type DropObject Removes a node or an edge Scan Returns all nodes or edges of a given type Select Returns the nodes or edges of a given type which have an attribute that evaluates true to a comparison operator over a value Related Returns all neighbors of a node following edges members of an edge type Neighbors Returns all neighbors
  • 53. HyperGraphDB(1)  Represents graph and hypergraphstructures, a mathematical generalization of a graph, where several edges correspond to a hyperedge.  Hyperedges connect an arbitrary set of nodes (n-ary relations) and can point to other hyperedges(higher-order relations).  This representation model saves space. 53 [Iordanov 2010] Researcher:Bob Researcher:Mary Area:Database Area:AI Area:SW Researcher:Bob Researcher:Mary Area:Database Area:AI Area:SW http://www.kobrix.com/index.jsp
  • 54. HyperGraphDB(2)  Edgesare represented as ordered sets (directed graphs with hyperedges).  Nodesand edges are unified into something called atom, which is a typed tuple.  The elements of a tuple are known as the Target Set.  The size of a Target Set is the arity of a tuple.  If a tuple x has 0 arity, it’s a node, otherwise it’s a link (or edge).  The set of atoms pointing to a tuple is called the incidence set of the tuple.  Every atom has a Handle (unique identifier).  Each entity has a type and value.  A type is an atom conforming to a special interface, values are data managed and stored by a type atom. 54 [Iordanov 2010] http://www.kobrix.com/index.jsp
  • 55. HyperGraphDB Storage  Data is stored in the form of key-value pairs.  Berkeley DB is used for storage of key-value structures.  Provides three access methods for data:  Hashes:Linear hash.  B-Trees:Data stored in leaves of a balanced tree.  Recno:Assigns a logical record identifier to each pair (indexed for direct access).  Based on a key-value layer with duplicate support.  Handles are custom generated UUIDs, ints, longs, etc.  Several key-value stores: some core, some are user defined indices. 55http://www.kobrix.com/index.jsp
  • 56. HyperGraphDB Operations And Indexing 56 Predefined Indexers Description ByPartIndexer Indexes atoms with compound types along a given dimension. ByTargetIndexer Indexes atoms by a specific target (a position in the target tuple). CompositeIndexer Combines any two indexers into a single one. This allows to essentially precompute and maintain an arbitrary join. LinkIndexer Indexes atoms by their full target tuple. TargetToTargetIndexer Given a link, index one of its target by another. Essential graph queries:  Node/Edge adjacency.  Pattern matching.  Graph traversal. Indexing by:  Properties.  Targets.  User defined. Predefined indexers:  ByPartIndexer.  ByTargetIndexer.  CompositeIndexer.  LinkIndexer.  TargetToTargetIndexer. http://www.kobrix.com/index.jsp
  • 57. Wrote Cites Wrote Paper1 Paper2 Paper3 Peter Smith 57 Conference ESWC Paper4 Conference Cites Paper5 ISWC Conference Conference Cites Conference Juan Perez Wrote John Smith Wrote Wrote Wrote Cites Wrote Example
  • 58. HyperGraphDB API (1) Adding nodes to the graph HyperGraph graph = new HyperGraph(databaseLocation); author author1 = new AUTHOR(‛Peter Smith‛); author author2 = new AUTHOR(‛Juan Perez‛); author author3 = new AUTHOR(‛John Smith‛); PAPER paper1 = new PAPER(‚Paper1‛); PAPER paper2 = new PAPER(‚Paper2‛); PAPER paper3 = new PAPER(‚Paper3‛); PAPER paper4 = new PAPER(‚Paper4‛); PAPER paper5 = new PAPER(‚Paper5‛); CONFERENCE conference1 = new CONFERENCE(‚ESWC‛); CONFERENCE conference2 = new CONFERENCE(‚ISWC‛); HGHandle authorHandle1 = graph.add(paper1); HGHandle authorHandle2 = graph.add(paper2); … HGHandle conferenceHandle2 = graph.add(conference2); … graph.close(); 58
  • 59. HyperGraphDB API (2) Adding edges to the graph 59 HyperGraph graph = new HyperGraph(databaseLocation); author author1 = new AUTHOR(‛Peter Smith‛); author author2 = new AUTHOR(‛Juan Perez‛); author author3 = new AUTHOR(‛John Smith‛); PAPER paper1 = new PAPER(‚Paper1‛); HGHandle authorHandle1 = graph.add(paper1); HGHandleyearHandle = graph.add(2013); HGValueLink link1 = new HGValueLink(‛publication year‛, authorHandle1, yearHandle); HGHandle linkHandle1 = graph.add(link1); graph.close();
  • 60. HyperGraphDB API (3) Searching node properties 60 HGQueryCondition condition= new And( new AtomTypeCondition(AUTHOR.class), new AtomPartCondition(newString[]{‛name"}, ‛Peter Smith", ComparisonOperator.EQ)); HGSearchResult<HGHandle>rs = graph.find(condition); while (rs.hasNext()) { HGHandle current = rs.next(); AUTHOR author = graph.get(current); System.out.println(AUTHOR.getBithDate()); }
  • 61. HyperGraphDB API (4) 61 List<author> authors = hg.getAll(hg.and(hg.type(AUTHOR.class), hg.eq("author", ‛Peter Smith"))); for (author p : authors) System.out.println(author.getBirthDate()); } Searching node properties
  • 62. HyperGraphDB API (5) Searching node properties 62 DefaultALGeneratoralgen = new DefaultALGenerator(graph, hg.type(CitedBy.class), hg.and(hg.type(PAPER.class), hg.eq(‛track‛,‛Semantic Data Management‛), true, false, false); HGTraversal traversal = new HGBreadthFirstTraversal( startingPaper, algen); Paper currentArticle = startingPaper; while (traversal.hasNext()) { Pair<HGHandle, HGHandle> next = traversal.next(); PAPER nextPaper = graph.get(next.getSecond()); System.out.println(‛Paper " + current + " quotes " + nextPaper); currentPaper = nextPaper; } HGBreadthFirstTraversalmeth od to traverse a graph returns a startAtom and adjListGenerator
  • 63. HyperGraphDB API (7) Searching node properties 63 author author1 = newAUTHOR(‛Peter Smith‛); HGDepthFirstTraversal traversal = new HGDepthFirstTraversal( author1, new SimpleALGenerator(graph)); while (traversal.hasNext()) { Pair<HGHandle, HGHandle> current = traversal.next(); HGLink l = (HGLink)graph.get(current.getFirst()); Object atom = graph.get(current.getSecond()); System.out.println("Visiting atom " + atom + ‚ pointed to by " + l); } SimpleAlGeneratormethod to produce all atoms link to a given atom
  • 64. HyperGraphDB API Summary 64 Class/Operation Description HGEnvironment Creates a hypergraph from a file Add Adds a new object to the hypergraph Update Updates an object in the hypergraph Remove Removes an object from the hypergraph HGHandle A reference to a hypergraph atom HGPersistentHandle a HGHandle that survives system downtime HGValueLink Adds a newhyperedge get Returns the object referred by a given HGHandle HGQueryCondition Returns the nodes or edges of a given type which have an attribute that evaluates true to a comparison operator over a value HGQuery.hg.getAll Returns all the objects that meet a condition
  • 65. Neo4J  Labeled attribute multigraph.  Nodes and edges can have properties.  There are norestrictions on the number of edges between two nodes.  Loops are allowed.  Attributes are single valued.  Different types of indexes: Nodes & relationships.  Different types of traversal strategies.  API for Java, Python. 65 [Robinson et al. 2013] http://neo4j.org/
  • 66. Neo4J Architecture 66 [Robinson et al. 2013] http://neo4j.org/
  • 67. Neo4J Logical View 67 [Robinson et al. 2013] http://neo4j.org/
  • 68. Wrote Cites Wrote Paper1 Paper2 Paper3 Peter Smith 68 Conference ESWC Paper4 Conference Cites Paper5 ISWC Conference Conference Cites Conference Juan Perez Wrote John Smith Wrote Wrote Wrote Cites Wrote Example
  • 69. Neo4J API (1) Creating a graph and adding nodes 69 db = new GraphDatabaseFactory().newEmbeddedDatabase( DB_PATH ); Node author1 = db.createNode(); Node author2 = db.createNode(); Node author3 = db.createNode(); Node paper1 = db.createNode(); Node paper2 = db.createNode(); Node paper3 = db.createNode(); Node paper4 = db.createNode(); Node paper5 = db.createNode(); Node conference1 = db.createNode(); Node conference2= db.createNode();
  • 70. Neo4J API (2) Adding attributes to nodes author1.setProperty(‚firstname‛,‛Peter‛); author1.setProperty(‚lastname‛,‛Smith‛); author2.setProperty(‚firstname‛,‛Juan‛); author2.setProperty(‚lastname‛,‛Perez‛); author3.setProperty(‚firstname‛,‛John‛); author3.setProperty(‚lastname‛,‛Smith‛); conference1.setProperty(‚name‛,‛ESWC‛); conference2.setProperty(‚name‛,‛ISWC‛); 70
  • 71. Neo4J API (3) Adding relationships 71 Relationship rel1=author1.createRelationshipTo(paper1,DynamicRelationshipType.of(‛WROTE‛)); Relationship rel2=author1.createRelationshipTo(paper5,DynamicRelationshipType.of(‛WROTE‛)); Relationship rel3=author1.createRelationshipTo(paper3,DynamicRelationshipType.of(‛WROTE‛)); … Relationship rel8=paper1.createRelationshipTo(paper2,DynamicRelationshipType.of(‛CITES‛)); Relationship rel9=paper3.createRelationshipTo(paper1,DynamicRelationshipType.of(‛CITES‛)); Relationship rel10=paper3.createRelationshipTo(paper4,DynamicRelationshipType.of(‛CITES‛)); Relationship rel11=paper4.createRelationshipTo(paper2,DynamicRelationshipType.of(‛CITES‛)); Relationship rel12=paper1.createRelationshipTo(conference1,DynamicRelationshipType.of(‛Conference‛)); Relationship rel13=paper2.createRelationshipTo(conference2,DynamicRelationshipType.of (‛Conference‛)); Relationship rel14=paper3.createRelationshipTo(conference1,DynamicRelationshipType.of (‛Conference‛)); Relationship rel15=paper4.createRelationshipTo(conference1,DynamicRelationshipType.of (‛Conference‛)); Relationship rel16=paper5.createRelationshipTo(conference2,DynamicRelationshipType.of (‛Conference‛));
  • 72. Neo4J API (4) Creating indexes 72 IndexManager index = db.index(); Index<Node>authorIndex = index.forNodes( ‛authors" ); //‛authors‛ index name Index<Node>paperIndex = index.forNodes( ‛papers" ); Index<Node>conferenceIndex = index.forNodes( ‛conferences" ); RelationshipIndexwroteIndex = index.forRelationships( ‛write" ); RelationshipIndexciteIndex = index.forRelationships( ‛cite" ); RelationshipIndexconferenceRelIndex = index.forRelationships( ‛conferenceRel" );
  • 74. Neo4J: Cypher Query Language (1)  START:  Specifies one or more starting points in the graph.  Starting points can be defined as index lookups or just by an element ID.  MATCH:  Pattern matching to match based on the starting point(s)  WHERE:  Filtering criteria.  RETURN:  What is projected out from the evaluation of the query. 74
  • 75. Neo4J: Cypher Query Language (2) Query:IsPeter Smith connected to John Smith. 75 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[R*]->(m) RETURNcount(R)
  • 76. Neo4J: Cypher Query Language (2) Query: Papers written by Peter Smith. 76 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[:WROTE]->(papers) RETURN papers
  • 77. Neo4J: Cypher Query Language (3) Query: Papers cited by a paper written by Peter Smith. 77 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[:WROTE]->()-[:CITES](papers) RETURN papers
  • 78. Neo4J: Cypher Query Language (3) Query: Papers cited by a paper written by Peter Smith that have at most 20 cites. 78 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[:WROTE]->()-[:CITES]->()-[:CITES]->(papers) WITH COUNT(papers) as cites WHERE cites < 21 RETURN papers
  • 79. Neo4J: Cypher Query Language (4) Query: Papers cited by a paper written by Peter Smith or cited by papers cited by a paper written by Peter Smith. 79 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers) RETURN papers
  • 80. Neo4J: Cypher Query Language (5) Query: Number of papers cited by a paper written by Peter Smith or cited by papers cited by a paper written by Peter Smith. 80 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers) RETURN count(papers)
  • 81. Neo4J: Cypher Query Language (6) Query: Number of papers cited by a paper written by Peter Smith or cited by papers cited by a paper written by Peter Smith, and have been published in ESWC. 81 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers) WHERE papers.conference! =‘ESWC’ RETURN count(papers) Exclamation mark in papers.conference ensures that papers without the conference attribute are not included in the output.
  • 82. Neo4J: Cypher Query Language (7) Query: Number of papers cited by a paper written by Peter Smith or cited by papers cited by a paper written by Peter Smith, have been published in ESWC and have at most 4 co-authors . 82 START author=node:authors( 'firstname:‛Peter" AND lastname:‛Smith"') MATCH (author)-[:WROTE]->()-[:CITES*1..2](papers)- [:Was-Written]->authorFinal WHERE papers.conference! =‘ESWC’ WITH author, count(authorFinal) as authorFinalCount WHERE authorFinalCount< 4 RETURN author, authorFinalCount Exclamation mark in papers.conference ensures that papers without the conference attribute are not included in the output.
  • 83. Neo4J API-Cypher Query: Peter Smith Neighborhood 83 static Neo4j TestGraph; String Query = "start n=node:node_auto_index(idNode='PeterSmith')" + "match n-[r]->m return m.idNode"; ExecutionResultresult = TestGraph.query(Query); booleanres = false; for ( Map<String, Object> row : result ){ for ( Map.Entry<String, Object> column : row.entrySet() ){ print(column.getValue().toString()); res = true; } } if (!res) print(" No result");
  • 84. Operation Description GraphDatabaseFactory Creates a empty graph registerShutdownHook Removes a graph createNode Creates a new empty node of a given node type AddNode Copies a node createRelationshipTo Adds a new edge of some edge type forNodes Index for a Node IndexManager Index Manager forRelationships Index for a Relationship add Add an object to an index TraversalDescription Different types of strategies to traverse a graph Traversal Branch Selector preorderDepthFirst, postorderDepthFirst, preorderBreadthFirst, postorderBreadthFirst Neo4J API Summary 84
  • 87. RDF-Based Graph Database Engines 87 Native storage Hexastore RDF3X BitMat DBMS-backed storage Hexastore* BHyper
  • 88. In the beginnings … 88 … there were dinosaurs
  • 89. In the beginnings … 89 • Traditionally: – Use Relational DBMS’ to store data – Early Approach: onelargeSPOtable (physical) • Pros: simple schema, fast updates (when no index present), fast single triple pattern queries • Cons: not suitable for multi-join queries, results in multiple self-joins
  • 90. In the beginnings … 90 • Traditionally: – Use Relational DBMS’ to store data – Further Approaches: propertytables (multiple flavors) • Pros: fast retrieval of subject / object pairs for given predicates • Cons: not easy to solve queries where predicate is unbound & harder to maintain schema (a table for each known predicate)
  • 91. RDF-tailored management approaches 91 • Same conceptual approach: onelargeSPOtable (but with RDF-specific physical organization): – Triples are indexed in all 3!=6 possible permutations: SPO, PSO, OSP, OPS, PSO, POS – Logical extension of the vertical partitioning proposed in [Abadi et al 2008] (the PSO index = generalization of the SO table) – Each index (i.e. SPO) stores partial triple pattern cardinalities – Resulting in 6 tree-based indexes, each 3 levels deep • First proposed in Hexastore [Weiss et al. 2008]
  • 92. Hexastore / Conceptual Model 92 • 2 main operations: (input = tp: partially bounded triple pattern, i.e. <bound_subject><some predicate> ?o) • cardinality = getCardinality(tp) • term_set = getSet(tp, set) • Resulting term_set is sorted
  • 93. Hexastore / Conceptual Model 93 • Pros: • The optimal index permutation & level is used (i.e. <bound_s> ?p ?o : SPO level 1) • Ability to make use of fast merge-joins • Cons: • Higher upkeep cost: a theoretical upper bound of 5x space increase over original (encoded) data
  • 94. 94 Wrote Cite s Wrote Paper1 Paper2 Paper3 Peter Smith Conference ESWC Paper4 Conference Cite s Paper5 ISWC Conference Conference Cite s Conference Juan Perez Wrote John Smith Wrote Wrote Wrote Cite s Wrote An Example (Publications): logical representation
  • 95. RDF Subgraph for the Publication Graph 95 conference/eswc/2 012/paper/03 conference/iswc/20 12/paper/05 person/peter-smith ontoware/ ontology#author 2012 ontoware/ ontology#year conference/iswc/ semanticweb/ ontology#isPartOf ISWC semanticweb/ ontology#hasAcronym semanticweb/ontology/C onferenceEvent w3c/ rdf-syntax#type ontoware/ ontology#author conference/eswc/ semanticweb/ ontology#isPartOf 2012 ontoware/ ontology#year w3c/ rdf-syntax#type ontoware/ ontology#biblioReference ESWC semanticweb/ ontology#hasAcronym conference/iswc/20 12/paper/01
  • 96. An Example (Publications): first step, encode strings 96 http://data.semanticweb.org/person/john-smith 20 http://data.semanticweb.org/person/juan-perez 19 18http://data.semanticweb.org/person/peter-smith http://data.semanticweb.org/ns/swc/ontology#hasAcronym 17 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 16 15"2012" 14http://swrc.ontoware.org/ontology#biblioReference 13http://swrc.ontoware.org/ontology#year 12http://swrc.ontoware.org/ontology#author 11http://data.semanticweb.org/ns/swc/ontology#isPartOf http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent 10 http://data.semanticweb.org/conference/eswc/2012 9 http://data.semanticweb.org/conference/iswc/2012 8 7"ISWC2012" "ESWC2012" 6 5http://data.semanticweb.org/conference/iswc/2012/paper/05 http://data.semanticweb.org/conference/eswc/2012/paper/04 4 3http://data.semanticweb.org/conference/eswc/2012/paper/03 http://data.semanticweb.org/conference/iswc/2012/paper/02 2 1http://data.semanticweb.org/conference/eswc/2012/paper/01 HexastoreMappingDictionary
  • 97. An Example (Publications): first step, encode strings 97 http://data.semanticweb.org/person/john-smith 20 http://data.semanticweb.org/person/juan-perez 19 18http://data.semanticweb.org/person/peter-smith http://data.semanticweb.org/ns/swc/ontology#hasAcronym 17 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 16 15"2012" 14http://swrc.ontoware.org/ontology#biblioReference 13http://swrc.ontoware.org/ontology#year 12http://swrc.ontoware.org/ontology#author 11http://data.semanticweb.org/ns/swc/ontology#isPartOf http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent 10 http://data.semanticweb.org/conference/eswc/2012 9 http://data.semanticweb.org/conference/iswc/2012 8 7"ISWC2012" "ESWC2012" 6 5http://data.semanticweb.org/conference/iswc/2012/paper/05 http://data.semanticweb.org/conference/eswc/2012/paper/04 4 3http://data.semanticweb.org/conference/eswc/2012/paper/03 http://data.semanticweb.org/conference/iswc/2012/paper/02 2 1http://data.semanticweb.org/conference/eswc/2012/paper/01 HexastoreMappingDictionary • Next, index original encoded RDF file …
  • 98. Hexastore / Physical implementation. Native storage 98 2 2 4 3 3 4 4 9 8 1 5 2 4 3 S 11 12 1 1 13 1 14 1 9 18 5 2 1 13 1 12 11 1 8 20 15 11 12 1 2 13 1 14 2 9 1918 15 41 11 12 1 2 13 1 14 1 9 2019 15 2 12 1 111 13 1 8 18 15 17 1 116 10 7 17 1 1 16 10 6 P O • Uses vector-storage[Weiss et al 2009] • Each index level: – partially sorted vector (on disk, one file per level) – each record: <rdf term id, cardinality, pointer to set> – first level gets special treatment: • fully sorted vector: search = binary search, O(log(N)) • Hint: using an additional hash can circumvent the binary search
  • 99. Hexastore / Physical implementation. Native storage 99 2 2 4 3 3 4 4 9 8 1 5 2 4 3 S 11 12 1 1 13 1 14 1 9 18 5 2 1 13 1 12 11 1 8 20 15 11 12 1 2 13 1 14 2 9 1918 15 41 11 12 1 2 13 1 14 1 9 2019 15 2 12 1 111 13 1 8 18 15 17 1 116 10 7 17 1 1 16 10 6 P O • Pros: – Optimized for read-intensive / static workloads – Optimal / fast triple pattern matching – No comparisons when scanning since each (partial) cardinality is known ! • Cons: – Not suitable for write intensive workloads (worst case can result in rewrite of entire index!) – Hypothesis (not yet tested): SSD media may be adequate…
  • 100. Hexastore / Physical implementation. DBMS (managed) storage 100 • Builds on top of sorted indexes: – B+Tree, LSM Tree, etc • Each index level: – Same sorted index structure (but not necessarily) – each record: <rdf term id list, cardinality> – Search cost = as provided by the managed index structure 2 2 4 3 3 4 4 9 8 1 5 2 4 3 S 1 1 1 1 11 12 1 1 13 1 14 1 1 11 9 - 1 -1812 1 13 5 - 1 14 2 - 2 2 2 1 13 1 12 11 1 2 11 8 - 2 12 20 - 2 13 15 - 3 3 3 3 11 12 1 2 13 1 14 2 3 11 9 - 12 -3 18 3 13 15 - 3 -14 1 4 4 4 4 11 12 1 2 13 1 14 1 4 11 9 - 4 -12 20 4 13 15 - 4 14 2 - 5 5 5 12 1 111 13 1 5 11 8 - 5 12 18 - 5 13 15 - 8 8 17 1 116 8 16 10 - 8 17 7 - 9 9 17 1 1 16 9 16 10 - 9 17 6 - SP SPO -3 1912 43 -14 12 19 -4
  • 101. Hexastore / Physical implementation. DBMS (managed) storage 101 • Pros: – Flexible design: each index can be configured for a different physical data-structure: • i.e. PSO=B+Tree, OSP=LSM Tree – Separation of concerns (the underlying DBMS manages and optimizes for access patterns, cache, etc…) • Cons: – Less control over fine-grained data storage (up to the level permitted by the DBMS API) – Usually less potential for native optimization 2 2 4 3 3 4 4 9 8 1 5 2 4 3 S 1 1 1 1 11 12 1 1 13 1 14 1 1 11 9 - 1 -1812 1 13 5 - 1 14 2 - 2 2 2 1 13 1 12 11 1 2 11 8 - 2 12 20 - 2 13 15 - 3 3 3 3 11 12 1 2 13 1 14 2 3 11 9 - 12 -3 18 3 13 15 - 3 -14 1 4 4 4 4 11 12 1 2 13 1 14 1 4 11 9 - 4 -12 20 4 13 15 - 4 14 2 - 5 5 5 12 1 111 13 1 5 11 8 - 5 12 18 - 5 13 15 - 8 8 17 1 116 8 16 10 - 8 17 7 - 9 9 17 1 1 16 9 16 10 - 9 17 6 - SP SPO -3 1912 43 -14 12 19 -4
  • 102. RDF3x (1)  RISC style operators.  Implements the traditionalOptimize-then-Execute paradigm, to identify left-linear plans of star-shaped sub- queries of basic triple patterns.  Makes use of secondary memory structures to locally store RDF data and reduce the overhead of I/O operations.  Registers aggregated count or number of occurrences, and is able to detect if a query will return an empty answer. 102 [Neumann and Weikum 2008]
  • 103. RDF3x (2)  Triple table, each row represents a triple.  Mapping dictionary, replacing all the literal strings by an id; two dictionary indices.  Compressed clustered B+-tree to index all triples:  Six permutations of subject, predicate and object; each permutation in a different index.  Triples are sorted lexicographically in each B+-tree. 103 [Neumann and Weikum 2008]
  • 105. 105 RDF3x(4) [Figure from Neumann et al 2010]  Compressed B+-trees.  One B+-tree per pattern.  Structure is updatable.
  • 106. 106 RDF3x (5) [Figure from Lei Zouhttp://hotdb.info/slides/RDF-talk.pdf]
  • 108. Wrote Cites Wrote Paper1 Paper2 Paper3 Peter Smith 108 Conference ESWC Paper4 Conference Cites Paper5 ISWC Conference Conference Cites Conference Juan Perez Wrote John Smith Wrote Wrote Wrote Cites Wrote Example
  • 109. RDF Subgraph for the Publication Graph 109 conference/eswc/2 012/paper/03 conference/iswc/20 12/paper/05 person/peter-smith ontoware/ ontology#author 2012 ontoware/ ontology#year conference/iswc/ semanticweb/ ontology#isPartOf ISWC semanticweb/ ontology#hasAcronym semanticweb/ontology/C onferenceEvent w3c/ rdf-syntax#type ontoware/ ontology#author conference/eswc/ semanticweb/ ontology#isPartOf 2012 ontoware/ ontology#year w3c/ rdf-syntax#type ontoware/ ontology#biblioReference ESWC semanticweb/ ontology#hasAcronym conference/iswc/20 12/paper/01
  • 110. RDF3x: Representation of the Publication Graph 110 OID Resource 0 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 1 http://data.semanticweb.org/ns/swc/ontology#hasAcronym 2 http://data.semanticweb.org/ns/swc/ontology#isPartOf 3 http://swrc.ontoware.org/ontology#author 4 http://swrc.ontoware.org/ontology#year 5 http://swrc.ontoware.org/ontology#biblioReference 6 http://data.semanticweb.org/conference/iswc/2012 7 http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent 8 ISWC2012 9 http://data.semanticweb.org/conference/eswc/2012 10 ESWC2012 11 http://data.semanticweb.org/conference/iswc/2012/paper/05 12 http://data.semanticweb.org/conference/iswc/2012/paper/02 13 http://data.semanticweb.org/conference/eswc/2012/paper/01 14 http://data.semanticweb.org/conference/eswc/2012/paper/03 15 http://data.semanticweb.org/conference/eswc/2012/paper/04 16 http://data.semanticweb.org/author/peter-smith 17 http://data.semanticweb.org/author/juan-perez 18 http://data.semanticweb.org/author/john-smith 19 2012 S P O 6 0 7 9 0 7 6 0 7 9 1 8 11 1 10 12 2 6 13 2 6 14 2 9 15 2 9 11 3 16 12 3 18 13 3 16 14 3 16 14 3 17 15 3 17 15 3 18 11 4 19 12 4 19 13 4 19 14 4 19 15 4 19 13 5 12 14 5 13 14 5 15 15 5 12 RDF3xMappingDictionary TripleTable
  • 111. BitMat(1)  3D bit-cube representation where each dimension represents subjects, predicates and objects.  Each cell of the matrix is 0 or 1, and represents an RDF triple that can be formed by the combination of S, P, O.  3D bit-cube is slicedalong a dimension to obtain 2D matrices.  A gap compression schema on each bit-row is implemented:  Basic operations over BitMat are defined to join basic triple patterns. 111 [Atre et al. 2010]
  • 112. 112 Triple Table 3D-cube BitMat S-O and O-S BitMats for Ps BitMat(2)
  • 114. Wrote Cites Wrote Paper1 Paper2 Paper3 Peter Smith 114 Conference ESWC Paper4 Conference Cites Paper5 ISWC Conference Conference Cites Conference Juan Perez Wrote John Smith Wrote Wrote Wrote Cites Wrote Example
  • 115. RDF Subgraph for the Publication Graph 115 conference/eswc/2 012/paper/03 conference/iswc/20 12/paper/05 person/peter-smith ontoware/ ontology#author 2012 ontoware/ ontology#year conference/iswc/ semanticweb/ ontology#isPartOf ISWC semanticweb/ ontology#hasAcronym semanticweb/ontology/C onferenceEvent w3c/ rdf-syntax#type ontoware/ ontology#author conference/eswc/ semanticweb/ ontology#isPartOf 2012 ontoware/ ontology#year w3c/ rdf-syntax#type ontoware/ ontology#biblioReference ESWC semanticweb/ ontology#hasAcronym conference/iswc/20 12/paper/01
  • 116. BitMat: Representation of the Publication Graph (1) 116 OID Resource 1 http://data.semanticweb.org/conference/eswc/2012 2 http://data.semanticweb.org/conference/eswc/2012/paper/01 3 http://data.semanticweb.org/conference/eswc/2012/paper/04 4 http://data.semanticweb.org/conference/iswc/2012 5 http://data.semanticweb.org/conference/iswc/2012/paper/02 6 http://data.semanticweb.org/conference/iswc/2012/paper/03 7 http://data.semanticweb.org/conference/eswc/2012/paper/05 S P O 4 6 8 4 1 1 2 1 6 8 1 1 7 7 2 4 5 2 4 2 2 1 6 2 1 3 2 1 2 3 1 1 6 3 1 1 6 3 1 0 3 3 1 0 3 3 9 7 3 1 1 5 3 9 2 5 6 5 5 6 6 5 6 3 5 6 7 5 6 sID Table 3DCubeBitMat
  • 117. BitMat: Representation of the Publication Graph (2) 117 S P O 4 6 8 4 1 1 2 1 6 8 1 1 7 7 2 4 5 2 4 2 2 1 6 2 1 3 2 1 2 3 1 1 6 3 1 1 6 3 1 0 3 3 1 0 3 3 9 7 3 1 1 5 3 9 2 5 6 5 5 6 6 5 6 3 5 6 7 5 6 pID Table 3DCubeBitMat OID Resource 1 http://data.semanticweb.org/ns/swc/ontology#hasAcronym 2 http://data.semanticweb.org/ns/swc/ontology#isPartOf 3 http://swrc.ontoware.org/ontology#author 4 http://swrc.ontoware.org/ontology#biblioReference 5 http://swrc.ontoware.org/ontology#year 6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type
  • 118. BitMat: Representation of the Publication Graph (3) 118 S P O 4 6 8 4 1 1 2 1 6 8 1 1 7 7 2 4 5 2 4 2 2 1 6 2 1 3 2 1 2 3 1 1 6 3 1 1 6 3 1 0 3 3 1 0 3 3 9 7 3 1 1 5 3 9 2 5 6 5 5 6 6 5 6 3 5 6 7 5 6 oID Table 3DCubeBitMat OID Resource 1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type 2 http://data.semanticweb.org/ns/swc/ontology#hasAcronym 3 http://data.semanticweb.org/ns/swc/ontology#isPartOf 4 http://swrc.ontoware.org/ontology#author 5 http://swrc.ontoware.org/ontology#year 6 http://swrc.ontoware.org/ontology#biblioReference 7 http://data.semanticweb.org/conference/iswc/2012 8 http://data.semanticweb.org/ns/swc/ontology#ConferenceEvent 9 ISWC2012 10 http://data.semanticweb.org/conference/eswc/2012 11 ESWC2012 12 http://data.semanticweb.org/conference/iswc/2012/paper/05
  • 119. BHyper(1)  A hypergraph-basedstructure is used to represent RDF documents.  Mapping dictionary, replacing all the literal strings by an id, two dictionary indices, indexed with a Hash Index.  Encodings of RDF resources are implemented as nodes, and set of triple patterns corresponds to hyperedges.  A role function is used to denote the role played by a resource in a hyperedge. Roles are represented as labels in the hyperedges.  Indicesare used to index the hyperdges and the different roles played by the resources in the hyperedges:  B+-tree are used to index hyperedges.  BHyper is built on top of Tokyo Cabinet1.4.451. 119 1 http://sourceforge.net/projects/tokyocabinet/ [Vidal and Martínez 2011]
  • 120. BHyper(2) 120 [Vidal and Martínez 2011] A hyper-edge can relate multiple edges.
  • 123. Experimental Set-Up The experiments presented in this section were executed on a machine with the following characteristics:  Model: Sun Fire x4100  Quantity of processors: 2  Processor specification: Dual-Core AMD Opteron™ Processor 2218 64 bits  RAM memory: 16GB  Disk capacity: 2 disks, 135GB e/o  Network interface: 2 + ALOM  Operating System:CentOS 5.5. 64 bits  RDF3x version: 0.3.7  DEX version: 4.8 Very Large Databases  HyperGraphDB version: 1.2  Neo4J version: 1.8.2  Timeout: 3,600 secs. (Value returned in the portal -2) 123
  • 124. 124 Graph Name #Nodes #Edges Description DSJC1000.1 1,000 49,629 Graph Density:0.1 [Johnson91] DSJC1000.5 1,000 249,826 Graph Density:0.5 [Johnson91] DSJC1000.9 1,000 449,449 Graph Density:0.9 [Johnson91] SSCA2-17 131,072 3,907,909 GTgraph 1) [Johnson91] Johnson, D., Aragon, C., McGeoch, L., and Schevon, C. Optimization by simulated annealing: an experimental evaluation; part ii, graph coloring and number partitioning. Operations research 39, 3 (1991), 378–406. GTgraph: A suite of three synthetic graph generators: 1) SSCA2: generates graphs used as input instances for the DARPA. High Productivity Computing Systems SSCA#2 graph theory benchmark http://www.cse.psu.edu/~madduri/software/GTgraph/index.html Benchmark Graphs
  • 125. 125 Task Description adjacentXP Given a node X and an edge label P, find adjacentnodes Y. adjacentX Given a node X,find adjacentnodes Y. edgeBetween Given two nodes X and Y, find the labels of the edges between X and Y. 2-hopX Given a node X, find the 2-hop Y nodes. 3-hopX Given a node X, find the 3-hop Y nodes. 4-hopX Given a node X, find the 4-hop Y nodes. Benchmark Tasks The benchmark tasks evaluate the engine performance while executing the following graph operations:
  • 126. 126 Task SPARQL Query adjacentXP Select ?y where { <http://graphdatabase.ldc.usb.ve/resource/6><http://graphdatabase.ldc.usb.ve/resource/pr> ?y} adjacentX Select ?y where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p ?y} edgeBetween Select ?p where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p<http://graphdatabase.ldc.usb.ve/resource/8> 2-hopX Select ?z where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1. ?y1 ?p2 ?z} 3-hopX Select ?z where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1. ?y1 ?p2 ?y2. ?y2 ?p3 ?z} 4-hopX Select ?z where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1. ?y1 ?p2 ?y2. ?y2 ?p3 ?y3. ?y3 ?p4 ?z} Benchmark Tasks: RDF Engines The benchmark tasks were implemented in the RDF engines with the following SPARQL queries:
  • 127. Graph Mining Tasks These tasks evaluate the engine performance while executing the following graph mining:  Graph summarization.  Dense subgraph.  Shortest path. 127
  • 128. 4 5 76 1 2 3 8 9 0 4 5 6 7 1 2 3 Add(8,1) Add(9,3) Add(0.3) Del(4,1) 8 9 0 Corrections Cost = 1(superedge) + 4(corrections)= 5 Graph Summarization [Navlakha et al. 2008] Original Graph Summarized Graph 128
  • 130. A B C D E F G H 1 2 1 1 1 3 3 2 3 1 1 1 2 Given a subset of nodes E’ compute the densest subgraph containing them, i.e., density of E’ is maximized: density(E') = weight(a,b) | E'|(a,b) in E' å [Khuller et al. 2010] Dense Subgraph 130
  • 131. Shortest Path • Given two nodes finds the shortest path between them. • Implemented using Iterative Depth First Search (DFID). • Ten node pairs are taken at random from each graph, and then the Shortest Path algorithm is executed on each pair of nodes 131
  • 133. 133 Loading Time: RDF Engines Graph Name #Nodes #Edges RDF3x Loading Time Hexastore Loading Time DSJC1000.1 1,000 49,629 2.036s 6.174s DSJC1000.5 1,000 249,826 5.594s 30.809s DSJC1000.9 1,000 449,449 8.992s 30.809s SSCA2-17 131,072 3,907,909 44.775s 2m43.814s
  • 134. 134 Task Description adjacentXP Given a node X and an edge label P, find adjacentnodes Y. adjacentX Given a node X,find adjacentnodes Y. edgeBetween Given two nodes X and Y, find the labels of the edges between X and Y. 2-hopX Given a node X, find the 2-hop Y nodes. 3-hopX Given a node X, find the 3-hop Y nodes. 4-hopX Given a node X, find the 4-hop Y nodes. Benchmark Tasks The benchmark tasks evaluate the engine performance while executing the following graph operations:
  • 135. 135 Task SPARQL Query adjacentXP Select ?y where { <http://graphdatabase.ldc.usb.ve/resource/6><http://graphdatabase.ldc.usb.ve/resource/pr> ?y} adjacentX Select ?y where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p ?y} edgeBetween Select ?p where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p<http://graphdatabase.ldc.usb.ve/resource/8> 2-hopX Select ?z where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1. ?y1 ?p2 ?z} 3-hopX Select ?z where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1. ?y1 ?p2 ?y2. ?y2 ?p3 ?z} 4-hopX Select ?z where { <http://graphdatabase.ldc.usb.ve/resource/6> ?p1 ?y1. ?y1 ?p2 ?y2. ?y2 ?p3 ?y3. ?y3 ?p4 ?z} Benchmark Tasks: RDF Engines The benchmark tasks were implemented in the RDF engines with the following SPARQL queries:
  • 136. 136 Benchmark Tasks: Hexastore ! "#$ #$ #! $ #! ! $ #! ! ! $ #! ! ! ! $ %&'%(%)*%%+$,&-,.%+)/$,&-,.%+)/0$ 12345/$ 62345/$ 72345/$ !"#$%&'()*+,#)-.#$./0)1'2)3$45#) 6#( $7, 489)*4.9.) : #"4.;' 8#) 89:; #! ! ! "#$ 89:; #! ! ! "<$ 89:; #! ! ! "=$ 99; >12#?$ K-hop seems to grow exponentially
  • 137. 137 Benchmark Tasks: RDF3x ! "! ! #$ ! "! #$ ! "#$ #$ #! $ #! ! $ #! ! ! $ #! ! ! ! $ %&'%(%)* %%+$,&-,.%+)/$,&-,.%+)/0$ 12345/$ 62345/$ 72345/$ !"#$%&'()*+,#)-.#$./0)1'2)3$45#) 6#( $7, 489)*4.9.) : ; <=")) 89:; #! ! ! "#$ 89:; #! ! ! "<$ 89:; #! ! ! "=$ 99; >12#?$ K-hop seems to grow exponentially
  • 138. 138 Loading Time: Graph Database Engines Graph Name #Nodes #Edges DEX Loading Time Neo4J Loading Time Hyper GraphDB Loading Time DSJC1000.1 1,000 49,629 10.031s 19.884s 159.665s DSJC1000.5 1,000 249,826 41.677s 113.174s 812.135s DSJC1000.9 1,000 449,449 76.389s 321.299s 1,557.574s SSCA2-17 131,072 3,907,909 455.293s 359.082s 7,148.301s
  • 139. 139 Benchmark Tasks: Neo4J ! "#$ #$ #! $ #! ! $ #! ! ! $ #! ! ! ! $ %&'%(%)*%%+$,&-,.%+)/$,&-,.%+)/0$ 12345/$ 62345/$ 72345/$ !"#$%&'()*+,#)-.#$./)0'1)2$34#) 5#( $6, 378)*3.8) 9 #' : ;) 89:; #! ! ! "#$ 89:; #! ! ! "<$ 89:; #! ! ! "=$ 99; >12#?$ K-hop seems to grow exponentially
  • 140. 140 Benchmark Tasks: DEX K-hop seems to grow exponentially
  • 141. 141 Benchmark Tasks: HyperGraphDB K-hop seems to grow exponentially
  • 142. Graph Mining Tasks These tasks evaluate the engine performance while executing the following graph mining:  Graph summarization.  Dense subgraph.  Shortest path. 142
  • 143. 4 5 76 1 2 3 8 9 0 4 5 6 7 1 2 3 Add(8,1) Add(9,3) Add(0.3) Del(4,1) 8 9 0 Corrections Cost = 1(superedge) + 4(corrections)= 5 Graph Summarization [Navlakha et al. 2008] Original Graph Summarized Graph 143
  • 145. A B C D E F G H 1 2 1 1 1 3 3 2 3 1 1 1 2 Given a subset of nodes E’ compute the densest subgraph containing them, i.e., density of E’ is maximized: density(E') = weight(a,b) | E'|(a,b) in E' å [Khuller et al. 2010] Dense Subgraph 145
  • 146. Shortest Path • Given two nodes finds the shortest path between them. • Implemented using Iterative Depth First Search (DFID). • Ten node pairs are taken at random from each graph, and then the Shortest Path algorithm is executed on each pair of nodes 146
  • 148. Graph Mining Tasks: Neo4J 148 ! " ! #" ! ##" ! ###" ! ####" $%&'!###(!" $%&'!###()" $%&'!###(*" %%'+,-!." !"#$%&'()*+,#)-.#$./0)1'2)3$45#) 6#( $7, 489)*4.9.) : #' ; <) / 0123"%45 5 1067189: " $; : <; "%4=/ 0123" %390>; <>"?1>3"
  • 149. Graph Mining Tasks: HyperGraphDB 149 ! " ! #" ! ##" ! ###" ! ####" $%&'!###(!" $%&'!###()" $%&'!###(*" %%'+,-!." !"#$%&'()*+,#)-.#$./0)1'2)3$45#) 6#( $7, 489)*4.9.) : ; <#8=84<7>6) / 0123"%45 5 1067189: " $; : <; "%4=/ 0123" %390>; <>"?1>3"
  • 151. Graph Data Storing Features 151 Graph Database Main Memory Persistent Memory Backend Storage Indexing DEX X X X HyperGraphDB1 X X X X Neo4j X X X Hexastore2 X X X X RDF3x X X Bitmat X X Bypher2 X X X X 2Backend Storage: Tokyo Cabinet 1Backend Storage: Berkeley DB
  • 152. Graph Data Management Features 152 Graph Database Graph API Query Language DataVi suali- zationData Definition Data Manipulation Standard Compliance DEX X X X X HyperGraphDB X X X Neo4j X X X X X Hexastore X X X X RDF3x X BitMat X X X Bypher X X X
  • 153. Adjacency Queries Node/Edge Adjacency K-neighborhood of a node. Reachability Queries Test whenever two nodes are connected by a path. Shortest Path. Pattern Matching Queries Find all sub-graphs of a data graph. Summarization Queries Aggregate the results of a graph-based query 153 Graph-Based Operations
  • 154. 154 Graph Database Node/Edge adjacency K- neighborhood Fixed-length paths RegularSimple Paths ShortestPath Pattern Matching Summarization Queries DEX X X X X X X HyperGraph X X X Neo4j X X X X X X Hexastore X X RDF3x X X BitMat X X Bypher X X Graph-Based Operations
  • 156. Data Storing Features:  Engines implement a wide variety of structures to efficiently represent large graphs.  RDF engines implement special-purpose structures to efficiently store RDF graphs. 156 Conclusions (1)
  • 157. Data Management Features:  General-purpose graph database engines provide APIs to manage and query data.  RDF engines support general data management operations. 157 Conclusions (2)
  • 158. Graph-Based Operations Support:  General-purpose graph database engines provide API methods to implement a wide variety of graph-based operations.  Only few offer query languages to express pattern matching.  RDF engines are not customized for basic graph-based operations, however,  They efficiently implement the pattern matching- based language SPARQL. 158 Conclusions (3)
  • 159.  Extensionof existing enginesto support specific tasks of graph traversal and mining.  Definition of benchmarksto study the performance and quality of existing graph database engines, e.g.,  Graph traversal, link prediction, pattern discovery, graph mining.  Empirical evaluation of the performance and quality of existing graph database engines 159 Future Directions
  • 160. P. Anderson, A. Thor, J. Benik, L. Raschid, and M.-E. Vidal. Pang: finding patterns in anno- tation graphs. In SIGMOD Conference, pages 677–680, 2012. R. Angles and C. Gutiérrez. Survey of graph database models. ACM Comput. Surv., 40(1), 2008. M. Atre, V. Chaoji, M. Zaki, and J. Hendler. Matrix Bit loaded: a scalable lightweight join query processor for RDF data. In International Conference on World Wide Web, Raleigh, USA, 2010. ACM. B.Iordanov.Hypergraphdb:Ageneralizedgraphdatabase.InWAIMWorkshops,pages25–36, 2010. N. Martíınez-Bazan, V. Muntés-Mulero, S. G ómez-Villamor, J. Nin, M.-A. nchez- Martíınez, and J.-L. Larriba-Pey. Dex: high-performance exploration on large graphs for information retrieval. In CIKM, pages 573–582, 2007. T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. VLDB, 1(1), 2008. 160 References
  • 161. M. K. Nguyen, C. Basca, and A. Bernstein. B+hash tree: optimizing query execution times for on-disk semantic web data structures. In Proceedings Of The 6th International Workshop On Scalable Semantic Web Knowledge Base Systems (SSWS2010), 2010. I Robinson, J. Webber, and E. Eifrem. Graph Databases. O’Reilly Media, Incorporated, 2013. S. Sakr and E. Pardede, editors. Graph Data Management: Techniques and Applications. IGI Global, 2011. D.ShimpiandS.Chaudhari. An overview of graph databases .In IJCA Proceedings on International Conference on Recent Trends in Information Technology and Computer Science 2012, 2013. M.-E. Vidal, A. Martínez, E. Ruckhaus, T. Lampo, and J. Sierra. On the efficiency of querying and storing rdf documents. In Graph Data Management, pages 354–385. 2011. C. Weiss, P. Karras, and A. Bernstein. Hexastore: Sextuple indexing for semantic web data management. Proc. of the 34th Intl Conf. on Very Large Data Bases (VLDB), 2008. 161 References
  • 162. C Weiss, A Bernstein, On-disk storage techniques for Semantic Web data - Are B-Trees always the optimal solution?, Proceedings of the 5th International Workshop on Scalable Semantic Web Knowledge Base Systems, October 2009. 162 References