Graph databases are a type of database that uses graph structures with nodes, edges and properties to represent and store information. They are distinct from specialized graph databases like triple stores and network databases. Some key graph database vendors include Neo4j, InfiniteGraph and OrientDB. Graph databases are well suited for applications that involve relationships, like recommendations, social networks, knowledge graphs and location-based services.
1. Graph databases, the Web of Data
storage engines
Pere Urbón Bayes
Senior Software Engineer
Independent
purbon@purbon.com
purbon.com
in/purbon
February of 2010
@purbon
2. Graph databases, the Web of Data
storage engines
● We are going to talk about
– Graph databases, facts and definitions.
– Graph database vendors.
– Use cases and applications, graph theory.
The web of data storage engines - DataDevRoom - Fosdem 2011 2
3. Graph databases, the Web of Data
storage engines
“A graph database is a database that uses graph
structures with nodes, edges, and properties to
represent and store information.
General graph databases that can store any
graph are distinct from specialized graph
databases such as triple stores and network
databases.”
Wikipedia
The web of data storage engines - DataDevRoom - Fosdem 2011 3
4. Graph Database
Property graph
● Abstractions
– Nodes
– Relationships
– Properties on both.
John smith liked http://www.example.com at 01/10/11
The web of data storage engines - DataDevRoom - Fosdem 2011 4
5. Graph databases
Facts
Connectivity
Everything
connected
RDF Ontologies
Linked Data
Tagging
Blogs Folksonomies
Social Networks
Text files
1990's 2010's 2020's Decades
The web of data storage engines - DataDevRoom - Fosdem 2011 5
6. Graph databases
Facts
Size of
1990's 2010's 2020's Decades
http://www.guardian.co.uk/business/2009/may/18/digital-content-expansion
The web of data storage engines - DataDevRoom - Fosdem 2011 6
7. Graph databases
Facts
Performance
Lists
Graph like structures
Semantic web
Semantic reasoning
Linked data
Performance slowdown
Unstructured
The web of data storage engines - DataDevRoom - Fosdem 2011 7
8. Graph databases
Performance comparison
Query RDBMS OIM GraphDB
Q1: count 20.38 17.35 0
Q2: projection 17.34 43.7 33.19
Q3: scan 32.76 174.64 3.14
Q4: values 12.28 20.77 0.01
Q5: select 7.34 5.43 0.84
Q6: hubs >3hours >3hours 624.68
RDBMS OIM GraphDB
data 27.36 GB 54 GB 9.69 GB
overhead 10.9 21.51 3.86
load 52891 s 17543 s 95579 s
The web of data storage engines - DataDevRoom - Fosdem 2011 8
9. Graph databases
Vendors
● Neo4J (neo4j.org)
● Embedded, disk-based, fully transactional
Java persistence engine that stores data
structured in graphs rather than in tables.
● Dual-Licensed AGPL and Commercial.
● High Availability, scalability, concurrent,etc.
The web of data storage engines - DataDevRoom - Fosdem 2011 9
10. Graph databases
Vendors
● InfiniteGraph
● A java distributed, scalable, with high
performance results commercial graph
database, provided with the experience of
Objectivity Inc.
● More info: http://www.infinitegraph.com/
The web of data storage engines - DataDevRoom - Fosdem 2011 10
11. Graph databases
Vendors
● OrientDB
● An embedded pure java fast, transactional,
scalable document-graph storage engine.
● Schema free, ACID, suport for SQL and JSON.
● Apache License 2.0
● More info: http://www.orientechnologies.com/
The web of data storage engines - DataDevRoom - Fosdem 2011 11
12. Graph databases
More Vendors
● Dex: The high performance graph database.
● HyperGraphDB: An IA and semantic web graph
database.
● Infogrid: The Internet graph database.
● Sones: SaaS dot Net graph database.
● AllegroGraph: The semantic graph database.
● VertexDB: High performance database server.
The web of data storage engines - DataDevRoom - Fosdem 2011 12
13. Graph Theory
analytics
● Clustering ● Task planning
(Communities) ● Scheduling
● Social connexions ● Process assignation
● Hubs ● Routing
● Graph Mining ● Logistics
● Centrality measures ● League planning
The web of data storage engines - DataDevRoom - Fosdem 2011 13
14. Graph Theory
Applications
● Pattern Recognition
● Dependency analysis
● Impact analysis
● Network flow
– Traffic analysis and optimization
– Delivery optimization
● Optimization of tasks
The web of data storage engines - DataDevRoom - Fosdem 2011 14
15. Graph Like
Applications
● Recommendations
– Heuristics (PageRank)
– Local
● Shortest Paths
● Hammock Functions
● Walks
● Search algorithms
● Shooting stars
● K-nearest neighbours
The web of data storage engines - DataDevRoom - Fosdem 2011 15
16. Graph Like
Applications
● Location based services
● Hubs
● Spatial databases
● Logical (multi-)index construction
The web of data storage engines - DataDevRoom - Fosdem 2011 16
17. Web
Trending Topics
● Semantic web
– RDF (OWL) Store
– RDF-Sail
– SPARQL
● Linked data (Open Data)
● Link analysis
● Structure mining
The web of data storage engines - DataDevRoom - Fosdem 2011 17
18. Graph databases
Performance
HPC Scalable Graph Analysis Benchmark IWGD 2010
Kernel DEX Neo4j Jena HyperGraphDB
Scale 15
Load(s) 7,44 697 141 +24h
Scan (s) 0,0010 2,71 0,689
2-Hops(s) 0,0120 0,0260 0,443
BC (s) 14,8 8,24 138
Size (MB) 30 17 207
Kernel DEX Neo4j Jena HyperGraph
Scale 20 DB
Load(s) 317 32.094 4.560 +24h
Scan (s) 0,005 751 18,6
2-Hops(s) 0,033 0,0230 0,4580
BC (s) 617 7027 59512
Size (MB) 893 539 6656
The web of data storage engines - DataDevRoom - Fosdem 2011 18
19. Graph databases
XI FOSDEM Dinner
Interested in Graph Databases and NoSQL,
attending this year FOSDEM.
Meeting point:
20:00 PM
In front of Le Roy d'Espagne
Grand Place 1
Brussels
The web of data storage engines - DataDevRoom - Fosdem 2011 19
20. Graph databases
Moviepilot is hiring
Interested in movies, data analytics, ruby, git,
opensource. Join us!.
Moviepilot is a leading provider and discovery
services for movies and TV series, based in
Berlin.
Interested, talk with @jannis or @purbon
The web of data storage engines - DataDevRoom - Fosdem 2011 20
21. Graph databases, the Web of Data
storage engines
Questions?
Pere Urbón Bayes
Senior Software Engineer
Independent
purbon@purbon.com
February of 2010
The web of data storage engines - DataDevRoom - Fosdem 2011 21