In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems.
This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.
12. name: Neptune
name: Alcmene
type: god
type: god
Vertex
Property
name: Saturn
name: Jupiter
name: Hercules
type: titan
type: god
type: demigod
name: Pluto
name: Cerberus
type: god
type: monster
Graph
13. name: Neptune
name: Alcmene
type: god
type: god
Edge
brother
mother
name: Saturn
name: Jupiter
name: Hercules
type: titan
type: god
type: demigod
father
father
Edge
battled
brother
Property
time:12
name: Pluto
name: Cerberus
type: god
type: monster
Edge
Type pet
Graph
14. name: Neptune
name: Alcmene
type: god
type: god
brother
mother
name: Saturn
name: Jupiter
name: Hercules
type: titan
type: god
type: demigod
father
father
battled
brother
time:12
name: Pluto
name: Cerberus
type: god
type: monster
pet
Path
15. name: Neptune
name: Alcmene
type: god
type: god
brother
mother
name: Saturn
name: Jupiter
name: Hercules
type: titan
type: god
type: demigod
father
father
battled
brother
time:12
name: Pluto
name: Cerberus
type: god
type: monster
pet
Degree
16. Apache 2
Aurelius Graph Cluster
TITAN FAUNUS FULGORA
Map/Reduce
Load
Bulk Load
Analysis results
back into Titan
Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs
18. Titan Features
Numerous Concurrent Users
Many Short Transactions
read/write
Real-time Traversals (OLTP)
High Availability
Dynamic Scalability
Variable Consistency Model
ACID or eventual consistency
Real-time Big Graph Data
20. $ ./titan-0.2.0/bin/gremlin.sh!
! ! !,,,/!
(o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = TitanFactory.open('/tmp/titan')!
==>titangraph[local:/tmp/titan]!
gremlin> v = g.V(‘name’,’Hercules’)!
==>v[4]!
gremlin> v.out(‘father’).out(‘brother’).name!
21. name: Neptune
name: Alcmene
type: god
type: god
brother
mother
name: Saturn
name: Jupiter
name: Hercules
type: titan
type: god
type: demigod
father
father
battled
brother
time:12
name: Pluto
name: Cerberus
type: god
type: monster
pet
gremlin> v.out(‘father’).out(‘brother’).name!
22. Vertex-Centric Indices
Sort and index edges per
vertex by primary key
Primary key can be composite
Enables efficient focused
traversals
Only retrieve edges that matter
Uses push down predicates for
quick, index-driven retrieval
23. battled
battled
battled
time: 1
time: 3
time: 5
mother
battled
v
v.query()!
time: 9
father
fought
fought
24. battled
battled
battled
time: 1
time: 3
time: 5
mother
battled
v
v.query()!
time: 9
.direction(OUT)!
father
31. Graph Indexing
Vertex and Edge indexing
Pluggable index provider
ElasticSearch
Lucene
Full-text search
Numeric range search
Geographic search
32. name: Neptune
name: Alcmene
age: 5200
age: 3300
title: God of the
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
age: 5900
title: God of the title: Divine hero
heaven and skies
father
father
battled
brother
time:12
location: (38.071,23.745)
name: Pluto
name: Cerberus
age: 4900
title: Ugly beast of the
title: God of the
underworld
underworld
pet
33. name: Neptune
name: Alcmene
age: 5200
age: 3300
title: God of the
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
age: 5900
title: God of the title: Divine hero
heaven and skies
father
father
battled
brother
time:12
location: (38.071,23.745)
name: Pluto
name: Cerberus
age: 4900
title: Ugly beast of the
title: God of the
underworld
underworld
pet
g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
34. name: Neptune
name: Alcmene
age: 5200
age: 3300
title: God of the
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
age: 5900
title: God of the title: Divine hero
heaven and skies
father
father
battled
brother
time:12
location: (38.071,23.745)
name: Pluto
name: Cerberus
age: 4900
title: Ugly beast of the
title: God of the
underworld
underworld
pet
g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
35. name: Neptune
name: Alcmene
age: 5200
age: 3300
title: God of the
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
age: 5900
title: God of the title: Divine hero
heaven and skies
father
father
battled
brother
time:12
location: (38.071,23.745)
name: Pluto
name: Cerberus
age: 4900
title: Ugly beast of the
title: God of the
underworld
underworld
pet
g.query().has(‘age’,Cmp.GREATER_THAN,5000)
has(‘title’,Txt.CONTAINS,’god’).vertices()!
36. name: Neptune
name: Alcmene
age: 5200
age: 3300
title: God of the
earth and ocean
brother
mother
name: Jupiter
name: Saturn
age: 4800
name: Hercules
age: 5900
title: God of the title: Divine hero
heaven and skies
father
father
battled
brother
time:12
location: (38.071,23.745)
name: Pluto
name: Cerberus
age: 4900
title: Ugly beast of the
title: God of the
underworld
underworld
pet
g.query().has(‘location’,Geo.WITHIN,
Geoshape.circle(38,23,100).edges()!
41. Apache 2
Aurelius Graph Cluster
TITAN FAUNUS FULGORA
Map/Reduce
Load
Bulk Load
Analysis results
back into Titan
Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs
42. What’s New
Faunus 0.1 released
Bulk Import / Export for Titan
loaded graph into Titan
loading derivations into Titan
RDF support
Many optimizations
vertex compression
43. Faunus Setup
$ bin/gremlin.sh !
,,,/!
(o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!
==>faunusgraph[titanhbaseinputformat]!
gremlin> g.getProperties()!
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!
==>faunus.output.location=dbpedia!
==>faunus.output.location.overwrite=true!
gremlin> g._() !
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!
12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
44. Build a Knowledge Graph
Based on DBPedia
Graph version of Wikipedia
~290 million edges (~1B triples)
1. Bulk load RDF into Faunus
6 m1.xlarge
2. Convert to property graph
3. Bulk load into Titan
3 m1.xlarge with Cassandra
4. OLTP+OLAP
Total Time: ~ 2 hours
47. Apache 2
Aurelius Graph Cluster
TITAN FAUNUS FULGORA
Map/Reduce
Load
Bulk Load
Analysis results
aureliusgraphs@googlegroups.com
back into Titan
Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs