3. Tinkerpop and GraphFrames provide
Complimentary Approaches for Graph Analytics
DataSet Catalyst
GraphFrames
3#EUeco3
4. Graphs are Vertices and Edges
4
Vertices are things and edges represent their relations to one another
#EUeco3
5. Graphs are Vertices and Edges
5
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class[6]
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class[8][9]
Service: 2286–2293 (7 Years)
#EUeco3
6. Graphs are Vertices and Edges
6
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class[6]
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class[8][9]
Service: 2286–2293 (7 Years)
Vertex
Properties
#EUeco3
7. Graphs are Vertices and Edges
7
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class[6]
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class[8][9]
Service: 2286–2293 (7 Years)
succeeded by
succeeded by
succeeded by
#EUeco3
8. Graphs are Vertices and Edges
8
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class[6]
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class[8][9]
Service: 2286–2293 (7 Years)
Edge
Edge Labelsucceeded by
succeeded by
succeeded by
#EUeco3
9. Graphs are Vertices and Edges
9
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class[6]
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class[8][9]
Service: 2286–2293 (7 Years)
Ship
Ship
Ship
Ship
Vertex Label
succeeded by
succeeded by
succeeded by
#EUeco3
10. Graphs are Vertices and Edges
10
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class
Service: 2245–2285 (40 Years)
Ship
Ship
Ship
Ship
Position: Captain
Name: Kirk
Position: Captain
Name: Picard
Crew
Crew
succeeded by
succeeded by
succeeded by
#EUeco3
11. Graphs are Vertices and Edges
11
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class
Service: 2286–2293 (7 Years)
Ship
Ship
Ship
Ship
Position: Captain
Name: Kirk
Position: Captain
Name: Picard
Crew
Crew
succeeded by
succeeded by
succeeded by
served on
served on
served on
served on
#EUeco3
12. Graphs are Vertices and Edges
12
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class
Service: 2286–2293 (7 Years)
Ship
Ship
Ship
Ship
Position: Captain
Name: Kirk
Position: Captain
Name: Picard
Crew
Crew
succeeded by
succeeded by
succeeded by
served on
served on
served on
served on
But why do I
want this?
#EUeco3
13. Graphs let us ask questions about our data based
on their relations
13
What Captain Served After Kirk?
What Ship was two after the
NCC-1701?
#EUeco3
14. Traversals involve following paths through the
Graph
14
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-D)
Class: Galaxy
Service: 2363–2371 (8 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class
Service: 2286–2293 (7 Years)
Ship
Ship
Ship
Ship
Position: Captain
Name: Kirk
Position: Captain
Name: Picard
Crew
Crew
succeeded by
succeeded by
succeeded by
served on
served on
served on
served on
#EUeco3
15. What Captain was After Kirk?
15
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class
Service: 2286–2293 (7 Years)
Ship
Ship
Position: Captain
Name: Kirk
Position: Captain
Name: Picard
Crew
Crewsucceeded by
served on
served on
#EUeco3
16. What Ship was two after the NCC-1701?
16
Registry: USS Enterprise (NCC-1701-C)
Class: Ambassador
Service: 2332[11] – 2344 (12 Years)
Registry: USS Enterprise (NCC-1701)
Class: Constitution class
Service: 2245–2285 (40 Years)
Registry: USS Enterprise (NCC-1701-A)
Class: Enterprise class
Service: 2286–2293 (7 Years)
Ship
Ship
Ship
succeeded by
succeeded by
#EUeco3
17. Tinkerpop is a Powerful and Flexible Graph
Framework
• Server, Language, Connectors
• Graph Framework for
OLAP and OLTP
• Node Centric Representations
• Fluent API (Gremlin)
• Fully Self Contained Framework
17#EUeco3
24. #EUeco3
Vertex Stored in a PairRDD
Id -> StarVertex(Edge and Property Information)
24
1
A
C
D
Star Vertex: Adjacency list representation
1: "A", "Kirk"
A: "C", "Kirk"
C: "D", "Picard"
D: "Picard" Just Id
Of Connected
Vertex
25. #EUeco3
Vertex Program Runs Initializing Traverser for
every Vertex
25
1
A
C
D
SparkMemory - Accumulator - Used for GlobalState
26. #EUeco3
Then we cycle through a message Passing
Algorithm
26
1
A
C
D
1
A
C
D
1
A
C
D
SparkMemory - Accumulator - Used for GlobalState
27. #EUeco3
Then we cycle through a message Passing
Algorithm
27
1
A
C
D
1
A
C
D
1
A
C
D
SparkMemory - Accumulator - Used for GlobalState
Passes messages from one Vertex to another with a join
28. #EUeco3
Then we cycle through a message Passing
Algorithm
28
1
A
C
D
1
A
C
D
1
A
C
D
SparkMemory - Accumulator - Used for GlobalState
Repeat
29. #EUeco3
Then we cycle through a message Passing
Algorithm
29
1
A
C
D
1
A
C
D
1
A
C
D
SparkMemory - Accumulator - Used for GlobalState
All Traversers Halt
Or
Program Terminates
Result!
31. #EUeco3
Tinkerpop Spark OLAP Pros/Cons
Pros
• Every message pass requires only a single shuffle
• Edges and edge properties accessible without a step
• Very Flexible, Many Provider Specific Shortcuts possible
• Internal properties can be any Java type
• All in one, Server already ready for multiple clients
Cons
• Limited in ability to connect to external sources/other spark applications
• Flexibility of framework allows for many platform specific shortcuts to be added
• Genericness provides difficulty in making some optimizations
• Edges co-partitioned with vertices, high degree nodes can cause memory issues
31
32. #EUeco3
GraphFrames Background
• Third Party Package
• https://graphframes.github.io/
• Integrates with Dataset/Dataframe in Spark
• Relational under the hood
32
34. #EUeco3
GraphFrames are built of two DataFrames
34
id job species
Geordi Chief
Engineer
Human
Data Science
Officer
Android
Vertex DataFrame
src dst relationship
Geordi Data Friend
Edge DataFrame
Friend
35. #EUeco3
GraphFrames are built of two DataFrames
35
id job species
Geordi Chief
Engineer
Human
Data Science
Officer
Android
Vertex DataFrame
src dst relationship
Geordi Data Friend
Edge DataFrame
Friend
Can Only Be Spark Types
36. #EUeco3
GraphFrames are built of two DataFrames
36
id job species
Geordi Chief
Engineer
Human
Data Science
Officer
Android
Vertex DataFrame
src dst relationship
Geordi Data Friend
Edge DataFrame
Friend
No Built in Labels
37. #EUeco3
Catalyst Optimizes any Requests
• Simple requests using DataFrame api don't do
anything special
• Some methods fall back to GraphX (RDD Based)
• Others use pure DataFrame methods
37
41. #EUeco3
GraphFrames Motif Matching
41
GraphFrame
(a)-[e]->(b)
Vertex (a) Vertices as a UDT "A"
[e]
Vertices as UDT "B"
Join with edges where
E.dst = B.id
Edge
Vertex
[b]
Edges as UDT "E"
Join with edges
where A.id = E.src
V E
A: <VertexRow>
A: <VertexRow>,
E: <EdgeRow>
Join
JoinA: <VertexRow>,
E: <EdgeRow>,
B: <VertexRow>
42. #EUeco3
GraphFrames Motif Matching
42
GraphFrame
(a)-[e]->(b)
Vertex (a) Vertices as a UDT "A"
[e]
Vertices as UDT "B"
Join with edges where
E.dst = B.id
Edge
Vertex
[b]
Edges as UDT "E"
Join with edges
where A.id = E.src
V E
A: <VertexRow>
A: <VertexRow>,
E: <EdgeRow>
Join
JoinA: <VertexRow>,
E: <EdgeRow>,
B: <VertexRow>
THAT'S SO
MANY JOINS
48. #EUeco3
All of the normal optimizations happen within this
FrameWork
48
Vertex
Edge
Vertex
A: <VertexRow>
A: <VertexRow>,
E: <EdgeRow>
A: <VertexRow>,
E: <EdgeRow>,
B: <VertexRow>
Broadcast?
Broadcast?
51. #EUeco3
GraphFrame Pros Cons
Pros
• Much Faster on basic counts
• Powerful optimizations + CodeGen
• Easy to connect to other sources
Cons
• Slower on complex traversals (2 Joins per hop)
• Relational Model not as Flexible
51
53. Choose TinkerPop OLAP For Long Paths
• More complicated queries
• Traversals that require many hops
• g.V().out.out.out.out
• Avoid for simple counts and aggregations
• Avoid if you have very high degree Vertices
53#EUeco3
54. Choose GraphFrames for Interoperability and
Short Paths
• General Edge/Vertex stats groupCount, min, max
• Connecting to other sources
• Short paths
• High Degree Vertices
• Avoid
• Long path algorithms
54#EUeco3
55. #EUeco3
Choosing the Right Framework
55
Gremlin on
Graphframes
OLTP backed
by DSE Graph
Built in Spark
We write it!
Search Built In!
Advanced
Security
56. #EUeco3
Thanks for Listening
56
Datastax Academy Graph Course
https://academy.datastax.com/resources/ds330-datastax-enterprise-graph
Try out Datastax Enterprise!
https://academy.datastax.com/quick-downloads
Apache Tinkerpop
http://tinkerpop.apache.org/
GraphFrames Link
https://graphframes.github.io/