Presentation of the Gradoop Framework at the Flink & Neo4j Meetup in Berlin (http://www.meetup.com/graphdb-berlin/events/228576494/). The talk is about the extended property graph model, its operators and how they are implemented on top of Apache Flink. The talk also includes some benchmark results on scalability and a demo involving Neo4j, Flink and Gradoop (see www.gradoop.com)
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Berlin
1. GRADOOP: Scalable Graph Analytics
with Apache Flink
Martin Junghanns @kc1s
Apache Flink and Neo4j Meetup Berlin
2. About the speaker and the team
Apache Flink and Neo4j Meetup Berlin 2
André
PhD Student
Martin
PhD Student
Kevin
M.Sc. Student
Niklas
M.Sc. Student
Prof. Dr. Erhard Rahm
Database Chair
10. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 10
Assuming a social network
11. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 11
Assuming a social network
1. Determine subgraph
12. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 12
Assuming a social network
1. Determine subgraph
13. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 13
Assuming a social network
1. Determine subgraph
2. Find communities
14. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 14
Assuming a social network
1. Determine subgraph
2. Find communities
15. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 15
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
16. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 16
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
17. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 17
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
4. Find common subgraph
18. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 18
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
4. Find common subgraph
19. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 19
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
20. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 20
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
21. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 21
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
22. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 22
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
23. „Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 23
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
24. „And let‘s not forget …“
Apache Flink and Neo4j Meetup Berlin 24
26. „A framework and research platform for efficient,
distributed and domain independent management
and analytics of heterogeneous graph data.“
Apache Flink and Neo4j Meetup Berlin 26
33. Apache Flink and Neo4j Meetup Berlin 29
Extended Property Graph Model (EPGM)
34. Extended Property Graph Model
• Vertices and directed Edges
Apache Flink and Neo4j Meetup Berlin 30
35. Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
Apache Flink and Neo4j Meetup Berlin 31
36. Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
Apache Flink and Neo4j Meetup Berlin 32
1 3
4
5
21 2
3
4
5
1
2
37. Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
Apache Flink and Neo4j Meetup Berlin 33
1 3
4
5
21 2
3
4
5
Person Band
Person
Person
Band
likes likes
likes
knows
likes
1|Community
2|Community
38. Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
• Properties
Apache Flink and Neo4j Meetup Berlin 34
1 3
4
5
21 2
3
4
5
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
96. Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
DataSet
DataSet
DataSet
97. Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
• Transformation := Operation on DataSets
DataSet
DataSet
DataSet
Transformation
Transformation
DataSet
DataSet
98. Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
• Transformation := Operation on DataSets
• Flink Programm := Composition of Transformations
DataSet
DataSet
DataSet
Transformation
Transformation
DataSet
DataSet
Transformation DataSet
Flink Program
99. Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
• DataSet := Distributed Collection of Data Objects
• Transformation := Operation on DataSets
• Flink Programm := Composition of Transformations
DataSet
DataSet
DataSet
Transformation
Transformation
DataSet
DataSet
Transformation DataSet
Flink Program
110. Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
111. Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Community {interest:Hard Rock}
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
DataSet<EPGMGraphHead>
112. Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Community {interest:Hard Rock}
Id Label Properties Graphs
1 Person {name:Alice, born:1984} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
4 Band {name:AC/DC,founded:1973} {2}
5 Person {name:Eve} {2}
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
113. Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Community {interest:Hard Rock}
Id Label Properties Graphs
1 Person {name:Alice, born:1984} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
4 Band {name:AC/DC,founded:1973} {2}
5 Person {name:Eve} {2}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
3 likes 3 4 {since:2015} {2}
4 knows 3 5 {} {2}
5 likes 5 4 {since:2014} {2}
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
DataSet<EPGMGraphHead>
DataSet<EPGMVertex> DataSet<EPGMEdge>
117. Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
118. Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
Exclusion
119. Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5 // input: firstGraph (G[1]), secondGraph (G[2])
1: DataSet<GradoopId> graphId = secondGraph.getGraphHead()
2: .map(new Id<G>());
3:
4: DataSet<V> newVertices = firstGraph.getVertices()
5: .filter(new NotInGraphBroadCast<V>())
6: .withBroadcastSet(graphId, GRAPH_ID);
7:
8: DataSet<E> newEdges = firstGraph.getEdges()
9: .filter(new NotInGraphBroadCast<E>())
10: .withBroadcastSet(graphId, GRAPH_ID)
11: .join(newVertices)
12: .where(new SourceId<E>().equalTo(new Id<V>())
13: .with(new LeftSide<E, V>())
14: .join(newVertices)
15: .where(new TargetId<E>().equalTo(new Id<V>())
16: .with(new LeftSide<E, V>());
Exclusion
121. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
graphId = secondGraph.getGraphHead()
122. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
123. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
.map(new Id<G>());
124. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
.map(new Id<G>());
125. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices()
.map(new Id<G>());
126. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices() Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
.map(new Id<G>());
127. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices() Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
.map(new Id<G>());
.filter(new NotInGraphBroadCast<V>())
.withBroadcastSet(graphId, GRAPH_ID);
128. Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices() Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
.map(new Id<G>());
.filter(new NotInGraphBroadCast<V>())
.withBroadcastSet(graphId, GRAPH_ID);
150. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
http://www.ldbcouncil.org/
151. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
http://www.ldbcouncil.org/
152. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
http://www.ldbcouncil.org/
153. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
http://www.ldbcouncil.org/
154. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
http://www.ldbcouncil.org/
155. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
http://www.ldbcouncil.org/
156. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
http://www.ldbcouncil.org/
157. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
8. Aggregate vertex and edge count of grouped graph
http://www.ldbcouncil.org/
158. Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 62
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
8. Aggregate vertex and edge count of grouped graph
https://git.io/vgozj