SlideShare ist ein Scribd-Unternehmen logo
1 von 169
Downloaden Sie, um offline zu lesen
GRADOOP: Scalable Graph Analytics
with Apache Flink
Martin Junghanns @kc1s
Apache Flink and Neo4j Meetup Berlin
About the speaker and the team
Apache Flink and Neo4j Meetup Berlin 2
André
PhD Student
Martin
PhD Student
Kevin
M.Sc. Student
Niklas
M.Sc. Student
Prof. Dr. Erhard Rahm
Database Chair
Apache Flink and Neo4j Meetup Berlin 3
Motivation
„Graphs are everywhere“
Apache Flink and Neo4j Meetup Berlin 4
𝑮𝑟𝑎𝑝ℎ = (𝑽𝑒𝑟𝑡𝑖𝑐𝑒𝑠, 𝑬𝑑𝑔𝑒𝑠)
„Graphs are everywhere“
Apache Flink and Neo4j Meetup Berlin 5
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
Trent
𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬, 𝐹𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠)
„Graphs are everywhere“
Apache Flink and Neo4j Meetup Berlin 6
𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠)
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
Trent
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
„Graphs are heterogeneous“
Apache Flink and Neo4j Meetup Berlin 7
𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 8
𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
0.2
0.28
0.26
0.33
0.25
0.26
Alice
Bob
AC/DC
Dave
Carol
Mallory
Peggy
Metallica
3.6
2.82
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 9
𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 10
Assuming a social network
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 11
Assuming a social network
1. Determine subgraph
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 12
Assuming a social network
1. Determine subgraph
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 13
Assuming a social network
1. Determine subgraph
2. Find communities
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 14
Assuming a social network
1. Determine subgraph
2. Find communities
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 15
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 16
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 17
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
4. Find common subgraph
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 18
Assuming a social network
1. Determine subgraph
2. Find communities
3. Filter communities
4. Find common subgraph
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 19
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 20
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 21
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 22
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
„Graphs can be analyzed“
Apache Flink and Neo4j Meetup Berlin 23
Assuming a social network
• Heterogeneous data
1. Determine subgraph
• Apply graph transformation
2. Find communities
• Handle collections of graphs
3. Filter communities
• Aggregation, Selection
4. Find common subgraph
• Apply dedicated algorithm
„And let‘s not forget …“
Apache Flink and Neo4j Meetup Berlin 24
“…Graphs are large”
Apache Flink and Neo4j Meetup Berlin 25
„A framework and research platform for efficient,
distributed and domain independent management
and analytics of heterogeneous graph data.“
Apache Flink and Neo4j Meetup Berlin 26
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
Apache HBase Distributed Graph Store
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
Apache HBase Distributed Graph Store
Apache Flink Distributed Operator Execution
High Level Architecture
Apache Flink and Neo4j Meetup Berlin 27
HDFS/YARN
Cluster
Apache HBase Distributed Graph Store
Apache Flink Operator Implementation
Apache Flink Distributed Operator Execution
Extended Property Graph Model (EPGM)
Graph Analytical Language (GrALa)  Java 7
 25K (33K) LOC
 GPLv3
Apache Flink Third-party library
Apache Flink and Neo4j Meetup Berlin 28
Streaming Dataflow Runtime
DataSet DataStream
HadoopMR
Table
Gelly
ML
Table
Zeppelin
Cascading
MRQL
Dataflow
Storm
Dataflow
SAMOA
GRADOOP
Cluster (e.g. YARN)Local Cloud (e.g. EC2)
Batch Stream
Data Storage (e.g. Files, HDFS, S3, JDBC, Kafka, …)
Apache Flink and Neo4j Meetup Berlin 29
Extended Property Graph Model (EPGM)
Extended Property Graph Model
• Vertices and directed Edges
Apache Flink and Neo4j Meetup Berlin 30
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
Apache Flink and Neo4j Meetup Berlin 31
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
Apache Flink and Neo4j Meetup Berlin 32
1 3
4
5
21 2
3
4
5
1
2
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
Apache Flink and Neo4j Meetup Berlin 33
1 3
4
5
21 2
3
4
5
Person Band
Person
Person
Band
likes likes
likes
knows
likes
1|Community
2|Community
Extended Property Graph Model
• Vertices and directed Edges
• Logical Graphs
• Identifiers
• Type Labels
• Properties
Apache Flink and Neo4j Meetup Berlin 34
1 3
4
5
21 2
3
4
5
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Apache Flink and Neo4j Meetup Berlin 35
EPGM Operators
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
1
2
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
1 3
4
5
2
1
2
Combination
3
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
31 3
4
5
2
1
2
3
Combination
Overlap
3
Basic Binary Operators
Apache Flink and Neo4j Meetup Berlin 36
1 3
4
5
2
3
1 2
1 3
4
5
2
1
2
3
3
Combination
Overlap
Exclusion
3
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
UDF
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
UDF
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
1 3
4
5
2
3
revenue:7000
expense:1000
expense:1000
UDF
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
1 3
4
5
2
3
revenue:7000
expense:1000
expense:1000
UDF
UDF
Graph Aggregation
Apache Flink and Neo4j Meetup Berlin 37
1 3
4
5
2
3
1 3
4
5
2
3 | vertexCount: 5
1 3
4
5
2
3
revenue:7000
expense:1000
expense:1000
1 3
4
5
2
3 | profit: 5000
revenue:7000
expense:1000
expense:1000
UDF
UDF
Graph Transformation
Apache Flink and Neo4j Meetup Berlin 38
Graph Transformation
Apache Flink and Neo4j Meetup Berlin 38
3 | vertexCount: 5
name:Alice
f_name:Bob1 3
4
5
2
Graph Transformation
Apache Flink and Neo4j Meetup Berlin 38
UDF
3 | vertexCount: 5
name:Alice
f_name:Bob1 3
4
5
2
3 | Community| vCount: 5
f_name:Alice
f_name:Bob1 3
4
5
2
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
UDF
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
3
4
1 2UDF
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
3
4
1 2
UDF
UDF
UDF
Subgraph Extraction
Apache Flink and Neo4j Meetup Berlin 39
3
1 3
4
5
2
3
4
1 2
3
4
1 2
4
3
5
2UDF
UDF
UDF
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2 Pattern
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2 Pattern
4 5
1 3
4
2
Graph Pattern Matching
Apache Flink and Neo4j Meetup Berlin 40
3
1 3
4
5
2 Pattern
4 5
1 3
4
2
Graph Collection
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
3
1 3
4
5
2
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
3
a:23 a:84
a:42
a:12
1 3
4
5
2
a:13
a:21
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
+Aggregate
3
a:23 a:84
a:42
a:12
1 3
4
5
2
a:13
a:21
Graph Grouping
Apache Flink and Neo4j Meetup Berlin 41
Keys
3
1 3
4
5
2
4
6 7
+Aggregate
3
a:23 a:84
a:42
a:12
1 3
4
5
2
a:13
a:21
4
count:2 count:2
max(a):42
max(a):84
max(a):13 max(a):21
6 7
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
1
2
3
revenue:7000
expense:1000
expense:1000
revenue:2000
revenue:4000
expense:3000
expense:1000
0 2
3
4
1
5 7 86
9 11 1210
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
Operator
1
2
3
revenue:7000
expense:1000
expense:1000
revenue:2000
revenue:4000
expense:3000
expense:1000
0 2
3
4
1
5 7 86
9 11 1210
Apply (e.g. Aggregation)
Apache Flink and Neo4j Meetup Berlin 42
Operator
1
2
3
revenue:7000
expense:1000
expense:1000
revenue:2000
revenue:4000
expense:3000
expense:1000
0 2
3
4
1
5 7 86
9 11 1210
1 | profit: 5000
2 | profit: -1000
3 | profit: 3000
revenue:7000
expense:1000
expense:1000
revenue:2000
revenue:4000
expense:3000
expense:1000
0 2
3
4
1
5 7 86
9 11 1210
Selection
Apache Flink and Neo4j Meetup Berlin 43
Selection
Apache Flink and Neo4j Meetup Berlin 43
1 | profit: 5000
2 | profit: -1000
3 | profit: 3000
revenue:7000
expense:1000
expense:1000
revenue:2000
revenue:4000
expense:3000
expense:1000
0 2
3
4
1
5 7 86
9 11 1210
Selection
Apache Flink and Neo4j Meetup Berlin 43
UDF
profit > 0
1 | profit: 5000
2 | profit: -1000
3 | profit: 3000
revenue:7000
expense:1000
expense:1000
revenue:2000
revenue:4000
expense:3000
expense:1000
0 2
3
4
1
5 7 86
9 11 1210
Selection
Apache Flink and Neo4j Meetup Berlin 43
UDF
profit > 0
1 | profit: 5000
2 | profit: -1000
3 | profit: 3000
revenue:7000
expense:1000
expense:1000
revenue:2000
revenue:4000
expense:3000
expense:1000
0 2
3
4
1
5 7 86
9 11 1210
1 | profit: 5000
3 | profit: 3000
revenue:7000
expense:1000
expense:1000
revenue:4000 expense:1000
0 2
3
4
1
9 11 1210
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
Algorithm
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. Clustering)
Apache Flink and Neo4j Meetup Berlin 44
Algorithm
1
0 2
3
4
1
5 7 86
9 11 1210
2
3
4
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
Algorithm
1
0 2
3
4
1
5 7 86
9 11 1210
Call (e.g. PageRank)
Apache Flink and Neo4j Meetup Berlin 45
Algorithm
2
rank:0.11
rank:0.25
rank:0.11
rank:1.29
rank:1.29
rank:1.58rank:0.11rank:5.12
rank:0.11
rank:0.11 rank:0.26 rank:0.11 rank:2.47
0 2
3
4
1
5 7 86
9 11 1210
1
0 2
3
4
1
5 7 86
9 11 1210
EPGM Operators Overview
Apache Flink and Neo4j Meetup Berlin 46
Operators
Unary Binary
GraphCollectionLogicalGraph
Algorithms
Aggregation
Pattern Matching
Transformation
Grouping Equality
Call
Combination
Overlap
Exclusion
Equality
Union
Intersection
Difference
Flink Gelly Library
BTG Extraction
Frequent Subgraphs
Limit
Selection
Distinct
Sort
Apply
Reduce
Call
Adaptive Partitioning
Subgraph
EPGM Operators Overview
Apache Flink and Neo4j Meetup Berlin 47
Operators
Unary Binary
GraphCollectionLogicalGraph
Algorithms
Aggregation
Pattern Matching
Transformation
Grouping Equality
Call
Combination
Overlap
Exclusion
Equality
Union
Intersection
Difference
Flink Gelly Library
BTG Extraction
Frequent Subgraphs
Limit
Selection
Distinct
Sort
Apply
Reduce
Call
Adaptive Partitioning
Subgraph
EPGM Operators Overview
Apache Flink and Neo4j Meetup Berlin 48
Operators
Unary Binary
GraphCollectionLogicalGraph
Algorithms
Aggregation
Pattern Matching
Transformation
Grouping Equality
Call
Combination
Overlap
Exclusion
Equality
Union
Intersection
Difference
Flink Gelly Library
BTG Extraction
Frequent Subgraphs
Limit
Selection
Distinct
Sort
Apply
Reduce
Call
Adaptive Partitioning
Subgraph
Apache Flink and Neo4j Meetup Berlin 49
EPGM on Apache Flink
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
DataSet
DataSet
DataSet
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
• Transformation := Operation on DataSets
DataSet
DataSet
DataSet
Transformation
Transformation
DataSet
DataSet
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
• DataSet := Distributed Collection of Data Objects
• Transformation := Operation on DataSets
• Flink Programm := Composition of Transformations
DataSet
DataSet
DataSet
Transformation
Transformation
DataSet
DataSet
Transformation DataSet
Flink Program
Flink DataSet API
Apache Flink and Neo4j Meetup Berlin 50
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
DataSetDataSetDataSet
• DataSet := Distributed Collection of Data Objects
• Transformation := Operation on DataSets
• Flink Programm := Composition of Transformations
DataSet
DataSet
DataSet
Transformation
Transformation
DataSet
DataSet
Transformation DataSet
Flink Program
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
EPGMGraphHead
Id Label Properties POJO DataSet<EPGMGraphHead>
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
EPGMGraphHead
EPGMVertex
Id Label Properties POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId TargetId Graphs
EPGMGraphHead
EPGMVertex
EPGMEdge
Id Label Properties POJO
POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
DataSet<EPGMEdge>
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId TargetId Graphs
EPGMGraphHead
EPGMVertex
EPGMEdge
Id Label Properties POJO
POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
DataSet<EPGMEdge>
Id Label Properties Graphs
EPGMVertex
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId TargetId Graphs
EPGMGraphHead
EPGMVertex
EPGMEdge
Id Label Properties POJO
POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
DataSet<EPGMEdge>
Id Label Properties Graphs
EPGMVertex
GradoopId := UUID
128-bit
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId TargetId Graphs
EPGMGraphHead
EPGMVertex
EPGMEdge
Id Label Properties POJO
POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
DataSet<EPGMEdge>
Id Label Properties Graphs
EPGMVertex
GradoopId := UUID
128-bit
String
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId TargetId Graphs
EPGMGraphHead
EPGMVertex
EPGMEdge
Id Label Properties POJO
POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
DataSet<EPGMEdge>
Id Label Properties Graphs
EPGMVertex
GradoopId := UUID
128-bit
String PropertyList := List<Property>
Property := (String, PropertyValue)
PropertyValue := byte[]
Graph Representation
Apache Flink and Neo4j Meetup Berlin 51
Id Label Properties Graphs
Id Label Properties SourceId TargetId Graphs
EPGMGraphHead
EPGMVertex
EPGMEdge
Id Label Properties POJO
POJO
POJO
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
DataSet<EPGMEdge>
Id Label Properties Graphs
EPGMVertex
GradoopId := UUID
128-bit
String PropertyList := List<Property>
Property := (String, PropertyValue)
PropertyValue := byte[]
GradoopIdSet := Set<GradoopId>
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Community {interest:Hard Rock}
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
DataSet<EPGMGraphHead>
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Community {interest:Hard Rock}
Id Label Properties Graphs
1 Person {name:Alice, born:1984} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
4 Band {name:AC/DC,founded:1973} {2}
5 Person {name:Eve} {2}
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
DataSet<EPGMGraphHead>
DataSet<EPGMVertex>
Graph Representation
Apache Flink and Neo4j Meetup Berlin 52
Id Label Properties
1 Community {interest:Heavy Metal}
2 Community {interest:Hard Rock}
Id Label Properties Graphs
1 Person {name:Alice, born:1984} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
4 Band {name:AC/DC,founded:1973} {2}
5 Person {name:Eve} {2}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
3 likes 3 4 {since:2015} {2}
4 knows 3 5 {} {2}
5 likes 5 4 {since:2014} {2}
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
DataSet<EPGMGraphHead>
DataSet<EPGMVertex> DataSet<EPGMEdge>
Flink DataSet Transformations
Apache Flink and Neo4j Meetup Berlin 53
Flink DataSet Transformations
Apache Flink and Neo4j Meetup Berlin 53
SQL-like Transformations
• filter
• project
• cross
• union
• distinct
• first-N (limit)
• groupBy
• aggregate
• join
• leftOuterJoin
• rightOuterJoin
• fullOuterJoin
Flink DataSet Transformations
Apache Flink and Neo4j Meetup Berlin 53
Hadoop-like Transformations
• map
• flatMap
• mapPartition
• reduce
• reduceGroup
• coGroup
Special Flink Operations
• iterate
• iterateDelta
SQL-like Transformations
• filter
• project
• cross
• union
• distinct
• first-N (limit)
• groupBy
• aggregate
• join
• leftOuterJoin
• rightOuterJoin
• fullOuterJoin
Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5
Exclusion
Operator Implementation
Apache Flink and Neo4j Meetup Berlin 54
1 3
4
5
2
1|Community|interest:Heavy Metal
2|Community|interest:Hard Rock
Person
name : Alice
born : 1984
Band
name : Metallica
founded : 1981
Person
name : Bob
Person
name : Eve
Band
name : AC/DC
founded : 1973
likes
since : 2014
likes
since : 2013
likes
since : 2015
knows
likes
since : 2014
1 2
3
4
5 // input: firstGraph (G[1]), secondGraph (G[2])
1: DataSet<GradoopId> graphId = secondGraph.getGraphHead()
2: .map(new Id<G>());
3:
4: DataSet<V> newVertices = firstGraph.getVertices()
5: .filter(new NotInGraphBroadCast<V>())
6: .withBroadcastSet(graphId, GRAPH_ID);
7:
8: DataSet<E> newEdges = firstGraph.getEdges()
9: .filter(new NotInGraphBroadCast<E>())
10: .withBroadcastSet(graphId, GRAPH_ID)
11: .join(newVertices)
12: .where(new SourceId<E>().equalTo(new Id<V>())
13: .with(new LeftSide<E, V>())
14: .join(newVertices)
15: .where(new TargetId<E>().equalTo(new Id<V>())
16: .with(new LeftSide<E, V>());
Exclusion
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
graphId = secondGraph.getGraphHead()
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
.map(new Id<G>());
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
.map(new Id<G>());
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices()
.map(new Id<G>());
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices() Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
.map(new Id<G>());
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices() Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
.map(new Id<G>());
.filter(new NotInGraphBroadCast<V>())
.withBroadcastSet(graphId, GRAPH_ID);
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 55
Id Label Properties
2 Community {interest:Hard Rock}
graphId = secondGraph.getGraphHead()
Id
2
newVertices = firstGraph.getVertices() Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
3 Person {name:Bob} {1,2}
Id Label Properties Graphs
1 Person {name:Alice} {1}
2 Band {name:Metallica,founded:1981} {1}
.map(new Id<G>());
.filter(new NotInGraphBroadCast<V>())
.withBroadcastSet(graphId, GRAPH_ID);
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges()
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target … Id Label …
1 likes 1 2 … 1 Person …
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target … Id Label …
1 likes 1 2 … 1 Person …
.with(new LeftSide<E, V>())
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target … Id Label …
1 likes 1 2 … 1 Person …
Id Label Source Target …
1 likes 1 2 …
.with(new LeftSide<E, V>())
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target … Id Label …
1 likes 1 2 … 1 Person …
Id Label Source Target …
1 likes 1 2 …
.join(newVertices)
.where(new TargetId<E>().equalTo(new Id<V>())
.with(new LeftSide<E, V>())
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target … Id Label …
1 likes 1 2 … 1 Person …
Id Label Source Target …
1 likes 1 2 …
Id Label Source Target … Id Label …
1 likes 1 2 … 2 Band …
.join(newVertices)
.where(new TargetId<E>().equalTo(new Id<V>())
.with(new LeftSide<E, V>())
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target … Id Label …
1 likes 1 2 … 1 Person …
Id Label Source Target …
1 likes 1 2 …
Id Label Source Target … Id Label …
1 likes 1 2 … 2 Band …
.with(new LeftSide<E, V>());
.join(newVertices)
.where(new TargetId<E>().equalTo(new Id<V>())
.with(new LeftSide<E, V>())
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
Operator Implementation – Exclusion
Apache Flink and Neo4j Meetup Berlin 56
newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target Properties Graphs
1 likes 1 2 {since:2014} {1}
2 likes 3 2 {since:2013} {1}
Id Label Source Target … Id Label …
1 likes 1 2 … 1 Person …
Id Label Source Target …
1 likes 1 2 …
Id Label Source Target … Id Label …
1 likes 1 2 … 2 Band …
Id Label Source Target …
1 likes 1 2 …
.with(new LeftSide<E, V>());
.join(newVertices)
.where(new TargetId<E>().equalTo(new Id<V>())
.with(new LeftSide<E, V>())
.join(newVertices)
.where(new SourceId<E>().equalTo(new Id<V>())
.filter(new NotInGraphBroadCast<E>())
.withBroadcastSet(graphId, GRAPH_ID)
GrALa API
Apache Flink and Neo4j Meetup Berlin 57
GrALa API
Apache Flink and Neo4j Meetup Berlin 57
class LogicalGraph<G extends EPGMGraphHead,
V extends EPGMVertex,
E extends EPGMEdge> {
fromCollections(...) : LogicalGraph<G, V, E>
fromDataSets(...) : LogicalGraph<G, V, E>
fromGellyGraph(...) : LogicalGraph<G, V, E>
getGraphHead() : DataSet<G>
getVertices() : DataSet<V>
getEdges() : DataSet<E>
aggregate(...) : LogicalGraph<G, V, E>
match(...) : GraphCollection<G, V, E>
groupBy(...) : LogicalGraph<G, V, E>
subgraph(...) : LogicalGraph<G, V, E>
combine(...) : LogicalGraph<G, V, E>
// ...
}
GrALa API
Apache Flink and Neo4j Meetup Berlin 57
class LogicalGraph<G extends EPGMGraphHead,
V extends EPGMVertex,
E extends EPGMEdge> {
fromCollections(...) : LogicalGraph<G, V, E>
fromDataSets(...) : LogicalGraph<G, V, E>
fromGellyGraph(...) : LogicalGraph<G, V, E>
getGraphHead() : DataSet<G>
getVertices() : DataSet<V>
getEdges() : DataSet<E>
aggregate(...) : LogicalGraph<G, V, E>
match(...) : GraphCollection<G, V, E>
groupBy(...) : LogicalGraph<G, V, E>
subgraph(...) : LogicalGraph<G, V, E>
combine(...) : LogicalGraph<G, V, E>
// ...
}
class GraphCollection<G extends EPGMGraphHead,
V extends EPGMVertex,
E extends EPGMEdge > {
fromCollections(...) : GraphCollection<G, V, E>
fromDataSets(...) : GraphCollection<G, V, E>
getGraphHeads() : DataSet<G>
getVertices() : DataSet<V>
getEdges() : DataSet<E>
select(...) : GraphCollection<G, V, E>
distinct( ) : GraphCollection<G, V, E>
sortBy(...) : GraphCollection<G, V, E>
union(...) : GraphCollection<G, V, E>
difference(...) : GraphCollection<G, V, E>
// ...
}
GrALa API
Apache Flink and Neo4j Meetup Berlin 58
class EPGMDatabase<G extends EPGMGraphHead,
V extends EPGMVertex,
E extends EPGMEdge> {
fromCollections(...) : EPGMDatabase<G, V, E>
fromDataSets(...) : EPGMDatabase<G, V, E>
fromHBase(...) : EPGMDatabase<G, V, E>
fromJSON(...) : EPGMDatabase<G, V, E>
fromExternalGraph(...) : EPGMDatabase<G, V, E>
writeAsJSON(...) : void
writeToHBase(...) : void
getDatabaseGraph( ) : LogicalGraph<G, V, E>
getGraphById(...) : LogicalGraph<G, V, E>
getGraphsById(...) : GraphCollection<G, V, E>
// ...
}
GrALa API
Apache Flink and Neo4j Meetup Berlin 59
class EPGMDatabase<G extends EPGMGraphHead,
V extends EPGMVertex,
E extends EPGMEdge> {
fromCollections(...) : EPGMDatabase<G, V, E>
fromDataSets(...) : EPGMDatabase<G, V, E>
fromHBase(...) : EPGMDatabase<G, V, E>
fromJSON(...) : EPGMDatabase<G, V, E>
fromExternalGraph(...) : EPGMDatabase<G, V, E>
writeAsJSON(...) : void
writeToHBase(...) : void
getDatabaseGraph( ) : LogicalGraph<G, V, E>
getGraphById(...) : LogicalGraph<G, V, E>
getGraphsById(...) : GraphCollection<G, V, E>
// ...
}
Apache Flink and Neo4j Meetup Berlin 60
Performance
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 61
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
8. Aggregate vertex and edge count of grouped graph
http://www.ldbcouncil.org/
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 62
1. Extract subgraph containing only Persons and knows relations
2. Transform Persons to necessary information
3. Find communities using Label Propagation
4. Aggregate vertex count for each community
5. Select communities with more than 50K users
6. Combine large communities to a single graph
7. Group graph by Persons location and gender
8. Aggregate vertex and edge count of grouped graph
https://git.io/vgozj
Social Network Benchmark
Apache Flink and Neo4j Meetup Berlin 63
Dataset # Vertices # Edges Disk size
Graphalytics.1 61,613 2,026,082 570 MB
Graphalytics.10 260,613 16,600,778 4.5 GB
Graphalytics.100 1,695,613 147,437,275 40.2 GB
Graphalytics.1000 12,775,613 1,363,747,260 372 GB
Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB
• 16x Intel(R) Xeon(R) 2.50GHz 6 (12)
• 16x 48 GB RAM
• 1 Gigabit Ethernet
• Hadoop 2.6.0
• Flink 1.0-SNAPSHOT
• slots (per worker) 12
• jobmanager.heap.mb 2048
• taskmanager.heap.mb 40960
Social Network Benchmark – Runtime
Apache Flink and Neo4j Meetup Berlin 64
Dataset # Vertices # Edges Disk size
Graphalytics.1 61,613 2,026,082 570 MB
Graphalytics.10 260,613 16,600,778 4.5 GB
Graphalytics.100 1,695,613 147,437,275 40.2 GB
Graphalytics.1000 12,775,613 1,363,747,260 372 GB
Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB
• 16x Intel(R) Xeon(R) 2.50GHz 6 (12)
• 16x 48 GB RAM
• 1 Gigabit Ethernet
• Hadoop 2.6.0
• Flink 1.0-SNAPSHOT
• slots (per worker) 12
• jobmanager.heap.mb 2048
• taskmanager.heap.mb 40960
0
200
400
600
800
1000
1200
1 2 4 8 16
Runtime[s]
Number of workers
Graphalytics.100
1
2
4
8
16
1 2 4 8 16
Speedup
Number of workers
Graphalytics.100 Linear
Social Network Benchmark – Speedup
Apache Flink and Neo4j Meetup Berlin 65
Dataset # Vertices # Edges Disk size
Graphalytics.1 61,613 2,026,082 570 MB
Graphalytics.10 260,613 16,600,778 4.5 GB
Graphalytics.100 1,695,613 147,437,275 40.2 GB
Graphalytics.1000 12,775,613 1,363,747,260 372 GB
Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB
• 16x Intel(R) Xeon(R) 2.50GHz 6 (12)
• 16x 48 GB RAM
• 1 Gigabit Ethernet
• Hadoop 2.6.0
• Flink 1.0-SNAPSHOT
• slots (per worker) 12
• jobmanager.heap.mb 2048
• taskmanager.heap.mb 40960
1
10
100
1000
10000
Runtime[s]
Social Network Benchmark – Datasets
Apache Flink and Neo4j Meetup Berlin 66
Dataset # Vertices # Edges Disk size
Graphalytics.1 61,613 2,026,082 570 MB
Graphalytics.10 260,613 16,600,778 4.5 GB
Graphalytics.100 1,695,613 147,437,275 40.2 GB
Graphalytics.1000 12,775,613 1,363,747,260 372 GB
Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB
• 16x Intel(R) Xeon(R) 2.50GHz 6 (12)
• 16x 48 GB RAM
• 1 Gigabit Ethernet
• Hadoop 2.6.0
• Flink 1.0-SNAPSHOT
• slots (per worker) 12
• jobmanager.heap.mb 2048
• taskmanager.heap.mb 40960
Apache Flink and Neo4j Meetup Berlin 67
Demo
https://github.com/s1ck/neo4j-gradoop-demos
Apache Flink and Neo4j Meetup Berlin 68
Current State and Future Work
Current State – Operator Implementations
Apache Flink and Neo4j Meetup Berlin 69
Operators
Unary Binary
GraphCollectionLogicalGraph
Algorithms
Aggregation
Pattern Matching
Transformation
Grouping Equality
Call
Combination
Overlap
Exclusion
Equality
Union
Intersection
Difference
Flink Gelly Library
BTG Extraction
Frequent Subgraphs
Limit
Selection
Distinct
Sort
Apply
Reduce
Call
Adaptive Partitioning
Subgraph
Release History
Apache Flink and Neo4j Meetup Berlin 70
• 0.0.1 First Prototype (May 2015)
– Hadoop MapReduce and Giraph for operator implementations
– Too much complexity
– Performance loss through serialization in HDFS/HBase
• 0.0.2 Using Flink as execution layer (June 2015)
– Basic operators
• 0.1 December 2015
– System-side identifiers (UUID)
– Improved property handling
– More operator implementations (e.g., Equality, Bool operators)
– Code refactoring
• 0.2-SNAPSHOT
– Graph Pattern Matching
– Frequent Subgraph Mining
– Memory optimization (96-bit ID, Dictionary Encoding, …)
– Tuple Implementation
Contributions to Flink
Apache Flink and Neo4j Meetup Berlin 71
• FLINK-2411 Add basic graph summarization algorithm
• FLINK-2590 DataSetUtils.zipWithUniqueID creates duplicate Ids
• FLINK-2905 Add intersect method to Graph class
• FLINK-2910 Combine tests for binary graph operators
• FLINK-2941 Implement a neo4j - Flink/Gelly connector
• FLINK-2981 Update README for building docs
• FLINK-3064 Missing size check in GroupReduceOperatorBase leads to NPE
• FLINK-3118 Check if MessageFunction implements ResultTypeQueryable
• FLINK-3122 Generalize value type in LabelPropagation
• FLINK-3272 Generalize vertex value type in ConnectedComponents
• Flink Forward (October 2015)
• Meetup Big Data Usergroup Saxony (December 2015)
• FOSDEM (January 2016)
Contributions Welcome
Apache Flink and Neo4j Meetup Berlin 72
• Code
– Operator implementations / improvement
– Performance Tuning
• People
– Bachelor / Master Thesis
– Open PhD positions in Leipzig, Germany
• Use Cases and (Big) Data!
Apache Flink and Neo4j Meetup Berlin 73
Thank you!
www.gradoop.com
http://flink.apache.org
http://neo4j.com
http://ldbcouncil.org
https://github.com/s1ck/neo4j-gradoop-demos
https://github.com/s1ck/flink-neo4j
https://github.com/s1ck/ldbc-flink-import
https://github.com/s1ck/gdl

Weitere ähnliche Inhalte

Was ist angesagt?

Querying Linked Geospatial Data with Incomplete Information
Querying Linked Geospatial Data with  Incomplete InformationQuerying Linked Geospatial Data with  Incomplete Information
Querying Linked Geospatial Data with Incomplete InformationCharalampos (Babis) Nikolaou
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesJo-fai Chow
 
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningImproving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningJo-fai Chow
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Futurejexp
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff AllenSri Ambati
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDFSara EL HASSAD
 
Spark for Recommender Systems
Spark for Recommender SystemsSpark for Recommender Systems
Spark for Recommender SystemsSorin Peste
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataOlaf Hartig
 
Using H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters OptimizationUsing H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters OptimizationJo-fai Chow
 
H2O Machine Learning Use Cases
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use CasesJo-fai Chow
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2OSri Ambati
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"Jo-fai Chow
 
Medical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSparkMedical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSparkHelge Holzmann
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Olaf Hartig
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JSuneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JFlink Forward
 

Was ist angesagt? (20)

Querying Linked Geospatial Data with Incomplete Information
Querying Linked Geospatial Data with  Incomplete InformationQuerying Linked Geospatial Data with  Incomplete Information
Querying Linked Geospatial Data with Incomplete Information
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
 
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningImproving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters Tuning
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Future
 
Shebanq gniezno
Shebanq gnieznoShebanq gniezno
Shebanq gniezno
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff Allen
 
Learning Commonalities in RDF
Learning Commonalities in RDFLearning Commonalities in RDF
Learning Commonalities in RDF
 
Spark for Recommender Systems
Spark for Recommender SystemsSpark for Recommender Systems
Spark for Recommender Systems
 
LDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked DataLDQL: A Query Language for the Web of Linked Data
LDQL: A Query Language for the Web of Linked Data
 
Using H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters OptimizationUsing H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters Optimization
 
PyData Introduction
PyData IntroductionPyData Introduction
PyData Introduction
 
H2O Machine Learning Use Cases
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use Cases
 
Stacked Ensembles in H2O
Stacked Ensembles in H2OStacked Ensembles in H2O
Stacked Ensembles in H2O
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
 
Medical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSparkMedical Heritage Library (MHL) on ArchiveSpark
Medical Heritage Library (MHL) on ArchiveSpark
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
 
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4JSuneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
 
ISAX
ISAXISAX
ISAX
 

Ähnlich wie Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Berlin

This week in Neo4j - 14th October 2017
This week in Neo4j - 14th October 2017This week in Neo4j - 14th October 2017
This week in Neo4j - 14th October 2017Neo4j
 
This Week in Neo4j- 1st December 2018
This Week in Neo4j- 1st December 2018This Week in Neo4j- 1st December 2018
This Week in Neo4j- 1st December 2018Neo4j
 
This Week in neo4j - 22nd September 2018
This Week in neo4j - 22nd September 2018This Week in neo4j - 22nd September 2018
This Week in neo4j - 22nd September 2018Neo4j
 
This week in Neo4j - 7th October 2017
This week in Neo4j - 7th October 2017This week in Neo4j - 7th October 2017
This week in Neo4j - 7th October 2017Neo4j
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileTed Habermann
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsDatabricks
 
openCV with python
openCV with pythonopenCV with python
openCV with pythonWei-Wen Hsu
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Roy Russo
 
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...
Vancouver   part 1 intro to elasticsearch and kibana-beginner's crash course ...Vancouver   part 1 intro to elasticsearch and kibana-beginner's crash course ...
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...UllyCarolinneSampaio
 
This Week in Neo4j - 24th November 2018
This Week in Neo4j - 24th November 2018This Week in Neo4j - 24th November 2018
This Week in Neo4j - 24th November 2018Neo4j
 
EKON 24 ML_community_edition
EKON 24 ML_community_editionEKON 24 ML_community_edition
EKON 24 ML_community_editionMax Kleiner
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...confluent
 
What's New in Neo4j - David Allen, Neo4j
What's New in Neo4j  - David Allen, Neo4jWhat's New in Neo4j  - David Allen, Neo4j
What's New in Neo4j - David Allen, Neo4jNeo4j
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014eswcsummerschool
 
Implementing the FRBR Conceptual Model in the Variations Music Discovery System
Implementing the FRBR Conceptual Model in the Variations Music Discovery SystemImplementing the FRBR Conceptual Model in the Variations Music Discovery System
Implementing the FRBR Conceptual Model in the Variations Music Discovery SystemJenn Riley
 
This Week in Neo4j - 20th October 2018
This Week in Neo4j - 20th October 2018This Week in Neo4j - 20th October 2018
This Week in Neo4j - 20th October 2018Neo4j
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesNeo4j
 
NIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderNIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderSebastian Ruder
 
CILK/CILK++ and Reducers
CILK/CILK++ and ReducersCILK/CILK++ and Reducers
CILK/CILK++ and ReducersYunming Zhang
 

Ähnlich wie Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Berlin (20)

This week in Neo4j - 14th October 2017
This week in Neo4j - 14th October 2017This week in Neo4j - 14th October 2017
This week in Neo4j - 14th October 2017
 
This Week in Neo4j- 1st December 2018
This Week in Neo4j- 1st December 2018This Week in Neo4j- 1st December 2018
This Week in Neo4j- 1st December 2018
 
This Week in neo4j - 22nd September 2018
This Week in neo4j - 22nd September 2018This Week in neo4j - 22nd September 2018
This Week in neo4j - 22nd September 2018
 
This week in Neo4j - 7th October 2017
This week in Neo4j - 7th October 2017This week in Neo4j - 7th October 2017
This week in Neo4j - 7th October 2017
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
 
openCV with python
openCV with pythonopenCV with python
openCV with python
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...
Vancouver   part 1 intro to elasticsearch and kibana-beginner's crash course ...Vancouver   part 1 intro to elasticsearch and kibana-beginner's crash course ...
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...
 
This Week in Neo4j - 24th November 2018
This Week in Neo4j - 24th November 2018This Week in Neo4j - 24th November 2018
This Week in Neo4j - 24th November 2018
 
EKON 24 ML_community_edition
EKON 24 ML_community_editionEKON 24 ML_community_edition
EKON 24 ML_community_edition
 
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
Extending the Stream/Table Duality into a Trinity, with Graphs (David Allen &...
 
What's New in Neo4j - David Allen, Neo4j
What's New in Neo4j  - David Allen, Neo4jWhat's New in Neo4j  - David Allen, Neo4j
What's New in Neo4j - David Allen, Neo4j
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
DrupalANDElasticsearch
DrupalANDElasticsearchDrupalANDElasticsearch
DrupalANDElasticsearch
 
Implementing the FRBR Conceptual Model in the Variations Music Discovery System
Implementing the FRBR Conceptual Model in the Variations Music Discovery SystemImplementing the FRBR Conceptual Model in the Variations Music Discovery System
Implementing the FRBR Conceptual Model in the Variations Music Discovery System
 
This Week in Neo4j - 20th October 2018
This Week in Neo4j - 20th October 2018This Week in Neo4j - 20th October 2018
This Week in Neo4j - 20th October 2018
 
Building a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and OntologiesBuilding a Knowledge Graph using NLP and Ontologies
Building a Knowledge Graph using NLP and Ontologies
 
NIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian RuderNIPS 2016 Highlights - Sebastian Ruder
NIPS 2016 Highlights - Sebastian Ruder
 
CILK/CILK++ and Reducers
CILK/CILK++ and ReducersCILK/CILK++ and Reducers
CILK/CILK++ and Reducers
 

Kürzlich hochgeladen

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Kürzlich hochgeladen (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Berlin

  • 1. GRADOOP: Scalable Graph Analytics with Apache Flink Martin Junghanns @kc1s Apache Flink and Neo4j Meetup Berlin
  • 2. About the speaker and the team Apache Flink and Neo4j Meetup Berlin 2 André PhD Student Martin PhD Student Kevin M.Sc. Student Niklas M.Sc. Student Prof. Dr. Erhard Rahm Database Chair
  • 3. Apache Flink and Neo4j Meetup Berlin 3 Motivation
  • 4. „Graphs are everywhere“ Apache Flink and Neo4j Meetup Berlin 4 𝑮𝑟𝑎𝑝ℎ = (𝑽𝑒𝑟𝑡𝑖𝑐𝑒𝑠, 𝑬𝑑𝑔𝑒𝑠)
  • 5. „Graphs are everywhere“ Apache Flink and Neo4j Meetup Berlin 5 Alice Bob Eve Dave Carol Mallory Peggy Trent 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬, 𝐹𝑜𝑙𝑙𝑜𝑤𝑒𝑟𝑠)
  • 6. „Graphs are everywhere“ Apache Flink and Neo4j Meetup Berlin 6 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠) Alice Bob Eve Dave Carol Mallory Peggy Trent
  • 7. Alice Bob AC/DC Dave Carol Mallory Peggy Metallica „Graphs are heterogeneous“ Apache Flink and Neo4j Meetup Berlin 7 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
  • 8. Alice Bob AC/DC Dave Carol Mallory Peggy Metallica „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 8 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
  • 9. 0.2 0.28 0.26 0.33 0.25 0.26 Alice Bob AC/DC Dave Carol Mallory Peggy Metallica 3.6 2.82 „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 9 𝐺𝑟𝑎𝑝ℎ = (𝐔𝐬𝐞𝐫𝐬 ∪ 𝐁𝐚𝐧𝐝𝐬, 𝐹𝑟𝑖𝑒𝑛𝑑𝑠ℎ𝑖𝑝𝑠 ∪ 𝐿𝑖𝑘𝑒𝑠)
  • 10. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 10 Assuming a social network
  • 11. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 11 Assuming a social network 1. Determine subgraph
  • 12. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 12 Assuming a social network 1. Determine subgraph
  • 13. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 13 Assuming a social network 1. Determine subgraph 2. Find communities
  • 14. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 14 Assuming a social network 1. Determine subgraph 2. Find communities
  • 15. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 15 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities
  • 16. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 16 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities
  • 17. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 17 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities 4. Find common subgraph
  • 18. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 18 Assuming a social network 1. Determine subgraph 2. Find communities 3. Filter communities 4. Find common subgraph
  • 19. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 19 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  • 20. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 20 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  • 21. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 21 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  • 22. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 22 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  • 23. „Graphs can be analyzed“ Apache Flink and Neo4j Meetup Berlin 23 Assuming a social network • Heterogeneous data 1. Determine subgraph • Apply graph transformation 2. Find communities • Handle collections of graphs 3. Filter communities • Aggregation, Selection 4. Find common subgraph • Apply dedicated algorithm
  • 24. „And let‘s not forget …“ Apache Flink and Neo4j Meetup Berlin 24
  • 25. “…Graphs are large” Apache Flink and Neo4j Meetup Berlin 25
  • 26. „A framework and research platform for efficient, distributed and domain independent management and analytics of heterogeneous graph data.“ Apache Flink and Neo4j Meetup Berlin 26
  • 27. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27
  • 28. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster
  • 29. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster Apache HBase Distributed Graph Store
  • 30. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster Apache HBase Distributed Graph Store Apache Flink Distributed Operator Execution
  • 31. High Level Architecture Apache Flink and Neo4j Meetup Berlin 27 HDFS/YARN Cluster Apache HBase Distributed Graph Store Apache Flink Operator Implementation Apache Flink Distributed Operator Execution Extended Property Graph Model (EPGM) Graph Analytical Language (GrALa)  Java 7  25K (33K) LOC  GPLv3
  • 32. Apache Flink Third-party library Apache Flink and Neo4j Meetup Berlin 28 Streaming Dataflow Runtime DataSet DataStream HadoopMR Table Gelly ML Table Zeppelin Cascading MRQL Dataflow Storm Dataflow SAMOA GRADOOP Cluster (e.g. YARN)Local Cloud (e.g. EC2) Batch Stream Data Storage (e.g. Files, HDFS, S3, JDBC, Kafka, …)
  • 33. Apache Flink and Neo4j Meetup Berlin 29 Extended Property Graph Model (EPGM)
  • 34. Extended Property Graph Model • Vertices and directed Edges Apache Flink and Neo4j Meetup Berlin 30
  • 35. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs Apache Flink and Neo4j Meetup Berlin 31
  • 36. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs • Identifiers Apache Flink and Neo4j Meetup Berlin 32 1 3 4 5 21 2 3 4 5 1 2
  • 37. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs • Identifiers • Type Labels Apache Flink and Neo4j Meetup Berlin 33 1 3 4 5 21 2 3 4 5 Person Band Person Person Band likes likes likes knows likes 1|Community 2|Community
  • 38. Extended Property Graph Model • Vertices and directed Edges • Logical Graphs • Identifiers • Type Labels • Properties Apache Flink and Neo4j Meetup Berlin 34 1 3 4 5 21 2 3 4 5 Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock
  • 39. Apache Flink and Neo4j Meetup Berlin 35 EPGM Operators
  • 40. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36
  • 41. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 1 2
  • 42. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 1 3 4 5 2 1 2 Combination 3
  • 43. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 31 3 4 5 2 1 2 3 Combination Overlap 3
  • 44. Basic Binary Operators Apache Flink and Neo4j Meetup Berlin 36 1 3 4 5 2 3 1 2 1 3 4 5 2 1 2 3 3 Combination Overlap Exclusion 3
  • 45. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37
  • 46. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3
  • 47. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 UDF
  • 48. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 UDF
  • 49. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 1 3 4 5 2 3 revenue:7000 expense:1000 expense:1000 UDF
  • 50. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 1 3 4 5 2 3 revenue:7000 expense:1000 expense:1000 UDF UDF
  • 51. Graph Aggregation Apache Flink and Neo4j Meetup Berlin 37 1 3 4 5 2 3 1 3 4 5 2 3 | vertexCount: 5 1 3 4 5 2 3 revenue:7000 expense:1000 expense:1000 1 3 4 5 2 3 | profit: 5000 revenue:7000 expense:1000 expense:1000 UDF UDF
  • 52. Graph Transformation Apache Flink and Neo4j Meetup Berlin 38
  • 53. Graph Transformation Apache Flink and Neo4j Meetup Berlin 38 3 | vertexCount: 5 name:Alice f_name:Bob1 3 4 5 2
  • 54. Graph Transformation Apache Flink and Neo4j Meetup Berlin 38 UDF 3 | vertexCount: 5 name:Alice f_name:Bob1 3 4 5 2 3 | Community| vCount: 5 f_name:Alice f_name:Bob1 3 4 5 2
  • 55. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39
  • 56. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2
  • 57. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 UDF
  • 58. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2UDF
  • 59. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 UDF UDF
  • 60. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 3 4 1 2UDF UDF
  • 61. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 3 4 1 2 UDF UDF UDF
  • 62. Subgraph Extraction Apache Flink and Neo4j Meetup Berlin 39 3 1 3 4 5 2 3 4 1 2 3 4 1 2 4 3 5 2UDF UDF UDF
  • 63. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40
  • 64. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2
  • 65. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2 Pattern
  • 66. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2 Pattern 4 5 1 3 4 2
  • 67. Graph Pattern Matching Apache Flink and Neo4j Meetup Berlin 40 3 1 3 4 5 2 Pattern 4 5 1 3 4 2 Graph Collection
  • 68. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41
  • 69. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 3 1 3 4 5 2
  • 70. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2
  • 71. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7
  • 72. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7 3 a:23 a:84 a:42 a:12 1 3 4 5 2 a:13 a:21
  • 73. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7 +Aggregate 3 a:23 a:84 a:42 a:12 1 3 4 5 2 a:13 a:21
  • 74. Graph Grouping Apache Flink and Neo4j Meetup Berlin 41 Keys 3 1 3 4 5 2 4 6 7 +Aggregate 3 a:23 a:84 a:42 a:12 1 3 4 5 2 a:13 a:21 4 count:2 count:2 max(a):42 max(a):84 max(a):13 max(a):21 6 7
  • 75. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42
  • 76. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42 1 2 3 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  • 77. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42 Operator 1 2 3 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  • 78. Apply (e.g. Aggregation) Apache Flink and Neo4j Meetup Berlin 42 Operator 1 2 3 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  • 79. Selection Apache Flink and Neo4j Meetup Berlin 43
  • 80. Selection Apache Flink and Neo4j Meetup Berlin 43 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  • 81. Selection Apache Flink and Neo4j Meetup Berlin 43 UDF profit > 0 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210
  • 82. Selection Apache Flink and Neo4j Meetup Berlin 43 UDF profit > 0 1 | profit: 5000 2 | profit: -1000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:2000 revenue:4000 expense:3000 expense:1000 0 2 3 4 1 5 7 86 9 11 1210 1 | profit: 5000 3 | profit: 3000 revenue:7000 expense:1000 expense:1000 revenue:4000 expense:1000 0 2 3 4 1 9 11 1210
  • 83. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44
  • 84. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44 1 0 2 3 4 1 5 7 86 9 11 1210
  • 85. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44 Algorithm 1 0 2 3 4 1 5 7 86 9 11 1210
  • 86. Call (e.g. Clustering) Apache Flink and Neo4j Meetup Berlin 44 Algorithm 1 0 2 3 4 1 5 7 86 9 11 1210 2 3 4 0 2 3 4 1 5 7 86 9 11 1210
  • 87. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45
  • 88. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45 1 0 2 3 4 1 5 7 86 9 11 1210
  • 89. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45 Algorithm 1 0 2 3 4 1 5 7 86 9 11 1210
  • 90. Call (e.g. PageRank) Apache Flink and Neo4j Meetup Berlin 45 Algorithm 2 rank:0.11 rank:0.25 rank:0.11 rank:1.29 rank:1.29 rank:1.58rank:0.11rank:5.12 rank:0.11 rank:0.11 rank:0.26 rank:0.11 rank:2.47 0 2 3 4 1 5 7 86 9 11 1210 1 0 2 3 4 1 5 7 86 9 11 1210
  • 91. EPGM Operators Overview Apache Flink and Neo4j Meetup Berlin 46 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  • 92. EPGM Operators Overview Apache Flink and Neo4j Meetup Berlin 47 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  • 93. EPGM Operators Overview Apache Flink and Neo4j Meetup Berlin 48 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  • 94. Apache Flink and Neo4j Meetup Berlin 49 EPGM on Apache Flink
  • 95. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50
  • 96. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 • DataSet := Distributed Collection of Data Objects DataSet DataSet DataSet
  • 97. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 • DataSet := Distributed Collection of Data Objects • Transformation := Operation on DataSets DataSet DataSet DataSet Transformation Transformation DataSet DataSet
  • 98. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 • DataSet := Distributed Collection of Data Objects • Transformation := Operation on DataSets • Flink Programm := Composition of Transformations DataSet DataSet DataSet Transformation Transformation DataSet DataSet Transformation DataSet Flink Program
  • 99. Flink DataSet API Apache Flink and Neo4j Meetup Berlin 50 DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet DataSetDataSetDataSet • DataSet := Distributed Collection of Data Objects • Transformation := Operation on DataSets • Flink Programm := Composition of Transformations DataSet DataSet DataSet Transformation Transformation DataSet DataSet Transformation DataSet Flink Program
  • 100. Graph Representation Apache Flink and Neo4j Meetup Berlin 51
  • 101. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 EPGMGraphHead Id Label Properties POJO DataSet<EPGMGraphHead>
  • 102. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs EPGMGraphHead EPGMVertex Id Label Properties POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex>
  • 103. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge>
  • 104. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex
  • 105. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit
  • 106. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit String
  • 107. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit String PropertyList := List<Property> Property := (String, PropertyValue) PropertyValue := byte[]
  • 108. Graph Representation Apache Flink and Neo4j Meetup Berlin 51 Id Label Properties Graphs Id Label Properties SourceId TargetId Graphs EPGMGraphHead EPGMVertex EPGMEdge Id Label Properties POJO POJO POJO DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge> Id Label Properties Graphs EPGMVertex GradoopId := UUID 128-bit String PropertyList := List<Property> Property := (String, PropertyValue) PropertyValue := byte[] GradoopIdSet := Set<GradoopId>
  • 109. Graph Representation Apache Flink and Neo4j Meetup Berlin 52
  • 110. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5
  • 111. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 Id Label Properties 1 Community {interest:Heavy Metal} 2 Community {interest:Hard Rock} 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 DataSet<EPGMGraphHead>
  • 112. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 Id Label Properties 1 Community {interest:Heavy Metal} 2 Community {interest:Hard Rock} Id Label Properties Graphs 1 Person {name:Alice, born:1984} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} 4 Band {name:AC/DC,founded:1973} {2} 5 Person {name:Eve} {2} 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 DataSet<EPGMGraphHead> DataSet<EPGMVertex>
  • 113. Graph Representation Apache Flink and Neo4j Meetup Berlin 52 Id Label Properties 1 Community {interest:Heavy Metal} 2 Community {interest:Hard Rock} Id Label Properties Graphs 1 Person {name:Alice, born:1984} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} 4 Band {name:AC/DC,founded:1973} {2} 5 Person {name:Eve} {2} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} 3 likes 3 4 {since:2015} {2} 4 knows 3 5 {} {2} 5 likes 5 4 {since:2014} {2} 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 DataSet<EPGMGraphHead> DataSet<EPGMVertex> DataSet<EPGMEdge>
  • 114. Flink DataSet Transformations Apache Flink and Neo4j Meetup Berlin 53
  • 115. Flink DataSet Transformations Apache Flink and Neo4j Meetup Berlin 53 SQL-like Transformations • filter • project • cross • union • distinct • first-N (limit) • groupBy • aggregate • join • leftOuterJoin • rightOuterJoin • fullOuterJoin
  • 116. Flink DataSet Transformations Apache Flink and Neo4j Meetup Berlin 53 Hadoop-like Transformations • map • flatMap • mapPartition • reduce • reduceGroup • coGroup Special Flink Operations • iterate • iterateDelta SQL-like Transformations • filter • project • cross • union • distinct • first-N (limit) • groupBy • aggregate • join • leftOuterJoin • rightOuterJoin • fullOuterJoin
  • 117. Operator Implementation Apache Flink and Neo4j Meetup Berlin 54 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5
  • 118. Operator Implementation Apache Flink and Neo4j Meetup Berlin 54 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 Exclusion
  • 119. Operator Implementation Apache Flink and Neo4j Meetup Berlin 54 1 3 4 5 2 1|Community|interest:Heavy Metal 2|Community|interest:Hard Rock Person name : Alice born : 1984 Band name : Metallica founded : 1981 Person name : Bob Person name : Eve Band name : AC/DC founded : 1973 likes since : 2014 likes since : 2013 likes since : 2015 knows likes since : 2014 1 2 3 4 5 // input: firstGraph (G[1]), secondGraph (G[2]) 1: DataSet<GradoopId> graphId = secondGraph.getGraphHead() 2: .map(new Id<G>()); 3: 4: DataSet<V> newVertices = firstGraph.getVertices() 5: .filter(new NotInGraphBroadCast<V>()) 6: .withBroadcastSet(graphId, GRAPH_ID); 7: 8: DataSet<E> newEdges = firstGraph.getEdges() 9: .filter(new NotInGraphBroadCast<E>()) 10: .withBroadcastSet(graphId, GRAPH_ID) 11: .join(newVertices) 12: .where(new SourceId<E>().equalTo(new Id<V>()) 13: .with(new LeftSide<E, V>()) 14: .join(newVertices) 15: .where(new TargetId<E>().equalTo(new Id<V>()) 16: .with(new LeftSide<E, V>()); Exclusion
  • 120. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55
  • 121. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 graphId = secondGraph.getGraphHead()
  • 122. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead()
  • 123. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() .map(new Id<G>());
  • 124. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 .map(new Id<G>());
  • 125. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() .map(new Id<G>());
  • 126. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} .map(new Id<G>());
  • 127. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} .map(new Id<G>()); .filter(new NotInGraphBroadCast<V>()) .withBroadcastSet(graphId, GRAPH_ID);
  • 128. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 55 Id Label Properties 2 Community {interest:Hard Rock} graphId = secondGraph.getGraphHead() Id 2 newVertices = firstGraph.getVertices() Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} 3 Person {name:Bob} {1,2} Id Label Properties Graphs 1 Person {name:Alice} {1} 2 Band {name:Metallica,founded:1981} {1} .map(new Id<G>()); .filter(new NotInGraphBroadCast<V>()) .withBroadcastSet(graphId, GRAPH_ID);
  • 129. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56
  • 130. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges()
  • 131. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1}
  • 132. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 133. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 134. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 135. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 136. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 137. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 138. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 139. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … Id Label Source Target … Id Label … 1 likes 1 2 … 2 Band … .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 140. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … Id Label Source Target … Id Label … 1 likes 1 2 … 2 Band … .with(new LeftSide<E, V>()); .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 141. Operator Implementation – Exclusion Apache Flink and Neo4j Meetup Berlin 56 newEdges = firstGraph.getEdges() Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target Properties Graphs 1 likes 1 2 {since:2014} {1} 2 likes 3 2 {since:2013} {1} Id Label Source Target … Id Label … 1 likes 1 2 … 1 Person … Id Label Source Target … 1 likes 1 2 … Id Label Source Target … Id Label … 1 likes 1 2 … 2 Band … Id Label Source Target … 1 likes 1 2 … .with(new LeftSide<E, V>()); .join(newVertices) .where(new TargetId<E>().equalTo(new Id<V>()) .with(new LeftSide<E, V>()) .join(newVertices) .where(new SourceId<E>().equalTo(new Id<V>()) .filter(new NotInGraphBroadCast<E>()) .withBroadcastSet(graphId, GRAPH_ID)
  • 142. GrALa API Apache Flink and Neo4j Meetup Berlin 57
  • 143. GrALa API Apache Flink and Neo4j Meetup Berlin 57 class LogicalGraph<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : LogicalGraph<G, V, E> fromDataSets(...) : LogicalGraph<G, V, E> fromGellyGraph(...) : LogicalGraph<G, V, E> getGraphHead() : DataSet<G> getVertices() : DataSet<V> getEdges() : DataSet<E> aggregate(...) : LogicalGraph<G, V, E> match(...) : GraphCollection<G, V, E> groupBy(...) : LogicalGraph<G, V, E> subgraph(...) : LogicalGraph<G, V, E> combine(...) : LogicalGraph<G, V, E> // ... }
  • 144. GrALa API Apache Flink and Neo4j Meetup Berlin 57 class LogicalGraph<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : LogicalGraph<G, V, E> fromDataSets(...) : LogicalGraph<G, V, E> fromGellyGraph(...) : LogicalGraph<G, V, E> getGraphHead() : DataSet<G> getVertices() : DataSet<V> getEdges() : DataSet<E> aggregate(...) : LogicalGraph<G, V, E> match(...) : GraphCollection<G, V, E> groupBy(...) : LogicalGraph<G, V, E> subgraph(...) : LogicalGraph<G, V, E> combine(...) : LogicalGraph<G, V, E> // ... } class GraphCollection<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge > { fromCollections(...) : GraphCollection<G, V, E> fromDataSets(...) : GraphCollection<G, V, E> getGraphHeads() : DataSet<G> getVertices() : DataSet<V> getEdges() : DataSet<E> select(...) : GraphCollection<G, V, E> distinct( ) : GraphCollection<G, V, E> sortBy(...) : GraphCollection<G, V, E> union(...) : GraphCollection<G, V, E> difference(...) : GraphCollection<G, V, E> // ... }
  • 145. GrALa API Apache Flink and Neo4j Meetup Berlin 58 class EPGMDatabase<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : EPGMDatabase<G, V, E> fromDataSets(...) : EPGMDatabase<G, V, E> fromHBase(...) : EPGMDatabase<G, V, E> fromJSON(...) : EPGMDatabase<G, V, E> fromExternalGraph(...) : EPGMDatabase<G, V, E> writeAsJSON(...) : void writeToHBase(...) : void getDatabaseGraph( ) : LogicalGraph<G, V, E> getGraphById(...) : LogicalGraph<G, V, E> getGraphsById(...) : GraphCollection<G, V, E> // ... }
  • 146. GrALa API Apache Flink and Neo4j Meetup Berlin 59 class EPGMDatabase<G extends EPGMGraphHead, V extends EPGMVertex, E extends EPGMEdge> { fromCollections(...) : EPGMDatabase<G, V, E> fromDataSets(...) : EPGMDatabase<G, V, E> fromHBase(...) : EPGMDatabase<G, V, E> fromJSON(...) : EPGMDatabase<G, V, E> fromExternalGraph(...) : EPGMDatabase<G, V, E> writeAsJSON(...) : void writeToHBase(...) : void getDatabaseGraph( ) : LogicalGraph<G, V, E> getGraphById(...) : LogicalGraph<G, V, E> getGraphsById(...) : GraphCollection<G, V, E> // ... }
  • 147. Apache Flink and Neo4j Meetup Berlin 60 Performance
  • 148. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61
  • 149. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 http://www.ldbcouncil.org/
  • 150. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations http://www.ldbcouncil.org/
  • 151. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information http://www.ldbcouncil.org/
  • 152. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation http://www.ldbcouncil.org/
  • 153. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community http://www.ldbcouncil.org/
  • 154. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users http://www.ldbcouncil.org/
  • 155. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph http://www.ldbcouncil.org/
  • 156. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph 7. Group graph by Persons location and gender http://www.ldbcouncil.org/
  • 157. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 61 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph 7. Group graph by Persons location and gender 8. Aggregate vertex and edge count of grouped graph http://www.ldbcouncil.org/
  • 158. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 62 1. Extract subgraph containing only Persons and knows relations 2. Transform Persons to necessary information 3. Find communities using Label Propagation 4. Aggregate vertex count for each community 5. Select communities with more than 50K users 6. Combine large communities to a single graph 7. Group graph by Persons location and gender 8. Aggregate vertex and edge count of grouped graph https://git.io/vgozj
  • 159. Social Network Benchmark Apache Flink and Neo4j Meetup Berlin 63 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960
  • 160. Social Network Benchmark – Runtime Apache Flink and Neo4j Meetup Berlin 64 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960 0 200 400 600 800 1000 1200 1 2 4 8 16 Runtime[s] Number of workers Graphalytics.100
  • 161. 1 2 4 8 16 1 2 4 8 16 Speedup Number of workers Graphalytics.100 Linear Social Network Benchmark – Speedup Apache Flink and Neo4j Meetup Berlin 65 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960
  • 162. 1 10 100 1000 10000 Runtime[s] Social Network Benchmark – Datasets Apache Flink and Neo4j Meetup Berlin 66 Dataset # Vertices # Edges Disk size Graphalytics.1 61,613 2,026,082 570 MB Graphalytics.10 260,613 16,600,778 4.5 GB Graphalytics.100 1,695,613 147,437,275 40.2 GB Graphalytics.1000 12,775,613 1,363,747,260 372 GB Graphalytics.10000 90,025,613 10,872,109,028 2.9 TB • 16x Intel(R) Xeon(R) 2.50GHz 6 (12) • 16x 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.0-SNAPSHOT • slots (per worker) 12 • jobmanager.heap.mb 2048 • taskmanager.heap.mb 40960
  • 163. Apache Flink and Neo4j Meetup Berlin 67 Demo https://github.com/s1ck/neo4j-gradoop-demos
  • 164. Apache Flink and Neo4j Meetup Berlin 68 Current State and Future Work
  • 165. Current State – Operator Implementations Apache Flink and Neo4j Meetup Berlin 69 Operators Unary Binary GraphCollectionLogicalGraph Algorithms Aggregation Pattern Matching Transformation Grouping Equality Call Combination Overlap Exclusion Equality Union Intersection Difference Flink Gelly Library BTG Extraction Frequent Subgraphs Limit Selection Distinct Sort Apply Reduce Call Adaptive Partitioning Subgraph
  • 166. Release History Apache Flink and Neo4j Meetup Berlin 70 • 0.0.1 First Prototype (May 2015) – Hadoop MapReduce and Giraph for operator implementations – Too much complexity – Performance loss through serialization in HDFS/HBase • 0.0.2 Using Flink as execution layer (June 2015) – Basic operators • 0.1 December 2015 – System-side identifiers (UUID) – Improved property handling – More operator implementations (e.g., Equality, Bool operators) – Code refactoring • 0.2-SNAPSHOT – Graph Pattern Matching – Frequent Subgraph Mining – Memory optimization (96-bit ID, Dictionary Encoding, …) – Tuple Implementation
  • 167. Contributions to Flink Apache Flink and Neo4j Meetup Berlin 71 • FLINK-2411 Add basic graph summarization algorithm • FLINK-2590 DataSetUtils.zipWithUniqueID creates duplicate Ids • FLINK-2905 Add intersect method to Graph class • FLINK-2910 Combine tests for binary graph operators • FLINK-2941 Implement a neo4j - Flink/Gelly connector • FLINK-2981 Update README for building docs • FLINK-3064 Missing size check in GroupReduceOperatorBase leads to NPE • FLINK-3118 Check if MessageFunction implements ResultTypeQueryable • FLINK-3122 Generalize value type in LabelPropagation • FLINK-3272 Generalize vertex value type in ConnectedComponents • Flink Forward (October 2015) • Meetup Big Data Usergroup Saxony (December 2015) • FOSDEM (January 2016)
  • 168. Contributions Welcome Apache Flink and Neo4j Meetup Berlin 72 • Code – Operator implementations / improvement – Performance Tuning • People – Bachelor / Master Thesis – Open PhD positions in Leipzig, Germany • Use Cases and (Big) Data!
  • 169. Apache Flink and Neo4j Meetup Berlin 73 Thank you! www.gradoop.com http://flink.apache.org http://neo4j.com http://ldbcouncil.org https://github.com/s1ck/neo4j-gradoop-demos https://github.com/s1ck/flink-neo4j https://github.com/s1ck/ldbc-flink-import https://github.com/s1ck/gdl