SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Downloaden Sie, um offline zu lesen
GRADOOP: Scalable Graph
Analytics with Apache Flink
Martin Junghanns
University of Leipzig
About the speaker and the team
 2011 Bachelor of Engineering
 Thesis: Partitioning of Dynamic Graphs
 2014 Master of Science
 Thesis: Graph Database Systems for Business Intelligence
 Now: PhD Student, Database Group, University of Leipzig
 Distributed Systems
 Distributed Graph Data Management
 Graph Theory & Algorithms
 Professional Experience: sones GraphDB, SAP
André, PhD Student
Martin, PhD Student
Kevin, M.Sc. StudentNiklas, M.Sc. Student
Motivation
𝑮𝑟𝑟𝑟𝑟 = (𝑽𝑒𝑒𝑒𝑒𝑒𝑒𝑒, 𝑬𝑑𝑑𝑑𝑑)
“Graphs are everywhere”
𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
“Graphs are everywhere”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
“Graphs are everywhere”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
“Graphs are everywhere”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
“Graphs are everywhere”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
“Graphs are everywhere”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
Trent
𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
“Graphs are everywhere”
Alice
Bob
Eve
Dave
Carol
Mallory
Peggy
Trent
𝐺𝐺𝐺𝐺𝐺 = (𝐂𝐂𝐂𝐂𝐂𝐂, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶)
“Graphs are everywhere”
Leipzig
pop: 544K
Dresden
pop: 536K
Berlin
pop: 3.5M
Hamburg
pop: 1.7M
Munich
pop: 1.4M
Chemnitz
pop: 243K
Nuremberg
pop: 500K
Cologne
pop: 1M
 World Wide Web
 ca. 1 billion websites
“Graphs are large”
 Facebook
 ca. 1.49 billion active users
 ca. 340 friends per user
End-to-End Graph Analytics
Data Integration Graph Analytics Representation
End-to-End Graph Analytics
Data Integration Graph Analytics Representation
 Integrate data from one or more sources into a dedicated
graph storage with common graph data model
End-to-End Graph Analytics
Data Integration Graph Analytics Representation
 Integrate data from one or more sources into a dedicated
graph storage with common graph data model
 Definition of analytical workflows from operator algebra
End-to-End Graph Analytics
Data Integration Graph Analytics Representation
 Integrate data from one or more sources into a dedicated
graph storage with common graph data model
 Definition of analytical workflows from operator algebra
 Result representation in a meaningful way
Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
What‘s missing?
An end-to-end framework and research platform
for efficient, distributed and domain independent
graph data management and analytics.
What‘s missing?
An end-to-end framework and research platform
for efficient, distributed and domain independent
graph data management and analytics.
Gradoop Architecture & Data Model
High Level Architecture
HDFS/YARN
Cluster
HBase Distributed Graph Store
Extended Property Graph Model
Flink Operator Implementations
Data Integration
Flink Operator Execution
Workflow
Declaration
Visual
GrALa DSL
Representation
Data flow
Control flow
Graph Analytics Representation
Workflow Execution
High Level Architecture
HBase Distributed Graph Store
Extended Property Graph Model
Flink Operator Implementations
Data Integration
Flink Operator Execution
Workflow
Declaration
Visual
GrALa DSL
Representation
Data flow
Control flow
Graph Analytics Representation
Workflow Execution
HDFS/YARN
Cluster
Extended Property Graph Model
Extended Property Graph Model
Extended Property Graph Model
Graph Operators
Operator GrALa notation
Binary
Combination graph.combine(otherGraph) : Graph
Overlap graph.overlap(otherGraph) : Graph
Exclusion graph.exclude(otherGraph) : Graph
Isomorphism graph.isIsomorphicTo(otherGraph) : Boolean
Unary
Pattern Matching graph.match(patternGraph,predicate) : Collection
Aggregation graph.aggregate(propertyKey,aggregateFunction) : Graph
Projection graph.project(vertexFunction,edgeFunction) : Graph
Summarization graph.summarize(
vertexGroupKeys,vertexAggregateFunction,
edgeGroupKeys,edgeAggregateFunction) : Graph
Combination
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
Combination
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
Graph Operators
Operator GrALa notation
Binary
Combination graph.combine(otherGraph) : Graph
Overlap graph.overlap(otherGraph) : Graph
Exclusion graph.exclude(otherGraph) : Graph
Isomorphism graph.isIsomorphicTo(otherGraph) : Boolean
Unary
Pattern Matching graph.match(patternGraph,predicate) : Collection
Aggregation graph.aggregate(propertyKey,aggregateFunction) : Graph
Projection graph.project(vertexFunction,edgeFunction) : Graph
Summarization graph.summarize(
vertexGroupKeys,vertexAggregateFunction,
edgeGroupKeys,edgeAggregateFunction) : Graph
Summarization
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
2: vertexGroupingKeys = {:type, “city”}
3: edgeGroupingKeys = {:type}
4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|)
5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|)
6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc,
edgeGroupingKeys, edgeAggFunc)
Summarization
1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
2: vertexGroupingKeys = {:type, “city”}
3: edgeGroupingKeys = {:type}
4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|)
5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|)
6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc,
edgeGroupingKeys, edgeAggFunc)
Graph Collection Operators
Operator GrALa notation
Collection
Selection collection.select(predicate) : Collection
Distinct collection.distinct() : Collection
Sort by collection.sortBy(key, [:asc|:desc]) : Collection
Top collection.top(limit) : Collection
Union collection.union(otherCollection) : Collection
Intersection collection.intersect(otherCollection) : Collection
Difference collection.difference(otherCollection) : Collection
Auxiliary
Apply collection.apply(unaryGraphOperator) : Collection
Reduce collection.reduce(binaryGraphOperator) : Graph
Call [graph|collection].callFor[Graph|Collection](
algorithm,parameters) : [Graph|Collection]
Selection
1: collection = <db.G[0],db.G[1],db.G[2]>
2: predicate = (Graph g => |g.V| > 3)
3: result = collection.select(predicate)
Selection
1: collection = <db.G[0],db.G[1],db.G[2]>
2: predicate = (Graph g => |g.V| > 3)
3: result = collection.select(predicate)
Graph Collection Operators
Operator GrALa notation
Collection
Selection collection.select(predicate) : Collection
Distinct collection.distinct() : Collection
Sort by collection.sortBy(key, [:asc|:desc]) : Collection
Top collection.top(limit) : Collection
Union collection.union(otherCollection) : Collection
Intersection collection.intersect(otherCollection) : Collection
Difference collection.difference(otherCollection) : Collection
Auxiliary
Apply collection.apply(unaryGraphOperator) : Collection
Reduce collection.reduce(binaryGraphOperator) : Graph
Call [graph|collection].callFor[Graph|Collection](
algorithm,parameters) : [Graph|Collection]
Extended Property Graph Model in Flink
ID Label Properties Graphs
ID Label Properties Source
Vertex
Target
Vertex
Graphs
VertexData
EdgeData
GraphData
ID Label Properties
POJO
POJO
POJO
DataSet<Vertex<ID,VertexData>>
DataSet<Edge<ID,EdgeData>>
DataSet<Subgraph<ID,GraphData>>
Gelly
𝒱
ℰ
𝒢
Pojo Representation
Extended Property Graph Model in Flink
VertexData
EdgeData
GraphData
POJO
POJO
POJO
DataSet<Vertex<ID,VertexData>>
DataSet<Edge<ID,EdgeData>>
DataSet<Subgraph<ID,GraphData>>
Gelly
VertexData
EdgeData
GraphData
Tuple
Tuple
Tuple
DataSet<VertexData>
DataSet<EdgeData>
DataSet<GraphData>
𝒱
𝒱
ℰ
ℰ
𝒢
𝒢
Pojo Representation
Tuple Representation
ID Label Properties Graphs
ID Label Properties Source
Vertex
Target
Vertex
Graphs
ID Label Properties
ID Label Properties Graphs
ID Label Properties Source
Vertex
Target
Vertex
Graphs
ID Label Properties
Summarization in Flink
VID City
0 L
1 L
2 D
3 D
4 D
5 B
EID S T
0 0 1
1 1 0
2 1 2
3 2 1
4 2 3
5 3 2
6 4 0
7 4 1
8 5 2
9 5 3
L [0,1]
D [2,3,4]
B [5]
VID City Count
0 L 2
2 D 3
5 B 1
VID Rep
0 0
1 0
2 2
3 2
4 2
5 5
ID S T
0 0 1
1 0 0
2 0 2
3 2 1
4 2 3
5 2 2
6 2 0
7 2 1
8 5 2
9 5 3
ID S T
0 0 0
1 0 0
2 0 2
3 2 0
4 2 2
5 2 2
6 2 0
7 2 0
8 5 2
9 5 2
0,0 [0,1]
0,2 [2]
2,0 [3,6,7]
2,2 [4,5]
5,2 [8,9]
EID S T Count
0 0 1 2
2 0 2 1
3 2 0 3
4 2 2 2
8 5 2 2
join(VID==S)
𝒱
ℰ’
𝒱′
ℰ
groupBy(City)
reduceGroup + filter + map
reduceGroup + filter + map
groupBy(S,T)
join(VID==T)
Use Case: Graph Business Intelligence
Use Case: Graph Business Intelligence
 Business intelligence usually based on relational data
warehouses
 Enterprise data is integrated within dimensional schema
 Analysis limited to predefined relationships
 No support for relationship-oriented data mining
Facts
Dim 1
Dim 2
Dim 3
Use Case: Graph Business Intelligence
 Business intelligence usually based on relational data
warehouses
 Enterprise data is integrated within dimensional schema
 Analysis limited to predefined relationships
 No support for relationship-oriented data mining
 Graph-based approach
 Integrate data sources within an instance graph by preserving original
relationships between data objects (transactional and master data)
 Determine subgraphs (business transaction graphs) related to business
activities
 Analyze subgraphs or entire graphs with aggregation queries, mining
relationship patterns, etc.
Facts
Dim 1
Dim 2
Dim 3
Prerequisites: Data Integration
Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
basedOn serves
serves
bills
bills
bills
processedBy
Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
Business Transaction Graphs
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
BTG 1
(1) BTG Extraction
BTG 2
BTG 3
BTG 4
BTG 5
BTG n
…
(1) BTG Extraction
// generate base collection
btgs = iig.callForCollection( :BusinessTransactionGraphs , {} )
(2) Profit Aggregation
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
(2) Profit Aggregation
// generate base collection
btgs = iig.callForCollection( :BusinessTransactionGraphs , {} )
// define profit aggregate function
aggFunc = ( Graph g =>
g.V.values(“Revenue").sum() - g.V.values(“Expense").sum()
)
(2) Profit Aggregation
BTG 1
BTG 2
BTG 3
BTG 4
BTG 5
BTG n
… ∑ Revenue ∑ Expenses Net Profit
5,000 -3,000 2,000
9,000 -3,000 6,000
2,000 -1,500 500
5,000 -7,000 -2,000
10,000 -15,000 -5,000
… … …
8,000 -4,000 4,000
(2) Profit Aggregation
// generate base collection
btgs = iig.callForCollection( :BusinessTransactionGraphs , {} )
// define profit aggregate function
aggFunc = ( Graph g =>
g.V.values(“Revenue").sum() - g.V.values(“Expense").sum()
)
// apply aggregate function and store result at new property
btgs = btgs.apply( Graph g =>
g.aggregate( “Profit“ , aggFunc )
)
(3) BTG Clustering
BTG 1
BTG 2
BTG 3
BTG 4
BTG 5
BTG n
… ∑ Revenue ∑ Expenses Net Profit
5,000 -3,000 2,000
9,000 -3,000 6,000
2,000 -1,500 500
5,000 -7,000 -2,000
10,000 -15,000 -5,000
… … …
8,000 -4,000 4,000
(3) BTG Clustering
// select profit and loss clusters
profitBtgs = btgs.select( Graph g => g[“Profit”] >= 0 )
lossBtgs = btgs.difference(profitBtgs)
(4) Cluster Characteristic Patterns
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
(4) Cluster Characteristic Patterns
CIT ERP
Employee
Name: Dave
Employee
Name: Alice
Employee
Name: Bob
Employee
Name: Carol
Ticket
Expense: 500
SalesQuotation
SalesOrder
PurchaseOrder
PurchaseOrder
SalesRevenue
Revenue: 5,000
PurchaseInvoice
Expense: 2,000
PurchaseInvoice
Expense: 1,500
sentBy
createdBy
processedBy
createdBy
openedFor
processedBy
processedBy
basedOn serves
serves
bills
bills
bills
(4) Cluster Characteristic Patterns
BTG 1
BTG 2
BTG 3
BTG 4
BTG 5
BTG n
…
∑ Revenue ∑ Expenses Net Profit
5,000 -3,000 2,000
9,000 -3,000 6,000
2,000 -1,500 500
5,000 -7,000 -2,000
10,000 -15,000 -5,000
… … …
8,000 -4,000 4,000
TicketAlice
processedBy
Bob
createdBy
PurchaseOrder
(4) Cluster Characteristic Patterns
// select profit and loss clusters
profitBtgs = btgs.select( Graph g => g[“Profit”] >= 0 )
lossBtgs = btgs.difference(profitBtgs)
// apply magic
profitFreqPats = profitBtgs.callForCollection(
:FrequentSubgraphs , {“Threshold”:0.7}
)
lossFreqPats = lossBtgs.callForCollection(
:FrequentSubgraphs , {“Threshold”:0.7}
)
// determine cluster characteristic patterns
trivialPats = profitFreqPats.intersect(lossFreqPats)
profitCharPatterns = profitFreqPats.difference(trivialPats)
lossCharPatterns = lossFreqPats.difference(trivialPats)
Current State & Future Work
Current State
 0.0.1 First Prototype (May 2015)
 Hadoop MapReduce and Giraph for operator implementations
 Too much complexity
 Performance loss through serialization in HDFS/HBase
 0.0.2 Using Flink as execution layer (June 2015)
 Basic operators
 Currently 0.0.3-SNAPSHOT
 Performance improvements
 More operator implementations
Operator implementations (0.0.3-SNAPSHOT)
Unary Pattern Matching Collection Selection Algorithms LabelPropagation
Aggregation Distinct BTG Extraction
Projection Sort by FSM
Summarization Top
Binary Combination Union
Overlap Intersection
Exclusion Difference
Isomorphism Auxiliary Apply
Reduce
Call
Future Work
 Operator integration into Gelly
 Summarization FLINK-2411
 Graph Sampling
 …
 Graph Operations on streams (Flink)
 Graph Partitioning (maybe together with the Gelly people)
 Graph Versioning (Storage)
 Benchmarking
 GrALa Interpreter / Web UI
Benchmarks Sneak Preview
0
200
400
600
800
1000
1200
1400
1 2 4 8 16
Time [s]
# Worker
Summarization (Vertex and Edge Labels)
 16x Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz (12 Cores), 48 GB RAM
 Hadoop 2.5.2, Flink 0.9.0
 slots (per node) 12
 jobmanager.heap.mb 2048
 taskmanager.heap.mb 40960
 Foodbroker Graph (https://github.com/dbs-leipzig/foodbroker)
 Generates BI process data
 858,624,267 Vertices, 4,406,445,007 Edges, 663GB Payload
Web UI Sneak Preview
Contributions welcome
 Code
 Operator implementations
 Performance Tuning
 Storage layout
 Data! and Use Cases
 We are researchers, we assume ...
 Getting real data (especially BI data) is nearly impossible
 People
 Bachelor / Master / PhD Thesis
Thank you for building Flink!
www.gradoop.com
https://github.com/dbs-leipzig/gradoop
http://dbs.uni-leipzig.de/file/GradoopTR.pdf
http://dbs.uni-leipzig.de/file/biiig-vldb2014.pdf

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Doug Needham
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionChetan Khatri
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine LearningJean Ihm
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1) Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1) Jean Ihm
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimizationKisung Kim
 
AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017Kisung Kim
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph IndexingKisung Kim
 
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014鉄平 土佐
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Introjeykottalam
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine LearningCarol McDonald
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkPetr Zapletal
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsStéphane Fréchette
 

Was ist angesagt? (20)

Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
 
When Graphs Meet Machine Learning
When Graphs Meet Machine LearningWhen Graphs Meet Machine Learning
When Graphs Meet Machine Learning
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1) Introduction to Property Graph Features (AskTOM Office Hours part 1)
Introduction to Property Graph Features (AskTOM Office Hours part 1)
 
SPARQL and RDF query optimization
SPARQL and RDF query optimizationSPARQL and RDF query optimization
SPARQL and RDF query optimization
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017AgensGraph Presentation at PGConf.us 2017
AgensGraph Presentation at PGConf.us 2017
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Survey of Graph Indexing
Survey of Graph IndexingSurvey of Graph Indexing
Survey of Graph Indexing
 
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
GraphX is the blue ocean for scala engineers @ Scala Matsuri 2014
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
The D3 Toolbox
The D3 ToolboxThe D3 Toolbox
The D3 Toolbox
 
Apache Spark Machine Learning
Apache Spark Machine LearningApache Spark Machine Learning
Apache Spark Machine Learning
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
 

Ähnlich wie Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015

Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMartin Junghanns
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkHenning Kropp
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...ErhardRahm
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Databricks
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JFlorent Biville
 
Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Kevin Lee
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemMarco Parenzan
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroGraphAware
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageMajid Abdollahi
 
The power of polyglot searching
The power of polyglot searchingThe power of polyglot searching
The power of polyglot searchingGraphAware
 
Download It
Download ItDownload It
Download Itbutest
 

Ähnlich wie Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015 (20)

Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache Spark
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apac...
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4J
 
Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)Graphs made easy with SAS ODS Graphics Designer (PAPER)
Graphs made easy with SAS ODS Graphics Designer (PAPER)
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
Machine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro NegroMachine Learning Powered by Graphs - Alessandro Negro
Machine Learning Powered by Graphs - Alessandro Negro
 
Rattle Graphical Interface for R Language
Rattle Graphical Interface for R LanguageRattle Graphical Interface for R Language
Rattle Graphical Interface for R Language
 
The power of polyglot searching
The power of polyglot searchingThe power of polyglot searching
The power of polyglot searching
 
Power of Polyglot Search
Power of Polyglot SearchPower of Polyglot Search
Power of Polyglot Search
 
Download It
Download ItDownload It
Download It
 

Mehr von Martin Junghanns

Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceGut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceMartin Junghanns
 
Distributed Graph Analytics with Gradoop
Distributed Graph Analytics with GradoopDistributed Graph Analytics with Gradoop
Distributed Graph Analytics with GradoopMartin Junghanns
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Martin Junghanns
 
Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016
Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016
Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016Martin Junghanns
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Martin Junghanns
 
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter DatenNoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter DatenMartin Junghanns
 

Mehr von Martin Junghanns (6)

Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceGut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
 
Distributed Graph Analytics with Gradoop
Distributed Graph Analytics with GradoopDistributed Graph Analytics with Gradoop
Distributed Graph Analytics with Gradoop
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016
Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016
Gradoop: Scalable Graph Analytics with Apache Flink @ FOSDEM 2016
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
 
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter DatenNoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
 

Kürzlich hochgeladen

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 

Kürzlich hochgeladen (20)

7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Gradoop: Scalable Graph Analytics with Apache Flink @ Flink Forward 2015

  • 1. GRADOOP: Scalable Graph Analytics with Apache Flink Martin Junghanns University of Leipzig
  • 2. About the speaker and the team  2011 Bachelor of Engineering  Thesis: Partitioning of Dynamic Graphs  2014 Master of Science  Thesis: Graph Database Systems for Business Intelligence  Now: PhD Student, Database Group, University of Leipzig  Distributed Systems  Distributed Graph Data Management  Graph Theory & Algorithms  Professional Experience: sones GraphDB, SAP André, PhD Student Martin, PhD Student Kevin, M.Sc. StudentNiklas, M.Sc. Student
  • 4. 𝑮𝑟𝑟𝑟𝑟 = (𝑽𝑒𝑒𝑒𝑒𝑒𝑒𝑒, 𝑬𝑑𝑑𝑑𝑑) “Graphs are everywhere”
  • 5. 𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) “Graphs are everywhere” Alice Bob Eve Dave Carol Mallory Peggy
  • 6. 𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) “Graphs are everywhere” Alice Bob Eve Dave Carol Mallory Peggy
  • 7. 𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) “Graphs are everywhere” Alice Bob Eve Dave Carol Mallory Peggy
  • 8. 𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) “Graphs are everywhere” Alice Bob Eve Dave Carol Mallory Peggy
  • 9. 𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) “Graphs are everywhere” Alice Bob Eve Dave Carol Mallory Peggy Trent
  • 10. 𝐺𝐺𝐺𝐺𝐺 = (𝐔𝐔𝐔𝐔𝐔, 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹) “Graphs are everywhere” Alice Bob Eve Dave Carol Mallory Peggy Trent
  • 11. 𝐺𝐺𝐺𝐺𝐺 = (𝐂𝐂𝐂𝐂𝐂𝐂, 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶) “Graphs are everywhere” Leipzig pop: 544K Dresden pop: 536K Berlin pop: 3.5M Hamburg pop: 1.7M Munich pop: 1.4M Chemnitz pop: 243K Nuremberg pop: 500K Cologne pop: 1M
  • 12.  World Wide Web  ca. 1 billion websites “Graphs are large”  Facebook  ca. 1.49 billion active users  ca. 340 friends per user
  • 13. End-to-End Graph Analytics Data Integration Graph Analytics Representation
  • 14. End-to-End Graph Analytics Data Integration Graph Analytics Representation  Integrate data from one or more sources into a dedicated graph storage with common graph data model
  • 15. End-to-End Graph Analytics Data Integration Graph Analytics Representation  Integrate data from one or more sources into a dedicated graph storage with common graph data model  Definition of analytical workflows from operator algebra
  • 16. End-to-End Graph Analytics Data Integration Graph Analytics Representation  Integrate data from one or more sources into a dedicated graph storage with common graph data model  Definition of analytical workflows from operator algebra  Result representation in a meaningful way
  • 17. Graph Data Management Graph Database Systems Neo4j, OrientDB Graph Processing Systems Pregel, Giraph Distributed Workflow Systems Flink Gelly, Spark GraphX Data Model Rich Graph Models Generic Graph Models Generic Graph Models Focus Local ACID Operations Global Graph Operations Global Data and Graph Operations Query Language Yes No No Persistency Yes No No Scalability Vertical Horizontal Horizontal Workflows No No Yes Data Integration No No No Graph Analytics No Yes Yes Representation Yes No No
  • 18. Graph Data Management Graph Database Systems Neo4j, OrientDB Graph Processing Systems Pregel, Giraph Distributed Workflow Systems Flink Gelly, Spark GraphX Data Model Rich Graph Models Generic Graph Models Generic Graph Models Focus Local ACID Operations Global Graph Operations Global Data and Graph Operations Query Language Yes No No Persistency Yes No No Scalability Vertical Horizontal Horizontal Workflows No No Yes Data Integration No No No Graph Analytics No Yes Yes Representation Yes No No
  • 19. Graph Data Management Graph Database Systems Neo4j, OrientDB Graph Processing Systems Pregel, Giraph Distributed Workflow Systems Flink Gelly, Spark GraphX Data Model Rich Graph Models Generic Graph Models Generic Graph Models Focus Local ACID Operations Global Graph Operations Global Data and Graph Operations Query Language Yes No No Persistency Yes No No Scalability Vertical Horizontal Horizontal Workflows No No Yes Data Integration No No No Graph Analytics No Yes Yes Representation Yes No No
  • 20. Graph Data Management Graph Database Systems Neo4j, OrientDB Graph Processing Systems Pregel, Giraph Distributed Workflow Systems Flink Gelly, Spark GraphX Data Model Rich Graph Models Generic Graph Models Generic Graph Models Focus Local ACID Operations Global Graph Operations Global Data and Graph Operations Query Language Yes No No Persistency Yes No No Scalability Vertical Horizontal Horizontal Workflows No No Yes Data Integration No No No Graph Analytics No Yes Yes Representation Yes No No
  • 21. What‘s missing? An end-to-end framework and research platform for efficient, distributed and domain independent graph data management and analytics.
  • 22. What‘s missing? An end-to-end framework and research platform for efficient, distributed and domain independent graph data management and analytics.
  • 24. High Level Architecture HDFS/YARN Cluster HBase Distributed Graph Store Extended Property Graph Model Flink Operator Implementations Data Integration Flink Operator Execution Workflow Declaration Visual GrALa DSL Representation Data flow Control flow Graph Analytics Representation Workflow Execution
  • 25. High Level Architecture HBase Distributed Graph Store Extended Property Graph Model Flink Operator Implementations Data Integration Flink Operator Execution Workflow Declaration Visual GrALa DSL Representation Data flow Control flow Graph Analytics Representation Workflow Execution HDFS/YARN Cluster
  • 29. Graph Operators Operator GrALa notation Binary Combination graph.combine(otherGraph) : Graph Overlap graph.overlap(otherGraph) : Graph Exclusion graph.exclude(otherGraph) : Graph Isomorphism graph.isIsomorphicTo(otherGraph) : Boolean Unary Pattern Matching graph.match(patternGraph,predicate) : Collection Aggregation graph.aggregate(propertyKey,aggregateFunction) : Graph Projection graph.project(vertexFunction,edgeFunction) : Graph Summarization graph.summarize( vertexGroupKeys,vertexAggregateFunction, edgeGroupKeys,edgeAggregateFunction) : Graph
  • 30. Combination 1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
  • 31. Combination 1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2])
  • 32. Graph Operators Operator GrALa notation Binary Combination graph.combine(otherGraph) : Graph Overlap graph.overlap(otherGraph) : Graph Exclusion graph.exclude(otherGraph) : Graph Isomorphism graph.isIsomorphicTo(otherGraph) : Boolean Unary Pattern Matching graph.match(patternGraph,predicate) : Collection Aggregation graph.aggregate(propertyKey,aggregateFunction) : Graph Projection graph.project(vertexFunction,edgeFunction) : Graph Summarization graph.summarize( vertexGroupKeys,vertexAggregateFunction, edgeGroupKeys,edgeAggregateFunction) : Graph
  • 33. Summarization 1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2]) 2: vertexGroupingKeys = {:type, “city”} 3: edgeGroupingKeys = {:type} 4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|) 5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|) 6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc, edgeGroupingKeys, edgeAggFunc)
  • 34. Summarization 1: personGraph = db.G[0].combine(db.G[1]).combine(db.G[2]) 2: vertexGroupingKeys = {:type, “city”} 3: edgeGroupingKeys = {:type} 4: vertexAggFunc = (Vertex vSum, Set vertices => vSum[“count”] = |vertices|) 5: edgeAggFunc = (Edge eSum, Set edges => eSum[“count”] = |edges|) 6: sumGraph = personGraph.summarize(vertexGroupingKeys, vertexAggFunc, edgeGroupingKeys, edgeAggFunc)
  • 35. Graph Collection Operators Operator GrALa notation Collection Selection collection.select(predicate) : Collection Distinct collection.distinct() : Collection Sort by collection.sortBy(key, [:asc|:desc]) : Collection Top collection.top(limit) : Collection Union collection.union(otherCollection) : Collection Intersection collection.intersect(otherCollection) : Collection Difference collection.difference(otherCollection) : Collection Auxiliary Apply collection.apply(unaryGraphOperator) : Collection Reduce collection.reduce(binaryGraphOperator) : Graph Call [graph|collection].callFor[Graph|Collection]( algorithm,parameters) : [Graph|Collection]
  • 36. Selection 1: collection = <db.G[0],db.G[1],db.G[2]> 2: predicate = (Graph g => |g.V| > 3) 3: result = collection.select(predicate)
  • 37. Selection 1: collection = <db.G[0],db.G[1],db.G[2]> 2: predicate = (Graph g => |g.V| > 3) 3: result = collection.select(predicate)
  • 38. Graph Collection Operators Operator GrALa notation Collection Selection collection.select(predicate) : Collection Distinct collection.distinct() : Collection Sort by collection.sortBy(key, [:asc|:desc]) : Collection Top collection.top(limit) : Collection Union collection.union(otherCollection) : Collection Intersection collection.intersect(otherCollection) : Collection Difference collection.difference(otherCollection) : Collection Auxiliary Apply collection.apply(unaryGraphOperator) : Collection Reduce collection.reduce(binaryGraphOperator) : Graph Call [graph|collection].callFor[Graph|Collection]( algorithm,parameters) : [Graph|Collection]
  • 39. Extended Property Graph Model in Flink ID Label Properties Graphs ID Label Properties Source Vertex Target Vertex Graphs VertexData EdgeData GraphData ID Label Properties POJO POJO POJO DataSet<Vertex<ID,VertexData>> DataSet<Edge<ID,EdgeData>> DataSet<Subgraph<ID,GraphData>> Gelly 𝒱 ℰ 𝒢 Pojo Representation
  • 40. Extended Property Graph Model in Flink VertexData EdgeData GraphData POJO POJO POJO DataSet<Vertex<ID,VertexData>> DataSet<Edge<ID,EdgeData>> DataSet<Subgraph<ID,GraphData>> Gelly VertexData EdgeData GraphData Tuple Tuple Tuple DataSet<VertexData> DataSet<EdgeData> DataSet<GraphData> 𝒱 𝒱 ℰ ℰ 𝒢 𝒢 Pojo Representation Tuple Representation ID Label Properties Graphs ID Label Properties Source Vertex Target Vertex Graphs ID Label Properties ID Label Properties Graphs ID Label Properties Source Vertex Target Vertex Graphs ID Label Properties
  • 41. Summarization in Flink VID City 0 L 1 L 2 D 3 D 4 D 5 B EID S T 0 0 1 1 1 0 2 1 2 3 2 1 4 2 3 5 3 2 6 4 0 7 4 1 8 5 2 9 5 3 L [0,1] D [2,3,4] B [5] VID City Count 0 L 2 2 D 3 5 B 1 VID Rep 0 0 1 0 2 2 3 2 4 2 5 5 ID S T 0 0 1 1 0 0 2 0 2 3 2 1 4 2 3 5 2 2 6 2 0 7 2 1 8 5 2 9 5 3 ID S T 0 0 0 1 0 0 2 0 2 3 2 0 4 2 2 5 2 2 6 2 0 7 2 0 8 5 2 9 5 2 0,0 [0,1] 0,2 [2] 2,0 [3,6,7] 2,2 [4,5] 5,2 [8,9] EID S T Count 0 0 1 2 2 0 2 1 3 2 0 3 4 2 2 2 8 5 2 2 join(VID==S) 𝒱 ℰ’ 𝒱′ ℰ groupBy(City) reduceGroup + filter + map reduceGroup + filter + map groupBy(S,T) join(VID==T)
  • 42. Use Case: Graph Business Intelligence
  • 43. Use Case: Graph Business Intelligence  Business intelligence usually based on relational data warehouses  Enterprise data is integrated within dimensional schema  Analysis limited to predefined relationships  No support for relationship-oriented data mining Facts Dim 1 Dim 2 Dim 3
  • 44. Use Case: Graph Business Intelligence  Business intelligence usually based on relational data warehouses  Enterprise data is integrated within dimensional schema  Analysis limited to predefined relationships  No support for relationship-oriented data mining  Graph-based approach  Integrate data sources within an instance graph by preserving original relationships between data objects (transactional and master data)  Determine subgraphs (business transaction graphs) related to business activities  Analyze subgraphs or entire graphs with aggregation queries, mining relationship patterns, etc. Facts Dim 1 Dim 2 Dim 3
  • 46. Business Transaction Graphs CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy basedOn serves serves bills bills bills processedBy
  • 47. Business Transaction Graphs CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 48. Business Transaction Graphs CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 49. Business Transaction Graphs CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 50. Business Transaction Graphs CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 51. Business Transaction Graphs CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 52. BTG 1 (1) BTG Extraction BTG 2 BTG 3 BTG 4 BTG 5 BTG n …
  • 53. (1) BTG Extraction // generate base collection btgs = iig.callForCollection( :BusinessTransactionGraphs , {} )
  • 54. (2) Profit Aggregation CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 55. (2) Profit Aggregation // generate base collection btgs = iig.callForCollection( :BusinessTransactionGraphs , {} ) // define profit aggregate function aggFunc = ( Graph g => g.V.values(“Revenue").sum() - g.V.values(“Expense").sum() )
  • 56. (2) Profit Aggregation BTG 1 BTG 2 BTG 3 BTG 4 BTG 5 BTG n … ∑ Revenue ∑ Expenses Net Profit 5,000 -3,000 2,000 9,000 -3,000 6,000 2,000 -1,500 500 5,000 -7,000 -2,000 10,000 -15,000 -5,000 … … … 8,000 -4,000 4,000
  • 57. (2) Profit Aggregation // generate base collection btgs = iig.callForCollection( :BusinessTransactionGraphs , {} ) // define profit aggregate function aggFunc = ( Graph g => g.V.values(“Revenue").sum() - g.V.values(“Expense").sum() ) // apply aggregate function and store result at new property btgs = btgs.apply( Graph g => g.aggregate( “Profit“ , aggFunc ) )
  • 58. (3) BTG Clustering BTG 1 BTG 2 BTG 3 BTG 4 BTG 5 BTG n … ∑ Revenue ∑ Expenses Net Profit 5,000 -3,000 2,000 9,000 -3,000 6,000 2,000 -1,500 500 5,000 -7,000 -2,000 10,000 -15,000 -5,000 … … … 8,000 -4,000 4,000
  • 59. (3) BTG Clustering // select profit and loss clusters profitBtgs = btgs.select( Graph g => g[“Profit”] >= 0 ) lossBtgs = btgs.difference(profitBtgs)
  • 60. (4) Cluster Characteristic Patterns CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 61. (4) Cluster Characteristic Patterns CIT ERP Employee Name: Dave Employee Name: Alice Employee Name: Bob Employee Name: Carol Ticket Expense: 500 SalesQuotation SalesOrder PurchaseOrder PurchaseOrder SalesRevenue Revenue: 5,000 PurchaseInvoice Expense: 2,000 PurchaseInvoice Expense: 1,500 sentBy createdBy processedBy createdBy openedFor processedBy processedBy basedOn serves serves bills bills bills
  • 62. (4) Cluster Characteristic Patterns BTG 1 BTG 2 BTG 3 BTG 4 BTG 5 BTG n … ∑ Revenue ∑ Expenses Net Profit 5,000 -3,000 2,000 9,000 -3,000 6,000 2,000 -1,500 500 5,000 -7,000 -2,000 10,000 -15,000 -5,000 … … … 8,000 -4,000 4,000 TicketAlice processedBy Bob createdBy PurchaseOrder
  • 63. (4) Cluster Characteristic Patterns // select profit and loss clusters profitBtgs = btgs.select( Graph g => g[“Profit”] >= 0 ) lossBtgs = btgs.difference(profitBtgs) // apply magic profitFreqPats = profitBtgs.callForCollection( :FrequentSubgraphs , {“Threshold”:0.7} ) lossFreqPats = lossBtgs.callForCollection( :FrequentSubgraphs , {“Threshold”:0.7} ) // determine cluster characteristic patterns trivialPats = profitFreqPats.intersect(lossFreqPats) profitCharPatterns = profitFreqPats.difference(trivialPats) lossCharPatterns = lossFreqPats.difference(trivialPats)
  • 64. Current State & Future Work
  • 65. Current State  0.0.1 First Prototype (May 2015)  Hadoop MapReduce and Giraph for operator implementations  Too much complexity  Performance loss through serialization in HDFS/HBase  0.0.2 Using Flink as execution layer (June 2015)  Basic operators  Currently 0.0.3-SNAPSHOT  Performance improvements  More operator implementations
  • 66. Operator implementations (0.0.3-SNAPSHOT) Unary Pattern Matching Collection Selection Algorithms LabelPropagation Aggregation Distinct BTG Extraction Projection Sort by FSM Summarization Top Binary Combination Union Overlap Intersection Exclusion Difference Isomorphism Auxiliary Apply Reduce Call
  • 67. Future Work  Operator integration into Gelly  Summarization FLINK-2411  Graph Sampling  …  Graph Operations on streams (Flink)  Graph Partitioning (maybe together with the Gelly people)  Graph Versioning (Storage)  Benchmarking  GrALa Interpreter / Web UI
  • 68. Benchmarks Sneak Preview 0 200 400 600 800 1000 1200 1400 1 2 4 8 16 Time [s] # Worker Summarization (Vertex and Edge Labels)  16x Intel(R) Xeon(R) CPU E5-2430 v2 @ 2.50GHz (12 Cores), 48 GB RAM  Hadoop 2.5.2, Flink 0.9.0  slots (per node) 12  jobmanager.heap.mb 2048  taskmanager.heap.mb 40960  Foodbroker Graph (https://github.com/dbs-leipzig/foodbroker)  Generates BI process data  858,624,267 Vertices, 4,406,445,007 Edges, 663GB Payload
  • 69. Web UI Sneak Preview
  • 70. Contributions welcome  Code  Operator implementations  Performance Tuning  Storage layout  Data! and Use Cases  We are researchers, we assume ...  Getting real data (especially BI data) is nearly impossible  People  Bachelor / Master / PhD Thesis
  • 71. Thank you for building Flink! www.gradoop.com https://github.com/dbs-leipzig/gradoop http://dbs.uni-leipzig.de/file/GradoopTR.pdf http://dbs.uni-leipzig.de/file/biiig-vldb2014.pdf