2. About the speaker and the team
2011 Bachelor of Engineering
Thesis: Partitioning of Dynamic Graphs
2014 Master of Science
Thesis: Graph Database Systems for Business Intelligence
Now: PhD Student, Database Group, University of Leipzig
Distributed Systems
Distributed Graph Data Management
Graph Theory & Algorithms
Professional Experience: sones GraphDB, SAP
André, PhD Student
Martin, PhD Student
Kevin, M.Sc. StudentNiklas, M.Sc. Student
14. End-to-End Graph Analytics
Data Integration Graph Analytics Representation
Integrate data from one or more sources into a dedicated
graph storage with common graph data model
15. End-to-End Graph Analytics
Data Integration Graph Analytics Representation
Integrate data from one or more sources into a dedicated
graph storage with common graph data model
Definition of analytical workflows from operator algebra
16. End-to-End Graph Analytics
Data Integration Graph Analytics Representation
Integrate data from one or more sources into a dedicated
graph storage with common graph data model
Definition of analytical workflows from operator algebra
Result representation in a meaningful way
17. Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
18. Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
19. Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
20. Graph Data Management
Graph Database
Systems
Neo4j, OrientDB
Graph Processing
Systems
Pregel, Giraph
Distributed Workflow
Systems
Flink Gelly, Spark GraphX
Data Model Rich Graph
Models
Generic Graph Models Generic Graph Models
Focus Local ACID
Operations
Global Graph Operations Global Data and Graph
Operations
Query Language Yes No No
Persistency Yes No No
Scalability Vertical Horizontal Horizontal
Workflows No No Yes
Data Integration No No No
Graph Analytics No Yes Yes
Representation Yes No No
21. What‘s missing?
An end-to-end framework and research platform
for efficient, distributed and domain independent
graph data management and analytics.
22. What‘s missing?
An end-to-end framework and research platform
for efficient, distributed and domain independent
graph data management and analytics.
43. Use Case: Graph Business Intelligence
Business intelligence usually based on relational data
warehouses
Enterprise data is integrated within dimensional schema
Analysis limited to predefined relationships
No support for relationship-oriented data mining
Facts
Dim 1
Dim 2
Dim 3
44. Use Case: Graph Business Intelligence
Business intelligence usually based on relational data
warehouses
Enterprise data is integrated within dimensional schema
Analysis limited to predefined relationships
No support for relationship-oriented data mining
Graph-based approach
Integrate data sources within an instance graph by preserving original
relationships between data objects (transactional and master data)
Determine subgraphs (business transaction graphs) related to business
activities
Analyze subgraphs or entire graphs with aggregation queries, mining
relationship patterns, etc.
Facts
Dim 1
Dim 2
Dim 3
65. Current State
0.0.1 First Prototype (May 2015)
Hadoop MapReduce and Giraph for operator implementations
Too much complexity
Performance loss through serialization in HDFS/HBase
0.0.2 Using Flink as execution layer (June 2015)
Basic operators
Currently 0.0.3-SNAPSHOT
Performance improvements
More operator implementations
70. Contributions welcome
Code
Operator implementations
Performance Tuning
Storage layout
Data! and Use Cases
We are researchers, we assume ...
Getting real data (especially BI data) is nearly impossible
People
Bachelor / Master / PhD Thesis
71. Thank you for building Flink!
www.gradoop.com
https://github.com/dbs-leipzig/gradoop
http://dbs.uni-leipzig.de/file/GradoopTR.pdf
http://dbs.uni-leipzig.de/file/biiig-vldb2014.pdf