SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Downloaden Sie, um offline zu lesen
© Prof. Dr.-Ing. Wolfgang Lehner |
Challenges in the Design of a Graph Database
Benchmark
FOSDEM‘12 – Graph Processing DevRoom
Marcus Paradies
Marcus Paradies | | 1
> Outline
 Motivation
 Challenges
 Thoughts on Graph Data Generation
 Thoughts on Query Workload
 Summary and Outlook
 Discussion
FOSDEM 2012
Marcus Paradies | | 2
> Motivation
FOSDEM 2012
 Graph databases are gaining momentum
 Enterprise corporations are getting interested
 How to compare the available graph database vendors?
 Main issue: Results from benchmarks are not comparable
 Lack of standardization in the data model and query language
 What are “typical“ graph operations?
Marcus Paradies | | 3
>
Challenges
FOSDEM 2012
Marcus Paradies | | 4
> Challenge #1: Application Domain
 Graph data is not homogenous
 Graph data from different domains follows different patterns
 Examples:
 Social Network Analysis (SNA)
 Protein Interaction Analysis
 Recommendation Systems
 Supply Chain Management (Vehicle Routing, CRM)
 Fraud Detection in Financial Systems
 …
Challenge: Find an application domain which represents a graph data pattern
common in many different scenarios.
FOSDEM 2012
Marcus Paradies | | 5
> Challenge #2: Graph Data Model
FOSDEM 2012
What flavours of graph data models
are commonly used?
Marcus Paradies | | 6
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Marcus Paradies | | 7
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Undirected Graph
Marcus Paradies | | 8
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Undirected Graph
Mixed Graph
Marcus Paradies | | 9
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Undirected Graph
Mixed Graph Multi Graph
Marcus Paradies | | 10
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Undirected Graph
Mixed Graph Multi Graph
(Plain) Property
Graph
Marcus Paradies | | 11
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Undirected Graph
Mixed Graph Multi Graph
(Plain) Property
Graph
(Structured
Property Graph)
Marcus Paradies | | 12
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Undirected Graph
Mixed Graph Multi Graph
(Plain) Property
Graph
(Structured
Property Graph)
Hyper Graph
Marcus Paradies | | 13
> Challenge #2: Graph Data Model
FOSDEM 2012
Directed Graph
Undirected Graph
Mixed Graph Multi Graph
(Plain) Property
Graph
(Structured
Property Graph)
Hyper Graph
Challenge: Find a graph data model suited for the majority of use cases
from various domains.
Marcus Paradies | | 14
> Challenge #3: Querying Graph Data
FOSDEM 2012
 Large variety in graph processing and manipulation languages
 Each graph database vendor implements own query languages/APIs
 Reason: No standardized graph query language available
Marcus Paradies | | 15
> Challenge #3: Querying Graph Data
FOSDEM 2012
 Large variety in graph processing and manipulation languages
 Each graph database vendor implements own query languages/APIs
 Reason: No standardized graph query language available
Challenge: Find a way to abstract from the zoo of available query languages.
Marcus Paradies | | 16
> Challenge #4: Defining the Workload
FOSDEM 2012
 The workload to be defined is dependent from the underlying
query/manipulation language
 Should complex (algorithmic) operations be part of a database benchmark?
 Which algorithms to pick?
 Social Network Analysis → Find communities
 Supply Chain Management → Find maximal flow
 Web of Data → Find pattern matches
 How are concurrent users represented?
 What about transactionality?
Marcus Paradies | | 17
>
Thoughts on Graph Data Generation
FOSDEM 2012
Marcus Paradies | | 18
> Graph Data Generation - Patterns
FOSDEM 2012
 Understanding graph patterns (characteristics) is crucical for a good graph
data generator
 What are distinguishing characteristics of graphs?
 How can we identify graph patterns on large graphs?
 Three main patterns [1]:
 Power law distributed
 Small diameters
 Community Effects
?
=
?
=
Marcus Paradies | | 19
> Pattern 1 – Power law distributed
FOSDEM 2012
 Most real-world graph data sets follow a power law distribution
 Examples:
 Internet router graph
 Subsets of the WWW
 Citation Graphs
source: [2] source: [2]
Marcus Paradies | | 20
> Pattern 2 – Small Diameters
FOSDEM 2012
 Effective Diameter (eccentricity): Minimum number of hops, in which a
fraction (e.g. 90%) of all connected pairs of nodes can reach each other
 Other measures exist as well, but are not applicable to disconnected graphs
 In most use cases, diameter is much smaller than the size of the graph
 Examples:
 97% eccentricity of around 16 for path lengths in the WWW
 Average path length around 6 for Epinions social network
source: [1]
Marcus Paradies | | 21
> Pattern 3 – Community Effects
FOSDEM 2012
 Community: A set of nodes, where each node in the set is closer to all other
nodes in the community than to nodes outside the community.
 Communities can be found in many real-world graphs, especially social
networks and collaboration networks
 Clustering Coefficient C: A measure, which qualifies the „clumpiness“ of a
graph
Marcus Paradies | | 22
>
Thoughts on Query Workload
FOSDEM 2012
Marcus Paradies | | 23
> Query Workload - Operations
FOSDEM 2012
 Graph Manipulation Operations
 Add/Update/Remove Nodes from the Graph
 Add/Update/Remove Edges from the Graph
 Add/Update/Remove Edge attributes
 Add/Update/Remove Node attributes
 Graph Query Operations
 Retrieve selection of nodes from given filter expression
 Getting the neighbors of a set of nodes (possibly with edge filter constraints)
 Graph Traversals
 Based on basic query operations
 Exploration of neighborhood from a given set of start nodes
 Terminated by the number of steps and/or edge/node filter constraints
 Graph Analytical Operations
 Aggregation operations such as sum, avg, min, max
 Aggregations on node-level and on edge-level
Marcus Paradies | | 24
> Query Workload - Measures
FOSDEM 2012
 Closely related to benchmark capabilities
 Measures from relational benchmarks apply such as
 Average query response time
 Transactions per second (throughput)
 Additional measures for graph traversals
 Traversals per second
 What about distributed scenarios?
 What about concurrent users?
Marcus Paradies | | 25
> Summary and Outlook
 Graph data distribution highly important for graph database benchmark
 Application domains do have very specific graph characteristics
 A graph database benchmark has to provide abstract and high-level graph
operation descriptions
 Feel free to contact me if you want to contribute:
marcus.paradies@gmail.com
FOSDEM 2012
Marcus Paradies | | 26
>
Discussion
FOSDEM 2012
Marcus Paradies | | 27
> Theses
 A benchmark based on social network data is nice, but might be not be that
representative for large enterprise applications
 Algorithms should NOT be part of a graph database benchmark
 Only support basic operations such as simple lookups and path traversals
 The underlying graph data model should be a simple property graph
 A graph database has to scale in terms of data size as well as number of
concurrent users
 ....
FOSDEM 2012
Marcus Paradies | | 28
> References
[1] Graph Mining: Laws, Generators, and Algorithms (2006)
[2] http://konect.uni-koblenz.de/
[3] A Discussion on the Design of Graph Database Benchmarks (2010)
FOSDEM 2012

Weitere ähnliche Inhalte

Was ist angesagt?

RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to GraphsNeo4j
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to GraphsNeo4j
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDaysNeo4j
 
The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...
The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...
The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...Neo4j
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at AirbnbNeo4j
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j
 
Intro to Neo4j Webinar
Intro to Neo4j WebinarIntro to Neo4j Webinar
Intro to Neo4j WebinarNeo4j
 
GraphTalks - Einführung in Graphdatenbanken
GraphTalks - Einführung in GraphdatenbankenGraphTalks - Einführung in Graphdatenbanken
GraphTalks - Einführung in GraphdatenbankenNeo4j
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformKenny Bastani
 
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...Neo4j
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsNeo4j
 
Einstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceEinstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceNeo4j
 
Neo4j Import Webinar
Neo4j Import WebinarNeo4j Import Webinar
Neo4j Import WebinarNeo4j
 
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
 Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un... Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...Neo4j
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeo4j
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenNeo4j
 

Was ist angesagt? (20)

RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 
RDBMS to Graphs
RDBMS to GraphsRDBMS to Graphs
RDBMS to Graphs
 
Introduction to graph databases GraphDays
Introduction to graph databases  GraphDaysIntroduction to graph databases  GraphDays
Introduction to graph databases GraphDays
 
The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...
The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...
The Five Graphs of Finance - Philip Rathle and Emil Eifrem @ GraphConnect NY ...
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
Democratizing Data at Airbnb
Democratizing Data at AirbnbDemocratizing Data at Airbnb
Democratizing Data at Airbnb
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Neo4j the Anti Crime Database
Neo4j the Anti Crime DatabaseNeo4j the Anti Crime Database
Neo4j the Anti Crime Database
 
Intro to Neo4j Webinar
Intro to Neo4j WebinarIntro to Neo4j Webinar
Intro to Neo4j Webinar
 
GraphTalks - Einführung in Graphdatenbanken
GraphTalks - Einführung in GraphdatenbankenGraphTalks - Einführung in Graphdatenbanken
GraphTalks - Einführung in Graphdatenbanken
 
Building a Graph-based Analytics Platform
Building a Graph-based Analytics PlatformBuilding a Graph-based Analytics Platform
Building a Graph-based Analytics Platform
 
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
Neo4j GraphDay - Graphs in the Real World: Tope Use Cases for Graph Databases...
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
Einstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceEinstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data Science
 
Neo4j Import Webinar
Neo4j Import WebinarNeo4j Import Webinar
Neo4j Import Webinar
 
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
 Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un... Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
Graphdatenbank Neo4j: Konzept, Positionierung, Status Region DACH - Bruno Un...
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with Graphs
 
GraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in GraphdatenbankenGraphTalk Berlin - Einführung in Graphdatenbanken
GraphTalk Berlin - Einführung in Graphdatenbanken
 

Andere mochten auch

A new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksA new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksNees Jan van Eck
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network AnalysisMarc Smith
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...rhatr
 
HW09 Social network analysis with Hadoop
HW09 Social network analysis with HadoopHW09 Social network analysis with Hadoop
HW09 Social network analysis with HadoopCloudera, Inc.
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph DatabasesInfiniteGraph
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesPeter Neubauer
 
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Neo4j
 

Andere mochten auch (9)

A new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networksA new software tool for large-scale analysis of citation networks
A new software tool for large-scale analysis of citation networks
 
2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis2013 NodeXL Social Media Network Analysis
2013 NodeXL Social Media Network Analysis
 
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
Apache Giraph: start analyzing graph relationships in your bigdata in 45 minu...
 
HW09 Social network analysis with Hadoop
HW09 Social network analysis with HadoopHW09 Social network analysis with Hadoop
HW09 Social network analysis with Hadoop
 
An Introduction to Graph Databases
An Introduction to Graph DatabasesAn Introduction to Graph Databases
An Introduction to Graph Databases
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
 
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
Hadoop and Graph Databases (Neo4j): Winning Combination for Bioanalytics - Jo...
 

Ähnlich wie Challenges in the Design of a Graph Database Benchmark

Challenges in the Design of a Graph Database Benchmark
Challenges in the Design of a Graph Database BenchmarkChallenges in the Design of a Graph Database Benchmark
Challenges in the Design of a Graph Database BenchmarkMarcus Paradies
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReducePietro Michiardi
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSHCL Technologies
 
A Lightweight MDD Process Applied in Small Projects
A Lightweight MDD Process Applied in Small ProjectsA Lightweight MDD Process Applied in Small Projects
A Lightweight MDD Process Applied in Small ProjectsGabor Guta
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Chapter19 rapid application development
Chapter19 rapid application developmentChapter19 rapid application development
Chapter19 rapid application developmentDhani Ahmad
 
Cloud software engineering
Cloud software engineeringCloud software engineering
Cloud software engineeringIan Sommerville
 
Top 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceTop 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceEdureka!
 
P209 leithiser-relationaldb-formal-specifications
P209 leithiser-relationaldb-formal-specificationsP209 leithiser-relationaldb-formal-specifications
P209 leithiser-relationaldb-formal-specificationsBob Leithiser
 
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
 
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET Journal
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Denodo
 
Crafted Design - ITAKE 2014
Crafted Design - ITAKE 2014Crafted Design - ITAKE 2014
Crafted Design - ITAKE 2014Sandro Mancuso
 
Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014Sandro Mancuso
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsRobert Grossman
 
Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)
Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)
Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)siouxhotornot
 

Ähnlich wie Challenges in the Design of a Graph Database Benchmark (20)

Challenges in the Design of a Graph Database Benchmark
Challenges in the Design of a Graph Database BenchmarkChallenges in the Design of a Graph Database Benchmark
Challenges in the Design of a Graph Database Benchmark
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
 
A Lightweight MDD Process Applied in Small Projects
A Lightweight MDD Process Applied in Small ProjectsA Lightweight MDD Process Applied in Small Projects
A Lightweight MDD Process Applied in Small Projects
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Chapter19 rapid application development
Chapter19 rapid application developmentChapter19 rapid application development
Chapter19 rapid application development
 
Cloud software engineering
Cloud software engineeringCloud software engineering
Cloud software engineering
 
Top 3 design patterns in Map Reduce
Top 3 design patterns in Map ReduceTop 3 design patterns in Map Reduce
Top 3 design patterns in Map Reduce
 
P209 leithiser-relationaldb-formal-specifications
P209 leithiser-relationaldb-formal-specificationsP209 leithiser-relationaldb-formal-specifications
P209 leithiser-relationaldb-formal-specifications
 
RAD
RADRAD
RAD
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
 
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
IRJET- Comparatively Analysis on K-Means++ and Mini Batch K-Means Clustering ...
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7
 
Waterfall model
Waterfall modelWaterfall model
Waterfall model
 
Crafted Design - ITAKE 2014
Crafted Design - ITAKE 2014Crafted Design - ITAKE 2014
Crafted Design - ITAKE 2014
 
What is rad model
What is rad modelWhat is rad model
What is rad model
 
Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014Crafted Design - GeeCON 2014
Crafted Design - GeeCON 2014
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 
Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)
Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)
Sioux Hot-or-Not: Model Driven Software Development (Markus Voelter)
 

Mehr von graphdevroom

An example graph visualization with Processing.js
An example graph visualization with Processing.js An example graph visualization with Processing.js
An example graph visualization with Processing.js graphdevroom
 
Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...graphdevroom
 
Cypher Query Language
Cypher Query Language Cypher Query Language
Cypher Query Language graphdevroom
 
Ontological Conjunctive Query Answering over large, semi-structured knowledge...
Ontological Conjunctive Query Answering over large, semi-structured knowledge...Ontological Conjunctive Query Answering over large, semi-structured knowledge...
Ontological Conjunctive Query Answering over large, semi-structured knowledge...graphdevroom
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB graphdevroom
 
Using Cascalog and Hadoop for rapid graph processing and exploration
Using Cascalog and Hadoop for rapid graph processing and exploration Using Cascalog and Hadoop for rapid graph processing and exploration
Using Cascalog and Hadoop for rapid graph processing and exploration graphdevroom
 

Mehr von graphdevroom (6)

An example graph visualization with Processing.js
An example graph visualization with Processing.js An example graph visualization with Processing.js
An example graph visualization with Processing.js
 
Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...Bio4j: A pioneer graph based database for the integration of biological Big D...
Bio4j: A pioneer graph based database for the integration of biological Big D...
 
Cypher Query Language
Cypher Query Language Cypher Query Language
Cypher Query Language
 
Ontological Conjunctive Query Answering over large, semi-structured knowledge...
Ontological Conjunctive Query Answering over large, semi-structured knowledge...Ontological Conjunctive Query Answering over large, semi-structured knowledge...
Ontological Conjunctive Query Answering over large, semi-structured knowledge...
 
Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB Works with persistent graphs using OrientDB
Works with persistent graphs using OrientDB
 
Using Cascalog and Hadoop for rapid graph processing and exploration
Using Cascalog and Hadoop for rapid graph processing and exploration Using Cascalog and Hadoop for rapid graph processing and exploration
Using Cascalog and Hadoop for rapid graph processing and exploration
 

Kürzlich hochgeladen

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Kürzlich hochgeladen (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Challenges in the Design of a Graph Database Benchmark

  • 1. © Prof. Dr.-Ing. Wolfgang Lehner | Challenges in the Design of a Graph Database Benchmark FOSDEM‘12 – Graph Processing DevRoom Marcus Paradies
  • 2. Marcus Paradies | | 1 > Outline  Motivation  Challenges  Thoughts on Graph Data Generation  Thoughts on Query Workload  Summary and Outlook  Discussion FOSDEM 2012
  • 3. Marcus Paradies | | 2 > Motivation FOSDEM 2012  Graph databases are gaining momentum  Enterprise corporations are getting interested  How to compare the available graph database vendors?  Main issue: Results from benchmarks are not comparable  Lack of standardization in the data model and query language  What are “typical“ graph operations?
  • 4. Marcus Paradies | | 3 > Challenges FOSDEM 2012
  • 5. Marcus Paradies | | 4 > Challenge #1: Application Domain  Graph data is not homogenous  Graph data from different domains follows different patterns  Examples:  Social Network Analysis (SNA)  Protein Interaction Analysis  Recommendation Systems  Supply Chain Management (Vehicle Routing, CRM)  Fraud Detection in Financial Systems  … Challenge: Find an application domain which represents a graph data pattern common in many different scenarios. FOSDEM 2012
  • 6. Marcus Paradies | | 5 > Challenge #2: Graph Data Model FOSDEM 2012 What flavours of graph data models are commonly used?
  • 7. Marcus Paradies | | 6 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph
  • 8. Marcus Paradies | | 7 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph Undirected Graph
  • 9. Marcus Paradies | | 8 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph Undirected Graph Mixed Graph
  • 10. Marcus Paradies | | 9 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph Undirected Graph Mixed Graph Multi Graph
  • 11. Marcus Paradies | | 10 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph Undirected Graph Mixed Graph Multi Graph (Plain) Property Graph
  • 12. Marcus Paradies | | 11 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph Undirected Graph Mixed Graph Multi Graph (Plain) Property Graph (Structured Property Graph)
  • 13. Marcus Paradies | | 12 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph Undirected Graph Mixed Graph Multi Graph (Plain) Property Graph (Structured Property Graph) Hyper Graph
  • 14. Marcus Paradies | | 13 > Challenge #2: Graph Data Model FOSDEM 2012 Directed Graph Undirected Graph Mixed Graph Multi Graph (Plain) Property Graph (Structured Property Graph) Hyper Graph Challenge: Find a graph data model suited for the majority of use cases from various domains.
  • 15. Marcus Paradies | | 14 > Challenge #3: Querying Graph Data FOSDEM 2012  Large variety in graph processing and manipulation languages  Each graph database vendor implements own query languages/APIs  Reason: No standardized graph query language available
  • 16. Marcus Paradies | | 15 > Challenge #3: Querying Graph Data FOSDEM 2012  Large variety in graph processing and manipulation languages  Each graph database vendor implements own query languages/APIs  Reason: No standardized graph query language available Challenge: Find a way to abstract from the zoo of available query languages.
  • 17. Marcus Paradies | | 16 > Challenge #4: Defining the Workload FOSDEM 2012  The workload to be defined is dependent from the underlying query/manipulation language  Should complex (algorithmic) operations be part of a database benchmark?  Which algorithms to pick?  Social Network Analysis → Find communities  Supply Chain Management → Find maximal flow  Web of Data → Find pattern matches  How are concurrent users represented?  What about transactionality?
  • 18. Marcus Paradies | | 17 > Thoughts on Graph Data Generation FOSDEM 2012
  • 19. Marcus Paradies | | 18 > Graph Data Generation - Patterns FOSDEM 2012  Understanding graph patterns (characteristics) is crucical for a good graph data generator  What are distinguishing characteristics of graphs?  How can we identify graph patterns on large graphs?  Three main patterns [1]:  Power law distributed  Small diameters  Community Effects ? = ? =
  • 20. Marcus Paradies | | 19 > Pattern 1 – Power law distributed FOSDEM 2012  Most real-world graph data sets follow a power law distribution  Examples:  Internet router graph  Subsets of the WWW  Citation Graphs source: [2] source: [2]
  • 21. Marcus Paradies | | 20 > Pattern 2 – Small Diameters FOSDEM 2012  Effective Diameter (eccentricity): Minimum number of hops, in which a fraction (e.g. 90%) of all connected pairs of nodes can reach each other  Other measures exist as well, but are not applicable to disconnected graphs  In most use cases, diameter is much smaller than the size of the graph  Examples:  97% eccentricity of around 16 for path lengths in the WWW  Average path length around 6 for Epinions social network source: [1]
  • 22. Marcus Paradies | | 21 > Pattern 3 – Community Effects FOSDEM 2012  Community: A set of nodes, where each node in the set is closer to all other nodes in the community than to nodes outside the community.  Communities can be found in many real-world graphs, especially social networks and collaboration networks  Clustering Coefficient C: A measure, which qualifies the „clumpiness“ of a graph
  • 23. Marcus Paradies | | 22 > Thoughts on Query Workload FOSDEM 2012
  • 24. Marcus Paradies | | 23 > Query Workload - Operations FOSDEM 2012  Graph Manipulation Operations  Add/Update/Remove Nodes from the Graph  Add/Update/Remove Edges from the Graph  Add/Update/Remove Edge attributes  Add/Update/Remove Node attributes  Graph Query Operations  Retrieve selection of nodes from given filter expression  Getting the neighbors of a set of nodes (possibly with edge filter constraints)  Graph Traversals  Based on basic query operations  Exploration of neighborhood from a given set of start nodes  Terminated by the number of steps and/or edge/node filter constraints  Graph Analytical Operations  Aggregation operations such as sum, avg, min, max  Aggregations on node-level and on edge-level
  • 25. Marcus Paradies | | 24 > Query Workload - Measures FOSDEM 2012  Closely related to benchmark capabilities  Measures from relational benchmarks apply such as  Average query response time  Transactions per second (throughput)  Additional measures for graph traversals  Traversals per second  What about distributed scenarios?  What about concurrent users?
  • 26. Marcus Paradies | | 25 > Summary and Outlook  Graph data distribution highly important for graph database benchmark  Application domains do have very specific graph characteristics  A graph database benchmark has to provide abstract and high-level graph operation descriptions  Feel free to contact me if you want to contribute: marcus.paradies@gmail.com FOSDEM 2012
  • 27. Marcus Paradies | | 26 > Discussion FOSDEM 2012
  • 28. Marcus Paradies | | 27 > Theses  A benchmark based on social network data is nice, but might be not be that representative for large enterprise applications  Algorithms should NOT be part of a graph database benchmark  Only support basic operations such as simple lookups and path traversals  The underlying graph data model should be a simple property graph  A graph database has to scale in terms of data size as well as number of concurrent users  .... FOSDEM 2012
  • 29. Marcus Paradies | | 28 > References [1] Graph Mining: Laws, Generators, and Algorithms (2006) [2] http://konect.uni-koblenz.de/ [3] A Discussion on the Design of Graph Database Benchmarks (2010) FOSDEM 2012