2. 2
Problem
• Lack of benchmarks for generic graph
processing platforms
• Graph500
• BFS
• Kroneker graph
• Several academic studies
• Specific to graph or RDF databases
• Ad-hoc setup, difficult to extend
6. 6
Choke-point analysis
• Choke points are crucial technological
challenges that platforms are struggling
with
• Select benchmark workload based on real-
world scenarios, but make sure they cover
the identified choke points
• Examples:
• Network traffic
• Access locality
• Skewed execution
7. 7
Enhanced LDBC Datagen
• Multiple node degree distributions
• Previously Facebook only
• Currently added Zeta and Geometric
• Different structural characteristics
• Average clustering coefficient
• Assortativity
• Improved graph generation
• Generate only friendship graph
• MapReduce optimizations
9. 9
Discussion
• How much preprocessing should we allow in the ETL phase?
How to choose a metric that captures the preprocessing?
• How should we asses the correctness of algorithms that
produce approximate results?
• How to setup the platforms? Should we allow algorithm-
specific platform setups or should we require only one setup
to be used for all algorithms?
http://graphalytics.ewi.tudelft.nl