PROCESSING LARGE-SCALE GRAPHS WITH GOOGLE(TM) PREGEL 
MICHAEL HACKSTEIN 
FRONT END AND GRAPH SPECIALIST ARANGODB
Processing large-scale graphs 
with GoogleTMPregel 
November 17th 
Michael Hackstein 
@mchacki 
www.arangodb.com
Michael Hackstein 
ArangoDB Core Team 
Web Frontend 
Graph visualisation 
Graph features 
Host of cologne.js 
Master’s Deg...
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and...
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and...
Graph Algorithms 
Pattern matching 
Search through the entire graph 
Identify similar components 
) Touch all vertices and...
Pregel 
A framework to query distributed, directed graphs. 
Known as “Map-Reduce” for graphs 
Uses same phases 
Has severa...
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 4 
3 4 
3 
6 
6 
active inactive 
3 forward message 2 backward mess...
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 
6 
7 
5 4 
3 4 
3 
6 
6 
4 
2 
3 
4 
active inactive 
3 forward me...
Example – Connected Components 
1 
1 
2 
2 
5 
7 
7 
5 
6 
7 
5 4 
3 4 
3 
6 
6 
4 
2 
3 
4 
active inactive 
3 forward me...
Example – Connected Components 
1 
1 
2 
2 
5 
6 
7 
5 
6 
5 
5 4 
3 4 
3 
5 
6 
3 
1 
2 
2 
active inactive 
3 forward me...
Example – Connected Components 
1 
1 
2 
2 
5 
6 
7 
5 
6 
5 
5 4 
3 4 
3 
5 
6 
3 
1 
2 
2 
active inactive 
3 forward me...
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 2 
2 4 
3 
5 
6 
1 
1 
2 
2 
active inactive 
3 forward message 2 b...
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 2 
2 4 
3 
5 
6 
1 
1 
2 
2 
active inactive 
3 forward message 2 b...
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
1 
1 
active inactive 
3 forward message 2 backwar...
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
1 
1 
active inactive 
3 forward message 2 backwar...
Example – Connected Components 
1 
1 
1 
2 
5 
5 
7 
5 1 
1 4 
3 
5 
6 
active inactive 
3 forward message 2 backward mess...
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Pregel – Sequence 
5
Worker ^= Map 
“Map” a user-de1ned algorithm over all vertices 
Output: set of messages to other vertices 
Available param...
Combine ^= Reduce 
“Reduce” all generated messages 
Output: An aggregated message for each vertex. 
Executed on sender as ...
Activity ^= Termination 
Execute several rounds of Map/Reduce 
Count active vertices and messages 
Start next round if one...
Pregel at ArangoDB 
Started as a side project in free hack time 
Experimental on operational database 
Implemented as an a...
Pagerank for Giraph 
10 
1 public class SimplePageRankComputation extends BasicComputation < 
LongWritable , DoubleWritabl...
Pagerank for TinkerPop3 
11 
1 public class PageRankVertexProgram implements VertexProgram < 
Double > { 
2 private Messag...
Pagerank for ArangoDB 
1 var pageRank = function (vertex , message , global ) { 
2 var total , rank , edgeCount , send , e...
Thank you 
Further Questions? 
Follow me on twitter/github: @mchacki 
Write me a mail: mchacki@arangodb.com 
Follow @arang...
17TH ~ 18th NOV 2014 
MADRID (SPAIN)
Nächste SlideShare
Wird geladen in …5
×

Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at Big Data Spain 2014

1.306 Aufrufe

Veröffentlicht am

This talk will give a good overview over the complex architecture of the Pregel framework and will give some insights where there are potential bottlenecks when writing a Pregel algorithm.

Veröffentlicht in: Technologie
0 Kommentare
0 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Keine Downloads
Aufrufe
Aufrufe insgesamt
1.306
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
3
Aktionen
Geteilt
0
Downloads
16
Kommentare
0
Gefällt mir
0
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Processing large-scale graphs with Google(TM) Pregel by MICHAEL HACKSTEIN at Big Data Spain 2014

  1. 1. PROCESSING LARGE-SCALE GRAPHS WITH GOOGLE(TM) PREGEL MICHAEL HACKSTEIN FRONT END AND GRAPH SPECIALIST ARANGODB
  2. 2. Processing large-scale graphs with GoogleTMPregel November 17th Michael Hackstein @mchacki www.arangodb.com
  3. 3. Michael Hackstein ArangoDB Core Team Web Frontend Graph visualisation Graph features Host of cologne.js Master’s Degree (spec. Databases and Information Systems) 1
  4. 4. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods 2
  5. 5. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods Traversals De1ne a speci1c start point Iteratively explore the graph ) History of steps is known 2
  6. 6. Graph Algorithms Pattern matching Search through the entire graph Identify similar components ) Touch all vertices and their neighbourhoods Traversals De1ne a speci1c start point Iteratively explore the graph ) History of steps is known Global measurements Compute one value for the graph, based on all it’s vertices or edges Compute one value for each vertex or edge ) Often require a global view on the graph 2
  7. 7. Pregel A framework to query distributed, directed graphs. Known as “Map-Reduce” for graphs Uses same phases Has several iterations Aims at: Operate all servers at full capacity Reduce network traZc Good at calculations touching all vertices Bad at calculations touching a very small number of vertices 3
  8. 8. Example – Connected Components 1 1 2 2 5 7 7 5 4 3 4 3 6 6 active inactive 3 forward message 2 backward message 4
  9. 9. Example – Connected Components 1 1 2 2 5 7 7 5 6 7 5 4 3 4 3 6 6 4 2 3 4 active inactive 3 forward message 2 backward message 4
  10. 10. Example – Connected Components 1 1 2 2 5 7 7 5 6 7 5 4 3 4 3 6 6 4 2 3 4 active inactive 3 forward message 2 backward message 4
  11. 11. Example – Connected Components 1 1 2 2 5 6 7 5 6 5 5 4 3 4 3 5 6 3 1 2 2 active inactive 3 forward message 2 backward message 4
  12. 12. Example – Connected Components 1 1 2 2 5 6 7 5 6 5 5 4 3 4 3 5 6 3 1 2 2 active inactive 3 forward message 2 backward message 4
  13. 13. Example – Connected Components 1 1 1 2 5 5 7 5 2 2 4 3 5 6 1 1 2 2 active inactive 3 forward message 2 backward message 4
  14. 14. Example – Connected Components 1 1 1 2 5 5 7 5 2 2 4 3 5 6 1 1 2 2 active inactive 3 forward message 2 backward message 4
  15. 15. Example – Connected Components 1 1 1 2 5 5 7 5 1 1 4 3 5 6 1 1 active inactive 3 forward message 2 backward message 4
  16. 16. Example – Connected Components 1 1 1 2 5 5 7 5 1 1 4 3 5 6 1 1 active inactive 3 forward message 2 backward message 4
  17. 17. Example – Connected Components 1 1 1 2 5 5 7 5 1 1 4 3 5 6 active inactive 3 forward message 2 backward message 4
  18. 18. Pregel – Sequence 5
  19. 19. Pregel – Sequence 5
  20. 20. Pregel – Sequence 5
  21. 21. Pregel – Sequence 5
  22. 22. Pregel – Sequence 5
  23. 23. Worker ^= Map “Map” a user-de1ned algorithm over all vertices Output: set of messages to other vertices Available parameters: The current vertex and his outbound edges All incoming messages Global values Allow modi1cations on the vertex: Attach a result to this vertex and his outgoing edges Delete the vertex and his outgoing edges Deactivate the vertex 6
  24. 24. Combine ^= Reduce “Reduce” all generated messages Output: An aggregated message for each vertex. Executed on sender as well as receiver. Available parameters: One new message for a vertex The stored aggregate for this vertex Typical combiners are SUM, MIN or MAX Reduces network traZc 7
  25. 25. Activity ^= Termination Execute several rounds of Map/Reduce Count active vertices and messages Start next round if one of the following is true: At least one vertex is active At least one message is sent Terminate if neither a vertex is active nor messages were sent Store all non-deleted vertices and edges as resulting graph 8
  26. 26. Pregel at ArangoDB Started as a side project in free hack time Experimental on operational database Implemented as an alternative to traversals Make use of the 2exibility of JavaScript: No strict type system No pre-compilation, on-the-2y queries Native JSON documents Really fast development 9
  27. 27. Pagerank for Giraph 10 1 public class SimplePageRankComputation extends BasicComputation < LongWritable , DoubleWritable , FloatWritable , DoubleWritable > { 2 public static final int MAX_SUPERSTEPS = 30; 34 @Override 5 public void compute ( Vertex < LongWritable , DoubleWritable , FloatWritable > vertex , Iterable < DoubleWritable > messages ) throws IOException { 6 if ( getSuperstep () >= 1) { 7 double sum = 0; 8 for ( DoubleWritable message : messages ) { 9 sum += message .get (); 10 } 11 DoubleWritable vertexValue = new DoubleWritable ((0.15 f / getTotalNumVertices ()) + 0.85 f * sum ); 12 vertex . setValue ( vertexValue ); 13 } 14 if ( getSuperstep () < MAX_SUPERSTEPS ) { 15 long edges = vertex . getNumEdges (); 16 sendMessageToAllEdges (vertex , new DoubleWritable ( vertex . getValue ().get () / edges )); 17 } else { 18 vertex . voteToHalt (); 19 } 20 } 21 22 public static class SimplePageRankWorkerContext extends WorkerContext { 23 @Override 24 public void preApplication () throws InstantiationException , IllegalAccessException { } 25 @Override 26 public void postApplication () { } 27 @Override 28 public void preSuperstep () { } 29 @Override 30 public void postSuperstep () { } 31 } 32 33 public static class SimplePageRankMasterCompute extends DefaultMasterCompute { 34 @Override 35 public void initialize () throws InstantiationException , IllegalAccessException { 36 } 37 } 38 public static class SimplePageRankVertexReader extends GeneratedVertexReader < LongWritable , DoubleWritable , FloatWritable > { 39 @Override 40 public boolean nextVertex () { 41 return totalRecords > recordsRead ; 42 } 44 @Override 45 public Vertex < LongWritable , DoubleWritable , FloatWritable > getCurrentVertex () throws IOException { 46 Vertex < LongWritable , DoubleWritable , FloatWritable > vertex = getConf (). createVertex (); 47 LongWritable vertexId = new LongWritable ( 48 ( inputSplit . getSplitIndex () * totalRecords ) + recordsRead ); 49 DoubleWritable vertexValue = new DoubleWritable ( vertexId . get () * 10d); 50 long targetVertexId = ( vertexId .get () + 1) % ( inputSplit . getNumSplits () * totalRecords ); 51 float edgeValue = vertexId . get () * 100 f; 52 List <Edge < LongWritable , FloatWritable >> edges = Lists . newLinkedList (); 53 edges .add ( EdgeFactory . create (new LongWritable ( targetVertexId ), new FloatWritable ( edgeValue ))); 54 vertex . initialize ( vertexId , vertexValue , edges ); 55 ++ recordsRead ; 56 return vertex ; 57 } 58 } 59 60 public static class SimplePageRankVertexInputFormat extends GeneratedVertexInputFormat < LongWritable , DoubleWritable , FloatWritable > { 61 @Override 62 public VertexReader < LongWritable , DoubleWritable , FloatWritable > createVertexReader ( InputSplit split , TaskAttemptContext context ) 63 throws IOException { 64 return new SimplePageRankVertexReader (); 65 } 66 } 67 68 public static class SimplePageRankVertexOutputFormat extends TextVertexOutputFormat < LongWritable , DoubleWritable , FloatWritable > { 69 @Override 70 public TextVertexWriter createVertexWriter ( TaskAttemptContext context ) throws IOException , InterruptedException { 71 return new SimplePageRankVertexWriter (); 72 } 73 74 public class SimplePageRankVertexWriter extends TextVertexWriter { 75 @Override 76 public void writeVertex ( Vertex < LongWritable , DoubleWritable , FloatWritable > vertex ) throws IOException , InterruptedException { 77 getRecordWriter (). write ( new Text ( vertex . getId (). toString ()), new Text ( vertex . getValue (). toString ())) ; 78 } 79 } 80 } 81 }
  28. 28. Pagerank for TinkerPop3 11 1 public class PageRankVertexProgram implements VertexProgram < Double > { 2 private MessageType . Local messageType = MessageType . Local .of (() -> GraphTraversal .< Vertex >of (). outE ()); 3 public static final String PAGE_RANK = Graph .Key . hide (" gremlin . pageRank "); 4 public static final String EDGE_COUNT = Graph .Key . hide (" gremlin . edgeCount "); 5 private static final String VERTEX_COUNT = " gremlin . pageRankVertexProgram . vertexCount "; 6 private static final String ALPHA = " gremlin . pageRankVertexProgram . alpha "; 7 private static final String TOTAL_ITERATIONS = " gremlin . pageRankVertexProgram . totalIterations "; 8 private static final String INCIDENT_TRAVERSAL = " gremlin . pageRankVertexProgram . incidentTraversal "; 9 private double vertexCountAsDouble = 1; 10 private double alpha = 0.85 d; 11 private int totalIterations = 30; 12 private static final Set <String > COMPUTE_KEYS = new HashSet <>( Arrays . asList ( PAGE_RANK , EDGE_COUNT )); 13 14 private PageRankVertexProgram () {} 15 16 @Override 17 public void loadState ( final Configuration configuration ) { 18 this . vertexCountAsDouble = configuration . getDouble ( VERTEX_COUNT , 1.0 d); 19 this . alpha = configuration . getDouble (ALPHA , 0.85 d); 20 this . totalIterations = configuration . getInt ( TOTAL_ITERATIONS , 30); 21 try { 22 if ( configuration . containsKey ( INCIDENT_TRAVERSAL )) { 23 final SSupplier < Traversal > traversalSupplier = VertexProgramHelper . deserialize ( configuration , INCIDENT_TRAVERSAL ); 24 VertexProgramHelper . verifyReversibility ( traversalSupplier .get ()); 25 this . messageType = MessageType . Local .of (( SSupplier ) traversalSupplier ); 26 } 27 } catch ( final Exception e) { 28 throw new IllegalStateException (e. getMessage () , e); 29 } 30 } 32 @Override 33 public void storeState ( final Configuration configuration ) { 34 configuration . setProperty ( GraphComputer . VERTEX_PROGRAM , PageRankVertexProgram . class . getName ()); 35 configuration . setProperty ( VERTEX_COUNT , this . vertexCountAsDouble ); 36 configuration . setProperty (ALPHA , this . alpha ); 37 configuration . setProperty ( TOTAL_ITERATIONS , this . totalIterations ); 38 try { 39 VertexProgramHelper . serialize ( this . messageType . getIncidentTraversal () , configuration , INCIDENT_TRAVERSAL ); 40 } catch ( final Exception e) { 41 throw new IllegalStateException (e. getMessage () , e); 42 } 43 } 44 45 @Override 46 public Set <String > getElementComputeKeys () { 47 return COMPUTE_KEYS ; 48 } 49 50 @Override 51 public void setup ( final Memory memory ) { 52 53 } 54 55 @Override 56 public void execute ( final Vertex vertex , Messenger <Double > messenger , final Memory memory ) { 57 if ( memory . isInitialIteration ()) { 58 double initialPageRank = 1.0d / this . vertexCountAsDouble ; 59 double edgeCount = Double . valueOf (( Long ) this . messageType . edges ( vertex ). count (). next ()); 60 vertex . singleProperty ( PAGE_RANK , initialPageRank ); 61 vertex . singleProperty ( EDGE_COUNT , edgeCount ); 62 messenger . sendMessage ( this . messageType , initialPageRank / edgeCount ); 63 } else { 64 double newPageRank = StreamFactory . stream ( messenger . receiveMessages ( this . messageType )). reduce (0.0d, (a, b) -> a + b); 65 newPageRank = ( this . alpha * newPageRank ) + ((1.0 d - this . alpha ) / this . vertexCountAsDouble ); 66 vertex . singleProperty ( PAGE_RANK , newPageRank ); 67 messenger . sendMessage ( this . messageType , newPageRank / vertex .<Double > property ( EDGE_COUNT ). orElse (0.0 d)); 68 } 69 } 70 71 @Override 72 public boolean terminate ( final Memory memory ) { 73 return memory . getIteration () >= this . totalIterations ; 74 } 75 }
  29. 29. Pagerank for ArangoDB 1 var pageRank = function (vertex , message , global ) { 2 var total , rank , edgeCount , send , edge , alpha , sum ; 3 total = global . vertexCount ; 4 edgeCount = vertex . _outEdges . length ; 5 alpha = global . alpha ; 6 sum = 0; 7 if ( global . step > 0) { 8 while ( message . hasNext ()) { 9 sum += message . next (). data ; 10 } 11 rank = alpha * sum + (1- alpha ) / total ; 12 } else { 13 rank = 1 / total ; 14 } 15 vertex . _setResult ( rank ); 16 if ( global . step < global . MAX_STEPS ) { 17 send = rank / edgeCount ; 18 while ( vertex . _outEdges . hasNext ()) { 19 edge = vertex . _outEdges . next (); 20 message . sendTo ( edge . _getTarget () , send ); 21 } 22 } else { 23 vertex . _deactivate (); 24 } 25 }; 26 27 var combiner = function ( message , oldMessage ) { 28 return message + oldMessage ; 29 }; 30 31 var Runner = require (" org/ arangodb / pregelRunner "). Runner ; 32 var runner = new Runner (); 33 runner . setWorker ( pageRank ); 34 runner . setCombiner ( combiner ); 35 runner . start (" myGraph "); 12
  30. 30. Thank you Further Questions? Follow me on twitter/github: @mchacki Write me a mail: mchacki@arangodb.com Follow @arangodb on Twitter Join our google group: https://groups.google.com/forum/#!forum/arangodb Visit our blog https://www.arangodb.com/blog Slides available at https://www.slideshare.net/arangodb 13
  31. 31. 17TH ~ 18th NOV 2014 MADRID (SPAIN)

×