This document discusses processing large-scale graphs using Google's Pregel framework. It provides an overview of Pregel, including its map-reduce approach with multiple iterations. An example of using Pregel to calculate connected components in a graph is shown step-by-step. The document also discusses graph algorithms like page rank, bipartite matching, and shortest paths that can be implemented with Pregel and examples of Pregel implementations in systems like Giraph, TinkerPop and ArangoDB.
2. About
about us
Frank Celler (@fceller) working on the ArangoDB core
Michael Hackstein (@mchacki) started an experimental
implementation of Pregel
1
3. About
about us
Frank Celler (@fceller) working on the ArangoDB core
Michael Hackstein (@mchacki) started an experimental
implementation of Pregel
about the talk
different kinds of graph algorithms
Pregel example
Pregel mind set aka Framework
more examples
1
4. Pregel at ArangoDB
Started as a side project in free hack time
Experimental on operational database
Implemented as an alternative to traversals
Make use of the 2exibility of JavaScript:
No strict type system
No pre-compilation, on-the-2y queries
Native JSON documents
Really fast development
2
5. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
) Touch all vertices and their neighbourhoods
3
6. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
) Touch all vertices and their neighbourhoods
Traversals
De1ne a speci1c start point
Iteratively explore the graph
) History of steps is known
3
7. Graph Algorithms
Pattern matching
Search through the entire graph
Identify similar components
) Touch all vertices and their neighbourhoods
Traversals
De1ne a speci1c start point
Iteratively explore the graph
) History of steps is known
Global measurements
Compute one value for the graph, based on all it’s vertices
or edges
Compute one value for each vertex or edge
) Often require a global view on the graph
3
8. Pregel
A framework to query distributed, directed graphs.
Known as “Map-Reduce” for graphs
Uses same phases
Has several iterations
Aims at:
Operate all servers at full capacity
Reduce network traZc
Good at calculations touching all vertices
Bad at calculations touching a very small number of vertices
4
24. Worker ^= Map
“Map” a user-de1ned algorithm over all vertices
Output: set of messages to other vertices
Available parameters:
The current vertex and his outbound edges
All incoming messages
Global values
Allow modi1cations on the vertex:
Attach a result to this vertex and his outgoing edges
Delete the vertex and his outgoing edges
Deactivate the vertex
7
25. Combine ^= Reduce
“Reduce” all generated messages
Output: An aggregated message for each vertex.
Executed on sender as well as receiver.
Available parameters:
One new message for a vertex
The stored aggregate for this vertex
Typical combiners are SUM, MIN or MAX
Reduces network traZc
8
26. Activity ^= Termination
Execute several rounds of Map/Reduce
Count active vertices and messages
Start next round if one of the following is true:
At least one vertex is active
At least one message is sent
Terminate if neither a vertex is active nor messages were sent
Store all non-deleted vertices and edges as resulting graph
9