Pregel is a system for large-scale graph processing that addresses the challenges of processing large graphs in a distributed environment. It uses a vertex-centric programming model where computation is expressed as discrete message-passing iterations called supersteps. In each superstep, vertices send messages to each other and can update their own state and outgoing edges. The system scales to graphs with billions of vertices across thousands of machines by partitioning the graph and assigning vertices to different machines. It provides fault tolerance through checkpointing. Pregel has been used to implement various graph algorithms like PageRank, shortest paths, and bipartite matching. Experiments showed it scales linearly with the number of machines.
1. Pregel: A System for Large-Scale
Graph Processing
Written by: Grzegorz Malewicz et al at SIGMOD 2010
Presented by: Abolfazl Asudeh
CSE 6339 – Spring 2013
2. Problem?
Very large graphs are a popular object of analysis:
e.g. Social networks, web and several other areas
Efficient processing of large graphs is challenging:
poor locality of memory access
very little work per vertex
Distribution over many machines:
the locality issue?
Machine failure?
There was no scalable general-purpose system for
implementing arbitrary graph algorithms over
arbitrary graph representations in a large-scale
distributed environment
2 4/11/2013
3. Want to process a large scale graph?
The options:
Crafting a custom distributed infrastructure
Needs a lot of effort and has to be repeated for every new
algorithm
Relying on an existing distributed platform: e.g. Map
Reduce
Must store graph state in each state too much
communication between stages
Using a single-computer graph algorithm library !!
Using an existing parallel graph system
Do not address problems like fault tolerance that are very
important in large graph processing
3 4/11/2013
4. How can we solve the problem?
How to Assign?
Billions of Vertices/Edges Thousand/millions of Computers
(Graph) (Distributed System)
4 4/11/2013
5. The high-level organization of Pregel
programs
All Vote
Input to Halt
Output
sequence of iterations, called super-steps.
each vertex invokes a function conceptually in
parallel
The function specifies behavior at a single vertex V
and a single superstep S
can read messages from previous steps and send
messages for next steps
Can modify the state? of V and its outgoing edges.
5 4/11/2013
6. advantage?
In vertex-centric approach
users focus on a local action
processing each item independently
ensures that Pregel programs are inherently free of
deadlocks and data races common in asynchronous
systems.
6 4/11/2013
7. MODEL OF COMPUTATION
A Directed Graph is given to Pregel
It runs the computation at each vertex
Until all nodes vote for halt
Then Returns the results
All Vote
to Halt
Output
7 4/11/2013
8. Vertex State Machine
Algorithm termination is based on every vertex
voting to halt
In superstep 0, every vertex is in the active state
A vertex deactivates itself by voting to halt
It can be reactivated by receiving an (external)
message
8 4/11/2013
10. The C++ API - Message Passing
Massages are guaranteed to deliver but not in
original order
Each message is delivered only once
Each vertex can send massage to any vertex
10 4/11/2013
12. The C++ API – other classes
Combiners (not active by default)
User Can specify a way in this class to reduce the
number of sending the same message
Aggregators
Gather The global information such as Statistical values
(sum, average, min,…)
Can Also be used as a global manager to force the
vertices to run a specific branches of their Compute
functions during specific SuperSteps
12 4/11/2013
13. The C++ API
Topology Mutation
Some graph algorithms need to change the graph's
topology.
E.g. A clustering algorithm may need to replace a cluster with a
node
Add Vertex, Then add edge. Remove all edges then
remove vertex
User defined handlers can be added to solve the conflicts
Input / Output
It has Reader/Writer for famous file formats like text and
relational DBs
User can customize Reader/Writer for new input/outputs
13 4/11/2013
14. Implementation
Pregel was designed for the Google cluster
architecture
Each cluster consists of thousands of commodity
PCs organized into racks with high intra-rack
bandwidth
Clusters are interconnected but distributed
geographically
Vertices are assigned to the machines based on
their vertex-ID ( hash(ID) ) so that it can easily be
understood that which node is where
14 4/11/2013
15. Implementation
1. User Programs are copied on machines
2. One machine becomes the master.
Other computer can find the master using name service
and register themselves to it
The master determines how many partitions the graph
have
3. The master assigns one or more partitions (why?)
and a portion of user input to each worker
4. The workers run the compute function for active
vertices and send the messages asynchronously
There is one thread for each partition in each worker
When the superstep is finished workers tell the master
15 how many vertices will be active for next superstep
4/11/2013
16. Fault tolerance
At the end of each super step:
Workers checkpoint V, E, and Messages
Master checkpoints the aggregated values
Failure is detected by “ping” messages from master
to workers
The Master reassigns the partition to available
workers and it is recovered from the last check point
In Confined recovery only the missed partitions have to
be recomputed because the result of other computations
are known
Not possible for randomized algorithms
16 4/11/2013
17. Worker implementation
Maintains some partitions of the graph
Has the message-queues for supersteps S and S+1
If the destination is not in this machine it is buffered
to be sent and when the buffer is full it is flashed
The user may define Combiners to send the remote
messages
17 4/11/2013
18. Master Implementation
maintains a list of all workers currently known to be
alive, including the worker's ID and address, and
which portion of the graph it has been assigned
Does the synchronization and coordinates all
operations
Maintains statistics and runs a HTTP server for user
18 4/11/2013
19. Aggregators Implementation
The information are passed to the master in a Tree
Structure
The workers may send their information to the
Aggregator machines and they aggregate and send
the values the Master Machine
Worker
Worker
Aggregator
Master
19 4/11/2013
20. Application – Page Rank ?
class PageRankVertex
: public Vertex<double, void, double> {
public:
virtual void Compute(MessageIterator* msgs) {
if (superstep() >= 1) {
double sum = 0;
for (; !msgs->Done(); msgs->Next())
sum += msgs->Value();
*MutableValue() =0.15 / NumVertices() + 0.85 * sum;
}
if (superstep() < 30) {
const int64 n = GetOutEdgeIterator().size();
SendMessageToAllNeighbors(GetValue() / n);
} else
VoteToHalt();
}
};
20 4/11/2013
21. Application – Shortest Path ?
class ShortestPathVertex
: public Vertex<int, int, int> {
void Compute(MessageIterator* msgs) {
int mindist = IsSource(vertex_id()) ? 0 : INF;
for (; !msgs->Done(); msgs->Next())
mindist = min(mindist, msgs->Value());
if (mindist < GetValue()) {
*MutableValue() = mindist;
OutEdgeIterator iter = GetOutEdgeIterator();
for (; !iter.Done(); iter.Next())
SendMessageTo(iter.Target(),mindist +
iter.GetValue());
}
VoteToHalt();
}
};
21 4/11/2013
22. Application – Bipartite matching
Problem: Find a set of edges in the bipartite graph
that share no endpoint
1. each left vertex not yet matched sends a message to
each of its neighbors to request a match, and then
unconditionally votes to halt
2. each right vertex not yet matched randomly chooses
one of the messages it receives, sends a message
granting that request
3. each left vertex not yet matched chooses one of the
grants it receives and sends an acceptance message
4. The right node receives the message and votes to halt
22 4/11/2013
23. Experiments
300 Multicore PCs were used
They just count the running time (not check pointing)
Measure the scalability of workers
Measure the scalability over number of vertices
23 4/11/2013
24. shortest paths runtimes for a binary tree with a
billion vertices (and, thus, a billion minus one edges)
when the number of Pregel workers varies from 50 to
800
24 4/11/2013
25. shortest paths runtimes for binary trees varying in
size from a billion to 50 billion vertices, now using a
fixed number of 800 worker tasks scheduled on 300
multicore machines.
25 4/11/2013
26. Random graphs that use a log-normal
distribution of outdegrees
26 4/11/2013