Pregel: A System for Large-Scale Graph Processing

Pregel: A System for Large-Scale
Graph Processing
Written by: Grzegorz Malewicz et al at SIGMOD 2010
Presented by: Abolfazl Asudeh

CSE 6339 – Spring 2013

Problem?
 Very large graphs are a popular object of analysis:
e.g. Social networks, web and several other areas
 Efficient processing of large graphs is challenging:
 poor locality of memory access
 very little work per vertex
 Distribution over many machines:
 the locality issue?
 Machine failure?
 There was no scalable general-purpose system for
implementing arbitrary graph algorithms over
arbitrary graph representations in a large-scale
distributed environment

2 4/11/2013

Want to process a large scale graph?
The options:
 Crafting a custom distributed infrastructure
 Needs a lot of effort and has to be repeated for every new
algorithm
 Relying on an existing distributed platform: e.g. Map
Reduce
 Must store graph state in each state  too much
communication between stages
 Using a single-computer graph algorithm library !!
 Using an existing parallel graph system
 Do not address problems like fault tolerance that are very
important in large graph processing

3 4/11/2013

How can we solve the problem?

How to Assign?

Billions of Vertices/Edges Thousand/millions of Computers
(Graph) (Distributed System)

4 4/11/2013

The high-level organization of Pregel
programs
All Vote

Input to Halt
Output

 sequence of iterations, called super-steps.
 each vertex invokes a function conceptually in
parallel
 The function specifies behavior at a single vertex V
and a single superstep S
 can read messages from previous steps and send
messages for next steps
 Can modify the state? of V and its outgoing edges.

5 4/11/2013

advantage?
In vertex-centric approach
 users focus on a local action
 processing each item independently
 ensures that Pregel programs are inherently free of
deadlocks and data races common in asynchronous
systems.

6 4/11/2013

MODEL OF COMPUTATION
 A Directed Graph is given to Pregel
 It runs the computation at each vertex
 Until all nodes vote for halt
 Then Returns the results

All Vote
to Halt
Output

7 4/11/2013

Vertex State Machine
 Algorithm termination is based on every vertex
voting to halt
 In superstep 0, every vertex is in the active state
 A vertex deactivates itself by voting to halt
 It can be reactivated by receiving an (external)
message

8 4/11/2013

The C++ API (Vertex Class)
1. template <typename VertexValue,
2. typename EdgeValue,
3. typename MessageValue>
4. class Vertex {
5. public:
6. virtual void Compute(MessageIterator* msgs) = 0;
7. const string& vertex_id() const;
8. int64 superstep() const;
9. const VertexValue& GetValue();
10. VertexValue* MutableValue();
11. OutEdgeIterator GetOutEdgeIterator();
12. void SendMessageTo(const string& dest_vertex, const
MessageValue& message);
13. void VoteToHalt();
14. };

9 4/11/2013

The C++ API - Message Passing
 Massages are guaranteed to deliver but not in
original order
 Each message is delivered only once
 Each vertex can send massage to any vertex

10 4/11/2013

Maximum Value Example

11 4/11/2013

The C++ API – other classes
 Combiners (not active by default)
 User Can specify a way in this class to reduce the
number of sending the same message
 Aggregators
 Gather The global information such as Statistical values
(sum, average, min,…)
 Can Also be used as a global manager to force the
vertices to run a specific branches of their Compute
functions during specific SuperSteps

12 4/11/2013

The C++ API
 Topology Mutation
 Some graph algorithms need to change the graph's
topology.
 E.g. A clustering algorithm may need to replace a cluster with a
node
 Add Vertex, Then add edge. Remove all edges then
remove vertex
 User defined handlers can be added to solve the conflicts
 Input / Output
 It has Reader/Writer for famous file formats like text and
relational DBs
 User can customize Reader/Writer for new input/outputs

13 4/11/2013

Implementation
 Pregel was designed for the Google cluster
architecture
 Each cluster consists of thousands of commodity
PCs organized into racks with high intra-rack
bandwidth
 Clusters are interconnected but distributed
geographically
 Vertices are assigned to the machines based on
their vertex-ID ( hash(ID) ) so that it can easily be
understood that which node is where

14 4/11/2013

Implementation
1. User Programs are copied on machines
2. One machine becomes the master.
 Other computer can find the master using name service
and register themselves to it
 The master determines how many partitions the graph
have
3. The master assigns one or more partitions (why?)
and a portion of user input to each worker
4. The workers run the compute function for active
vertices and send the messages asynchronously
 There is one thread for each partition in each worker
 When the superstep is finished workers tell the master
15 how many vertices will be active for next superstep
4/11/2013

Fault tolerance
 At the end of each super step:
 Workers checkpoint V, E, and Messages
 Master checkpoints the aggregated values
 Failure is detected by “ping” messages from master
to workers
 The Master reassigns the partition to available
workers and it is recovered from the last check point
 In Confined recovery only the missed partitions have to
be recomputed because the result of other computations
are known
 Not possible for randomized algorithms

16 4/11/2013

Worker implementation
 Maintains some partitions of the graph
 Has the message-queues for supersteps S and S+1
 If the destination is not in this machine it is buffered
to be sent and when the buffer is full it is flashed
 The user may define Combiners to send the remote
messages

17 4/11/2013

Master Implementation
 maintains a list of all workers currently known to be
alive, including the worker's ID and address, and
which portion of the graph it has been assigned
 Does the synchronization and coordinates all
operations
 Maintains statistics and runs a HTTP server for user

18 4/11/2013

Aggregators Implementation
 The information are passed to the master in a Tree
Structure
 The workers may send their information to the
Aggregator machines and they aggregate and send
the values the Master Machine
Worker

Worker
Aggregator
Master

19 4/11/2013

Application – Page Rank ?
class PageRankVertex
: public Vertex<double, void, double> {
public:
virtual void Compute(MessageIterator* msgs) {
if (superstep() >= 1) {
double sum = 0;
for (; !msgs->Done(); msgs->Next())
sum += msgs->Value();
*MutableValue() =0.15 / NumVertices() + 0.85 * sum;
}
if (superstep() < 30) {
const int64 n = GetOutEdgeIterator().size();
SendMessageToAllNeighbors(GetValue() / n);
} else
VoteToHalt();
}
};

20 4/11/2013

Application – Shortest Path ?
class ShortestPathVertex
: public Vertex<int, int, int> {
void Compute(MessageIterator* msgs) {
int mindist = IsSource(vertex_id()) ? 0 : INF;
for (; !msgs->Done(); msgs->Next())
mindist = min(mindist, msgs->Value());
if (mindist < GetValue()) {
*MutableValue() = mindist;
OutEdgeIterator iter = GetOutEdgeIterator();
for (; !iter.Done(); iter.Next())
SendMessageTo(iter.Target(),mindist +
iter.GetValue());
}
VoteToHalt();
}
};

21 4/11/2013

Application – Bipartite matching
 Problem: Find a set of edges in the bipartite graph
that share no endpoint
1. each left vertex not yet matched sends a message to
each of its neighbors to request a match, and then
unconditionally votes to halt
2. each right vertex not yet matched randomly chooses
one of the messages it receives, sends a message
granting that request
3. each left vertex not yet matched chooses one of the
grants it receives and sends an acceptance message
4. The right node receives the message and votes to halt

22 4/11/2013

Experiments
 300 Multicore PCs were used
 They just count the running time (not check pointing)
 Measure the scalability of workers
 Measure the scalability over number of vertices

23 4/11/2013

shortest paths runtimes for a binary tree with a
billion vertices (and, thus, a billion minus one edges)
when the number of Pregel workers varies from 50 to
800

24 4/11/2013

shortest paths runtimes for binary trees varying in
size from a billion to 50 billion vertices, now using a
fixed number of 800 worker tasks scheduled on 300
multicore machines.

25 4/11/2013

Random graphs that use a log-normal
distribution of outdegrees

26 4/11/2013

Thank you

27 4/11/2013

Pregel: A System for Large-Scale Graph Processing

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Pregel: A System for Large-Scale Graph Processing

Ähnlich wie Pregel: A System for Large-Scale Graph Processing (20)

Mehr von Abolfazl Asudeh

Mehr von Abolfazl Asudeh (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Pregel: A System for Large-Scale Graph Processing