(Abstract from Strata talk)
http://strataconf.com/strata2014/public/schedule/detail/32137
Graph analytics have applications beyond large web scale organizations. Many computing problems can be efficiently expressed and processed as a graph and can lead to useful insights that drive product and business decisions
While you can express graph algorithms as SQL queries in Hive or Hadoop MapReduce programs, an API designed specifically for graph processing makes writing many iterative graph computations (such as page rank, connected components, label propagation, graph-based clustering, etc.) easy to express in simpler and easier to understand code. Apache Giraph provides such a native graph processing API, runs on existing Hadoop infrastructure and can directly access HDFS and/or Hive tables.
This talk describes our efforts at Facebook to scale Apache Giraph to very large graphs of up to one trillion edges and how we run Apache Giraph in production. We will also talk about several algorithms that we have implemented and their use cases.
3. Apache Giraph
• Inspired by Google’s Pregel but runs on Hadoop
• “Think like a vertex”
• Maximum value vertex example
Processor 1
Time
5
5
5
1
1
5
5
5
2
Processor 2
5
2
2
5
5
5. Apache Giraph data flow
Split 3
Load/
Send
Graph
Part 1
Part 2
Part 3
Compute/
Send
Messages
Compute/
Send
Messages
Send stats/iterate!
Worker 0
Part 0
Worker 0
Load/
Send
Graph
Storing the graph
Worker 1
Split 2
In-memory
graph
Worker 1
Split 1
Compute / Iterate
Master
Master
Split 0
Worker 1
Input
format
Worker 0
Loading the graph
Part 0
Part 1
Output
format
Part 0
Part 1
Part 2
Part 3
Part 2
Part 3
7. Use case: k-means clustering
Cluster input vectors into k clusters
• Assign each input vector to the closest centroid
• Update centroid locations based on assignments
Random centroid location
Assignment to centroid
c0
Update centroids
c0
c2
c0
c2
c2
c0
c2
c1
c1
c1
c1
8. k-means in Giraph
Partitioning the problem
c0
c2
Input vectors → vertices
• Partitioned across machines
Centroids → aggregators
• Shared data across all machines
c1
!
!
Worker 0
Problem solved....right?
Worker 1
c0
c0
c2
c1
c2
c1
9. Problem 1: Massive dimensions
Cluster Facebook members by friendships?
• 1 billion members (dimensions)
• k clusters
Each worker sending to the master a maximum of
• 1B * (2 bytes - max 5k friends) * k = 2 * k GB
Master receives up to 2 * k * workers GB
• Saturated network link
• OOM
11. Problem 2: Edge cut metric
Clusters should reduce the number of cut edges
Two phases
• Send all out edges your cluster id
• Aggregate edges with different cluster ids
Calculate no more than once an hour?
12. Master computation
Serial computation on master
• Communicates to workers via aggregators
• Added to Giraph by Stanford GPS team
Master
Worker 0
Worker 1
Time
k-means
k-means
start cut
end cut
k-means
k-means
k-means
start cut
end cut
k-means
13. Problem 3: More phases, more problems
Add a stage to initialize the centroids
Add random input vectors to centroids
• Add a few random friends
Two phases
c0
c2
• Randomly sample input vertices to add
• Send messages to a few random neighbors
c3
14. Problem 3: (continued)
Cannot easily support different messages,
combiners
Vertex compute code getting messy
c0
c2
if (phase == INITIALIZE_SELF)
// Randomly add to centroid
else if (phase == INITIALIZE_FRIEND)
// Add my vector to centroid if a friend selected me
else if (phase == K_MEANS)
// Do k-means
else if (phase == START_EDGE_CUT)...
c3
15. Composable computation
Decouple vertex from computation
Master sets the computation, combiner classes
Reusable and composable
Computation
Add random
centroid /
random friends
Add to centroid
K-means
Start edge cut
End edge cut
In message
Null
Centroid
message
Null
Null
Cluster
Out message
Centroid
message
Null
Null
Cluster
Null
Combiner
N/A
N/A
N/A
Cluster combiner
N/A
16. Composable computation (cont)
Balanced Label Propagation
compute candidates to
move to partitions
probabilistically
move vertices
Continue if halting condition not met (i.e. < n
vertices moved?)
17. Composable computation (cont)
Balanced Label Propagation
compute candidates to
move to partitions
probabilistically
move vertices
Continue if halting condition not met (i.e. < n
vertices moved?)
Affinity Propagation
calculate and send
responsibilities
calculate and send
availabilities
Continue if halting condition met (i.e. < n
vertices changed exemplars?)
update exemplars
18. Faster than Hive?
Application
Graph Size
CPU Time Speedup
Elapsed Time Speedup
Page rank
400B+ edges
26x
120x
71B+ edges
12.5x
48x
(single iteration)
Friends of
friends score
24. Balanced label propagation results
* Loosely based on Ugander and Backstrom. Balanced label
propagation for partitioning massive graphs, WSDM '13
25. Avoiding out-of-core
Example: Mutual friends calculation between
neighbors
!
C:{D}
D:{C}
A
1. Send your friends a list of your friends
!
!
E:{}
B
2. Intersect with your friend list
!
1.23B (as of 1/2014)
A:{D}
D:{A,E}
E:{D}
C
E
200+ average friends (2011 S1)
8-byte ids (longs)
= 394 TB / 100 GB machines
3,940 machines (not including the graph)
D
A:{C}
C:{A,E}
E:{C}
B:{}
C:{D}
D:{C}
26. Superstep splitting
Subsets of sources/destinations edges per superstep
* Currently manual - future work automatic!
Sources: A (on), B (off)
Destinations: A (on), B (off)
Sources: A (on), B (off)
Destinations: A (off), B (on)
B
Sources: A (off), B (on)
Destinations: A (on), B (off)
B
Sources: A (off), B (on)
Destinations: A (off), B (on)
B
B
A
B
A
B
A
B
A
B
B
A
B
A
B
A
B
A
A
A
A
A
28. Giraph in Production
Over 1.5 years in production
Over 100 jobs processed a week
30+ applications in our internal application repository
Sample production job - 700B+ edges
Very stable
• Checkpointing disabled (highly loaded HDFS adds instability)
• Retries handle intermittent failures