Graph Coloring Algorithms on Pregel Model using Hadoop
1. Indian Institute of Technology, Patna
Graph Coloring Algorithms on Pregel Model using Hadoop
Supervisor
Dr. Rajiv Misra
Candidate
Nishant M Gandhi
Roll No: 1311CS05
March 29, 2015
2. Contents
• Introduction
• Related Work
• Pregel Graph Coloring Algorithms
◦ Algorithms
◦ Analysis/Result
• Conclusion & Future Work
• References
2 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
3. Introduction
• Challange:
◦ Graph Coloring (Total Vertex Coloring) of Large Scale Graph on top
of Hadoop
• Graph Coloring:
◦ G = (V , E) undirected graph
◦ V is set of vertices and E is set of edges
◦ The problem of graph coloring is to assign color to each vertex
such that for all (i, j) ∈ E; i and j does not get same color.
3 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
4. Introduction: Applications
• Finding substructure in social network [Cha11]
• Frequency Assignment [RPM05]
• Content Delivery Network
• Distibuted Resource Directory Service [Ko06]
4 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
5. Introduction
• Motivation:
◦ MapReduce model is not suitable for iterative graph computation
such as Graph Coloring. Pregel is more suitable for that.
◦ Existing work on Graph Coloring Algorithms on Pregel are like
demonstration of Graph Coloring can also be implemented on Pregel.
[SW14]
◦ Lack careful study of different Graph Coloring Algorithms on Pregel.
5 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
6. Introduction: My Work
• Studies 5 Pregel Graph Coloring Algorithms
◦ Local Maxima First(LMF)
◦ Local Minima-Maxima First(LMMF)
◦ Local Largest Degree First(LLDF)
◦ Local Smallest-Largest Degree First(LSLDF)
◦ Local Incident Degree First(LIDF)
• Being more suitable Pregel based open source platform
[HDA+14], Apache Giraph is used to implement algorithms.
• Evaluated performace of Pregel Graph Coloring Algorithms with
large real-world graphs on 8 node Apache Hadoop cluster.
6 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
7. Background
• Minimum number required to properly color graph is called
chromatic number of that graph.
• Finding chromatic number of a graph is well known NP-Hard
Problem. [GJ79]
• It is not possible to approximate chromatic number into
considerable bound. [FK96]
• Relax chromatic number and many polynomial time sequential
algorithm exist for simple graph coloring problem.
• Maximal Independent Set(MIS) algorithms, which can be easly
parallelized can be used for solving graph coloring problem in
parallel.
7 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
8. Related Work: MapReduce
• Problem with MapReduce Graph Algorithm
◦ Iterative MR-Jobs
◦ High I/O
◦ Not intuitive for Graph Algorithm
• No attempts are made in designing Graph Coloring Algorithm
with MapReduce model
• Pregel model is more suitable for iterative graph computation
than MapReduce model on top of Hadoop [QWH12]
8 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
9. Related Work: Pregel
• Pregel [MABD10], Graph Processing System
◦ In-memory Computation
◦ Vertex-Centic High-level programing model
◦ Batch oriented processing
◦ Based on Valient’s Bulk Synchronization Parallel Model [Val90]
9 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
10. Related Work: Pregel Model
• Graph G=(V,E), Graph is mutable during execution of Algorithm.
• The computation starts simultaneously in all vertices, and
proceeds in discrete rounds.
• The number of rounds that elapse from the beginning of the
algorithm until its end is called the running time of the algorithm.
• Vertices are allowed to perform unbounded local computations.
• Each Vertex can be in either Active or Inactive State. Only Active
vertices in each round take part in local computation.
• In each round, each vertex v is allowed to send message to each
of its neighbors.
• A vertex is allowed to send distict messages to distict neighbor.
• The vertices communicate over the edges of E in the synchronous
manner. 10 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
11. Related Work: Pregel Model
• Pregel works in iterations called Supersteps
• Program Flow:
For Superstep Si=S1,S2,S3,...,Sn
◦ For each Active Vertex,
Execute Compute:
• Messages are received
• Local computation
• Messages are Sent
• Graph Mutation
• VoteToHalt
◦ Termination:
• All Vertices are in Inactive state
• No Messages are sent
11 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
12. Related Work: Pregel Model
• Vertex
◦ VertexId
◦ VertexValue
• Edge
◦ Target Vertex
◦ Weight
• Vertex State
◦ Active
◦ Inactive
12 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
13. Related Work: Distributed Algorithms
• MIS algorithms colors the graph by repeatedly finding
Independent Set
• Randomized Algorithms to find MIS
◦ Luby’s MIS algorithm [Lub86]
◦ Jones-Placement algorithm [JP93]
◦ Welsh-Powell algorithm [WP67]
◦ E G Boman et al. algorithm [BBCG]
13 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
14. Pregel Graph Coloring Algorithms
• Heuristic Approach
• Does not give optimal solutions
• Based on computing Maximul Independent Set in parallel
• Certain assuptions are made for this algorithms.
◦ Graph is undirected and unweighted
◦ Each vertex has unique identifier
◦ Each vertex has one storage variable and assigned color is stored in
that variable
◦ Instead of color, we assign number to vertices
14 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
15. Pregel Graph Coloring Algorithms: Local Maxima
First(LMF)
• Simple Heuristic Approach
• Use only VertexId of Vertex
• Among Active Vertices, Vertices with maximum VertexId in
neighbors are selected
• Each Supersteps generate one MIS and color it
15 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
16. Pregel Graph Coloring Algorithms: Local Maxima
First(LMF)
16 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
17. Pregel Graph Coloring Algorithms: Local
Minima-Maxima First(LMMF)
• Improvement over LMF
• Use only VertexId of Vertex
• Among Active Vertices, Vertices with minimum and maximum
VertexId in neighbors are selected
• Each Supersteps generate one or two MIS and color them
differently
17 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
18. Pregel Graph Coloring Algorithms: Local
Minima-Maxima First(LMMF)
18 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
19. Pregel Graph Coloring Algorithms: Local Largest
Degree First(LLDF)
• Better Heuristic than previous approch
• Use Degree of a Vertex
• Each Supersteps generate one MIS and color it
19 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
20. Pregel Graph Coloring Algorithms: Local Largest
Degree First(LLDF)
20 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
21. Pregel Graph Coloring Algorithms: Local Smallest
Largest Degree First(LSLDF)
• Improvement over LLDF
• Use Degree of a Vertex
• Each Supersteps generate one or two MIS and color them
differently
21 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
22. Pregel Graph Coloring Algorithms: Local Smallest
Largest Degree First(LSLDF)
22 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
23. Pregel Graph Coloring Algorithms: Local Smallest
Largest Degree First(LSLDF)
23 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
24. Pregel Graph Coloring Algorithms: Local Incident
Degree First(LIDF)
• Dynemic Ordering based Heuristic
• Use Incident Degree of a Vertex
• Two Supersteps generate one MIS and color it
24 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
25. Pregel Graph Coloring Algorithms: Local Incident
Degree First(LIDF)
25 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
26. Pregel Graph Coloring Algorithms: Local Incident
Degree First(LIDF)
26 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
27. Experiments: Cluster Configuration
Parameters Details
Number of Nodes 8
RAM for Each Node 2 GB
Hard Disk for Each Node 100 GB
Operating System for
Each Node
Ubuntu Desktop 14.04
(Linux 3.13.0-24-generic)
Hadoop Version 1.2.1 MR1
Pregel like System Name Apache Giraph
Pregel like System Version 1.2.0
Configured Workers 4 per node
Table : Hadoop Cluster Configuration Details
27 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
28. Experiments: Dataset
Dataset |V | |E|
Internet-Topology 1,696,415 11,095,298 35,455
Youtube 1,138,499 2,990,443 28,754
Texas Road Network 1,379,917 1,921,660 12
Flicker 1,715,255 22,613,981 27,236
Table : Real World Datasets from Stanford Network Analysis Platform
28 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
29. Experiments & Result: Performance on Color
Color Used Internet-Topology Youtube Texas Road Network Flicker
LMF 1586 704 344 4303
LMMF 1587 705 345 4303
LLDF 484 261 123 1667
LSLDF 478 267 139 1653
LIDF 648 283 19 3133
Table : Color Used by Different Graph Coloring Algorithm on Different
Dataset
• Performace of LLDF & LSLDF are better than others and very close to each
other.
• LMF & LMMF performace equaly worst than others.
• LIDF has performance better than LMF,LMMF and worst than LLDF, LSLDF.
29 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
30. Experiments & Result: Run Time (second)
Run Time Internet-Topology Youtube Texas Road Network Flicker
LMF 2700 407 66 648122
LMMF 2460 233 49 218556
LLDF 1783 350 47 217031
LSLDF 1380 94 44 2113
LIDF 2597 343 51 1080588
Table : Time(in seconds) taken by Different Graph Coloring Algorithm
30 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
31. Experiments & Result: Supersteps
Supersteps Internet-Topology Youtube Texas Road Network Flicker
LMF 1587 705 345 4304
LMMF 794 353 173 2153
LLDF 485 262 124 1667
LSLDF 241 135 120 827
LIDF 1293 567 39 6267
Table : Time(in seconds) taken by Different Graph Coloring Algorithm
31 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
32. Conclusion
• Effective Graph Coloring is possible using various Heuristic with
Pregel on Hadoop
• Among the algorithm presented, LLDF perform best in the matrix
of Color used in most of the cases of social Netwrok Graphs.
• LSLDF come out as overall best performer in terms of time and
Color used.
• LMF & LMMF are not good approach to color graph in general.
• LIDF perform best in sparce graph but takes more time than
others.
32 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
33. Future Work
• Performance guarantee graph coloring algorithms on Pregel
• Custom Graph partition for performance tuning
33 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
34. 34 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
35. References (1)
Erik G Boman, Doruk Bozda˘g, Umit Catalyurek, and Gebremedhin, A scalable parallel graph
coloring algorithm for distributed memory computers, Euro-Par 2005 Parallel Processing,
Springer, pp. 241–251.
David Chalupa, On the ability of graph coloring heuristics to find substructures in social
networks, Information Sciences and Technologies, Bulletin of ACM Slovakia 3 (2011), no. 2,
51–54.
Uriel Feige and Joe Kilian, Zero knowledge and the chromatic number, Computational
Complexity, 1996. Proceedings., Eleventh Annual IEEE Conference on, IEEE, 1996,
pp. 278–287.
M R Garey and D S Johnson, Computers and intractability, Freeman (1979).
Minyang Han, Khuzaima Daudjee, Khaled Ammar, M Tamer Ozsu, Xingfang Wang, and Tianqi
Jin, An experimental comparison of pregel-like graph processing systems, Proceedings of the
VLDB Endowment 7 (2014), no. 12, 1047–1058.
35 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
36. References (2)
Mark T Jones and Paul E Plassmann, A parallel graph coloring heuristic, SIAM Journal on
Scientific Computing 14 (1993), no. 3, 654–669.
Bong Jun Ko, Distributed, self-organizing replica placement in large scale networks, Columbia
University, 2006.
Michael Luby, A simple parallel algorithm for the maximal independent set problem, SIAM
journal on computing 15 (1986), no. 4, 1036–1053.
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, and Dehnert, Pregel: a system for
large-scale graph processing, Proceedings of the 2010 ACM SIGMOD International Conference
on Management of data, ACM, 2010, pp. 135–146.
Louise Quick, Paul Wilkinson, and David Hardcastle, Using pregel-like large scale graph
processing frameworks for social network analysis, Proceedings of the 2012 International
Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), IEEE
Computer Society, 2012, pp. 457–463.
36 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
37. References (3)
Janne Riihij¨arvi, Marina Petrova, and Petri M¨ah¨onen, Frequency allocation for wlans using
graph colouring techniques., WONS, vol. 5, 2005, pp. 216–222.
Semih Salihoglu and Jennifer Widom, Optimizing graph algorithms on pregel-like systems.
Leslie G Valiant, A bridging model for parallel computation, Communications of the ACM 33
(1990), no. 8, 103–111.
Dominic JA Welsh and Martin B Powell, An upper bound for the chromatic number of a graph
and its application to timetabling problems, The Computer Journal 10 (1967), no. 1, 85–86.
37 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop
38. Thank You
38 of 38
Nishant M Gandhi, Roll No: 1311CS05 -
Graph Coloring Algorithms on Pregel Model using Hadoop