SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
Large Scale Graph Processing
Deepankar Patra
IIT Madras
Goal
Running graph algorithms(e.g. Shortest
path, connected components, finding
diameter etc) on huge graphs(Terabyte
or more Sized)
Graph
Node/Vertex Edge
Example Graph Algorithm
● Shortest Path Algorithm
Source Vertex Destination Vertex
Why?
Lot of machine learning algorithms
require graph computations and in the
real world the input for these are
huge, which cannot fit in one machine.
Real World? 
Big Graphs:
● Social Networks
● Biological Networks
● Mobile Call Networks
● Citation Networks
● World Wide Web
● Geographic Pathways
● Customer merchant graphs(Amazon,
Ebay)
Facebook Friends Graph
Src: http://wisonets.files.wordpress.com/2012/09/facebook-mutual-friends2.png
Machine Learning 
Algorithms?
● Recommendation
● PageRank
● Web search
● Cyber security
● Fraud detection
● Clustering
● Shortest Path Calculation
Graph Algorithms Typically Involve
● Performing computations at each
node based on node features, edge
features, and local link structure.
● Propagating computations:
“traversing” the graph
Example
Src: http://www.slideshare.net/WeiruDai
Why not MapReduce?
● Represent graphs as adjacency lists
● Perform local computations in mapper
● Pass along partial results via
outlinks, keyed by destination node
● Perform aggregation in reducer on
inlinks to a node
● Iterate until convergence: controlled
by external “driver”
● Don’t forget to pass the graph
structure between iterations
Why not Spark?
● Spark provides GraphX library for
graph & machine learning algorithms.
● But still it is not designed
specifically for graph algorithms.
● So, no optimization will be available
which are applicable for graphs only.
PREGEL, Google, 2010
● Basic idea: “think like a vertex”
● Based on Bulk Synchronous
Parallel(BSP) Model
● Provides scalability
● Provides fault tolerance
● Provides flexibility to express
arbitrary graph algorithms
How does it work?
● Master/Worker architecture
● Each worker is assigned a subset of
a directed graph’s vertices
● Vertex-centric model. Each vertex
has:
● An arbitrary “value” that can be
get/set.
● List of messages sent to it
● List of outgoing edges (edges have
a value too)
● A binary state (active/inactive)
Graph Parititioning
Worker 1
Worker 3
Worker 2
Pregel execution model
Master initiates synchronous iterations (called a
“superstep”), where at every superstep:
● Workers asynchronously execute a user function on all
of its vertices
● Vertices can receive messages sent to it in the last
superstep
● Vertices can modify their value, modify values of
edges, change the topology of the graph (add/remove
vertices or edges)
● Vertices can send messages to other vertices to be
received in the next superstep
● Vertices can “vote to halt”
● Execution stops when all vertices have voted to halt
and no vertices have messages.
● Vote to halt trumped by non-empty message queue
Pregel Graph Processing
Page Rank
PageRank is a link analysis
algorithm that is used to determine
the importance of a documentbased on
the number of references to it and
the importance of the source
documents themselves.
Page Rank
A = A given page
T1 .... Tn = Pages that point to page
A (citations)
d = Damping factor between 0 and 1
(usually kept as
0.85)
C(T) = number of links going out of T
PR(A) = the PageRank of page A
Page Rank
Class PageRankVertex
: public Vertex<double, void, double> {
public:
virtual void Compute(MessageIterator* msgs) {
if (superstep() >= 1) {
double sum = 0;
for (; !msgs->done(); msgs->Next())
sum += msgs->Value();
*MutableValue() = 0.15 + 0.85 * sum;
}
if (supersteps() < 30) {
const int64 n = GetOutEdgeIterator().size();
SendMessageToAllNeighbors(GetValue() / n);
}
else {
VoteToHalt();
}}};
Open Source
PREGEL was a research paper, Google didn't
expose any open source implementation.
As a result lots of open source
implementations came up and they keep on
improving the basic Pregel model. Most
notable two are:
a) Apache Giraph, started, maintained and
used mainly by facebook
b) CMU's GraphLab(now it is a company by
itself)
One Example: GraphLab
● GraphLab is currently is the best one
● GraphLab modified the partitioning
strategy to reduce network overhead
message transfer among workers
● GraphLab has a rich library of
machine learning algorithms and its
growing
Reference
● Pregel: A System for Large-Scale Graph
Processing
● PowerGraph: Distributed Graph-Parallel
Computation on Natural Graphs
● GraphX: A Resilient Distributed Graph
System on Spark
● giraph.apache.org
● graphlab.org

Weitere ähnliche Inhalte

Was ist angesagt?

Search tree,Tree and binary tree and heap tree
Search tree,Tree  and binary tree and heap treeSearch tree,Tree  and binary tree and heap tree
Search tree,Tree and binary tree and heap treezia eagle
 
Merkle Trees and Fusion Trees
Merkle Trees and Fusion TreesMerkle Trees and Fusion Trees
Merkle Trees and Fusion TreesRohithND
 
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...NaveenPeter8
 
Trees data structure
Trees data structureTrees data structure
Trees data structureSumit Gupta
 
5. Stream Ciphers
5. Stream Ciphers5. Stream Ciphers
5. Stream CiphersSam Bowne
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashingAmi Ranjit
 
BLS署名の実装とその応用
BLS署名の実装とその応用BLS署名の実装とその応用
BLS署名の実装とその応用MITSUNARI Shigeo
 
Data Structures - Searching & sorting
Data Structures - Searching & sortingData Structures - Searching & sorting
Data Structures - Searching & sortingKaushal Shah
 
Tree data structure
Tree data structureTree data structure
Tree data structureDana dia
 
Bresenham's line drawing algorithm
Bresenham's line drawing algorithmBresenham's line drawing algorithm
Bresenham's line drawing algorithmMani Kanth
 
A2 Computing Reverse Polish Notation Part 2
A2 Computing   Reverse Polish Notation Part 2A2 Computing   Reverse Polish Notation Part 2
A2 Computing Reverse Polish Notation Part 2pstevens1963
 

Was ist angesagt? (20)

Search tree,Tree and binary tree and heap tree
Search tree,Tree  and binary tree and heap treeSearch tree,Tree  and binary tree and heap tree
Search tree,Tree and binary tree and heap tree
 
Merkle Trees and Fusion Trees
Merkle Trees and Fusion TreesMerkle Trees and Fusion Trees
Merkle Trees and Fusion Trees
 
Graph data structure and algorithms
Graph data structure and algorithmsGraph data structure and algorithms
Graph data structure and algorithms
 
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
Hashing and Hashtable, application of hashing, advantages of hashing, disadva...
 
B and B+ tree
B and B+ treeB and B+ tree
B and B+ tree
 
Trees data structure
Trees data structureTrees data structure
Trees data structure
 
Expression trees
Expression treesExpression trees
Expression trees
 
5. Stream Ciphers
5. Stream Ciphers5. Stream Ciphers
5. Stream Ciphers
 
DS ppt
DS pptDS ppt
DS ppt
 
Indexing and-hashing
Indexing and-hashingIndexing and-hashing
Indexing and-hashing
 
Ch17 Hashing
Ch17 HashingCh17 Hashing
Ch17 Hashing
 
BLS署名の実装とその応用
BLS署名の実装とその応用BLS署名の実装とその応用
BLS署名の実装とその応用
 
Bottom up parser
Bottom up parserBottom up parser
Bottom up parser
 
B trees dbms
B trees dbmsB trees dbms
B trees dbms
 
Data Structures - Searching & sorting
Data Structures - Searching & sortingData Structures - Searching & sorting
Data Structures - Searching & sorting
 
Tree data structure
Tree data structureTree data structure
Tree data structure
 
Bresenham's line drawing algorithm
Bresenham's line drawing algorithmBresenham's line drawing algorithm
Bresenham's line drawing algorithm
 
A2 Computing Reverse Polish Notation Part 2
A2 Computing   Reverse Polish Notation Part 2A2 Computing   Reverse Polish Notation Part 2
A2 Computing Reverse Polish Notation Part 2
 
Cryptography Intro
Cryptography IntroCryptography Intro
Cryptography Intro
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
 

Andere mochten auch

Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingChris Bunch
 
Participação no projecto Comenius na Turquia
Participação no projecto Comenius na TurquiaParticipação no projecto Comenius na Turquia
Participação no projecto Comenius na TurquiaRolando Viana
 
Iiw13 identifying with_your_bank
Iiw13 identifying with_your_bankIiw13 identifying with_your_bank
Iiw13 identifying with_your_bankSteve Sidner
 
על הדבש, הגז והעוקץ
על הדבש, הגז והעוקץעל הדבש, הגז והעוקץ
על הדבש, הגז והעוקץAnochi.com.
 
Position paper cars august 11, 2011
Position paper cars august 11, 2011Position paper cars august 11, 2011
Position paper cars august 11, 2011Anochi.com.
 
Highway safety in pakistan
Highway safety in pakistanHighway safety in pakistan
Highway safety in pakistanAdnan Masood
 
Web & App Mobile per Enti pubblici
Web & App Mobile per Enti pubbliciWeb & App Mobile per Enti pubblici
Web & App Mobile per Enti pubblicimaurizio vellano
 
תוכנית טרכטנברג
תוכנית טרכטנברג תוכנית טרכטנברג
תוכנית טרכטנברג Anochi.com.
 
סמינריון ליברליזם באמריקה
סמינריון   ליברליזם באמריקהסמינריון   ליברליזם באמריקה
סמינריון ליברליזם באמריקהAnochi.com.
 
סמינריון צבא התנדבותי מקצועי ת.ה
סמינריון  צבא התנדבותי מקצועי ת.הסמינריון  צבא התנדבותי מקצועי ת.ה
סמינריון צבא התנדבותי מקצועי ת.הAnochi.com.
 
News meorav 1 print
News meorav 1 printNews meorav 1 print
News meorav 1 printAnochi.com.
 
בגץ התנועה הליברלית 1
בגץ התנועה הליברלית  1 בגץ התנועה הליברלית  1
בגץ התנועה הליברלית 1 Anochi.com.
 
דוח לוקר יולי 2015
דוח לוקר יולי 2015דוח לוקר יולי 2015
דוח לוקר יולי 2015Anochi.com.
 
Heidegger and nazism
Heidegger and nazismHeidegger and nazism
Heidegger and nazismAnochi.com.
 
Learn Australia through multimedia
Learn Australia through multimediaLearn Australia through multimedia
Learn Australia through multimediaHarisankar H
 
תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...
תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...
תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...Anochi.com.
 

Andere mochten auch (20)

Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Pregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph ProcessingPregel: A System for Large-Scale Graph Processing
Pregel: A System for Large-Scale Graph Processing
 
Participação no projecto Comenius na Turquia
Participação no projecto Comenius na TurquiaParticipação no projecto Comenius na Turquia
Participação no projecto Comenius na Turquia
 
Iiw13 identifying with_your_bank
Iiw13 identifying with_your_bankIiw13 identifying with_your_bank
Iiw13 identifying with_your_bank
 
על הדבש, הגז והעוקץ
על הדבש, הגז והעוקץעל הדבש, הגז והעוקץ
על הדבש, הגז והעוקץ
 
Position paper cars august 11, 2011
Position paper cars august 11, 2011Position paper cars august 11, 2011
Position paper cars august 11, 2011
 
Highway safety in pakistan
Highway safety in pakistanHighway safety in pakistan
Highway safety in pakistan
 
Web & App Mobile per Enti pubblici
Web & App Mobile per Enti pubbliciWeb & App Mobile per Enti pubblici
Web & App Mobile per Enti pubblici
 
Gmail15
Gmail15Gmail15
Gmail15
 
Intro to jQuery
Intro to jQueryIntro to jQuery
Intro to jQuery
 
תוכנית טרכטנברג
תוכנית טרכטנברג תוכנית טרכטנברג
תוכנית טרכטנברג
 
סמינריון ליברליזם באמריקה
סמינריון   ליברליזם באמריקהסמינריון   ליברליזם באמריקה
סמינריון ליברליזם באמריקה
 
סמינריון צבא התנדבותי מקצועי ת.ה
סמינריון  צבא התנדבותי מקצועי ת.הסמינריון  צבא התנדבותי מקצועי ת.ה
סמינריון צבא התנדבותי מקצועי ת.ה
 
News meorav 1 print
News meorav 1 printNews meorav 1 print
News meorav 1 print
 
בגץ התנועה הליברלית 1
בגץ התנועה הליברלית  1 בגץ התנועה הליברלית  1
בגץ התנועה הליברלית 1
 
דוח לוקר יולי 2015
דוח לוקר יולי 2015דוח לוקר יולי 2015
דוח לוקר יולי 2015
 
Heidegger and nazism
Heidegger and nazismHeidegger and nazism
Heidegger and nazism
 
Learn Australia through multimedia
Learn Australia through multimediaLearn Australia through multimedia
Learn Australia through multimedia
 
תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...
תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...
תיאור וניתוח שוק הדירות להשכרה והשפעת רגולציה מוצעת על הגבלת שיעור העליה בשכר...
 
Notícia gênero textual
Notícia gênero textualNotícia gênero textual
Notícia gênero textual
 

Ähnlich wie Large scale graph processing

Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Databricks
 
PREGEL a system for large scale graph processing
PREGEL a system for large scale graph processingPREGEL a system for large scale graph processing
PREGEL a system for large scale graph processingAbolfazl Asudeh
 
Big data shim
Big data shimBig data shim
Big data shimtistrue
 
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)Ovidiu Farauanu
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.pptCheeWeiTan10
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesKonstantinos Xirogiannopoulos
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesPyData
 
Lec5 pagerank
Lec5 pagerankLec5 pagerank
Lec5 pagerankCarlos
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerankmobius.cn
 
Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)Sri Prasanna
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
От Java Threads к лямбдам, Андрей Родионов
От Java Threads к лямбдам, Андрей РодионовОт Java Threads к лямбдам, Андрей Родионов
От Java Threads к лямбдам, Андрей РодионовYandex
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelinesRamesh Sampath
 
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010Yahoo Developer Network
 

Ähnlich wie Large scale graph processing (20)

Pregel
PregelPregel
Pregel
 
Mpi.Net Talk
Mpi.Net TalkMpi.Net Talk
Mpi.Net Talk
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
PREGEL a system for large scale graph processing
PREGEL a system for large scale graph processingPREGEL a system for large scale graph processing
PREGEL a system for large scale graph processing
 
Big data shim
Big data shimBig data shim
Big data shim
 
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
Functional Patterns for C++ Multithreading (C++ Dev Meetup Iasi)
 
MapReduceAlgorithms.ppt
MapReduceAlgorithms.pptMapReduceAlgorithms.ppt
MapReduceAlgorithms.ppt
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
GraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational DatabasesGraphGen: Conducting Graph Analytics over Relational Databases
GraphGen: Conducting Graph Analytics over Relational Databases
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerank
 
Lec5 pagerank
Lec5 pagerankLec5 pagerank
Lec5 pagerank
 
Lec5 Pagerank
Lec5 PagerankLec5 Pagerank
Lec5 Pagerank
 
Pagerank (from Google)
Pagerank (from Google)Pagerank (from Google)
Pagerank (from Google)
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
MapReduce
MapReduceMapReduce
MapReduce
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
От Java Threads к лямбдам, Андрей Родионов
От Java Threads к лямбдам, Андрей РодионовОт Java Threads к лямбдам, Андрей Родионов
От Java Threads к лямбдам, Андрей Родионов
 
Feature engineering pipelines
Feature engineering pipelinesFeature engineering pipelines
Feature engineering pipelines
 
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010
Design Patterns for Efficient Graph Algorithms in MapReduce__HadoopSummit2010
 

Large scale graph processing

  • 1. Large Scale Graph Processing Deepankar Patra IIT Madras
  • 2. Goal Running graph algorithms(e.g. Shortest path, connected components, finding diameter etc) on huge graphs(Terabyte or more Sized)
  • 4. Example Graph Algorithm ● Shortest Path Algorithm Source Vertex Destination Vertex
  • 5. Why? Lot of machine learning algorithms require graph computations and in the real world the input for these are huge, which cannot fit in one machine.
  • 6. Real World?  Big Graphs: ● Social Networks ● Biological Networks ● Mobile Call Networks ● Citation Networks ● World Wide Web ● Geographic Pathways ● Customer merchant graphs(Amazon, Ebay)
  • 7. Facebook Friends Graph Src: http://wisonets.files.wordpress.com/2012/09/facebook-mutual-friends2.png
  • 8. Machine Learning  Algorithms? ● Recommendation ● PageRank ● Web search ● Cyber security ● Fraud detection ● Clustering ● Shortest Path Calculation
  • 9. Graph Algorithms Typically Involve ● Performing computations at each node based on node features, edge features, and local link structure. ● Propagating computations: “traversing” the graph
  • 11. Why not MapReduce? ● Represent graphs as adjacency lists ● Perform local computations in mapper ● Pass along partial results via outlinks, keyed by destination node ● Perform aggregation in reducer on inlinks to a node ● Iterate until convergence: controlled by external “driver” ● Don’t forget to pass the graph structure between iterations
  • 12. Why not Spark? ● Spark provides GraphX library for graph & machine learning algorithms. ● But still it is not designed specifically for graph algorithms. ● So, no optimization will be available which are applicable for graphs only.
  • 13. PREGEL, Google, 2010 ● Basic idea: “think like a vertex” ● Based on Bulk Synchronous Parallel(BSP) Model ● Provides scalability ● Provides fault tolerance ● Provides flexibility to express arbitrary graph algorithms
  • 14. How does it work? ● Master/Worker architecture ● Each worker is assigned a subset of a directed graph’s vertices ● Vertex-centric model. Each vertex has: ● An arbitrary “value” that can be get/set. ● List of messages sent to it ● List of outgoing edges (edges have a value too) ● A binary state (active/inactive)
  • 16. Pregel execution model Master initiates synchronous iterations (called a “superstep”), where at every superstep: ● Workers asynchronously execute a user function on all of its vertices ● Vertices can receive messages sent to it in the last superstep ● Vertices can modify their value, modify values of edges, change the topology of the graph (add/remove vertices or edges) ● Vertices can send messages to other vertices to be received in the next superstep ● Vertices can “vote to halt” ● Execution stops when all vertices have voted to halt and no vertices have messages. ● Vote to halt trumped by non-empty message queue
  • 18. Page Rank PageRank is a link analysis algorithm that is used to determine the importance of a documentbased on the number of references to it and the importance of the source documents themselves.
  • 19. Page Rank A = A given page T1 .... Tn = Pages that point to page A (citations) d = Damping factor between 0 and 1 (usually kept as 0.85) C(T) = number of links going out of T PR(A) = the PageRank of page A
  • 20. Page Rank Class PageRankVertex : public Vertex<double, void, double> { public: virtual void Compute(MessageIterator* msgs) { if (superstep() >= 1) { double sum = 0; for (; !msgs->done(); msgs->Next()) sum += msgs->Value(); *MutableValue() = 0.15 + 0.85 * sum; } if (supersteps() < 30) { const int64 n = GetOutEdgeIterator().size(); SendMessageToAllNeighbors(GetValue() / n); } else { VoteToHalt(); }}};
  • 21. Open Source PREGEL was a research paper, Google didn't expose any open source implementation. As a result lots of open source implementations came up and they keep on improving the basic Pregel model. Most notable two are: a) Apache Giraph, started, maintained and used mainly by facebook b) CMU's GraphLab(now it is a company by itself)
  • 22. One Example: GraphLab ● GraphLab is currently is the best one ● GraphLab modified the partitioning strategy to reduce network overhead message transfer among workers ● GraphLab has a rich library of machine learning algorithms and its growing
  • 23. Reference ● Pregel: A System for Large-Scale Graph Processing ● PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs ● GraphX: A Resilient Distributed Graph System on Spark ● giraph.apache.org ● graphlab.org