SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Downloaden Sie, um offline zu lesen
Pregel: A System for Large-Scale Graph Processing
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 1 / 40
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 2 / 40
Introduction
Graphs provide a flexible abstraction for describing relationships be-
tween discrete objects.
Many problems can be modeled by graphs and solved with appro-
priate graph algorithms.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 3 / 40
Large Graph
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 4 / 40
Large-Scale Graph Processing
Large graphs need large-scale processing.
A large graph either cannot fit into memory of single computer or
it fits with huge cost.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 5 / 40
Question
Can we use platforms like MapReduce or Spark, which are based on data-parallel
model, for large-scale graph proceeding?
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 6 / 40
Data-Parallel Model for Large-Scale Graph Processing
The platforms that have worked well for developing parallel applica-
tions are not necessarily effective for large-scale graph problems.
Why?
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 7 / 40
Graph Algorithms Characteristics (1/2)
Unstructured problems
• Difficult to extract parallelism based on partitioning of the data: the
irregular structure of graphs.
• Limited scalability: unbalanced computational loads resulting from
poorly partitioned data.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
Graph Algorithms Characteristics (1/2)
Unstructured problems
• Difficult to extract parallelism based on partitioning of the data: the
irregular structure of graphs.
• Limited scalability: unbalanced computational loads resulting from
poorly partitioned data.
Data-driven computations
• Difficult to express parallelism based on partitioning of computation:
the structure of computations in the algorithm is not known a priori.
• The computations are dictated by nodes and links of the graph.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
Graph Algorithms Characteristics (2/2)
Poor data locality
• The computations and data access patterns do not have much local-
ity: the irregular structure of graphs.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
Graph Algorithms Characteristics (2/2)
Poor data locality
• The computations and data access patterns do not have much local-
ity: the irregular structure of graphs.
High data access to computation ratio
• Graph algorithms are often based on exploring the structure of a
graph to perform computations on the graph data.
• Runtime can be dominated by waiting memory fetches: low locality.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
Proposed Solution
Graph-Parallel Processing
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
Proposed Solution
Graph-Parallel Processing
Computation typically depends on the neighbors.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
Graph-Parallel Processing
Restricts the types of computation.
New techniques to partition and distribute graphs.
Exploit graph structure.
Executes graph algorithms orders-of-magnitude faster than more
general data-parallel systems.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 11 / 40
Data-Parallel vs. Graph-Parallel Computation
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 12 / 40
Data-Parallel vs. Graph-Parallel Computation
Data-parallel computation
• Record-centric view of data.
• Parallelism: processing independent data on separate resources.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
Data-Parallel vs. Graph-Parallel Computation
Data-parallel computation
• Record-centric view of data.
• Parallelism: processing independent data on separate resources.
Graph-parallel computation
• Vertex-centric view of graphs.
• Parallelism: partitioning graph (dependent) data across processing
resources, and resolving dependencies (along edges) through
iterative computation and communication.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 14 / 40
Seven Bridges of K¨onigsberg
Finding a walk through the city that would cross each bridge once
and only once.
Euler proved that the problem has no solution.
Map of K¨onigsberg in Euler’s time, highlighting the river Pregel and the bridges.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 15 / 40
Pregel
Large-scale graph-parallel processing platform developed at Google.
Inspired by bulk synchronous parallel (BSP) model.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 16 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
• A communications network that delivers messages in a point-
to-point manner.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
• A communications network that delivers messages in a point-
to-point manner.
• A mechanism for the efficient barrier synchronization for all or
a subset of the processes.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (1/2)
It is a parallel programming model.
The model consists of:
• A set of processor-memory pairs.
• A communications network that delivers messages in a point-
to-point manner.
• A mechanism for the efficient barrier synchronization for all or
a subset of the processes.
• There are no special combining, replicating, or broadcasting fa-
cilities.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
Bulk Synchronous Parallel (2/2)
All vertices update in parallel (at the same time).
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 18 / 40
Vertex-Centric Programs
Think as a vertex.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
Vertex-Centric Programs
Think as a vertex.
Each vertex computes individually its value: in parallel
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
Vertex-Centric Programs
Think as a vertex.
Each vertex computes individually its value: in parallel
Each vertex can see its local context, and updates its value accord-
ingly.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
Data Model
A directed graph that stores the program state, e.g., the current
value.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 20 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
During a superstep, user-defined functions for each vertex is invoked
(method Compute()): in parallel
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
During a superstep, user-defined functions for each vertex is invoked
(method Compute()): in parallel
A vertex in superstep S can:
• reads messages sent to it in superstep S-1.
• sends messages to other vertices: receiving at superstep S+1.
• modifies its state.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (1/3)
Applications run in sequence of iterations: supersteps
During a superstep, user-defined functions for each vertex is invoked
(method Compute()): in parallel
A vertex in superstep S can:
• reads messages sent to it in superstep S-1.
• sends messages to other vertices: receiving at superstep S+1.
• modifies its state.
Vertices communicate directly with one another by sending mes-
sages.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
A vertex deactivates itself by voting to halt: no further work to do.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
A vertex deactivates itself by voting to halt: no further work to do.
A halted vertex can be active if it receives a message.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (2/3)
Superstep 0: all vertices are in the active state.
A vertex deactivates itself by voting to halt: no further work to do.
A halted vertex can be active if it receives a message.
The whole algorithm terminates when:
• All vertices are simultaneously inactive.
• There are no messages in transit.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
Execution Model (3/3)
Aggregation: a mechanism for global communication, monitoring,
and data.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
Execution Model (3/3)
Aggregation: a mechanism for global communication, monitoring,
and data.
Runs after each superstep.
Each vertex can provide a value to an aggregator in superstep S.
The system combines those values and the resulting value is made
available to all vertices in superstep S + 1.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
Execution Model (3/3)
Aggregation: a mechanism for global communication, monitoring,
and data.
Runs after each superstep.
Each vertex can provide a value to an aggregator in superstep S.
The system combines those values and the resulting value is made
available to all vertices in superstep S + 1.
A number of predefined aggregators, e.g., min, max, sum.
Aggregation operators should be commutative and associative.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
Example: Max Value (1/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 24 / 40
Example: Max Value (2/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 25 / 40
Example: Max Value (3/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 26 / 40
Example: Max Value (4/4)
i_val := val
for each message m
if m > val then val := m
if i_val == val then
vote_to_halt
else
for each neighbor v
send_message(v, val)
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 27 / 40
Example: PageRank
Update ranks in parallel.
Iterate until convergence.
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 28 / 40
Example: PageRank
Pregel_PageRank(i, messages):
// receive all the messages
total = 0
foreach(msg in messages):
total = total + msg
// update the rank of this vertex
R[i] = 0.15 + total
// send new messages to neighbors
foreach(j in out_neighbors[i]):
sendmsg(R[i] * wij) to vertex j
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 29 / 40
Partitioning the Graph
The pregel library divides a graph into a number of partitions.
Each consisting of a set of vertices and all of those vertices’ outgoing
edges.
Vertices are assigned to partitions based on their vertex-ID (e.g.,
hash(ID)).
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 30 / 40
Implementation (1/4)
Master-worker model.
User programs are copied on machines.
One copy becomes the master.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 31 / 40
Implementation (2/4)
The master is responsible for
• Coordinating workers activity.
• Determining the number of partitions.
Each worker is responsible for
• Maintaining the state of its partitions.
• Executing the user’s Compute() method on its vertices.
• Managing messages to and from other workers.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 32 / 40
Implementation (3/4)
The master assigns one or more partitions to each worker.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
Implementation (3/4)
The master assigns one or more partitions to each worker.
The master assigns a portion of user input to each worker.
• Set of records containing an arbitrary number of vertices and edges.
• If a worker loads a vertex that belongs to that worker’s partitions,
the appropriate data structures are immediately updated.
• Otherwise the worker enqueues a message to the remote peer that
owns the vertex.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
Implementation (4/4)
After the input has finished loading, all vertices are marked as active.
The master instructs each worker to perform a superstep.
After the computation halts, the master may instruct each worker
to save its portion of the graph.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 34 / 40
Combiner
Sending a message between workers incurs some overhead: use com-
biner.
This can be reduced in some cases: sometimes vertices only care
about a summary value for the messages it is sent (e.g., min, max,
sum, avg).
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 35 / 40
Fault Tolerance (1/2)
Fault tolerance is achieved through checkpointing.
At start of each superstep, master tells workers to save their state:
• Vertex values, edge values, incoming messages
• Saved to persistent storage
Master saves aggregator values (if any).
This is not necessarily done at every superstep: costly
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 36 / 40
Fault Tolerance (2/2)
When master detects one or more worker failures:
• All workers revert to last checkpoint.
• Continue from there.
• That is a lot of repeated work.
• At least it is better than redoing the whole job.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 37 / 40
Pregel Limitations
Inefficient if different regions of the graph converge at different
speed.
Can suffer if one task is more expensive than the others.
Runtime of each phase is determined by the slowest machine.
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 38 / 40
Pregel Summary
Bulk Synchronous Parallel model
Vertex-centric
Superstep: sequence of iterations
Master-worker model
Communication: message passing
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 39 / 40
Questions?
Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 40 / 40

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudDatabricks
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseMo Patel
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadeaviadea
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLSpark Summit
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...Databricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
HDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wHDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wCloudera Japan
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark InternalsPietro Michiardi
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
 

Was ist angesagt? (20)

Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use CaseApache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
Apache Spark GraphX & GraphFrame Synthetic ID Fraud Use Case
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
GraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQLGraphFrames: Graph Queries In Spark SQL
GraphFrames: Graph Queries In Spark SQL
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
HDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13wHDFSネームノードのHAについて #hcj13w
HDFSネームノードのHAについて #hcj13w
 
Introduction to Spark Internals
Introduction to Spark InternalsIntroduction to Spark Internals
Introduction to Spark Internals
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 

Andere mochten auch

Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEPAmir Payberah
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processingAmir Payberah
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3Amir Payberah
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module ProgrammingAmir Payberah
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - GraphlabAmir Payberah
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and SpannerAmir Payberah
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1Amir Payberah
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Amir Payberah
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2Amir Payberah
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Amir Payberah
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2Amir Payberah
 

Andere mochten auch (20)

Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
 
MapReduce
MapReduceMapReduce
MapReduce
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processing
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
MegaStore and Spanner
MegaStore and SpannerMegaStore and Spanner
MegaStore and Spanner
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1
 
Threads
ThreadsThreads
Threads
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3
 
Mesos and YARN
Mesos and YARNMesos and YARN
Mesos and YARN
 
Main Memory - Part2
Main Memory - Part2Main Memory - Part2
Main Memory - Part2
 
IO Systems
IO SystemsIO Systems
IO Systems
 
Security
SecuritySecurity
Security
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Protection
ProtectionProtection
Protection
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
 
Storage
StorageStorage
Storage
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
 

Ähnlich wie Graph processing - Pregel

Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1Amir Payberah
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2Amir Payberah
 
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Amir Payberah
 
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...FogGuru MSCA Project
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2Amir Payberah
 
Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...Ahmed Koriem
 
Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1Amir Payberah
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2gregoryg
 
MHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.pptMHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.pptBiHongPhc
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Amir Payberah
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_tworamikaurraminder
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfPo-Chuan Chen
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1Amir Payberah
 
PLA Minimization -Testing
PLA Minimization -TestingPLA Minimization -Testing
PLA Minimization -TestingDr.YNM
 

Ähnlich wie Graph processing - Pregel (20)

Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2
 
Scala
ScalaScala
Scala
 
Chubby
ChubbyChubby
Chubby
 
Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1Introduction to Operating Systems - Part1
Introduction to Operating Systems - Part1
 
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
An Experiment-Driven Performance Model of Stream Processing Operators in Fog ...
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2
 
Main Memory - Part1
Main Memory - Part1Main Memory - Part1
Main Memory - Part1
 
Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...Reliability Study of wireless corba using Petri net and end to end instanteno...
Reliability Study of wireless corba using Petri net and end to end instanteno...
 
Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1Introduction to cloud computing and big data - part1
Introduction to cloud computing and big data - part1
 
Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2Boetticher Presentation Promise 2008v2
Boetticher Presentation Promise 2008v2
 
MHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.pptMHH_20Feb_2012111111111111111111111111111.ppt
MHH_20Feb_2012111111111111111111111111111.ppt
 
Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2Introduction to cloud computing and big data - part2
Introduction to cloud computing and big data - part2
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_two
 
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of TransformerspdfHyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
HyperPrompt:Prompt-based Task-Conditioning of Transformerspdf
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1
 
Transformer and BERT
Transformer and BERTTransformer and BERT
Transformer and BERT
 
PLA Minimization -Testing
PLA Minimization -TestingPLA Minimization -Testing
PLA Minimization -Testing
 

Mehr von Amir Payberah

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution NetworkAmir Payberah
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing FrameworksAmir Payberah
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformAmir Payberah
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformAmir Payberah
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2Amir Payberah
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1Amir Payberah
 
File System Interface
File System InterfaceFile System Interface
File System InterfaceAmir Payberah
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1Amir Payberah
 

Mehr von Amir Payberah (8)

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution Network
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing Frameworks
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics Platform
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1
 
File System Interface
File System InterfaceFile System Interface
File System Interface
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1
 

Kürzlich hochgeladen

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Kürzlich hochgeladen (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Graph processing - Pregel

  • 1. Pregel: A System for Large-Scale Graph Processing Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 1 / 40
  • 2. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 2 / 40
  • 3. Introduction Graphs provide a flexible abstraction for describing relationships be- tween discrete objects. Many problems can be modeled by graphs and solved with appro- priate graph algorithms. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 3 / 40
  • 4. Large Graph Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 4 / 40
  • 5. Large-Scale Graph Processing Large graphs need large-scale processing. A large graph either cannot fit into memory of single computer or it fits with huge cost. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 5 / 40
  • 6. Question Can we use platforms like MapReduce or Spark, which are based on data-parallel model, for large-scale graph proceeding? Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 6 / 40
  • 7. Data-Parallel Model for Large-Scale Graph Processing The platforms that have worked well for developing parallel applica- tions are not necessarily effective for large-scale graph problems. Why? Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 7 / 40
  • 8. Graph Algorithms Characteristics (1/2) Unstructured problems • Difficult to extract parallelism based on partitioning of the data: the irregular structure of graphs. • Limited scalability: unbalanced computational loads resulting from poorly partitioned data. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
  • 9. Graph Algorithms Characteristics (1/2) Unstructured problems • Difficult to extract parallelism based on partitioning of the data: the irregular structure of graphs. • Limited scalability: unbalanced computational loads resulting from poorly partitioned data. Data-driven computations • Difficult to express parallelism based on partitioning of computation: the structure of computations in the algorithm is not known a priori. • The computations are dictated by nodes and links of the graph. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 8 / 40
  • 10. Graph Algorithms Characteristics (2/2) Poor data locality • The computations and data access patterns do not have much local- ity: the irregular structure of graphs. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
  • 11. Graph Algorithms Characteristics (2/2) Poor data locality • The computations and data access patterns do not have much local- ity: the irregular structure of graphs. High data access to computation ratio • Graph algorithms are often based on exploring the structure of a graph to perform computations on the graph data. • Runtime can be dominated by waiting memory fetches: low locality. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 9 / 40
  • 12. Proposed Solution Graph-Parallel Processing Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
  • 13. Proposed Solution Graph-Parallel Processing Computation typically depends on the neighbors. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 10 / 40
  • 14. Graph-Parallel Processing Restricts the types of computation. New techniques to partition and distribute graphs. Exploit graph structure. Executes graph algorithms orders-of-magnitude faster than more general data-parallel systems. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 11 / 40
  • 15. Data-Parallel vs. Graph-Parallel Computation Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 12 / 40
  • 16. Data-Parallel vs. Graph-Parallel Computation Data-parallel computation • Record-centric view of data. • Parallelism: processing independent data on separate resources. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
  • 17. Data-Parallel vs. Graph-Parallel Computation Data-parallel computation • Record-centric view of data. • Parallelism: processing independent data on separate resources. Graph-parallel computation • Vertex-centric view of graphs. • Parallelism: partitioning graph (dependent) data across processing resources, and resolving dependencies (along edges) through iterative computation and communication. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 13 / 40
  • 18. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 14 / 40
  • 19. Seven Bridges of K¨onigsberg Finding a walk through the city that would cross each bridge once and only once. Euler proved that the problem has no solution. Map of K¨onigsberg in Euler’s time, highlighting the river Pregel and the bridges. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 15 / 40
  • 20. Pregel Large-scale graph-parallel processing platform developed at Google. Inspired by bulk synchronous parallel (BSP) model. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 16 / 40
  • 21. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 22. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 23. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. • A communications network that delivers messages in a point- to-point manner. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 24. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. • A communications network that delivers messages in a point- to-point manner. • A mechanism for the efficient barrier synchronization for all or a subset of the processes. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 25. Bulk Synchronous Parallel (1/2) It is a parallel programming model. The model consists of: • A set of processor-memory pairs. • A communications network that delivers messages in a point- to-point manner. • A mechanism for the efficient barrier synchronization for all or a subset of the processes. • There are no special combining, replicating, or broadcasting fa- cilities. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 17 / 40
  • 26. Bulk Synchronous Parallel (2/2) All vertices update in parallel (at the same time). Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 18 / 40
  • 27. Vertex-Centric Programs Think as a vertex. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
  • 28. Vertex-Centric Programs Think as a vertex. Each vertex computes individually its value: in parallel Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
  • 29. Vertex-Centric Programs Think as a vertex. Each vertex computes individually its value: in parallel Each vertex can see its local context, and updates its value accord- ingly. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 19 / 40
  • 30. Data Model A directed graph that stores the program state, e.g., the current value. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 20 / 40
  • 31. Execution Model (1/3) Applications run in sequence of iterations: supersteps Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 32. Execution Model (1/3) Applications run in sequence of iterations: supersteps During a superstep, user-defined functions for each vertex is invoked (method Compute()): in parallel Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 33. Execution Model (1/3) Applications run in sequence of iterations: supersteps During a superstep, user-defined functions for each vertex is invoked (method Compute()): in parallel A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 34. Execution Model (1/3) Applications run in sequence of iterations: supersteps During a superstep, user-defined functions for each vertex is invoked (method Compute()): in parallel A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Vertices communicate directly with one another by sending mes- sages. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 21 / 40
  • 35. Execution Model (2/3) Superstep 0: all vertices are in the active state. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 36. Execution Model (2/3) Superstep 0: all vertices are in the active state. A vertex deactivates itself by voting to halt: no further work to do. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 37. Execution Model (2/3) Superstep 0: all vertices are in the active state. A vertex deactivates itself by voting to halt: no further work to do. A halted vertex can be active if it receives a message. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 38. Execution Model (2/3) Superstep 0: all vertices are in the active state. A vertex deactivates itself by voting to halt: no further work to do. A halted vertex can be active if it receives a message. The whole algorithm terminates when: • All vertices are simultaneously inactive. • There are no messages in transit. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 22 / 40
  • 39. Execution Model (3/3) Aggregation: a mechanism for global communication, monitoring, and data. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
  • 40. Execution Model (3/3) Aggregation: a mechanism for global communication, monitoring, and data. Runs after each superstep. Each vertex can provide a value to an aggregator in superstep S. The system combines those values and the resulting value is made available to all vertices in superstep S + 1. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
  • 41. Execution Model (3/3) Aggregation: a mechanism for global communication, monitoring, and data. Runs after each superstep. Each vertex can provide a value to an aggregator in superstep S. The system combines those values and the resulting value is made available to all vertices in superstep S + 1. A number of predefined aggregators, e.g., min, max, sum. Aggregation operators should be commutative and associative. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 23 / 40
  • 42. Example: Max Value (1/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 24 / 40
  • 43. Example: Max Value (2/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 25 / 40
  • 44. Example: Max Value (3/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 26 / 40
  • 45. Example: Max Value (4/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 27 / 40
  • 46. Example: PageRank Update ranks in parallel. Iterate until convergence. R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 28 / 40
  • 47. Example: PageRank Pregel_PageRank(i, messages): // receive all the messages total = 0 foreach(msg in messages): total = total + msg // update the rank of this vertex R[i] = 0.15 + total // send new messages to neighbors foreach(j in out_neighbors[i]): sendmsg(R[i] * wij) to vertex j R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 29 / 40
  • 48. Partitioning the Graph The pregel library divides a graph into a number of partitions. Each consisting of a set of vertices and all of those vertices’ outgoing edges. Vertices are assigned to partitions based on their vertex-ID (e.g., hash(ID)). Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 30 / 40
  • 49. Implementation (1/4) Master-worker model. User programs are copied on machines. One copy becomes the master. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 31 / 40
  • 50. Implementation (2/4) The master is responsible for • Coordinating workers activity. • Determining the number of partitions. Each worker is responsible for • Maintaining the state of its partitions. • Executing the user’s Compute() method on its vertices. • Managing messages to and from other workers. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 32 / 40
  • 51. Implementation (3/4) The master assigns one or more partitions to each worker. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
  • 52. Implementation (3/4) The master assigns one or more partitions to each worker. The master assigns a portion of user input to each worker. • Set of records containing an arbitrary number of vertices and edges. • If a worker loads a vertex that belongs to that worker’s partitions, the appropriate data structures are immediately updated. • Otherwise the worker enqueues a message to the remote peer that owns the vertex. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 33 / 40
  • 53. Implementation (4/4) After the input has finished loading, all vertices are marked as active. The master instructs each worker to perform a superstep. After the computation halts, the master may instruct each worker to save its portion of the graph. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 34 / 40
  • 54. Combiner Sending a message between workers incurs some overhead: use com- biner. This can be reduced in some cases: sometimes vertices only care about a summary value for the messages it is sent (e.g., min, max, sum, avg). Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 35 / 40
  • 55. Fault Tolerance (1/2) Fault tolerance is achieved through checkpointing. At start of each superstep, master tells workers to save their state: • Vertex values, edge values, incoming messages • Saved to persistent storage Master saves aggregator values (if any). This is not necessarily done at every superstep: costly Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 36 / 40
  • 56. Fault Tolerance (2/2) When master detects one or more worker failures: • All workers revert to last checkpoint. • Continue from there. • That is a lot of repeated work. • At least it is better than redoing the whole job. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 37 / 40
  • 57. Pregel Limitations Inefficient if different regions of the graph converge at different speed. Can suffer if one task is more expensive than the others. Runtime of each phase is determined by the slowest machine. Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 38 / 40
  • 58. Pregel Summary Bulk Synchronous Parallel model Vertex-centric Superstep: sequence of iterations Master-worker model Communication: message passing Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 39 / 40
  • 59. Questions? Amir H. Payberah (Tehran Polytechnic) Pregel 1393/9/3 40 / 40