SlideShare ist ein Scribd-Unternehmen logo
1 von 88
Downloaden Sie, um offline zu lesen
PowerGraph and GraphX
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 1 / 61
Reminder
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 2 / 61
Data-Parallel vs. Graph-Parallel Computation
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 3 / 61
Pregel
Vertex-centric
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel
Vertex-centric
Bulk Synchronous Parallel (BSP)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel
Vertex-centric
Bulk Synchronous Parallel (BSP)
Runs in sequence of iterations (supersteps)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel
Vertex-centric
Bulk Synchronous Parallel (BSP)
Runs in sequence of iterations (supersteps)
A vertex in superstep S can:
• reads messages sent to it in superstep S-1.
• sends messages to other vertices: receiving at superstep S+1.
• modifies its state.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
Pregel Limitations
Inefficient if different regions of the graph converge at different
speed.
Can suffer if one task is more expensive than the others.
Runtime of each phase is determined by the slowest machine.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 5 / 61
GraphLab
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 6 / 61
GraphLab
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 7 / 61
GraphLab Limitations
Poor performance on Natural graphs.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 8 / 61
Natural Graphs
Graphs derived from natural phenomena.
Skewed Power-Law degree distribution.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 9 / 61
Natural Graphs Challenges
Traditional graph-partitioning algorithms (edge-cut algorithms) per-
form poorly on Power-Law Graphs.
Challenges of high-degree vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 10 / 61
Proposed Solution
Vertex-Cut Partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
Proposed Solution
Vertex-Cut Partitioning
Edge-cut Vertex-cut
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
Edge-cut vs. Vertex-cut Partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 12 / 61
Edge-cut vs. Vertex-cut Partitioning
Edge-cut Vertex-cut
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 13 / 61
PowerGraph
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 14 / 61
PowerGraph
Vertex-cut partitioning of graphs.
Factorizes the GraphLab update function into the Gather, Apply and
Scatter phases (GAS).
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 15 / 61
Gather-Apply-Scatter Programming Model
Gather
• Accumulate information about neighborhood
through a generalized sum.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
Gather-Apply-Scatter Programming Model
Gather
• Accumulate information about neighborhood
through a generalized sum.
Apply
• Apply the accumulated value to center vertex.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
Gather-Apply-Scatter Programming Model
Gather
• Accumulate information about neighborhood
through a generalized sum.
Apply
• Apply the accumulated value to center vertex.
Scatter
• Update adjacent edges and vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
Data Model
A directed graph that stores the program state, called data graph.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 17 / 61
Execution Model (1/2)
Vertex-centric programming: implementing the GASVertexProgram
interface (vertex-program for short).
Similar to Comput in Pregel, and update function in GraphLab.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 18 / 61
Execution Model (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
Execution Model (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
PageRank - Pregel
Pregel_PageRank(i, messages):
// receive all the messages
total = 0
foreach(msg in messages):
total = total + msg
// update the rank of this vertex
R[i] = 0.15 + total
// send new messages to neighbors
foreach(j in out_neighbors[i]):
sendmsg(R[i] * wij) to vertex j
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 20 / 61
PageRank - GraphLab
GraphLab_PageRank(i)
// compute sum over neighbors
total = 0
foreach(j in in_neighbors(i)):
total = total + R[j] * wji
// update the PageRank
R[i] = 0.15 + total
// trigger neighbors to run again
foreach(j in out_neighbors(i)):
signal vertex-program on j
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 21 / 61
PageRank - PowerGraph
PowerGraph_PageRank(i):
Gather(j -> i):
return wji * R[j]
sum(a, b):
return a + b
// total: Gather and sum
Apply(i, total):
R[i] = 0.15 + total
Scatter(i -> j):
if R[i] changed then activate(j)
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 22 / 61
Scheduling (1/5)
PowerGraph inherits the dynamic scheduling of GraphLab.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 23 / 61
Scheduling (2/5)
Initially all vertices are active.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
The order of executing activated vertices is up to the PowerGraph
execution engine.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
The order of executing activated vertices is up to the PowerGraph
execution engine.
Once a vertex-program completes the scatter phase it becomes in-
active until it is reactivated.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (2/5)
Initially all vertices are active.
PowerGraph executes the vertex-program on the active vertices until
none remain.
The order of executing activated vertices is up to the PowerGraph
execution engine.
Once a vertex-program completes the scatter phase it becomes in-
active until it is reactivated.
Vertices can activate themselves and neighboring vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
Scheduling (3/5)
PowerGraph can execute both synchronously and asynchronously.
• Bulk synchronous execution
• Asynchronous execution
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 25 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Minor-step: executing the gather, apply, and scatter in order.
• Runs synchronously on all active vertices with a barrier at the end.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Minor-step: executing the gather, apply, and scatter in order.
• Runs synchronously on all active vertices with a barrier at the end.
Super-step: a complete series of GAS minor-steps.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Bulk Synchronous Execution (4/5)
Similar to Pregel.
Minor-step: executing the gather, apply, and scatter in order.
• Runs synchronously on all active vertices with a barrier at the end.
Super-step: a complete series of GAS minor-steps.
Changes made to the vertex/edge data are committed at the end of
each minor-step and are visible in the subsequent minor-steps.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
• Dining philosophers problem, where each vertex is a philosopher, and
each edge is a fork.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
• Dining philosophers problem, where each vertex is a philosopher, and
each edge is a fork.
• GraphLab implements Dijkstras solution, where forks are acquired
sequentially according to a total ordering.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Scheduling - Asynchronous Execution (5/5)
Changes made to the vertex/edge data during the apply and scatter
functions are immediately committed to the graph.
• Visible to subsequent computation on neighboring vertices.
Serializability: prevents adjacent vertex-programs from running con-
currently using a fine-grained locking protocol.
• Dining philosophers problem, where each vertex is a philosopher, and
each edge is a fork.
• GraphLab implements Dijkstras solution, where forks are acquired
sequentially according to a total ordering.
• PowerGraph implements Chandy-Misra solution, which acquires all
forks simultaneously.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
Delta Caching (1/2)
Changes in a few of its neighbors → triggering a vertex-program
The gather operation is invoked on all neighbors: wasting compu-
tation cycles
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
Delta Caching (1/2)
Changes in a few of its neighbors → triggering a vertex-program
The gather operation is invoked on all neighbors: wasting compu-
tation cycles
Maintaining a cache of the accumulator av from the previous gather
phase for each vertex.
The scatter can return an additional ∆a, which is added to the
cached accumulator av.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
Delta Caching (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
Delta Caching (2/2)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
Example: PageRank (Delta-Caching)
PowerGraph_PageRank(i):
Gather(j -> i):
return wji * R[j]
sum(a, b):
return a + b
// total: Gather and sum
Apply(i, total):
new = 0.15 + total
R[i].delta = new - R[i]
R[i] = new
Scatter(i -> j):
if R[i] changed then activate(j)
return wij * R[i].delta
R[i] = 0.15 +
j∈Nbrs(i)
wjiR[j]
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 30 / 61
Graph Partitioning
Vertex-cut partitioning.
Evenly assign edges to machines.
• Minimize machines spanned by each vertex.
Two proposed solutions:
• Random edge placement.
• Greedy edge placement.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 31 / 61
Random Vertex-Cuts
Randomly assign edges to machines.
Completely parallel and easy to distribute.
High replication factor.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 32 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Case 2: If A(u) and A(v) are not empty and do not intersect, then
the edge should be assigned to one of the machines from the vertex
with the most unassigned edges.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Case 2: If A(u) and A(v) are not empty and do not intersect, then
the edge should be assigned to one of the machines from the vertex
with the most unassigned edges.
Case 3: If only one of the two vertices has been assigned, then
choose a machine from the assigned vertex.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (1/2)
A(v): set of machines that contain adjacent edges of v.
Case 1: If A(u) and A(v) intersect, then the edge should be assigned
to a machine in the intersection.
Case 2: If A(u) and A(v) are not empty and do not intersect, then
the edge should be assigned to one of the machines from the vertex
with the most unassigned edges.
Case 3: If only one of the two vertices has been assigned, then
choose a machine from the assigned vertex.
Case 4: If neither vertex has been assigned, then assign the edge to
the least loaded machine.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
Greedy Vertex-Cuts (2/2)
Coordinated edge placement:
• Requires coordination to place each edge
• Slower, but higher quality cuts
Oblivious edge placement:
• Approx. greedy objective without coordination
• Faster, but lower quality cuts
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 34 / 61
PowerGraph Summary
Gather-Apply-Scatter programming model
Synchronous and Asynchronous models
Vertex-cut graph partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 35 / 61
Any limitations?
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 36 / 61
Data-Parallel vs. Graph-Parallel Computation
Graph-parallel computation: restricting the types of computation to
achieve performance.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
Data-Parallel vs. Graph-Parallel Computation
Graph-parallel computation: restricting the types of computation to
achieve performance.
But, the same restrictions make it difficult and inefficient to express
many stages in a typical graph-analytics pipeline.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
Data-Parallel and Graph-Parallel Pipeline
Moving between table and graph views of the same physical data.
Inefficient: extensive data movement and duplication across the net-
work and file system.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 38 / 61
GraphX vs. Data-Parallel/Graph-Parallel Systems
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
GraphX vs. Data-Parallel/Graph-Parallel Systems
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 40 / 61
GraphX
New API that blurs the distinction between Tables and Graphs.
New system that unifies Data-Parallel and Graph-Parallel systems.
It is implemented on top of Spark.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 41 / 61
Unifying Data-Parallel and Graph-Parallel Analytics
Tables and Graphs are composable views of the same physical data.
Each view has its own operators that exploit the semantics of the
view to achieve efficient execution.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 42 / 61
Data Model
Property Graph: represented using two Spark RDDs:
• Edge collection: VertexRDD
• Vertex collection: EdgeRDD
// VD: the type of the vertex attribute
// ED: the type of the edge attribute
class Graph[VD, ED] {
val vertices: VertexRDD[VD]
val edges: EdgeRDD[ED]
}
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 43 / 61
Primitive Data Types
// Vertex collection
class VertexRDD[VD] extends RDD[(VertexId, VD)]
// Edge collection
class EdgeRDD[ED] extends RDD[Edge[ED]]
case class Edge[ED](srcId: VertexId = 0, dstId: VertexId = 0,
attr: ED = null.asInstanceOf[ED])
// Edge Triple
class EdgeTriplet[VD, ED] extends Edge[ED]
EdgeTriplet represents an edge along with the vertex attributes of
its neighboring vertices.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 44 / 61
Example (1/3)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 45 / 61
Example (2/3)
val sc: SparkContext
// Create an RDD for the vertices
val users: VertexRDD[(String, String)] = sc.parallelize(
Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")),
(5L, ("franklin", "prof")), (2L, ("istoica", "prof"))))
// Create an RDD for edges
val relationships: EdgeRDD[String] = sc.parallelize(
Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"),
Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi")))
// Define a default user in case there are relationship with missing user
val defaultUser = ("John Doe", "Missing")
// Build the initial Graph
val userGraph: Graph[(String, String), String] =
Graph(users, relationships, defaultUser)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 46 / 61
Example (3/3)
// Constructed from above
val userGraph: Graph[(String, String), String]
// Count all users which are postdocs
userGraph.vertices.filter((id, (name, pos)) => pos == "postdoc").count
// Count all the edges where src > dst
userGraph.edges.filter(e => e.srcId > e.dstId).count
// Use the triplets view to create an RDD of facts
val facts: RDD[String] = graph.triplets.map(triplet =>
triplet.srcAttr._1 + " is the " +
triplet.attr + " of " + triplet.dstAttr._1)
// Remove missing vertices as well as the edges to connected to them
val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing")
facts.collect.foreach(println(_))
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 47 / 61
Property Operators (1/2)
class Graph[VD, ED] {
def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED]
def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
}
They yield new graphs with the vertex or edge properties modified
by the map function.
The graph structure is unaffected.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 48 / 61
Property Operators (2/2)
val newGraph = graph.mapVertices((id, attr) => mapUdf(id, attr))
val newVertices = graph.vertices.map((id, attr) => (id, mapUdf(id, attr)))
val newGraph = Graph(newVertices, graph.edges)
Both are logically equivalent, but the second one does not preserve
the structural indices and would not benefit from the GraphX system
optimizations.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 49 / 61
Map Reduce Triplets
Map-Reduce for each vertex
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
Map Reduce Triplets
Map-Reduce for each vertex
// what is the age of the oldest follower for each user?
val oldestFollowerAge = graph.mapReduceTriplets(
e => (e.dstAttr, e.srcAttr), // Map
(a, b) => max(a, b) // Reduce
).vertices
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
Structural Operators
class Graph[VD, ED] {
// returns a new graph with all the edge directions reversed
def reverse: Graph[VD, ED]
// returns the graph containing only the vertices and edges that satisfy
// the vertex predicate
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
// a subgraph by returning a graph that contains the vertices and edges
// that are also found in the input graph
def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED]
}
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 51 / 61
Structural Operators Example
// Build the initial Graph
val graph = Graph(users, relationships, defaultUser)
// Run Connected Components
val ccGraph = graph.connectedComponents()
// Remove missing vertices as well as the edges to connected to them
val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing")
// Restrict the answer to the valid subgraph
val validCCGraph = ccGraph.mask(validGraph)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 52 / 61
Join Operators
To join data from external collections (RDDs) with graphs.
class Graph[VD, ED] {
// joins the vertices with the input RDD and returns a new graph
// by applying the map function to the result of the joined vertices
def joinVertices[U](table: RDD[(VertexId, U)])
(map: (VertexId, VD, U) => VD): Graph[VD, ED]
// similarly to joinVertices, but the map function is applied to
// all vertices and can change the vertex property type
def outerJoinVertices[U, VD2](table: RDD[(VertexId, U)])
(map: (VertexId, VD, Option[U]) => VD2): Graph[VD2, ED]
}
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 53 / 61
Graph Builders
// load a graph from a list of edges on disk
object GraphLoader {
def edgeListFile(
sc: SparkContext,
path: String,
canonicalOrientation: Boolean = false,
minEdgePartitions: Int = 1)
: Graph[Int, Int]
}
// graph file
# This is a comment
2 1
4 1
1 2
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 54 / 61
GraphX and Spark
GraphX is implemented on top of Spark
In-memory caching
Lineage-based fault tolerance
Programmable partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 55 / 61
Distributed Graph Representation (1/2)
Representing graphs using two RDDs: edge-collection and vertex-
collection
Vertex-cut partitioning (like PowerGraph)
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 56 / 61
Distributed Graph Representation (2/2)
Each vertex partition contains a bitmask and routing table.
Routing table: a logical map from a vertex id to the set of edge
partitions that contains adjacent edges.
Bitmask: enables the set intersection and filtering.
• Vertices bitmasks are updated after each operation (e.g., mapRe-
duceTriplets).
• Vertices hidden by the bitmask do not participate in the graph oper-
ations.
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 57 / 61
Summary
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 58 / 61
Summary
PowerGraph
• GAS programming model
• Vertex-cut partitioning
GraphX
• Unifying data-parallel and graph-parallel analytics
• Vertex-cut partitioning
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 59 / 61
Questions?
Acknowledgements
Some pictures were derived from the Spark web site
(http://spark.apache.org/).
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 60 / 61
Questions?
Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 61 / 61

Weitere ähnliche Inhalte

Was ist angesagt?

Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Spark Summit
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!Databricks
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit/Hadoop Summit
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkPatrick Wendell
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLDatabricks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez Hortonworks
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Databricks
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in SparkDatabricks
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseenissoz
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at FacebookDatabricks
 

Was ist angesagt? (20)

Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
Data Storage Tips for Optimal Spark Performance-(Vida Ha, Databricks)
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
A Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQLA Deep Dive into Query Execution Engine of Spark SQL
A Deep Dive into Query Execution Engine of Spark SQL
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...Performant Streaming in Production: Preventing Common Pitfalls when Productio...
Performant Streaming in Production: Preventing Common Pitfalls when Productio...
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in Spark
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Apache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAseApache phoenix: Past, Present and Future of SQL over HBAse
Apache phoenix: Past, Present and Future of SQL over HBAse
 
Spark
SparkSpark
Spark
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 

Andere mochten auch

Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - GraphlabAmir Payberah
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)Ankur Dave
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slidesSean Mulvehill
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013Amazon Web Services
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkFlink Forward
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
Graph processing - Pregel
Graph processing - PregelGraph processing - Pregel
Graph processing - PregelAmir Payberah
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEPAmir Payberah
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processingAmir Payberah
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module ProgrammingAmir Payberah
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3Amir Payberah
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1Amir Payberah
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Amir Payberah
 

Andere mochten auch (20)

Graph processing - Graphlab
Graph processing - GraphlabGraph processing - Graphlab
Graph processing - Graphlab
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
GraphX: Graph Analytics in Apache Spark (AMPCamp 5, 2014-11-20)
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Graph technology meetup slides
Graph technology meetup slidesGraph technology meetup slides
Graph technology meetup slides
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
GraphLab: Large-Scale Machine Learning on Graphs (BDT204) | AWS re:Invent 2013
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache FlinkMartin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
Martin Junghans – Gradoop: Scalable Graph Analytics with Apache Flink
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
Graph processing - Pregel
Graph processing - PregelGraph processing - Pregel
Graph processing - Pregel
 
Spark Stream and SEEP
Spark Stream and SEEPSpark Stream and SEEP
Spark Stream and SEEP
 
MapReduce
MapReduceMapReduce
MapReduce
 
Introduction to stream processing
Introduction to stream processingIntroduction to stream processing
Introduction to stream processing
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
Process Management - Part3
Process Management - Part3Process Management - Part3
Process Management - Part3
 
Process Management - Part1
Process Management - Part1Process Management - Part1
Process Management - Part1
 
Threads
ThreadsThreads
Threads
 
Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3Introduction to Operating Systems - Part3
Introduction to Operating Systems - Part3
 

Ähnlich wie Graph processing - Powergraph and GraphX

Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1Amir Payberah
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2Amir Payberah
 
Comparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSOComparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSOINFOGAIN PUBLICATION
 
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...iaemedu
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_tworamikaurraminder
 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3Saad Farooq
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2Amir Payberah
 
detail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfdetail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfssuserf86fba
 
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
Algorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTSAlgorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTS
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTSAlicia Edwards
 
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUAcceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUMahesh Khadatare
 
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...IJERA Editor
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMIJCSEA Journal
 
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...Abhishek Jain
 
Analysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB SoftwareAnalysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB SoftwareAllison Thompson
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...Eswar Publications
 
IRJET- Structural Analysis of Transmission Tower: State of Art
IRJET-  	  Structural Analysis of Transmission Tower: State of ArtIRJET-  	  Structural Analysis of Transmission Tower: State of Art
IRJET- Structural Analysis of Transmission Tower: State of ArtIRJET Journal
 
Estimation of Base Drag On Supersonic Cruise Missile
Estimation of Base Drag On Supersonic Cruise MissileEstimation of Base Drag On Supersonic Cruise Missile
Estimation of Base Drag On Supersonic Cruise MissileIRJET Journal
 

Ähnlich wie Graph processing - Powergraph and GraphX (20)

Scala
ScalaScala
Scala
 
Process Synchronization - Part1
Process Synchronization -  Part1Process Synchronization -  Part1
Process Synchronization - Part1
 
Process Synchronization - Part2
Process Synchronization -  Part2Process Synchronization -  Part2
Process Synchronization - Part2
 
Comparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSOComparative analysis of FACTS controllers by tuning employing GA and PSO
Comparative analysis of FACTS controllers by tuning employing GA and PSO
 
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
Numerical simulation of flow modeling in ducted axial fan using simpson’s 13r...
 
Raminder kaur presentation_two
Raminder kaur presentation_twoRaminder kaur presentation_two
Raminder kaur presentation_two
 
saad faim paper3
saad faim paper3saad faim paper3
saad faim paper3
 
Process Management - Part2
Process Management - Part2Process Management - Part2
Process Management - Part2
 
detail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdfdetail of flowchart and algorithm that are used in programmingpdf
detail of flowchart and algorithm that are used in programmingpdf
 
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
Algorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTSAlgorithm   Flowchart Manual ALGORITHM   FLOWCHART MANUAL For STUDENTS
Algorithm Flowchart Manual ALGORITHM FLOWCHART MANUAL For STUDENTS
 
Algorithm manual
Algorithm manualAlgorithm manual
Algorithm manual
 
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPUAcceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
Acceleration of the Longwave Rapid Radiative Transfer Module using GPGPU
 
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
Improving the Hydraulic Efficiency of Centrifugal Pumps through Computational...
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
 
Co&al lecture-07
Co&al lecture-07Co&al lecture-07
Co&al lecture-07
 
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
AIROPT: A Multi-Objective Evolutionary Algorithm based Aerodynamic Shape Opti...
 
Analysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB SoftwareAnalysis Of Transmission Line Using MATLAB Software
Analysis Of Transmission Line Using MATLAB Software
 
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...A Novel Technique in Software Engineering for Building Scalable Large Paralle...
A Novel Technique in Software Engineering for Building Scalable Large Paralle...
 
IRJET- Structural Analysis of Transmission Tower: State of Art
IRJET-  	  Structural Analysis of Transmission Tower: State of ArtIRJET-  	  Structural Analysis of Transmission Tower: State of Art
IRJET- Structural Analysis of Transmission Tower: State of Art
 
Estimation of Base Drag On Supersonic Cruise Missile
Estimation of Base Drag On Supersonic Cruise MissileEstimation of Base Drag On Supersonic Cruise Missile
Estimation of Base Drag On Supersonic Cruise Missile
 

Mehr von Amir Payberah

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution NetworkAmir Payberah
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing FrameworksAmir Payberah
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformAmir Payberah
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformAmir Payberah
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2Amir Payberah
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1Amir Payberah
 
File System Interface
File System InterfaceFile System Interface
File System InterfaceAmir Payberah
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2Amir Payberah
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1Amir Payberah
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2Amir Payberah
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1Amir Payberah
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Amir Payberah
 

Mehr von Amir Payberah (20)

P2P Content Distribution Network
P2P Content Distribution NetworkP2P Content Distribution Network
P2P Content Distribution Network
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Data Intensive Computing Frameworks
Data Intensive Computing FrameworksData Intensive Computing Frameworks
Data Intensive Computing Frameworks
 
The Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics PlatformThe Stratosphere Big Data Analytics Platform
The Stratosphere Big Data Analytics Platform
 
The Spark Big Data Analytics Platform
The Spark Big Data Analytics PlatformThe Spark Big Data Analytics Platform
The Spark Big Data Analytics Platform
 
Security
SecuritySecurity
Security
 
Protection
ProtectionProtection
Protection
 
IO Systems
IO SystemsIO Systems
IO Systems
 
File System Implementation - Part2
File System Implementation - Part2File System Implementation - Part2
File System Implementation - Part2
 
File System Implementation - Part1
File System Implementation - Part1File System Implementation - Part1
File System Implementation - Part1
 
File System Interface
File System InterfaceFile System Interface
File System Interface
 
Storage
StorageStorage
Storage
 
Virtual Memory - Part2
Virtual Memory - Part2Virtual Memory - Part2
Virtual Memory - Part2
 
Virtual Memory - Part1
Virtual Memory - Part1Virtual Memory - Part1
Virtual Memory - Part1
 
Main Memory - Part2
Main Memory - Part2Main Memory - Part2
Main Memory - Part2
 
Main Memory - Part1
Main Memory - Part1Main Memory - Part1
Main Memory - Part1
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
CPU Scheduling - Part2
CPU Scheduling - Part2CPU Scheduling - Part2
CPU Scheduling - Part2
 
CPU Scheduling - Part1
CPU Scheduling - Part1CPU Scheduling - Part1
CPU Scheduling - Part1
 
Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2Introduction to Operating Systems - Part2
Introduction to Operating Systems - Part2
 

Kürzlich hochgeladen

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Kürzlich hochgeladen (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Graph processing - Powergraph and GraphX

  • 1. PowerGraph and GraphX Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 1 / 61
  • 2. Reminder Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 2 / 61
  • 3. Data-Parallel vs. Graph-Parallel Computation Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 3 / 61
  • 4. Pregel Vertex-centric Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 5. Pregel Vertex-centric Bulk Synchronous Parallel (BSP) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 6. Pregel Vertex-centric Bulk Synchronous Parallel (BSP) Runs in sequence of iterations (supersteps) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 7. Pregel Vertex-centric Bulk Synchronous Parallel (BSP) Runs in sequence of iterations (supersteps) A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 4 / 61
  • 8. Pregel Limitations Inefficient if different regions of the graph converge at different speed. Can suffer if one task is more expensive than the others. Runtime of each phase is determined by the slowest machine. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 5 / 61
  • 9. GraphLab Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 6 / 61
  • 10. GraphLab Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 7 / 61
  • 11. GraphLab Limitations Poor performance on Natural graphs. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 8 / 61
  • 12. Natural Graphs Graphs derived from natural phenomena. Skewed Power-Law degree distribution. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 9 / 61
  • 13. Natural Graphs Challenges Traditional graph-partitioning algorithms (edge-cut algorithms) per- form poorly on Power-Law Graphs. Challenges of high-degree vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 10 / 61
  • 14. Proposed Solution Vertex-Cut Partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
  • 15. Proposed Solution Vertex-Cut Partitioning Edge-cut Vertex-cut Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 11 / 61
  • 16. Edge-cut vs. Vertex-cut Partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 12 / 61
  • 17. Edge-cut vs. Vertex-cut Partitioning Edge-cut Vertex-cut Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 13 / 61
  • 18. PowerGraph Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 14 / 61
  • 19. PowerGraph Vertex-cut partitioning of graphs. Factorizes the GraphLab update function into the Gather, Apply and Scatter phases (GAS). Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 15 / 61
  • 20. Gather-Apply-Scatter Programming Model Gather • Accumulate information about neighborhood through a generalized sum. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
  • 21. Gather-Apply-Scatter Programming Model Gather • Accumulate information about neighborhood through a generalized sum. Apply • Apply the accumulated value to center vertex. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
  • 22. Gather-Apply-Scatter Programming Model Gather • Accumulate information about neighborhood through a generalized sum. Apply • Apply the accumulated value to center vertex. Scatter • Update adjacent edges and vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 16 / 61
  • 23. Data Model A directed graph that stores the program state, called data graph. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 17 / 61
  • 24. Execution Model (1/2) Vertex-centric programming: implementing the GASVertexProgram interface (vertex-program for short). Similar to Comput in Pregel, and update function in GraphLab. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 18 / 61
  • 25. Execution Model (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
  • 26. Execution Model (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 19 / 61
  • 27. PageRank - Pregel Pregel_PageRank(i, messages): // receive all the messages total = 0 foreach(msg in messages): total = total + msg // update the rank of this vertex R[i] = 0.15 + total // send new messages to neighbors foreach(j in out_neighbors[i]): sendmsg(R[i] * wij) to vertex j R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 20 / 61
  • 28. PageRank - GraphLab GraphLab_PageRank(i) // compute sum over neighbors total = 0 foreach(j in in_neighbors(i)): total = total + R[j] * wji // update the PageRank R[i] = 0.15 + total // trigger neighbors to run again foreach(j in out_neighbors(i)): signal vertex-program on j R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 21 / 61
  • 29. PageRank - PowerGraph PowerGraph_PageRank(i): Gather(j -> i): return wji * R[j] sum(a, b): return a + b // total: Gather and sum Apply(i, total): R[i] = 0.15 + total Scatter(i -> j): if R[i] changed then activate(j) R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 22 / 61
  • 30. Scheduling (1/5) PowerGraph inherits the dynamic scheduling of GraphLab. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 23 / 61
  • 31. Scheduling (2/5) Initially all vertices are active. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 32. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 33. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. The order of executing activated vertices is up to the PowerGraph execution engine. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 34. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. The order of executing activated vertices is up to the PowerGraph execution engine. Once a vertex-program completes the scatter phase it becomes in- active until it is reactivated. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 35. Scheduling (2/5) Initially all vertices are active. PowerGraph executes the vertex-program on the active vertices until none remain. The order of executing activated vertices is up to the PowerGraph execution engine. Once a vertex-program completes the scatter phase it becomes in- active until it is reactivated. Vertices can activate themselves and neighboring vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 24 / 61
  • 36. Scheduling (3/5) PowerGraph can execute both synchronously and asynchronously. • Bulk synchronous execution • Asynchronous execution Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 25 / 61
  • 37. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 38. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Minor-step: executing the gather, apply, and scatter in order. • Runs synchronously on all active vertices with a barrier at the end. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 39. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Minor-step: executing the gather, apply, and scatter in order. • Runs synchronously on all active vertices with a barrier at the end. Super-step: a complete series of GAS minor-steps. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 40. Scheduling - Bulk Synchronous Execution (4/5) Similar to Pregel. Minor-step: executing the gather, apply, and scatter in order. • Runs synchronously on all active vertices with a barrier at the end. Super-step: a complete series of GAS minor-steps. Changes made to the vertex/edge data are committed at the end of each minor-step and are visible in the subsequent minor-steps. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 26 / 61
  • 41. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 42. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 43. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. • Dining philosophers problem, where each vertex is a philosopher, and each edge is a fork. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 44. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. • Dining philosophers problem, where each vertex is a philosopher, and each edge is a fork. • GraphLab implements Dijkstras solution, where forks are acquired sequentially according to a total ordering. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 45. Scheduling - Asynchronous Execution (5/5) Changes made to the vertex/edge data during the apply and scatter functions are immediately committed to the graph. • Visible to subsequent computation on neighboring vertices. Serializability: prevents adjacent vertex-programs from running con- currently using a fine-grained locking protocol. • Dining philosophers problem, where each vertex is a philosopher, and each edge is a fork. • GraphLab implements Dijkstras solution, where forks are acquired sequentially according to a total ordering. • PowerGraph implements Chandy-Misra solution, which acquires all forks simultaneously. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 27 / 61
  • 46. Delta Caching (1/2) Changes in a few of its neighbors → triggering a vertex-program The gather operation is invoked on all neighbors: wasting compu- tation cycles Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
  • 47. Delta Caching (1/2) Changes in a few of its neighbors → triggering a vertex-program The gather operation is invoked on all neighbors: wasting compu- tation cycles Maintaining a cache of the accumulator av from the previous gather phase for each vertex. The scatter can return an additional ∆a, which is added to the cached accumulator av. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 28 / 61
  • 48. Delta Caching (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
  • 49. Delta Caching (2/2) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 29 / 61
  • 50. Example: PageRank (Delta-Caching) PowerGraph_PageRank(i): Gather(j -> i): return wji * R[j] sum(a, b): return a + b // total: Gather and sum Apply(i, total): new = 0.15 + total R[i].delta = new - R[i] R[i] = new Scatter(i -> j): if R[i] changed then activate(j) return wij * R[i].delta R[i] = 0.15 + j∈Nbrs(i) wjiR[j] Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 30 / 61
  • 51. Graph Partitioning Vertex-cut partitioning. Evenly assign edges to machines. • Minimize machines spanned by each vertex. Two proposed solutions: • Random edge placement. • Greedy edge placement. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 31 / 61
  • 52. Random Vertex-Cuts Randomly assign edges to machines. Completely parallel and easy to distribute. High replication factor. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 32 / 61
  • 53. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 54. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 55. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Case 2: If A(u) and A(v) are not empty and do not intersect, then the edge should be assigned to one of the machines from the vertex with the most unassigned edges. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 56. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Case 2: If A(u) and A(v) are not empty and do not intersect, then the edge should be assigned to one of the machines from the vertex with the most unassigned edges. Case 3: If only one of the two vertices has been assigned, then choose a machine from the assigned vertex. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 57. Greedy Vertex-Cuts (1/2) A(v): set of machines that contain adjacent edges of v. Case 1: If A(u) and A(v) intersect, then the edge should be assigned to a machine in the intersection. Case 2: If A(u) and A(v) are not empty and do not intersect, then the edge should be assigned to one of the machines from the vertex with the most unassigned edges. Case 3: If only one of the two vertices has been assigned, then choose a machine from the assigned vertex. Case 4: If neither vertex has been assigned, then assign the edge to the least loaded machine. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 33 / 61
  • 58. Greedy Vertex-Cuts (2/2) Coordinated edge placement: • Requires coordination to place each edge • Slower, but higher quality cuts Oblivious edge placement: • Approx. greedy objective without coordination • Faster, but lower quality cuts Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 34 / 61
  • 59. PowerGraph Summary Gather-Apply-Scatter programming model Synchronous and Asynchronous models Vertex-cut graph partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 35 / 61
  • 60. Any limitations? Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 36 / 61
  • 61. Data-Parallel vs. Graph-Parallel Computation Graph-parallel computation: restricting the types of computation to achieve performance. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
  • 62. Data-Parallel vs. Graph-Parallel Computation Graph-parallel computation: restricting the types of computation to achieve performance. But, the same restrictions make it difficult and inefficient to express many stages in a typical graph-analytics pipeline. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 37 / 61
  • 63. Data-Parallel and Graph-Parallel Pipeline Moving between table and graph views of the same physical data. Inefficient: extensive data movement and duplication across the net- work and file system. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 38 / 61
  • 64. GraphX vs. Data-Parallel/Graph-Parallel Systems Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
  • 65. GraphX vs. Data-Parallel/Graph-Parallel Systems Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 39 / 61
  • 66. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 40 / 61
  • 67. GraphX New API that blurs the distinction between Tables and Graphs. New system that unifies Data-Parallel and Graph-Parallel systems. It is implemented on top of Spark. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 41 / 61
  • 68. Unifying Data-Parallel and Graph-Parallel Analytics Tables and Graphs are composable views of the same physical data. Each view has its own operators that exploit the semantics of the view to achieve efficient execution. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 42 / 61
  • 69. Data Model Property Graph: represented using two Spark RDDs: • Edge collection: VertexRDD • Vertex collection: EdgeRDD // VD: the type of the vertex attribute // ED: the type of the edge attribute class Graph[VD, ED] { val vertices: VertexRDD[VD] val edges: EdgeRDD[ED] } Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 43 / 61
  • 70. Primitive Data Types // Vertex collection class VertexRDD[VD] extends RDD[(VertexId, VD)] // Edge collection class EdgeRDD[ED] extends RDD[Edge[ED]] case class Edge[ED](srcId: VertexId = 0, dstId: VertexId = 0, attr: ED = null.asInstanceOf[ED]) // Edge Triple class EdgeTriplet[VD, ED] extends Edge[ED] EdgeTriplet represents an edge along with the vertex attributes of its neighboring vertices. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 44 / 61
  • 71. Example (1/3) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 45 / 61
  • 72. Example (2/3) val sc: SparkContext // Create an RDD for the vertices val users: VertexRDD[(String, String)] = sc.parallelize( Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica", "prof")))) // Create an RDD for edges val relationships: EdgeRDD[String] = sc.parallelize( Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"))) // Define a default user in case there are relationship with missing user val defaultUser = ("John Doe", "Missing") // Build the initial Graph val userGraph: Graph[(String, String), String] = Graph(users, relationships, defaultUser) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 46 / 61
  • 73. Example (3/3) // Constructed from above val userGraph: Graph[(String, String), String] // Count all users which are postdocs userGraph.vertices.filter((id, (name, pos)) => pos == "postdoc").count // Count all the edges where src > dst userGraph.edges.filter(e => e.srcId > e.dstId).count // Use the triplets view to create an RDD of facts val facts: RDD[String] = graph.triplets.map(triplet => triplet.srcAttr._1 + " is the " + triplet.attr + " of " + triplet.dstAttr._1) // Remove missing vertices as well as the edges to connected to them val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing") facts.collect.foreach(println(_)) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 47 / 61
  • 74. Property Operators (1/2) class Graph[VD, ED] { def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED] def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2] def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2] } They yield new graphs with the vertex or edge properties modified by the map function. The graph structure is unaffected. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 48 / 61
  • 75. Property Operators (2/2) val newGraph = graph.mapVertices((id, attr) => mapUdf(id, attr)) val newVertices = graph.vertices.map((id, attr) => (id, mapUdf(id, attr))) val newGraph = Graph(newVertices, graph.edges) Both are logically equivalent, but the second one does not preserve the structural indices and would not benefit from the GraphX system optimizations. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 49 / 61
  • 76. Map Reduce Triplets Map-Reduce for each vertex Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
  • 77. Map Reduce Triplets Map-Reduce for each vertex // what is the age of the oldest follower for each user? val oldestFollowerAge = graph.mapReduceTriplets( e => (e.dstAttr, e.srcAttr), // Map (a, b) => max(a, b) // Reduce ).vertices Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 50 / 61
  • 78. Structural Operators class Graph[VD, ED] { // returns a new graph with all the edge directions reversed def reverse: Graph[VD, ED] // returns the graph containing only the vertices and edges that satisfy // the vertex predicate def subgraph(epred: EdgeTriplet[VD,ED] => Boolean, vpred: (VertexId, VD) => Boolean): Graph[VD, ED] // a subgraph by returning a graph that contains the vertices and edges // that are also found in the input graph def mask[VD2, ED2](other: Graph[VD2, ED2]): Graph[VD, ED] } Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 51 / 61
  • 79. Structural Operators Example // Build the initial Graph val graph = Graph(users, relationships, defaultUser) // Run Connected Components val ccGraph = graph.connectedComponents() // Remove missing vertices as well as the edges to connected to them val validGraph = graph.subgraph(vpred = (id, attr) => attr._2 != "Missing") // Restrict the answer to the valid subgraph val validCCGraph = ccGraph.mask(validGraph) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 52 / 61
  • 80. Join Operators To join data from external collections (RDDs) with graphs. class Graph[VD, ED] { // joins the vertices with the input RDD and returns a new graph // by applying the map function to the result of the joined vertices def joinVertices[U](table: RDD[(VertexId, U)]) (map: (VertexId, VD, U) => VD): Graph[VD, ED] // similarly to joinVertices, but the map function is applied to // all vertices and can change the vertex property type def outerJoinVertices[U, VD2](table: RDD[(VertexId, U)]) (map: (VertexId, VD, Option[U]) => VD2): Graph[VD2, ED] } Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 53 / 61
  • 81. Graph Builders // load a graph from a list of edges on disk object GraphLoader { def edgeListFile( sc: SparkContext, path: String, canonicalOrientation: Boolean = false, minEdgePartitions: Int = 1) : Graph[Int, Int] } // graph file # This is a comment 2 1 4 1 1 2 Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 54 / 61
  • 82. GraphX and Spark GraphX is implemented on top of Spark In-memory caching Lineage-based fault tolerance Programmable partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 55 / 61
  • 83. Distributed Graph Representation (1/2) Representing graphs using two RDDs: edge-collection and vertex- collection Vertex-cut partitioning (like PowerGraph) Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 56 / 61
  • 84. Distributed Graph Representation (2/2) Each vertex partition contains a bitmask and routing table. Routing table: a logical map from a vertex id to the set of edge partitions that contains adjacent edges. Bitmask: enables the set intersection and filtering. • Vertices bitmasks are updated after each operation (e.g., mapRe- duceTriplets). • Vertices hidden by the bitmask do not participate in the graph oper- ations. Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 57 / 61
  • 85. Summary Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 58 / 61
  • 86. Summary PowerGraph • GAS programming model • Vertex-cut partitioning GraphX • Unifying data-parallel and graph-parallel analytics • Vertex-cut partitioning Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 59 / 61
  • 87. Questions? Acknowledgements Some pictures were derived from the Spark web site (http://spark.apache.org/). Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 60 / 61
  • 88. Questions? Amir H. Payberah (Tehran Polytechnic) PowerGraph and GraphX 1393/9/10 61 / 61