SlideShare ist ein Scribd-Unternehmen logo
1 von 34
András Németh, CRUNCH, Budapest, 20th October, 2017
Scalable Distributed Graph Algorithms
on Apache Spark
Why scalable graph algorithms?
© Lynx
Analytics
Graphs are all around us …
Citations
3
Social
graphs
Internet Transportatio
n networks
Protein
structure
Money
transfers
Viral
infection
patterns
Electronic
circuits
Telecommunication
networks
Knowledge
representations
(i.e. Google’s
Knowledge Graph)
Neural networks
(artificial and
natural)
© Lynx
Analytics
… and they are full of hidden secrets
4
Looking close enough, they can:
• Predict churn based on embeddedness in
the call graph
• Figure out demographic based on social
relationships
and communities
• Find fraudsters in a bank’s transaction
network
• Help find influencers and design viral
campaigns
• Identify which bus routes are
unnecessary and which ones need more
capacity
© Lynx
Analytics
But they are large!
5
Telco call graph
hundreds of
millions
of vertices and
billions of edges
Google Knowledge
Graph
70 billion edges
Internet
tens of billions of
vertices and hundreds
of billions of edges
Brain
hundred billion
vertices and
hundred trillion
edges
Apache Spark – horizontal scaling to the
rescue
© Lynx
Analytics
What is Apache Spark?
7
Apache Spark is the world’s most trendy
scalable distributed data processing
engine.
• It takes care of the plumbing to run
distributed algorithms on huge cluters
• break down work to tasks
• scheduling of tasks on workers
• distribution of input/output data
and processing code
• distributed FS and standard file
format access
• error recovery
• etc, etc
• Elegant, high level yet powerful API
• Scala, Python and R
• Higher level API add ons: SQL, machine
learning,
graph processing
© Lynx
Analytics
But graph algorithms are hard to parallelize
8
• Distributed computation works by splitting
input data into manageable sized partitions
• Graph algorithms are all about checking and
modifying state of neighborings
• Ideal partitioning would not cut through
edges
• Too bad that this is absolutely impossible
for 99% of graph
• Methods exists to minimizes edge cuts, but
even one cut edge implies information
exchange among partitions, which is very
expensive
The Pregel Model
© Lynx
Analytics
Pregel model - definition
10
Based on Google’s “Pregel: A System for
Large-Scale Graph Processing”, Pregel is
an algorthmic framework to manage (if not
solve) the above difficulties.
A Pregel algorithm is a repetition of the
following steps:
1. Some vertex local computation (using
also messages received – see next
point)
2. Sending messages to neighboring
vertices
© Lynx
Analytics
Pregel example – shortest paths from multiple sources
11
1. All vertices start with an initial
path length estimate of infinity,
except sources start
with 0
2. Vertices send their current length
estimate to all neighbors
3. All vertices update their estimate
based on their current value and
values coming from neighbors
4. Iterate 2 and 3 until convergence or
after N iterations if we are only
interested in paths
of length at most N
If vertices remember which neighbor
produced
the minimum in step 3 above then
paths can be reconstructed.
Also easy to extend to cases with
different edge “lengths” and initial
“starting costs”.
© Lynx
Analytics
Pregel example – shortest paths from multiple sources
12
1. All vertices start with an initial
path length estimate of infinity,
except sources start
with 0
2. Vertices send their current length
estimate to all neighbors
3. All vertices update their estimate
based on their current value and
values coming from neighbors
4. Iterate 2 and 3 until convergence or
after N iterations if we are only
interested in paths
of length at most N
If vertices remember which neighbor
produced
the minimum in step 3 above then
paths can be reconstructed.
Also easy to extend to cases with
different edge “lengths” and initial
“starting costs”.
© Lynx
Analytics
Pregel example – shortest paths from multiple sources
13
1. All vertices start with an initial
path length estimate of infinity,
except sources start
with 0
2. Vertices send their current length
estimate to all neighbors
3. All vertices update their estimate
based on their current value and
values coming from neighbors
4. Iterate 2 and 3 until convergence or
after N iterations if we are only
interested in paths
of length at most N
If vertices remember which neighbor
produced
the minimum in step 3 above then
paths can be reconstructed.
Also easy to extend to cases with
different edge “lengths” and initial
“starting costs”.
© Lynx
Analytics
Pregel example – shortest paths from multiple sources
14
1. All vertices start with an initial
path length estimate of infinity,
except sources start
with 0
2. Vertices send their current length
estimate to all neighbors
3. All vertices update their estimate
based on their current value and
values coming from neighbors
4. Iterate 2 and 3 until convergence or
after N iterations if we are only
interested in paths
of length at most N
If vertices remember which neighbor
produced
the minimum in step 3 above then
paths can be reconstructed.
Also easy to extend to cases with
different edge “lengths” and initial
“starting costs”.
© Lynx
Analytics
Pregel example – shortest paths from multiple sources
15
1. All vertices start with an initial
path length estimate of infinity,
except sources start
with 0
2. Vertices send their current length
estimate to all neighbors
3. All vertices update their estimate
based on their current value and
values coming from neighbors
4. Iterate 2 and 3 until convergence or
after N iterations if we are only
interested in paths
of length at most N
If vertices remember which neighbor
produced
the minimum in step 3 above then
paths can be reconstructed.
Also easy to extend to cases with
different edge “lengths” and initial
“starting costs”.
© Lynx
Analytics
Pregel example – shortest paths from multiple sources
16
1. All vertices start with an initial
path length estimate of infinity,
except sources start
with 0
2. Vertices send their current length
estimate to all neighbors
3. All vertices update their estimate
based on their current value and
values coming from neighbors
4. Iterate 2 and 3 until convergence or
after N iterations if we are only
interested in paths
of length at most N
If vertices remember which neighbor
produced
the minimum in step 3 above then
paths can be reconstructed.
Also easy to extend to cases with
different edge “lengths” and initial
“starting costs”.
© Lynx
Analytics
Pregel example – shortest paths from multiple sources
17
1. All vertices start with an initial
path length estimate of infinity,
except sources start
with 0
2. Vertices send their current length
estimate to all neighbors
3. All vertices update their estimate
based on their current value and
values coming from neighbors
4. Iterate 2 and 3 until convergence or
after N iterations if we are only
interested in paths
of length at most N
If vertices remember which neighbor
produced
the minimum in step 3 above then
paths can be reconstructed.
Also easy to extend to cases with
different edge “lengths” and initial
“starting costs”.
© Lynx
Analytics
Pregel example - pagerank
18
1. All vertices start with an initial
pagerank estimate
(say 1 for all)
2. All vertices send their current pagerank
estimate to their outneighbors
3. Based on incoming pagerank estimates all
vertices recompute their pagerank
estimate
4. Repeat 2 and 3 until convergence
or getting bored
© Lynx
Analytics
Pregel on Spark
19
// Contains actual (vertex id, vertex state) pairs
var vertexStates: RDD[(ID, VertexState)] = …. Code to initialize vertex states …
while (… ! halting condition …) {
// returns an iterator of the (target vertex id, message) pairs sent by a given vertex
def messageGenerator(
sourceId: ID,
sourceState: VertexState,
neighbors: Iterable[ID]): Iterator[(ID, Message)] = { … }
val messages: RDD[(ID, Message)] = vertexStates.join(edgesBySource.groupByKey).flatMap{
case (id, (state, neighbors)) => messageGenerator(id, state, neighbors)}
// returns new state given old state and messages
def newState(originalState: VertexState, messages: Iterable[Message]): VertexState = { … }
vertexStates = vertexStates.join(messages.groupByKey).mapValues{
case (originalState, messages) => newState(originalState, messages)}
}
© Lynx
Analytics
Pregel on Spark
20
Conceptually it’s super easy to represent
a Pregel algorithm as a Spark program.
There are some details to watch out for,
though:
• Lots of joins – they’d better be fast
• Partitioning has to be controlled
closely
• Same partitioning for states
throughout
the algorithm
• Above partitioning “enough” for
number
of messages, not just number of
states
• Potential hotspotting if a vertex
generates
or receives too many messages
© Lynx
Analytics
Fast joins – sorted RDDs
21
• Built-in Spark join:
• Repartition both datasets by the
hash of the join keys
• Move corresponding partition pairs
to the same machine
• Join a single partition by
collecting key-value pairs in a map
• This is somewhat slow and memory
intensive
• Merge joins
• much faster
• constant memory overhead
• Requires both RDDs sorted by key within
partitions
• This is done via an RDD subclass
SortedRDD developed at Lynx
© Lynx
Analytics
Sorted joins - results
22
© Lynx
Analytics
Hotspots what & why
23
• Hotspotting means that partitioning
of the work fails
• Causes seriour performance hits even
if total amount of work is manageable
• Large partitions even cause OOM
errors
• Large degree vertices are notorious
to cause hotsports in graph
algorithms
• Very typical problem with large,
scale free (in other words, realistic
 ) graphs
© Lynx
Analytics
Hotspots – how to deal with them?
24
Partition work based on edges, not
vertices!
E.g. instead of using our original message
generator:
def messageGenerator(sourceId: ID,
sourceState: VertexState,
neigbors: Iterable[ID])
on all vertices use something like this on
all edges:
def messageGenerator(sourceId: ID,
destinationId: ID,
sourceState: VertexState)
This way we never have to collect all
edges of
a single vertex!
Similar tricks can be applied to
destination vertices:
• Incoming messages can be pre-aggregated
© Lynx
Analytics
Hotspots – join problems
25
How do you exactly collect, say, source states on all edges? Easy!
val edges: RDD[(ID, ID)] // Edges represented as (src, dst) ids.
val edgesWithStates: RDD[(ID, ID, VertexState)] =
edges.groupByKey().join(vertexStates).flatMap {
case (src, (dsts, vertexState)) => dsts.map(dst => (src, dst, vertexState))
}
Wait a second! That groupByKey in itself can create
a hotspot!
This does exactly what we pledged not to do: collects all edges of a vertex to a
single partitioner…
© Lynx
Analytics
Hybrid lookup – the task
26
The technique we use to solve this
problem is what we call a hybrid lookup.
Problem statement
We are given two RDDs, both with the
same keyspace:
val hybrid: RDD[(K,V1)]
val lookupTable: RDD[(K,V2)]
In lookupTable we know that all keys are
unique but hybrid might have the same key
many-many times. The task is to look up
in lookupTable all keys in hybrid and return:
val result: RDD[(K, (V1,V2))]
© Lynx
Analytics
Hybrid lookup – implementation
27
1. Split hybrid into two sets
• only the really large keys
(hybridLarges)
• the rest of the keys (hybridSmalls)
2. For the small keys use standard, join
based lookup
(This includes repartitioning hybridSmalls
by key)
3. Send the lookup value for all large
keys to all partitions of hybridLarges and
use that map to perform the lookup
(no repartitioning hybridLarges!)
4. Take the union of results from 2 and 3
above
The use of hybrid joins and techniques
explained above resolved lots of
performance instability and spark crash
issues in LynxKite.
Monte Carlo for parallelization
© Lynx
Analytics
Yet another Pregel compatible algorithm – connected
components
29
1. All vertices use their ids as their
starting state
2. Every vertex sends their current state to
its neighbors
3. States are updated to the minimum of
current state and received messages
4. Repeat 2 and 3 until convergence
Notice that on termination each node’s state
will be the lowest id in its connected
component.
Exactly what we needed to differentiate
components!
Great!
© Lynx
Analytics
Yet another Pregel compatible algorithm – connected
components
30
1. All vertices use their ids as their
starting state
2. Every vertex sends their current state to
its neighbors
3. States are updated to the minimum of
current state and received messages
4. Repeat 2 and 3 until convergence
Notice that on termination each node’s state
will be the lowest id in its connected
component.
Exactly what we needed to differentiate
components!
Great!
Or is it?
We may have tons of iterations!
© Lynx
Analytics
Randomness to the rescue – connected components take
2
31
1. Let’s party! Each node organize a party
with ½ probability.
All neighbors invited!
2. Non-organizers choose a party to attend
(Social pariahs start their own one
person party.)
3. We create a new graph of parties
4. We recurse on the new party graph until
we run
out of edges
This algorithm is expected to end in O(logN)
iterations.
(Based on algorithm from "A Model of
Computation
for MapReduce" by Karloff et al.)
© Lynx
Analytics
Randomness to the rescue – connected components take
2
32
1. Let’s party! Each node organize a party
with ½ probability.
All neighbors invited!
2. Non-organizers choose a party to attend
(Social pariahs start their own one
person party.)
3. We create a new graph of parties
4. We recurse on the new party graph until
we run
out of edges
This algorithm is expected to end in O(logN)
iterations.
(Based on algorithm from "A Model of
Computation
for MapReduce" by Karloff et al.)
Small performance trick: switch to single
machine when the graph gets small.
© Lynx
Analytics
Connected component search - runtimes
33
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Automatic cluster formation and assigning address for wireless sensor net
Automatic cluster formation and assigning address for wireless sensor netAutomatic cluster formation and assigning address for wireless sensor net
Automatic cluster formation and assigning address for wireless sensor net
IAEME Publication
 
Convergence of desynchronization primitives in wireless sensor networks a sto...
Convergence of desynchronization primitives in wireless sensor networks a sto...Convergence of desynchronization primitives in wireless sensor networks a sto...
Convergence of desynchronization primitives in wireless sensor networks a sto...
ieeeprojectsbangalore
 
Info mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copyInfo mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copy
Selva Raj
 
Info mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationInfo mimi-hop-by-hop authentication
Info mimi-hop-by-hop authentication
Selva Raj
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particles
Cemal Ardil
 
Canopy kmeans
Canopy kmeansCanopy kmeans
Canopy kmeans
nagwww
 

Was ist angesagt? (19)

Unit 3
Unit 3Unit 3
Unit 3
 
Unit 2
Unit  2Unit  2
Unit 2
 
Cryptography using probability
Cryptography using probabilityCryptography using probability
Cryptography using probability
 
Vsm lsi
Vsm lsiVsm lsi
Vsm lsi
 
Ach away cluster heads
Ach away cluster headsAch away cluster heads
Ach away cluster heads
 
Eryk_Kulikowski_a4
Eryk_Kulikowski_a4Eryk_Kulikowski_a4
Eryk_Kulikowski_a4
 
Automatic cluster formation and assigning address for wireless sensor net
Automatic cluster formation and assigning address for wireless sensor netAutomatic cluster formation and assigning address for wireless sensor net
Automatic cluster formation and assigning address for wireless sensor net
 
PREDICTION OF MALICIOUS OBJECTS IN COMPUTER NETWORK AND DEFENSE
PREDICTION OF MALICIOUS OBJECTS IN COMPUTER NETWORK AND DEFENSEPREDICTION OF MALICIOUS OBJECTS IN COMPUTER NETWORK AND DEFENSE
PREDICTION OF MALICIOUS OBJECTS IN COMPUTER NETWORK AND DEFENSE
 
Lab Seminar 2009 12 01 Message Drop Reduction And Movement
Lab Seminar 2009 12 01  Message Drop Reduction And MovementLab Seminar 2009 12 01  Message Drop Reduction And Movement
Lab Seminar 2009 12 01 Message Drop Reduction And Movement
 
Quantum cryptography for secured communication networks
Quantum cryptography for secured communication networksQuantum cryptography for secured communication networks
Quantum cryptography for secured communication networks
 
Convergence of desynchronization primitives in wireless sensor networks a sto...
Convergence of desynchronization primitives in wireless sensor networks a sto...Convergence of desynchronization primitives in wireless sensor networks a sto...
Convergence of desynchronization primitives in wireless sensor networks a sto...
 
Info mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copyInfo mimi-hop-by-hop authentication-copy
Info mimi-hop-by-hop authentication-copy
 
Info mimi-hop-by-hop authentication
Info mimi-hop-by-hop authenticationInfo mimi-hop-by-hop authentication
Info mimi-hop-by-hop authentication
 
Ch11
Ch11Ch11
Ch11
 
Compression and information leakage of plaintext
Compression and information leakage of plaintextCompression and information leakage of plaintext
Compression and information leakage of plaintext
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particles
 
Canopy kmeans
Canopy kmeansCanopy kmeans
Canopy kmeans
 
Canopy k-means using Hadoop
Canopy k-means using HadoopCanopy k-means using Hadoop
Canopy k-means using Hadoop
 
Digital signatures
Digital signaturesDigital signatures
Digital signatures
 

Ähnlich wie Scalable Distributed Graph Algorithms on Apache Spark

cis97003
cis97003cis97003
cis97003
perfj
 
Unit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].pptUnit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].ppt
Ramya Nellutla
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
cis97007
cis97007cis97007
cis97007
perfj
 

Ähnlich wie Scalable Distributed Graph Algorithms on Apache Spark (20)

cis97003
cis97003cis97003
cis97003
 
Description Of A Graph
Description Of A GraphDescription Of A Graph
Description Of A Graph
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
 
DCCN Network Layer congestion control TCP
DCCN Network Layer congestion control TCPDCCN Network Layer congestion control TCP
DCCN Network Layer congestion control TCP
 
Unit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].pptUnit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].ppt
 
Ijebea14 272
Ijebea14 272Ijebea14 272
Ijebea14 272
 
Query optimization for_sensor_networks
Query optimization for_sensor_networksQuery optimization for_sensor_networks
Query optimization for_sensor_networks
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Social Network Analysis and Visualization
Social Network Analysis and VisualizationSocial Network Analysis and Visualization
Social Network Analysis and Visualization
 
Fakhre alam
Fakhre alamFakhre alam
Fakhre alam
 
Implementation of hybrid data collection (mobile element and hierarchical clu...
Implementation of hybrid data collection (mobile element and hierarchical clu...Implementation of hybrid data collection (mobile element and hierarchical clu...
Implementation of hybrid data collection (mobile element and hierarchical clu...
 
A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...
A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...
A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...
 
A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...
A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...
A Survey on Rendezvous Based Techniques for Power Conservation in Wireless Se...
 
cis97007
cis97007cis97007
cis97007
 
Implementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and AkkaImplementing a Distributed Hash Table with Scala and Akka
Implementing a Distributed Hash Table with Scala and Akka
 
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
A Parallel Algorithm Template for Updating Single-Source Shortest Paths in La...
 
Implementation of Spanning Tree Protocol using ns-3
Implementation of Spanning Tree Protocol using ns-3Implementation of Spanning Tree Protocol using ns-3
Implementation of Spanning Tree Protocol using ns-3
 
Network layer new
Network layer newNetwork layer new
Network layer new
 
Comparison of BFS and Prim's Algorithm when used in MANETs Routing
Comparison of BFS and Prim's Algorithm when used in MANETs RoutingComparison of BFS and Prim's Algorithm when used in MANETs Routing
Comparison of BFS and Prim's Algorithm when used in MANETs Routing
 
Network Layer
Network LayerNetwork Layer
Network Layer
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Scalable Distributed Graph Algorithms on Apache Spark

  • 1. András Németh, CRUNCH, Budapest, 20th October, 2017 Scalable Distributed Graph Algorithms on Apache Spark
  • 2. Why scalable graph algorithms?
  • 3. © Lynx Analytics Graphs are all around us … Citations 3 Social graphs Internet Transportatio n networks Protein structure Money transfers Viral infection patterns Electronic circuits Telecommunication networks Knowledge representations (i.e. Google’s Knowledge Graph) Neural networks (artificial and natural)
  • 4. © Lynx Analytics … and they are full of hidden secrets 4 Looking close enough, they can: • Predict churn based on embeddedness in the call graph • Figure out demographic based on social relationships and communities • Find fraudsters in a bank’s transaction network • Help find influencers and design viral campaigns • Identify which bus routes are unnecessary and which ones need more capacity
  • 5. © Lynx Analytics But they are large! 5 Telco call graph hundreds of millions of vertices and billions of edges Google Knowledge Graph 70 billion edges Internet tens of billions of vertices and hundreds of billions of edges Brain hundred billion vertices and hundred trillion edges
  • 6. Apache Spark – horizontal scaling to the rescue
  • 7. © Lynx Analytics What is Apache Spark? 7 Apache Spark is the world’s most trendy scalable distributed data processing engine. • It takes care of the plumbing to run distributed algorithms on huge cluters • break down work to tasks • scheduling of tasks on workers • distribution of input/output data and processing code • distributed FS and standard file format access • error recovery • etc, etc • Elegant, high level yet powerful API • Scala, Python and R • Higher level API add ons: SQL, machine learning, graph processing
  • 8. © Lynx Analytics But graph algorithms are hard to parallelize 8 • Distributed computation works by splitting input data into manageable sized partitions • Graph algorithms are all about checking and modifying state of neighborings • Ideal partitioning would not cut through edges • Too bad that this is absolutely impossible for 99% of graph • Methods exists to minimizes edge cuts, but even one cut edge implies information exchange among partitions, which is very expensive
  • 10. © Lynx Analytics Pregel model - definition 10 Based on Google’s “Pregel: A System for Large-Scale Graph Processing”, Pregel is an algorthmic framework to manage (if not solve) the above difficulties. A Pregel algorithm is a repetition of the following steps: 1. Some vertex local computation (using also messages received – see next point) 2. Sending messages to neighboring vertices
  • 11. © Lynx Analytics Pregel example – shortest paths from multiple sources 11 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  • 12. © Lynx Analytics Pregel example – shortest paths from multiple sources 12 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  • 13. © Lynx Analytics Pregel example – shortest paths from multiple sources 13 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  • 14. © Lynx Analytics Pregel example – shortest paths from multiple sources 14 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  • 15. © Lynx Analytics Pregel example – shortest paths from multiple sources 15 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  • 16. © Lynx Analytics Pregel example – shortest paths from multiple sources 16 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  • 17. © Lynx Analytics Pregel example – shortest paths from multiple sources 17 1. All vertices start with an initial path length estimate of infinity, except sources start with 0 2. Vertices send their current length estimate to all neighbors 3. All vertices update their estimate based on their current value and values coming from neighbors 4. Iterate 2 and 3 until convergence or after N iterations if we are only interested in paths of length at most N If vertices remember which neighbor produced the minimum in step 3 above then paths can be reconstructed. Also easy to extend to cases with different edge “lengths” and initial “starting costs”.
  • 18. © Lynx Analytics Pregel example - pagerank 18 1. All vertices start with an initial pagerank estimate (say 1 for all) 2. All vertices send their current pagerank estimate to their outneighbors 3. Based on incoming pagerank estimates all vertices recompute their pagerank estimate 4. Repeat 2 and 3 until convergence or getting bored
  • 19. © Lynx Analytics Pregel on Spark 19 // Contains actual (vertex id, vertex state) pairs var vertexStates: RDD[(ID, VertexState)] = …. Code to initialize vertex states … while (… ! halting condition …) { // returns an iterator of the (target vertex id, message) pairs sent by a given vertex def messageGenerator( sourceId: ID, sourceState: VertexState, neighbors: Iterable[ID]): Iterator[(ID, Message)] = { … } val messages: RDD[(ID, Message)] = vertexStates.join(edgesBySource.groupByKey).flatMap{ case (id, (state, neighbors)) => messageGenerator(id, state, neighbors)} // returns new state given old state and messages def newState(originalState: VertexState, messages: Iterable[Message]): VertexState = { … } vertexStates = vertexStates.join(messages.groupByKey).mapValues{ case (originalState, messages) => newState(originalState, messages)} }
  • 20. © Lynx Analytics Pregel on Spark 20 Conceptually it’s super easy to represent a Pregel algorithm as a Spark program. There are some details to watch out for, though: • Lots of joins – they’d better be fast • Partitioning has to be controlled closely • Same partitioning for states throughout the algorithm • Above partitioning “enough” for number of messages, not just number of states • Potential hotspotting if a vertex generates or receives too many messages
  • 21. © Lynx Analytics Fast joins – sorted RDDs 21 • Built-in Spark join: • Repartition both datasets by the hash of the join keys • Move corresponding partition pairs to the same machine • Join a single partition by collecting key-value pairs in a map • This is somewhat slow and memory intensive • Merge joins • much faster • constant memory overhead • Requires both RDDs sorted by key within partitions • This is done via an RDD subclass SortedRDD developed at Lynx
  • 23. © Lynx Analytics Hotspots what & why 23 • Hotspotting means that partitioning of the work fails • Causes seriour performance hits even if total amount of work is manageable • Large partitions even cause OOM errors • Large degree vertices are notorious to cause hotsports in graph algorithms • Very typical problem with large, scale free (in other words, realistic  ) graphs
  • 24. © Lynx Analytics Hotspots – how to deal with them? 24 Partition work based on edges, not vertices! E.g. instead of using our original message generator: def messageGenerator(sourceId: ID, sourceState: VertexState, neigbors: Iterable[ID]) on all vertices use something like this on all edges: def messageGenerator(sourceId: ID, destinationId: ID, sourceState: VertexState) This way we never have to collect all edges of a single vertex! Similar tricks can be applied to destination vertices: • Incoming messages can be pre-aggregated
  • 25. © Lynx Analytics Hotspots – join problems 25 How do you exactly collect, say, source states on all edges? Easy! val edges: RDD[(ID, ID)] // Edges represented as (src, dst) ids. val edgesWithStates: RDD[(ID, ID, VertexState)] = edges.groupByKey().join(vertexStates).flatMap { case (src, (dsts, vertexState)) => dsts.map(dst => (src, dst, vertexState)) } Wait a second! That groupByKey in itself can create a hotspot! This does exactly what we pledged not to do: collects all edges of a vertex to a single partitioner…
  • 26. © Lynx Analytics Hybrid lookup – the task 26 The technique we use to solve this problem is what we call a hybrid lookup. Problem statement We are given two RDDs, both with the same keyspace: val hybrid: RDD[(K,V1)] val lookupTable: RDD[(K,V2)] In lookupTable we know that all keys are unique but hybrid might have the same key many-many times. The task is to look up in lookupTable all keys in hybrid and return: val result: RDD[(K, (V1,V2))]
  • 27. © Lynx Analytics Hybrid lookup – implementation 27 1. Split hybrid into two sets • only the really large keys (hybridLarges) • the rest of the keys (hybridSmalls) 2. For the small keys use standard, join based lookup (This includes repartitioning hybridSmalls by key) 3. Send the lookup value for all large keys to all partitions of hybridLarges and use that map to perform the lookup (no repartitioning hybridLarges!) 4. Take the union of results from 2 and 3 above The use of hybrid joins and techniques explained above resolved lots of performance instability and spark crash issues in LynxKite.
  • 28. Monte Carlo for parallelization
  • 29. © Lynx Analytics Yet another Pregel compatible algorithm – connected components 29 1. All vertices use their ids as their starting state 2. Every vertex sends their current state to its neighbors 3. States are updated to the minimum of current state and received messages 4. Repeat 2 and 3 until convergence Notice that on termination each node’s state will be the lowest id in its connected component. Exactly what we needed to differentiate components! Great!
  • 30. © Lynx Analytics Yet another Pregel compatible algorithm – connected components 30 1. All vertices use their ids as their starting state 2. Every vertex sends their current state to its neighbors 3. States are updated to the minimum of current state and received messages 4. Repeat 2 and 3 until convergence Notice that on termination each node’s state will be the lowest id in its connected component. Exactly what we needed to differentiate components! Great! Or is it? We may have tons of iterations!
  • 31. © Lynx Analytics Randomness to the rescue – connected components take 2 31 1. Let’s party! Each node organize a party with ½ probability. All neighbors invited! 2. Non-organizers choose a party to attend (Social pariahs start their own one person party.) 3. We create a new graph of parties 4. We recurse on the new party graph until we run out of edges This algorithm is expected to end in O(logN) iterations. (Based on algorithm from "A Model of Computation for MapReduce" by Karloff et al.)
  • 32. © Lynx Analytics Randomness to the rescue – connected components take 2 32 1. Let’s party! Each node organize a party with ½ probability. All neighbors invited! 2. Non-organizers choose a party to attend (Social pariahs start their own one person party.) 3. We create a new graph of parties 4. We recurse on the new party graph until we run out of edges This algorithm is expected to end in O(logN) iterations. (Based on algorithm from "A Model of Computation for MapReduce" by Karloff et al.) Small performance trick: switch to single machine when the graph gets small.