SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Scaling Apache Giraph
Nitay Joffe, Data Infrastructure Engineer
nitay@apache.org
@nitayj
September 10, 2013
Agenda
1 Background
2 Scaling
3 Results
4 Questions
Background
What is Giraph?
• Apache open source graph computation engine based on Google’s Pregel.
• Support for Hadoop, Hive, HBase, and Accumulo.
• BSP model with simple think like a vertex API.
• Combiners, Aggregators, Mutability, and more.
• Configurable Graph<I,V,E,M>:
– I: Vertex ID
– V: Vertex Value
– E: Edge Value
– M: Message data
What is Giraph NOT?
• A Graph database. See Neo4J.
• A completely asynchronous generic MPI system.
• A slow tool.
implements
Writable
Why not Hive?
Input
format
Output
format
Map
tasks
Intermediate
files
Reduce
tasks
Output 0
Output 1
Input 0
Input 1
Iterate!
• Too much disk. Limited in-memory caching.
• Each iteration becomes a MapReduce job!
Giraph components
Master – Application coordinator
• Synchronizes supersteps
• Assigns partitions to workers before superstep begins
Workers – Computation & messaging
• Handle I/O – reading and writing the graph
• Computation/messaging of assigned partitions
ZooKeeper
• Maintains global application state
Giraph Dataflow
Split 0
Split 1
Split 2
Split 3
Worker
1
Master
Worker
0Input format
Load /
Send
Graph
Load /
Send
Graph
Loading the graph
1
Part 0
Part 1
Part 2
Part 3
Compute /
Send
Messages
Worker
1
Compute /
Send
Messages
Master
Worker
0
In-memory
graph
Send stats / iterate!
Compute/Iterate
2
Worker
1
Worker
0
Part 0
Part 1
Part 2
Part 3
Output format
Part 0
Part 1
Part 2
Part 3
Storing the graph
3
Split 4
Split
Giraph Job Lifetime
Output
Active Inactive
Vote to Halt
Received Message
Vertex Lifecycle
All Vertices
Halted?
Input
Compute
Superstep
No
Master
halted?
No
Yes
Yes
Simple Example – Compute the maximum value
5
1
5
2
5
5
2
5
5
5
5
5
1
2
Processor 1
Processor 2
Time
Connected Components
e.g. Finding Communities
PageRank – ranking websites
Mahout (Hadoop)
854 lines
Giraph
< 30 lines
• Send neighbors an equal fraction of your page rank
• New page rank = 0.15 / (# of vertices) + 0.85 * (messages
sum)
Scaling
Problem: Worker Crash.
Superstep i
(no checkpoint)
Superstep i+1
(checkpoint)
Superstep i+2
(no checkpoint)
Worker failure!
Superstep i+1
(checkpoint)
Superstep i+2
(no checkpoint)
Superstep i+3
(checkpoint)
Worker failure after
checkpoint complete!
Superstep i+3
(no checkpoint)
Application
Complete…
Solution: Checkpointing.
“Spare”
Master 2
Active
Master State“Spare”
Master 1
“Active”
Master 0
Before failure of active master 0
“Spare”
Master 2
Active
Master State“Active”
Master 1
“Active”
Master 0
After failure of active master 0
ZooKeeper ZooKeeper
Problem: Master Crash.
Solution: ZooKeeper Master Queue.
Problem: Primitive Collections.
• Graphs often parameterized with { }
• Boxing/unboxing. Objects have internal overhead.
3
Solution: Use fastutil, e.g. Long2DoubleOpenHashMap.
fastutil extends the Java™ Collections Framework by providing type-specific
maps, sets, lists and queues with a small memory footprint and fast access and
insertion
1
2
4
5
1.2
0.5
0.8
0.4
1.7
0.7
Single Source Shortest Path
s
t
1.2
0.5
0.8
0.4
0.2
0.7
Network Flow
3
1
2
4
5
Count In-Degree
Problem: Too many objects.
Lots of time spent in GC.
Graph: 1B Vertices, 200B Edges, 200 Workers.
• 1B Edges per Worker. 1 object per edge value.
• List<Edge<I, E>>  ~ 10B objects
• 5M Vertices per Worker. 10 objects per vertex value.
• Map<I, Vertex<I, V, E>  ~ 50M objects
• 1 Message per Edge. 10 objects per message data.
• Map<I, List<M>>  ~ 10B objects
• Objects used ~= O(E*e + V*v + M*m) => O(E*e)
Label Propagation
e.g. Who’s sleeping?
3
1
2
4
5
Boring
Amazing
Q: What did he think?
0.5
0.2
0.8 0.36
0.17
0.41
Confusing
Problem: Too many objects.
Lots of time spent in GC.
Solution: byte[]
• Serialize messages, edges, and vertices.
• Iterable interface with representative object.
Input Input Input
next()
next()
next()
Objects per worker ~= O(V)
Label Propagation
e.g. Who’s sleeping?
3
1
2
4
5
Boring
Amazing
Q: What did he think?
0.5
0.2
0.8 0.36
0.17
0.41
Confusing
Problem: Serialization of byte[]
• DataInput? Kyro? Custom?
Solution: Unsafe
• Dangerous. No formal API. Volatile. Non-portable (oracle JVM only).
• AWESOME. As fast as it gets.
• True native. Essentially C: *(long*)(data+offset);
Problem: Large Aggregations.
Worker
Worker
Worker
Worker
Worker
Master
Workers own aggregators
Worker
Worker
Worker
Worker
Worker
Master
Aggregator owners communicate
with Master
Worker
Worker
Worker
Worker
Worker
Master
Aggregator owners distribute values
Solution: Sharded Aggregators.
Worker
Worker
Worker
Worker
Worker
Master
K-Means Clustering
e.g. Similar Emails
Problem: Network Wait.
• RPC doesn’t fit model.
• Synchronous calls no good.
Solution: Netty
Tune queue sizes & threads
BarrierBarrier
Begin superstep
compute
network
End compute
End superstep
wait
Barrier
Barrier
Begin superstep
compute
network
wait
Time to first message
End compute
End superstep
Results
0
50
100
150
200
250
300
350
400
450
50 100 150 200 250 300
IterationTime(sec)
Workers
2B Vertices, 200B Edges, 20 Compute Threads
Increasing Workers Increasing Data
Size
0
50
100
150
200
250
300
350
400
450
1E+09 1.01E+11
IterationTime(sec)
Edges
50 Workers, 20 Compute Threads
Scalability Graphs
Lessons Learned
• Coordinating is a zoo. Be resilient with ZooKeeper.
• Efficient networking is hard. Let Netty help.
• Primitive collections, primitive performance. Use fastutil.
• byte[] is simple yet powerful.
• Being Unsafe can be a good thing.
• Have a graph? Use Giraph.
What’s the final result?
Comparison with Hive:
• 20x CPU speedup
• 100x Elapsed time speedup. 15 hours => 9 minutes.
Computations on entire Facebook graph no longer “weekend jobs”.
Now they’re coffee breaks.
Questions?
Problem: Measurements.
• Need tools to gain visibility into the system.
• Problems with connecting to Hadoop sub-processes.
Solution: Do it all.
• YourKit – see YourKitProfiler
• jmap – see JMapHistoDumper
• VisualVM –with jstatd & ssh socks proxy
• Yammer Metrics
• Hadoop Counters
• Logging & GC prints
Problem: Mutations
• Synchronization.
• Load balancing.
Solution: Reshuffle resources
• Mutations handled at barrier between supersteps.
• Master rebalances vertex assignments to optimize distribution.
• Handle mutations in batches.
• Avoid if using byte[].
• Favor algorithms which don’t mutate graph.

Weitere ähnliche Inhalte

Was ist angesagt?

Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
Data Science at the Command Line
Data Science at the Command LineData Science at the Command Line
Data Science at the Command LineHéloïse Nonne
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14Sri Ambati
 
SciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with NumbaSciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with Numbastan_seibert
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Python VS GO
Python VS GOPython VS GO
Python VS GOOfir Nir
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...asimkadav
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersDataWorks Summit
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learningjie cao
 
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...Databricks
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2OSri Ambati
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing EcosystemDatabricks
 

Was ist angesagt? (20)

Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Data Science at the Command Line
Data Science at the Command LineData Science at the Command Line
Data Science at the Command Line
 
Tensorflow vs MxNet
Tensorflow vs MxNetTensorflow vs MxNet
Tensorflow vs MxNet
 
Sparkling Water 5 28-14
Sparkling Water 5 28-14Sparkling Water 5 28-14
Sparkling Water 5 28-14
 
SciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with NumbaSciPy 2019: How to Accelerate an Existing Codebase with Numba
SciPy 2019: How to Accelerate an Existing Codebase with Numba
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Python VS GO
Python VS GOPython VS GO
Python VS GO
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
Enabling Composition in Distributed Reinforcement Learning with Ray RLlib wit...
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Weld Strata talk
Weld Strata talkWeld Strata talk
Weld Strata talk
 
Webinar: Deep Learning with H2O
Webinar: Deep Learning with H2OWebinar: Deep Learning with H2O
Webinar: Deep Learning with H2O
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing Ecosystem
 

Andere mochten auch

Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDataWorks Summit
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXAmir Payberah
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 

Andere mochten auch (7)

Pregel
PregelPregel
Pregel
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Dynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache GiraphDynamic Draph / Iterative Computation on Apache Giraph
Dynamic Draph / Iterative Computation on Apache Giraph
 
Graph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphXGraph processing - Powergraph and GraphX
Graph processing - Powergraph and GraphX
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 

Ähnlich wie 2013.09.10 Giraph at London Hadoop Users Group

2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache GiraphAvery Ching
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼NAVER D2
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disquszeeg
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Nicolas Poggi
 
Rails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoRails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoGuillaume Luccisano
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Claudio Martella
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to pythonActiveState
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017Viktor Gamov
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 

Ähnlich wie 2013.09.10 Giraph at London Hadoop Users Group (20)

2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
2014.02.13 (Strata) Graph Analysis with One Trillion Edges on Apache Giraph
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
Rails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume LuccisanoRails performance at Justin.tv - Guillaume Luccisano
Rails performance at Justin.tv - Guillaume Luccisano
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
[NYJavaSig] Riding the Distributed Streams - Feb 2nd, 2017
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
Data Science
Data ScienceData Science
Data Science
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Surge2012
Surge2012Surge2012
Surge2012
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 

Kürzlich hochgeladen

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Kürzlich hochgeladen (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

2013.09.10 Giraph at London Hadoop Users Group

  • 1. Scaling Apache Giraph Nitay Joffe, Data Infrastructure Engineer nitay@apache.org @nitayj September 10, 2013
  • 2. Agenda 1 Background 2 Scaling 3 Results 4 Questions
  • 4. What is Giraph? • Apache open source graph computation engine based on Google’s Pregel. • Support for Hadoop, Hive, HBase, and Accumulo. • BSP model with simple think like a vertex API. • Combiners, Aggregators, Mutability, and more. • Configurable Graph<I,V,E,M>: – I: Vertex ID – V: Vertex Value – E: Edge Value – M: Message data What is Giraph NOT? • A Graph database. See Neo4J. • A completely asynchronous generic MPI system. • A slow tool. implements Writable
  • 5. Why not Hive? Input format Output format Map tasks Intermediate files Reduce tasks Output 0 Output 1 Input 0 Input 1 Iterate! • Too much disk. Limited in-memory caching. • Each iteration becomes a MapReduce job!
  • 6. Giraph components Master – Application coordinator • Synchronizes supersteps • Assigns partitions to workers before superstep begins Workers – Computation & messaging • Handle I/O – reading and writing the graph • Computation/messaging of assigned partitions ZooKeeper • Maintains global application state
  • 7. Giraph Dataflow Split 0 Split 1 Split 2 Split 3 Worker 1 Master Worker 0Input format Load / Send Graph Load / Send Graph Loading the graph 1 Part 0 Part 1 Part 2 Part 3 Compute / Send Messages Worker 1 Compute / Send Messages Master Worker 0 In-memory graph Send stats / iterate! Compute/Iterate 2 Worker 1 Worker 0 Part 0 Part 1 Part 2 Part 3 Output format Part 0 Part 1 Part 2 Part 3 Storing the graph 3 Split 4 Split
  • 8. Giraph Job Lifetime Output Active Inactive Vote to Halt Received Message Vertex Lifecycle All Vertices Halted? Input Compute Superstep No Master halted? No Yes Yes
  • 9. Simple Example – Compute the maximum value 5 1 5 2 5 5 2 5 5 5 5 5 1 2 Processor 1 Processor 2 Time Connected Components e.g. Finding Communities
  • 10. PageRank – ranking websites Mahout (Hadoop) 854 lines Giraph < 30 lines • Send neighbors an equal fraction of your page rank • New page rank = 0.15 / (# of vertices) + 0.85 * (messages sum)
  • 12. Problem: Worker Crash. Superstep i (no checkpoint) Superstep i+1 (checkpoint) Superstep i+2 (no checkpoint) Worker failure! Superstep i+1 (checkpoint) Superstep i+2 (no checkpoint) Superstep i+3 (checkpoint) Worker failure after checkpoint complete! Superstep i+3 (no checkpoint) Application Complete… Solution: Checkpointing.
  • 13. “Spare” Master 2 Active Master State“Spare” Master 1 “Active” Master 0 Before failure of active master 0 “Spare” Master 2 Active Master State“Active” Master 1 “Active” Master 0 After failure of active master 0 ZooKeeper ZooKeeper Problem: Master Crash. Solution: ZooKeeper Master Queue.
  • 14. Problem: Primitive Collections. • Graphs often parameterized with { } • Boxing/unboxing. Objects have internal overhead. 3 Solution: Use fastutil, e.g. Long2DoubleOpenHashMap. fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion 1 2 4 5 1.2 0.5 0.8 0.4 1.7 0.7 Single Source Shortest Path s t 1.2 0.5 0.8 0.4 0.2 0.7 Network Flow 3 1 2 4 5 Count In-Degree
  • 15. Problem: Too many objects. Lots of time spent in GC. Graph: 1B Vertices, 200B Edges, 200 Workers. • 1B Edges per Worker. 1 object per edge value. • List<Edge<I, E>>  ~ 10B objects • 5M Vertices per Worker. 10 objects per vertex value. • Map<I, Vertex<I, V, E>  ~ 50M objects • 1 Message per Edge. 10 objects per message data. • Map<I, List<M>>  ~ 10B objects • Objects used ~= O(E*e + V*v + M*m) => O(E*e) Label Propagation e.g. Who’s sleeping? 3 1 2 4 5 Boring Amazing Q: What did he think? 0.5 0.2 0.8 0.36 0.17 0.41 Confusing
  • 16. Problem: Too many objects. Lots of time spent in GC. Solution: byte[] • Serialize messages, edges, and vertices. • Iterable interface with representative object. Input Input Input next() next() next() Objects per worker ~= O(V) Label Propagation e.g. Who’s sleeping? 3 1 2 4 5 Boring Amazing Q: What did he think? 0.5 0.2 0.8 0.36 0.17 0.41 Confusing
  • 17. Problem: Serialization of byte[] • DataInput? Kyro? Custom? Solution: Unsafe • Dangerous. No formal API. Volatile. Non-portable (oracle JVM only). • AWESOME. As fast as it gets. • True native. Essentially C: *(long*)(data+offset);
  • 18. Problem: Large Aggregations. Worker Worker Worker Worker Worker Master Workers own aggregators Worker Worker Worker Worker Worker Master Aggregator owners communicate with Master Worker Worker Worker Worker Worker Master Aggregator owners distribute values Solution: Sharded Aggregators. Worker Worker Worker Worker Worker Master K-Means Clustering e.g. Similar Emails
  • 19. Problem: Network Wait. • RPC doesn’t fit model. • Synchronous calls no good. Solution: Netty Tune queue sizes & threads BarrierBarrier Begin superstep compute network End compute End superstep wait Barrier Barrier Begin superstep compute network wait Time to first message End compute End superstep
  • 21. 0 50 100 150 200 250 300 350 400 450 50 100 150 200 250 300 IterationTime(sec) Workers 2B Vertices, 200B Edges, 20 Compute Threads Increasing Workers Increasing Data Size 0 50 100 150 200 250 300 350 400 450 1E+09 1.01E+11 IterationTime(sec) Edges 50 Workers, 20 Compute Threads Scalability Graphs
  • 22. Lessons Learned • Coordinating is a zoo. Be resilient with ZooKeeper. • Efficient networking is hard. Let Netty help. • Primitive collections, primitive performance. Use fastutil. • byte[] is simple yet powerful. • Being Unsafe can be a good thing. • Have a graph? Use Giraph.
  • 23. What’s the final result? Comparison with Hive: • 20x CPU speedup • 100x Elapsed time speedup. 15 hours => 9 minutes. Computations on entire Facebook graph no longer “weekend jobs”. Now they’re coffee breaks.
  • 25. Problem: Measurements. • Need tools to gain visibility into the system. • Problems with connecting to Hadoop sub-processes. Solution: Do it all. • YourKit – see YourKitProfiler • jmap – see JMapHistoDumper • VisualVM –with jstatd & ssh socks proxy • Yammer Metrics • Hadoop Counters • Logging & GC prints
  • 26. Problem: Mutations • Synchronization. • Load balancing. Solution: Reshuffle resources • Mutations handled at barrier between supersteps. • Master rebalances vertex assignments to optimize distribution. • Handle mutations in batches. • Avoid if using byte[]. • Favor algorithms which don’t mutate graph.

Hinweis der Redaktion

  1. No internal FB repo. Everyone committer.A global epoch followed by a global barrier where components do concurrent computation and send messages.Graphs are sparse.
  2. Giraph is a map-only job
  3. Code is real, checked into Giraph.All vertices find the maximum value in a strongly connected graph
  4. One active master, with spare masters taking over in the event of an active master failureAll active master state is stored in ZooKeeper so that a spare master can immediately step in when an active master fails“Active” master implemented as a queue in ZooKeeperA single worker failure causes the superstep to failApplication reverts to the last committed superstep automaticallyMaster detects worker failure during any superstep with a ZooKeeper “health” znodeMaster chooses the last committed superstep and sends a command through ZooKeeper for all workers to restart from that superstep
  5. One active master, with spare masters taking over in the event of an active master failureAll active master state is stored in ZooKeeper so that a spare master can immediately step in when an active master fails“Active” master implemented as a queue in ZooKeeper
  6. Primitive collections are primitive.Lots of boxing / unboxing of types.Object and reference for each instance.
  7. Also other implementations like Map&lt;I,E&gt; for edges which use more space but better for lots of mutations.Realistically for FB sized graphs need even bigger.Edges are not uniform in reality, some vertices are much larger.
  8. Dangerous, non-portable, volatile. Oracle JVM only. No formal API.Allocate non-GC memory.Inherit from String (final class).Direct access memory (C pointer casts)
  9. Cluster open source projects.Histograms. Job metrics.
  10. Start sending messages early and sendwith computation.Tune message buffer sizes to reduce wait time.
  11. First thing’s first – what’s going on with the system?Want debugger, but don’t have one.Use YourKit’s API to create granular snapshots withinapplication.JMap binding errors – spawn from within process.
  12. With byte[] any mutation requires full deserialization / re-serialization.