SlideShare ist ein Scribd-Unternehmen logo
1 von 26
Scalable Deep Learning in Baidu
Weide Zhang, Kyle Tsai, Jiang Wang
Baidu USDC
Background
• Spark
– Batch/Streaming processing, Spark SQL, MLLib
• Deep learning has many use cases in Baidu and showed
significant improvement in quality
– Image retrieval ranking
– Ads CTR prediction
– Machine translation
– Speech recognition
• Goal: able to use distributed deep learning in Spark
Typical Training Data in Baidu
• Image Recognition: 100 millions
• OCR: 100 millions
• Speech: 10 billions
• CTR: 100 billions
• Grows every year since 2012
Baidu Spark One
Paddle
• Parallel Asynchronous Distributed Deep Learning Library
– Support distributed parameter servers to do
synchronous/asynchronous parameter update
– Support Multi GPU / CPU training
– Support sparse model
– Easy to understand API for user to add new layers
– Support rich features for deep learning use cases,
especially for NLP
Deep learning options comparison
Caffe Tensor Flow Torch Paddle
Distributed Training Yes Yes No Yes
Communication
Cost
Medium High N/A Medium to Low
Easy to customize
and coding
Yes More Learning
Curve
More Learning
Curve
Yes
Sparse Model
Support
No Yes Yes Yes
Area of Focus Vision All All All
Integration with
Spark
Yes No No Yes
High Level Goals
• Implement Spark ML abstractions to let user train
deep learning models with minimal code change
• Leverage paddle’s native training and parameter
server mechanisms to be scheduled in spark deep
learning jobs
• Handle multi-tenancy and heterogeneity
• Parallelize hyper parameter selection
• Batch and Streaming learning
Paddle on Spark
•Training
Communication
•Deep Learning
Algorithm
•Resource
Management
• Training
Environment
Spark Yarn
Parameter
Server
Paddle
Training Data Flow
System Architecture
Spark ML’s Abstraction
• Train
• Predict
Simple Parameter Is Not Enough
Image Label
Bird
Cat
Convolution
Pooling
Full Connection Cost
Parameter For CNN
Use Paddle As Estimator
Code your Configuration
Example of caffe prototxt
Design decisions
• Spark ML Compatible API
– Compatible with Spark is more important than
implemented under Spark
• Code level configuration
– Easy and flexible
– Manual is prone to error
PADDLE Scalable Deep Learning Platform at Baidu
Sharded Parameter Server
• One parameter an one
trainer co-locate in a
machine.
• Parameters are shared,
but not replicated.
• All-to-all communication.
• Our environments
• 4 GPUs per machine.
• 4-10 machines.
• all machines in one
switch
• reliable data center.
GPU Ring Synchronization
• Each parameter only needs
to go through slow
connection two times.
• One for reduce.
• Another for scatter.
ImageNet Scale Experiments
0
10
20
30
40
1 2 3 4 5
Time(s)
Number of machines
Time per 100 batches
TCP RDMA
• AlexNet on ImageNet
• batch size = 64
• 1 Machine has 4 K10
GPUs.
Sparse Training
Sparse Training Experiment
0
75
150
225
1 2 4 8 16
Time(s)
Number of nodes
Time per 100 batches
Non Sparse Sparse
• 1451594 dimensional
sparse feature.
• Embedded to 128d, 96d,
96d, and 128d.
• Using a ranking cost on
the top.
• batch size = 128.
Flexible and Efficient RNN Implementation
RNN Performance Comparison with TensorFlow
0
125
250
375
500
625
200 650 1500
Time(ms)
RNN Hidden Size
Time per Batch
PADDLE TensorFlow
• Machine Translation
• batch size = 64
• embedding size =
hidden_size
• dictionary size =
10000
Distributed Training Performance Character Neural Machine Translation
• 8 Machines, each with 4 K40 GPUs
• number of RNN encoder layers: 9, number of RNN decoder layers: 7
• Word embedding size: 256, RNN size: 512
• batch size: 25000 character
• Speed:
•attention: 25 minutes / 100 batches .
•encoder-decoder: 9 minutes / 100 batches.
THANK YOU.
zhangweide/kyletsai/wangjiang03@baidu.com

Weitere ähnliche Inhalte

Was ist angesagt?

Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Yahoo Developer Network
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 

Was ist angesagt? (20)

Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
 
Large-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at FacebookLarge-Scale Training with GPUs at Facebook
Large-Scale Training with GPUs at Facebook
 
High availability
High availabilityHigh availability
High availability
 
Apache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionApache Hadoop YARN State of the Union
Apache Hadoop YARN State of the Union
 
Hp hadoop platform
Hp hadoop platformHp hadoop platform
Hp hadoop platform
 
RubyConf Bangladesh 2017 - Speed up your API/backend
RubyConf Bangladesh 2017 - Speed up your API/backendRubyConf Bangladesh 2017 - Speed up your API/backend
RubyConf Bangladesh 2017 - Speed up your API/backend
 
Horovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made EasyHorovod - Distributed TensorFlow Made Easy
Horovod - Distributed TensorFlow Made Easy
 
Office 365 SaaS Mail Integration with SAP on Azure
Office 365 SaaS Mail Integration with SAP on AzureOffice 365 SaaS Mail Integration with SAP on Azure
Office 365 SaaS Mail Integration with SAP on Azure
 
PandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesPandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled Ensembles
 
Cost Effective Rendering in the Cloud with Spot Instances
Cost Effective Rendering in the Cloud with Spot InstancesCost Effective Rendering in the Cloud with Spot Instances
Cost Effective Rendering in the Cloud with Spot Instances
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Project Progress
Project ProgressProject Progress
Project Progress
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
20180522 infra autoscaling_system
20180522 infra autoscaling_system20180522 infra autoscaling_system
20180522 infra autoscaling_system
 
Microsoft Azure SQL Premium
Microsoft Azure SQL PremiumMicrosoft Azure SQL Premium
Microsoft Azure SQL Premium
 
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
Apache Hadoop India Summit 2011 talk "The Next Generation of Hadoop MapReduce...
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paper
 
Azure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaverAzure Custom Backup Solution for SAP NetWeaver
Azure Custom Backup Solution for SAP NetWeaver
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 

Andere mochten auch

O PAC e os Direitos Humanos
O PAC e os Direitos HumanosO PAC e os Direitos Humanos
O PAC e os Direitos Humanos
olybarbanti
 
Brincadeiras
BrincadeirasBrincadeiras
Brincadeiras
mjcamargo
 
Semana Cavanis 2009
Semana Cavanis 2009Semana Cavanis 2009
Semana Cavanis 2009
Edvaldo001
 
Apresentação reino fungi e reino monera
Apresentação reino fungi e reino moneraApresentação reino fungi e reino monera
Apresentação reino fungi e reino monera
Joselito Oliveira Neto
 

Andere mochten auch (20)

Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Redes sociales
Redes socialesRedes sociales
Redes sociales
 
Tribus urbanas
Tribus urbanasTribus urbanas
Tribus urbanas
 
EAS
EASEAS
EAS
 
Taller de comprensión del vídeo
Taller de comprensión del vídeo Taller de comprensión del vídeo
Taller de comprensión del vídeo
 
2684247
26842472684247
2684247
 
Power point
Power pointPower point
Power point
 
Jogo labirinto
Jogo labirintoJogo labirinto
Jogo labirinto
 
Permen lhk no. 46 2016 ttg pemanfaatan jasa lingkungan panas bumi dan kawasan...
Permen lhk no. 46 2016 ttg pemanfaatan jasa lingkungan panas bumi dan kawasan...Permen lhk no. 46 2016 ttg pemanfaatan jasa lingkungan panas bumi dan kawasan...
Permen lhk no. 46 2016 ttg pemanfaatan jasa lingkungan panas bumi dan kawasan...
 
LA CRONICA 655
LA CRONICA 655LA CRONICA 655
LA CRONICA 655
 
Assembleia pais
Assembleia paisAssembleia pais
Assembleia pais
 
O PAC e os Direitos Humanos
O PAC e os Direitos HumanosO PAC e os Direitos Humanos
O PAC e os Direitos Humanos
 
Maria
Maria Maria
Maria
 
Brincadeiras
BrincadeirasBrincadeiras
Brincadeiras
 
Report Bp. Jokowi
Report Bp. JokowiReport Bp. Jokowi
Report Bp. Jokowi
 
Greco - міжнародний антикорупційний моніторинговий механізм
Greco - міжнародний антикорупційний моніторинговий механізмGreco - міжнародний антикорупційний моніторинговий механізм
Greco - міжнародний антикорупційний моніторинговий механізм
 
Curso
CursoCurso
Curso
 
Semana Cavanis 2009
Semana Cavanis 2009Semana Cavanis 2009
Semana Cavanis 2009
 
Apresentação reino fungi e reino monera
Apresentação reino fungi e reino moneraApresentação reino fungi e reino monera
Apresentação reino fungi e reino monera
 
Fé na doença
Fé na doença   Fé na doença
Fé na doença
 

Ähnlich wie Paddle_Spark_Summit

알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)
Alibaba Cloud Korea
 

Ähnlich wie Paddle_Spark_Summit (20)

Spark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloadsSpark and Deep Learning frameworks with distributed workloads
Spark and Deep Learning frameworks with distributed workloads
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)알리바바 클라우드 PAI (machine learning Platform for AI)
알리바바 클라우드 PAI (machine learning Platform for AI)
 
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
Keras Tutorial For Beginners | Creating Deep Learning Models Using Keras In P...
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
High Performance Deep learning with Apache Spark
High Performance Deep learning with Apache SparkHigh Performance Deep learning with Apache Spark
High Performance Deep learning with Apache Spark
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Proud to be polyglot
Proud to be polyglotProud to be polyglot
Proud to be polyglot
 
Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)Scale Machine Learning from zero to millions of users (April 2020)
Scale Machine Learning from zero to millions of users (April 2020)
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
QuAI platform
QuAI platformQuAI platform
QuAI platform
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Distributed Deep learning Training.
Distributed Deep learning Training.Distributed Deep learning Training.
Distributed Deep learning Training.
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 

Paddle_Spark_Summit

  • 1. Scalable Deep Learning in Baidu Weide Zhang, Kyle Tsai, Jiang Wang Baidu USDC
  • 2. Background • Spark – Batch/Streaming processing, Spark SQL, MLLib • Deep learning has many use cases in Baidu and showed significant improvement in quality – Image retrieval ranking – Ads CTR prediction – Machine translation – Speech recognition • Goal: able to use distributed deep learning in Spark
  • 3. Typical Training Data in Baidu • Image Recognition: 100 millions • OCR: 100 millions • Speech: 10 billions • CTR: 100 billions • Grows every year since 2012
  • 5. Paddle • Parallel Asynchronous Distributed Deep Learning Library – Support distributed parameter servers to do synchronous/asynchronous parameter update – Support Multi GPU / CPU training – Support sparse model – Easy to understand API for user to add new layers – Support rich features for deep learning use cases, especially for NLP
  • 6. Deep learning options comparison Caffe Tensor Flow Torch Paddle Distributed Training Yes Yes No Yes Communication Cost Medium High N/A Medium to Low Easy to customize and coding Yes More Learning Curve More Learning Curve Yes Sparse Model Support No Yes Yes Yes Area of Focus Vision All All All Integration with Spark Yes No No Yes
  • 7. High Level Goals • Implement Spark ML abstractions to let user train deep learning models with minimal code change • Leverage paddle’s native training and parameter server mechanisms to be scheduled in spark deep learning jobs • Handle multi-tenancy and heterogeneity • Parallelize hyper parameter selection • Batch and Streaming learning
  • 8. Paddle on Spark •Training Communication •Deep Learning Algorithm •Resource Management • Training Environment Spark Yarn Parameter Server Paddle
  • 11. Spark ML’s Abstraction • Train • Predict
  • 12. Simple Parameter Is Not Enough Image Label Bird Cat Convolution Pooling Full Connection Cost Parameter For CNN
  • 13. Use Paddle As Estimator
  • 15. Example of caffe prototxt
  • 16. Design decisions • Spark ML Compatible API – Compatible with Spark is more important than implemented under Spark • Code level configuration – Easy and flexible – Manual is prone to error
  • 17. PADDLE Scalable Deep Learning Platform at Baidu
  • 18. Sharded Parameter Server • One parameter an one trainer co-locate in a machine. • Parameters are shared, but not replicated. • All-to-all communication. • Our environments • 4 GPUs per machine. • 4-10 machines. • all machines in one switch • reliable data center.
  • 19. GPU Ring Synchronization • Each parameter only needs to go through slow connection two times. • One for reduce. • Another for scatter.
  • 20. ImageNet Scale Experiments 0 10 20 30 40 1 2 3 4 5 Time(s) Number of machines Time per 100 batches TCP RDMA • AlexNet on ImageNet • batch size = 64 • 1 Machine has 4 K10 GPUs.
  • 22. Sparse Training Experiment 0 75 150 225 1 2 4 8 16 Time(s) Number of nodes Time per 100 batches Non Sparse Sparse • 1451594 dimensional sparse feature. • Embedded to 128d, 96d, 96d, and 128d. • Using a ranking cost on the top. • batch size = 128.
  • 23. Flexible and Efficient RNN Implementation
  • 24. RNN Performance Comparison with TensorFlow 0 125 250 375 500 625 200 650 1500 Time(ms) RNN Hidden Size Time per Batch PADDLE TensorFlow • Machine Translation • batch size = 64 • embedding size = hidden_size • dictionary size = 10000
  • 25. Distributed Training Performance Character Neural Machine Translation • 8 Machines, each with 4 K40 GPUs • number of RNN encoder layers: 9, number of RNN decoder layers: 7 • Word embedding size: 256, RNN size: 512 • batch size: 25000 character • Speed: •attention: 25 minutes / 100 batches . •encoder-decoder: 9 minutes / 100 batches.

Hinweis der Redaktion

  1. Pros Distributed Comprehensive algorithm support Fast Strong NLP support Cons Difficult to use Training resource management Wasteful data copying Reliability Cons 还包含manually model serving
  2. Spark take care of everthing outside of training algorithm , includes data preparation(maybe also preprocess by spark, model serving in the same app) Yarn take care of RM, to support hyper parameter training and multi tennancy. Yarn enable job level heterogeneous computing and provide better resource utilization.
  3. We wrap raw training data inside DF. The source could be a DB table or a HDFS folder. Partition based on the size of paddle trainer and push training input data through RPC interface into paddle trainer.
  4. Our work got some inspiration from caffe on spark, but we took a different design choice. Leave DP as a black box to keep it’s training performance and use spark/yarn to make engineers’ life easier. Keep deep learning training scheduling intact inside paddle because scheduling during training is a heavily tuned result involved deep learning background(trainer speed equal) … Here shows an ideal case of a training which colocate trainer and executor in a 1 to 1 ratio. We can have a spark executor running on a cpu machine and push training input to a gpu machine for trainer with Yarn RM.
  5. API design starts here: Encapsulation paddle inside spark ML API, accomplish Deep learning support in spark ML. Our work demonstrate how DP platform could be fit into spark ML seamlessly.
  6. Current Spark ML parameter abstraction is not enough for deep learning, so we extend it with a neural network parameter to support DP.
  7. Hyper parameter is easier to achieve here through code level configuration. You can code your own rule of tuning process and have spark take care of the rest. Code level config can do sanity check like spelling, correct unreasonable configs.
  8. 训练过程动态增减trainer 数量 资源抢占 资源混布 cpu gpu 混合使用 Serving model service could be deployed through yarn after training automatically.