SlideShare ist ein Scribd-Unternehmen logo
1 von 11
Downloaden Sie, um offline zu lesen
PADDLE PADDLE
COMPLETE SOLUTION FOR BUSINESS
DOCUMENTATION
▸ Github repo:
▸ https://github.com/PaddlePaddle/Paddle
▸ DL 101 Book:
▸ book.paddlepaddle.org
▸ https://github.com/PaddlePaddle/book
A REAL REQUEST FOR AI
▸ How to control TV sets via voice
▸ AI Hub
▸ No. An Alexa in each room?
▸ AI API
▸ No. Business owners don’t want user behavior data go to AI tech providers.
▸ AI on Cloud
▸ No. GPU instances are too expensive.
▸ AI on on-premise clusters
▸ Yes.
CLOUD AND ON-PREMISE CLUSTERS
Internet traditional
big
companies
on-
premises
cluster
on-
premises
cluster
small
companies
cloud
on-
premises
cluster
THE SOLUTION - GENERAL PURPOSE CLUSTERS
GPU servers Multi-GPU servers CPU servers…
Kubernetes: a distributed operating system
PaddleSpark
speech
model
trainer
speech
API
server
fluentd
nginx
log Kafka
online
data
process
offline
data
process
Hadoop HDFS
labeled
data
model
Internet
clients:
- Web browser
- mobile apps
- IoT devices
CHALLENGES - GENERAL PURPOSE CLUSTERS
▸ group replica of processes into jobs
▸ Web services, data processing pipelines, machine learning jobs.
▸ service isolation and multi-user
▸ online experiments requires real log data stream, so
▸ we run production jobs and experimental jobs on the same cluster.
▸ priority-based scheduling
▸ a high-priority (production) job can preempt low-priority (experiment) jobs.
▸ make full use of hardware
▸ e.g., schedule processes of a Hadoop job that requires network and disk bandwidth
and processes of a deep learning job that requires GPU on the same node.
CHALLENGES - FAULT-TOLERABLE JOBS
▸ auto-scaling
▸ there are often many active users at day time, so the cluster kills processes of
deep learning jobs and creates more Web service processes.
▸ in nights, it kills some Web service processes to run more deep learning
processes.
▸ fault-recovery
▸ a job must be tolerable with a varying number of processes.
▸ speedup v.s. fault-recovery
▸ speedup optimizes a job.
▸ speedup with fault-tolerance optimizes the business.
A PADDLE PADDLE JOB
parameter
server 1
parameter
server 2
trainer 1
global
model
shard
1/2
global
model
shard
2/2
local
model
shard
1/2
local
model
shard
2/2
trainer 2
local
model
shard
1/2
local
model
shard
2/2
trainer 3
local
model
shard
1/2
local
model
shard
2/2
master
gradients/model gradients/model gradients/model
tasks
tasks
tasks
AUTO FAULT-RECOVERY
etcd
job B
master of job A
job A
task 4
task 2
task 1
todo
pending
done
task 3
task 2 task 1
todo
pending
done
task 3
master of job B
todo
created
pending
done
dispatched
completed
timeout
KEEP OPEN
▸ Thanks to the Kubernetes community for their expertise on
distributed computing and their effort of code review.
▸ We hope to see more traditional industries have their on-
premise clusters support running their whole business.
▸ PaddlePaddle will keep open.
▸ We are working on open source more AI technologies
basing on PaddlePaddle.

Weitere ähnliche Inhalte

Was ist angesagt?

Improving your workflow with gulp
Improving your workflow with gulpImproving your workflow with gulp
Improving your workflow with gulp
frontendne
 
Angular workflow with gulp.js
Angular workflow with gulp.jsAngular workflow with gulp.js
Angular workflow with gulp.js
Cihad Horuzoğlu
 
JavaOne14 Hands-on Hadoop
JavaOne14 Hands-on HadoopJavaOne14 Hands-on Hadoop
JavaOne14 Hands-on Hadoop
templedf
 

Was ist angesagt? (20)

Html5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhraHtml5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhra
 
Web development tools { starter pack }
Web development tools { starter pack }Web development tools { starter pack }
Web development tools { starter pack }
 
Serverless preview environments to the rescue
Serverless preview environments to the rescueServerless preview environments to the rescue
Serverless preview environments to the rescue
 
The GrapQL ecosystem
The GrapQL ecosystemThe GrapQL ecosystem
The GrapQL ecosystem
 
RedisConf17- Redis as a Primary Data Store
RedisConf17- Redis as a Primary Data StoreRedisConf17- Redis as a Primary Data Store
RedisConf17- Redis as a Primary Data Store
 
Improving your workflow with gulp
Improving your workflow with gulpImproving your workflow with gulp
Improving your workflow with gulp
 
London HUG 8/3 - Nomad
London HUG 8/3 - NomadLondon HUG 8/3 - Nomad
London HUG 8/3 - Nomad
 
Angular workflow with gulp.js
Angular workflow with gulp.jsAngular workflow with gulp.js
Angular workflow with gulp.js
 
Managing short lived Kubernetes (Production) deployments
Managing short lived Kubernetes (Production) deploymentsManaging short lived Kubernetes (Production) deployments
Managing short lived Kubernetes (Production) deployments
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 
Firebase Cloud Functions: a quick overview
Firebase Cloud Functions: a quick overviewFirebase Cloud Functions: a quick overview
Firebase Cloud Functions: a quick overview
 
Intro to Gulp
Intro to GulpIntro to Gulp
Intro to Gulp
 
Introduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud FunctionsIntroduction to Serverless and Google Cloud Functions
Introduction to Serverless and Google Cloud Functions
 
Migratory Workloads Across Clouds with Nomad
Migratory Workloads Across Clouds with NomadMigratory Workloads Across Clouds with Nomad
Migratory Workloads Across Clouds with Nomad
 
Google Cloud Computing compares GCE, GAE and GKE
Google Cloud Computing compares GCE, GAE and GKEGoogle Cloud Computing compares GCE, GAE and GKE
Google Cloud Computing compares GCE, GAE and GKE
 
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
TensorFlow London 14: Ben Hall 'Machine Learning Workloads with Kubernetes an...
 
Experiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and PostgresqlExperiences sharing about Lambda, Kinesis, and Postgresql
Experiences sharing about Lambda, Kinesis, and Postgresql
 
JavaOne14 Hands-on Hadoop
JavaOne14 Hands-on HadoopJavaOne14 Hands-on Hadoop
JavaOne14 Hands-on Hadoop
 
Gulp - the streaming build system
Gulp - the streaming build systemGulp - the streaming build system
Gulp - the streaming build system
 
[OracleCode - SF] Distributed caching for your next node.js project
[OracleCode - SF] Distributed caching for your next node.js project[OracleCode - SF] Distributed caching for your next node.js project
[OracleCode - SF] Distributed caching for your next node.js project
 

Ähnlich wie PaddlePaddle: A Complete Enterprise Solution

Usability in the GeoWeb
Usability in the GeoWebUsability in the GeoWeb
Usability in the GeoWeb
Dave Bouwman
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
NVIDIA Taiwan
 
The Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with ChefThe Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
Chef Software, Inc.
 

Ähnlich wie PaddlePaddle: A Complete Enterprise Solution (20)

Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
Usability in the GeoWeb
Usability in the GeoWebUsability in the GeoWeb
Usability in the GeoWeb
 
Developing Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/KubernetesDeveloping Microservices Directly in AKS/Kubernetes
Developing Microservices Directly in AKS/Kubernetes
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
NoSQL and ACID
NoSQL and ACIDNoSQL and ACID
NoSQL and ACID
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
Deploying Perl apps on dotCloud
Deploying Perl apps on dotCloudDeploying Perl apps on dotCloud
Deploying Perl apps on dotCloud
 
The Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with ChefThe Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
The Kitchen Cloud How To: Automating Joyent SmartMachines with Chef
 
The Visual Computing Company
The Visual Computing CompanyThe Visual Computing Company
The Visual Computing Company
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
 
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMRBig Data and Hadoop in Cloud - Leveraging Amazon EMR
Big Data and Hadoop in Cloud - Leveraging Amazon EMR
 
Docker for mac & local developer environment optimization
Docker for mac & local developer environment optimizationDocker for mac & local developer environment optimization
Docker for mac & local developer environment optimization
 
M|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real WorldM|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real World
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
H2O on Hadoop Dec 12
H2O on Hadoop Dec 12 H2O on Hadoop Dec 12
H2O on Hadoop Dec 12
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 

Kürzlich hochgeladen

Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Chandigarh Call girls 9053900678 Call girls in Chandigarh
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
imonikaupta
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Kürzlich hochgeladen (20)

Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
Low Sexy Call Girls In Mohali 9053900678 🥵Have Save And Good Place 🥵
 
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRLLucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
Lucknow ❤CALL GIRL 88759*99948 ❤CALL GIRLS IN Lucknow ESCORT SERVICE❤CALL GIRL
 
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
Sarola * Female Escorts Service in Pune | 8005736733 Independent Escorts & Da...
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
Hire↠Young Call Girls in Tilak nagar (Delhi) ☎️ 9205541914 ☎️ Independent Esc...
 
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
VVIP Pune Call Girls Mohammadwadi WhatSapp Number 8005736733 With Elite Staff...
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl ServiceRussian Call girl in Ajman +971563133746 Ajman Call girl Service
Russian Call girl in Ajman +971563133746 Ajman Call girl Service
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts ServiceReal Escorts in Al Nahda +971524965298 Dubai Escorts Service
Real Escorts in Al Nahda +971524965298 Dubai Escorts Service
 
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...Nanded City ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready ...
Nanded City ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready ...
 
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...(+971568250507  ))#  Young Call Girls  in Ajman  By Pakistani Call Girls  in ...
(+971568250507 ))# Young Call Girls in Ajman By Pakistani Call Girls in ...
 

PaddlePaddle: A Complete Enterprise Solution

  • 2. DOCUMENTATION ▸ Github repo: ▸ https://github.com/PaddlePaddle/Paddle ▸ DL 101 Book: ▸ book.paddlepaddle.org ▸ https://github.com/PaddlePaddle/book
  • 3. A REAL REQUEST FOR AI ▸ How to control TV sets via voice ▸ AI Hub ▸ No. An Alexa in each room? ▸ AI API ▸ No. Business owners don’t want user behavior data go to AI tech providers. ▸ AI on Cloud ▸ No. GPU instances are too expensive. ▸ AI on on-premise clusters ▸ Yes.
  • 4.
  • 5. CLOUD AND ON-PREMISE CLUSTERS Internet traditional big companies on- premises cluster on- premises cluster small companies cloud on- premises cluster
  • 6. THE SOLUTION - GENERAL PURPOSE CLUSTERS GPU servers Multi-GPU servers CPU servers… Kubernetes: a distributed operating system PaddleSpark speech model trainer speech API server fluentd nginx log Kafka online data process offline data process Hadoop HDFS labeled data model Internet clients: - Web browser - mobile apps - IoT devices
  • 7. CHALLENGES - GENERAL PURPOSE CLUSTERS ▸ group replica of processes into jobs ▸ Web services, data processing pipelines, machine learning jobs. ▸ service isolation and multi-user ▸ online experiments requires real log data stream, so ▸ we run production jobs and experimental jobs on the same cluster. ▸ priority-based scheduling ▸ a high-priority (production) job can preempt low-priority (experiment) jobs. ▸ make full use of hardware ▸ e.g., schedule processes of a Hadoop job that requires network and disk bandwidth and processes of a deep learning job that requires GPU on the same node.
  • 8. CHALLENGES - FAULT-TOLERABLE JOBS ▸ auto-scaling ▸ there are often many active users at day time, so the cluster kills processes of deep learning jobs and creates more Web service processes. ▸ in nights, it kills some Web service processes to run more deep learning processes. ▸ fault-recovery ▸ a job must be tolerable with a varying number of processes. ▸ speedup v.s. fault-recovery ▸ speedup optimizes a job. ▸ speedup with fault-tolerance optimizes the business.
  • 9. A PADDLE PADDLE JOB parameter server 1 parameter server 2 trainer 1 global model shard 1/2 global model shard 2/2 local model shard 1/2 local model shard 2/2 trainer 2 local model shard 1/2 local model shard 2/2 trainer 3 local model shard 1/2 local model shard 2/2 master gradients/model gradients/model gradients/model tasks tasks tasks
  • 10. AUTO FAULT-RECOVERY etcd job B master of job A job A task 4 task 2 task 1 todo pending done task 3 task 2 task 1 todo pending done task 3 master of job B todo created pending done dispatched completed timeout
  • 11. KEEP OPEN ▸ Thanks to the Kubernetes community for their expertise on distributed computing and their effort of code review. ▸ We hope to see more traditional industries have their on- premise clusters support running their whole business. ▸ PaddlePaddle will keep open. ▸ We are working on open source more AI technologies basing on PaddlePaddle.