SlideShare ist ein Scribd-Unternehmen logo
1 von 37
All AI Roads Lead to Distribution
jim_dowling
dotAI Paris 2018
©2018 Logical Clocks AB. All Rights Reserved
2/38
“Methods that scale with computation
are the future of AI”*
- Rich Sutton (Founding Father of Reinforcement Learning)
* https://www.youtube.com/watch?v=EeMCEQa85tw
©2018 Logical Clocks AB. All Rights Reserved
Massive Increase in Compute for AI*
Distributed Systems
3.5 month-doubling time
*https://blog.openai.com/ai-and-compute
3/38
©2018 Logical Clocks AB. All Rights Reserved
Model Parallelism Data Parallelism
GPU2
GPU3
GPU1Layers1-100
Layer100-199
Layer200-299
Training/Test Data
Predictions
GPU0 GPU1 GPU99
Distributed Filesystem
32 files 32 files 32 files
Batch Size
of 3200
One Big Model on 3 GPUs
Copy of the Model
on each GPU
Aggregate
Gradients
New Model
Barrier
4/38
©2018 Logical Clocks AB. All Rights Reserved
CNNs
Distribution
Facebook vs Google
ImageNet
Mini AI Arms Race
5/38
©2018 Logical Clocks AB. All Rights Reserved
ImageNet
[Image from https://www.slideshare.net/xavigiro/image-classification-on-imagenet-d1l4-2017-upc-deep-learning-for-computer-vision ]
6/38
Improved Accuracy for ImageNet
©2018 Logical Clocks AB. All Rights Reserved
Improvements in ImageNet – Accuracy
81.5
82
82.5
83
83.5
84
84.5
GOOGLE-1 (MAY '17) GOOGLE-2 (MARCH '18) FACEBOOK (MAY '18)
Top-1 Accuracy for ImageNet
ImageNet Top-1
AutoML
with RL
AutoML
with GA
Facebook- https://goo.gl/ERpJyr
Google-1: https://goo.gl/EV7Xv1
Google-2:https://goo.gl/eidnyQ
More Compute
1 Bn Images
More Data
8/38
©2018 Logical Clocks AB. All Rights Reserved
Reduced
Generalization
Error
Hyper
Parameter
Optimization
Larger
Training
Datasets
Better
Regularization
Methods
Better
Optimization
Algorithms
Design
Better
Models
Methods for Improving Model Accuracy
Distribution
9/38
Faster Training of ImageNet
Single GPU
[Image from https://www.matroid.com/scaledml/2018/jeff.pdf]
Clustered GPUs
©2018 Logical Clocks AB. All Rights Reserved
Improvements in ImageNet – Training Time
June’17 Oct’17
FB - https://goo.gl/ERpJyr
Google-1: https://goo.gl/EV7Xv1
Google-2:https://goo.gl/eidnyQ
0
10
20
30
40
50
60
FACEBOOK (JUNE '17) GOOGLE-1 (DEC '17) GOOGLE-2 (MARCH '18)
Training Time (mins) ImageNet >75% top-1
Training Time (mins)
256 Nvidia P100s
256 TPUv2s
256 TPUv2s
11/38
©2018 Logical Clocks AB. All Rights Reserved
0
20
40
60
80
100
120
140
160
FACEBOOK (JUNE '17) GOOGLE-1 (DEC '17) GOOGLE-2 (MARCH '18)
1000s Files/Sec Processed
Files Processed (K/s)
ImageNet – Files/Sec Processed
June’17
FB - https://goo.gl/ERpJyr
Google-1: https://goo.gl/EV7Xv1
Google-2:https://goo.gl/eidnyQ
HDFS
~80K/s
12/38
©2018 Logical Clocks AB. All Rights Reserved
0
200
400
600
800
1000
1200
FACEBOOK GOOGLE-1 GOOGLE-2 HOPSFS
Distributed Filesystem Read Throughput
Files Processed (K/s)
Remove Bottleneck and Keep Scaling
HopsFS - https://goo.gl/yFCsGc
Scale Challenge Winner (2017)
Hops
PLUG for Hops
13/38
©2018 Logical Clocks AB. All Rights Reserved
All AI Roads Lead to Distribution
Distributed
Deep Learning
Hyper
Parameter
Optimization
Distributed
Training
Larger
Training
Datasets
Elastic
Model
Serving
Parallel
Experiments
(Commodity)
GPU Clusters
Auto
ML
14/38
THEY WILL
IGNORE YOU, DISTRIBUTION
UNTIL THEY
NEED YOU DISTRIBUTION
©2018 Logical Clocks AB. All Rights Reserved
Top 10 List: Why Big Data/Distribution sucks
1. I don’t know which is scarier: reading Stephen King’s horror
novel IT or reading a file from HDFS in Python
2. MapReduce – how can a programming model with only 2
parts have O(N2) complexity?
3. I managed to install Hadoop – at the cost of my last
relationship…the Spark died!
4. “Jupyter’s down again” - some service you never heard of
stopped it working….
5. I still don’t know which is worse - a dead service or a dead
slow service.
16/38
©2018 Logical Clocks AB. All Rights Reserved
Top 10 List: why Big Data/Distribution sucks
6. How do I debug this thing on my laptop?
7. Enough with the Dockerfiles and YAML files, when can I
start writing *real* code?
8. Isn’t it supposed to be faster when I add a new server?
9. Do I really have to install the library on *all* of the
servers?
10.Distributed? I don’t even do multi-threaded!
17/38
KubeFlow
GPU/TPU Services in the Cloud and On-Premise
•Managed Cloud Platforms
-RiseML
-FloydHub
-Google Cloud ML
-Microsoft Batch AI
-AWS Sagemaker
•On-Premise/Cloud Platforms
-KubeFlow
-Hops
18/38
©2018 Logical Clocks AB. All Rights Reserved
Hops Open-Source Data Science Platform
19/38
©2018 Logical Clocks AB. All Rights Reserved
PySpark
End-to-End ML Pipeline in Python on Hops
20/38
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
TfServingTensorFlow
FLIP-O-RAMA with Python and HDFS/Hops
Left Hand Here
Right
Thumb
Here
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
import hops.hdfs as hdfs
features = ["Age", "Occupation", "Sex", …, "Country"]
with open(hdfs.project_path() +
"/TestJob/data/census/adult.data", "r") as trainFile:
train_data=pd.read_csv(trainFile, names=features,
sep=r's*,s*', engine='python', na_values="?")
Data Ingestion with Pandas
Right
Thumb
Here
import hops.hdfs as hdfs
features = ["Age", "Occupation", "Sex", …, "Country"]
h = hdfs.get_fs()
with h.open_file(hdfs.project_path() +
"/TestJob/data/census/adult.data", "r") as trainFile:
train_data=pd.read_csv(trainFile, names=features,
sep=r's*,s*', engine='python', na_values="?")
Data Ingestion with Pandas on Hops
Right
Thumb
Here
Google Facets
24/38
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
train_data = pd.read_csv(…)
test_data = pd.read_csv(…)
facets.overview(train_data, test_data)
facets.dive(test_data.to_json(orient='records'))
Google Facets
25/38
Big Data Preparation with PySpark
images = spark.readImages(IMAGE_PATH, recursive = True,
numPartitions=10, sampleRatio = 0.1).cache()
tr = (ImageTransformer().setOutputCol(“transformed”)
.resize(height = 200, width = 200)
.crop(0, 0, height = 180, width = 180) )
smallImages = tr.transform(images).select(“transformed”)
dfutil.saveAsTFRecords(smallImages, OUTPUT_DIR)
26/38
Experimentation and Training
Data
Collection
Experimentation Training Serving
Feature
Extraction
Data
Transformation
& Verification
Test
GridSearch with TensorFlow/Spark on Hops
def train(learning_rate, dropout):
[TensorFlow Code here]
args_dict = {'learning_rate': [0.001, 0.005, 0.01],
'dropout': [0.5, 0.6]}
experiment.launch(spark, train, args_dict)
Launch 6 Spark Executors
28/38
Model Architecture Search on TensorFlow/Hops
def train_cifar10(learning_rate, dropout, num_layers):
[TensorFlow Code here]
dict = {'learning_rate': [0.005, 0.00005],
'dropout': [0.01, 0.99], 'num_layers': [1,3]}
experiment.evolutionary_search(spark, train, dict,
direction='max', popsize=10, generations=3, crossover=0.7,
mutation=0.5)
29/38
Launch 10 Spark Executors
©2018 Logical Clocks AB. All Rights Reserved
Differential Evolution in Tensorboard (1/4)
CIFAR-10
30/38
©2018 Logical Clocks AB. All Rights Reserved
Differential Evolution in Tensorboard (2/4)
CIFAR-10
31/38
©2018 Logical Clocks AB. All Rights Reserved
Differential Evolution in Tensorboard (3/4)
CIFAR-10
32/38
©2018 Logical Clocks AB. All Rights Reserved
Differential Evolution in Tensorboard (4/4)
CIFAR-10
33/38
Distributed Training with Horovod
hvd.init()
opt = hvd.DistributedOptimizer(opt)
if hvd.local_rank()==0:
[TensorFlow Code here]
…..
else:
[TensorFlow Code here]
…..
34/38
©2018 Logical Clocks AB. All Rights Reserved
Model Serving: Monitor + Retrain
35/38
Summary
•The future of Deep Learning is Distributed
https://www.oreilly.com/ideas/distributed-tensorflow
•Hops is a new Data Platform with first-class support for
Python / Deep Learning / ML / Data Governance / GPUs
*https://twitter.com/karpathy/status/972701240017633281
“It is starting to look like deep learning workflows of the future
feature autotuned architectures running with autotuned
compute schedules across arbitrary backends.”
Andrej Karpathy - Head of AI @ Tesla
36/38
©2018 Logical Clocks AB. All Rights Reserved
The Team
Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud
Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios
Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, August
Bonds.
Active:
Alumni:
Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram
Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto
Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro,
Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos
Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid
Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al
Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka,
Tobias Johansson , Roberto Bampi, Roshan Sedar.
www.hops.io
@hopshadoop

Weitere ähnliche Inhalte

Ähnlich wie All AI Roads lead to Distribution - Dot AI

Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJim Dowling
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsJim Dowling
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lGanesan Narayanasamy
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform Seldon
 
Intelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudIntelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudHenrik Hammer Eliassen
 
Northwestern 20181004 v9
Northwestern 20181004 v9Northwestern 20181004 v9
Northwestern 20181004 v9ISSIP
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data LakeRobert Chong
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraJim Dowling
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteDenis Magda
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusDatabricks
 
Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Expertsgraphistry
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
 
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningWebinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningMesosphere Inc.
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetTigerGraph
 

Ähnlich wie All AI Roads lead to Distribution - Dot AI (20)

Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
Berlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on HopsBerlin buzzwords 2018 TensorFlow on Hops
Berlin buzzwords 2018 TensorFlow on Hops
 
OpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in ZurichOpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in Zurich
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2l
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform TensorFlow 16: Building a Data Science Platform
TensorFlow 16: Building a Data Science Platform
 
Intelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudIntelligent internet of things with Google Cloud
Intelligent internet of things with Google Cloud
 
Northwestern 20181004 v9
Northwestern 20181004 v9Northwestern 20181004 v9
Northwestern 20181004 v9
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
Continuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache IgniteContinuous Machine and Deep Learning with Apache Ignite
Continuous Machine and Deep Learning with Apache Ignite
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
Enterprise Data Lakes
Enterprise Data LakesEnterprise Data Lakes
Enterprise Data Lakes
 
OpenPOWER foundation
OpenPOWER foundationOpenPOWER foundation
OpenPOWER foundation
 
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using PrometheusMonitoring of GPU Usage with Tensorflow Models Using Prometheus
Monitoring of GPU Usage with Tensorflow Models Using Prometheus
 
Scaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & ExpertsScaling graph investigations with Math, GPUs, & Experts
Scaling graph investigations with Math, GPUs, & Experts
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Webinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the LearningWebinar: Deep Learning Pipelines Beyond the Learning
Webinar: Deep Learning Pipelines Beyond the Learning
 
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 DatasetGraph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
Graph Gurus Episode 37: Modeling for Kaggle COVID-19 Dataset
 

Mehr von Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 

Mehr von Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 

Kürzlich hochgeladen

GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...SUHANI PANDEY
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445ruhi
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024APNIC
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...SUHANI PANDEY
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls DubaiEscorts Call Girls
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.soniya singh
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...tanu pandey
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.soniya singh
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubaikojalkojal131
 

Kürzlich hochgeladen (20)

(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
(INDIRA) Call Girl Pune Call Now 8250077686 Pune Escorts 24x7
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
 
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
6.High Profile Call Girls In Punjab +919053900678 Punjab Call GirlHigh Profil...
 
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Connaught Place ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
DDoS In Oceania and the Pacific, presented by Dave Phelan at NZNOG 2024
 
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...Russian Call Girls Pune  (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...Pune Airport ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready...
Pune Airport ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready...
 
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Sarai Rohilla Escort Service Delhi N.C.R.
 
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls DubaiDubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
Dubai=Desi Dubai Call Girls O525547819 Outdoor Call Girls Dubai
 

All AI Roads lead to Distribution - Dot AI

  • 1. All AI Roads Lead to Distribution jim_dowling dotAI Paris 2018
  • 2. ©2018 Logical Clocks AB. All Rights Reserved 2/38 “Methods that scale with computation are the future of AI”* - Rich Sutton (Founding Father of Reinforcement Learning) * https://www.youtube.com/watch?v=EeMCEQa85tw
  • 3. ©2018 Logical Clocks AB. All Rights Reserved Massive Increase in Compute for AI* Distributed Systems 3.5 month-doubling time *https://blog.openai.com/ai-and-compute 3/38
  • 4. ©2018 Logical Clocks AB. All Rights Reserved Model Parallelism Data Parallelism GPU2 GPU3 GPU1Layers1-100 Layer100-199 Layer200-299 Training/Test Data Predictions GPU0 GPU1 GPU99 Distributed Filesystem 32 files 32 files 32 files Batch Size of 3200 One Big Model on 3 GPUs Copy of the Model on each GPU Aggregate Gradients New Model Barrier 4/38
  • 5. ©2018 Logical Clocks AB. All Rights Reserved CNNs Distribution Facebook vs Google ImageNet Mini AI Arms Race 5/38
  • 6. ©2018 Logical Clocks AB. All Rights Reserved ImageNet [Image from https://www.slideshare.net/xavigiro/image-classification-on-imagenet-d1l4-2017-upc-deep-learning-for-computer-vision ] 6/38
  • 8. ©2018 Logical Clocks AB. All Rights Reserved Improvements in ImageNet – Accuracy 81.5 82 82.5 83 83.5 84 84.5 GOOGLE-1 (MAY '17) GOOGLE-2 (MARCH '18) FACEBOOK (MAY '18) Top-1 Accuracy for ImageNet ImageNet Top-1 AutoML with RL AutoML with GA Facebook- https://goo.gl/ERpJyr Google-1: https://goo.gl/EV7Xv1 Google-2:https://goo.gl/eidnyQ More Compute 1 Bn Images More Data 8/38
  • 9. ©2018 Logical Clocks AB. All Rights Reserved Reduced Generalization Error Hyper Parameter Optimization Larger Training Datasets Better Regularization Methods Better Optimization Algorithms Design Better Models Methods for Improving Model Accuracy Distribution 9/38
  • 10. Faster Training of ImageNet Single GPU [Image from https://www.matroid.com/scaledml/2018/jeff.pdf] Clustered GPUs
  • 11. ©2018 Logical Clocks AB. All Rights Reserved Improvements in ImageNet – Training Time June’17 Oct’17 FB - https://goo.gl/ERpJyr Google-1: https://goo.gl/EV7Xv1 Google-2:https://goo.gl/eidnyQ 0 10 20 30 40 50 60 FACEBOOK (JUNE '17) GOOGLE-1 (DEC '17) GOOGLE-2 (MARCH '18) Training Time (mins) ImageNet >75% top-1 Training Time (mins) 256 Nvidia P100s 256 TPUv2s 256 TPUv2s 11/38
  • 12. ©2018 Logical Clocks AB. All Rights Reserved 0 20 40 60 80 100 120 140 160 FACEBOOK (JUNE '17) GOOGLE-1 (DEC '17) GOOGLE-2 (MARCH '18) 1000s Files/Sec Processed Files Processed (K/s) ImageNet – Files/Sec Processed June’17 FB - https://goo.gl/ERpJyr Google-1: https://goo.gl/EV7Xv1 Google-2:https://goo.gl/eidnyQ HDFS ~80K/s 12/38
  • 13. ©2018 Logical Clocks AB. All Rights Reserved 0 200 400 600 800 1000 1200 FACEBOOK GOOGLE-1 GOOGLE-2 HOPSFS Distributed Filesystem Read Throughput Files Processed (K/s) Remove Bottleneck and Keep Scaling HopsFS - https://goo.gl/yFCsGc Scale Challenge Winner (2017) Hops PLUG for Hops 13/38
  • 14. ©2018 Logical Clocks AB. All Rights Reserved All AI Roads Lead to Distribution Distributed Deep Learning Hyper Parameter Optimization Distributed Training Larger Training Datasets Elastic Model Serving Parallel Experiments (Commodity) GPU Clusters Auto ML 14/38
  • 15. THEY WILL IGNORE YOU, DISTRIBUTION UNTIL THEY NEED YOU DISTRIBUTION
  • 16. ©2018 Logical Clocks AB. All Rights Reserved Top 10 List: Why Big Data/Distribution sucks 1. I don’t know which is scarier: reading Stephen King’s horror novel IT or reading a file from HDFS in Python 2. MapReduce – how can a programming model with only 2 parts have O(N2) complexity? 3. I managed to install Hadoop – at the cost of my last relationship…the Spark died! 4. “Jupyter’s down again” - some service you never heard of stopped it working…. 5. I still don’t know which is worse - a dead service or a dead slow service. 16/38
  • 17. ©2018 Logical Clocks AB. All Rights Reserved Top 10 List: why Big Data/Distribution sucks 6. How do I debug this thing on my laptop? 7. Enough with the Dockerfiles and YAML files, when can I start writing *real* code? 8. Isn’t it supposed to be faster when I add a new server? 9. Do I really have to install the library on *all* of the servers? 10.Distributed? I don’t even do multi-threaded! 17/38
  • 18. KubeFlow GPU/TPU Services in the Cloud and On-Premise •Managed Cloud Platforms -RiseML -FloydHub -Google Cloud ML -Microsoft Batch AI -AWS Sagemaker •On-Premise/Cloud Platforms -KubeFlow -Hops 18/38
  • 19. ©2018 Logical Clocks AB. All Rights Reserved Hops Open-Source Data Science Platform 19/38
  • 20. ©2018 Logical Clocks AB. All Rights Reserved PySpark End-to-End ML Pipeline in Python on Hops 20/38 Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test TfServingTensorFlow
  • 21. FLIP-O-RAMA with Python and HDFS/Hops Left Hand Here Right Thumb Here Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test
  • 22. import hops.hdfs as hdfs features = ["Age", "Occupation", "Sex", …, "Country"] with open(hdfs.project_path() + "/TestJob/data/census/adult.data", "r") as trainFile: train_data=pd.read_csv(trainFile, names=features, sep=r's*,s*', engine='python', na_values="?") Data Ingestion with Pandas Right Thumb Here
  • 23. import hops.hdfs as hdfs features = ["Age", "Occupation", "Sex", …, "Country"] h = hdfs.get_fs() with h.open_file(hdfs.project_path() + "/TestJob/data/census/adult.data", "r") as trainFile: train_data=pd.read_csv(trainFile, names=features, sep=r's*,s*', engine='python', na_values="?") Data Ingestion with Pandas on Hops Right Thumb Here
  • 24. Google Facets 24/38 Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test
  • 25. train_data = pd.read_csv(…) test_data = pd.read_csv(…) facets.overview(train_data, test_data) facets.dive(test_data.to_json(orient='records')) Google Facets 25/38
  • 26. Big Data Preparation with PySpark images = spark.readImages(IMAGE_PATH, recursive = True, numPartitions=10, sampleRatio = 0.1).cache() tr = (ImageTransformer().setOutputCol(“transformed”) .resize(height = 200, width = 200) .crop(0, 0, height = 180, width = 180) ) smallImages = tr.transform(images).select(“transformed”) dfutil.saveAsTFRecords(smallImages, OUTPUT_DIR) 26/38
  • 27. Experimentation and Training Data Collection Experimentation Training Serving Feature Extraction Data Transformation & Verification Test
  • 28. GridSearch with TensorFlow/Spark on Hops def train(learning_rate, dropout): [TensorFlow Code here] args_dict = {'learning_rate': [0.001, 0.005, 0.01], 'dropout': [0.5, 0.6]} experiment.launch(spark, train, args_dict) Launch 6 Spark Executors 28/38
  • 29. Model Architecture Search on TensorFlow/Hops def train_cifar10(learning_rate, dropout, num_layers): [TensorFlow Code here] dict = {'learning_rate': [0.005, 0.00005], 'dropout': [0.01, 0.99], 'num_layers': [1,3]} experiment.evolutionary_search(spark, train, dict, direction='max', popsize=10, generations=3, crossover=0.7, mutation=0.5) 29/38 Launch 10 Spark Executors
  • 30. ©2018 Logical Clocks AB. All Rights Reserved Differential Evolution in Tensorboard (1/4) CIFAR-10 30/38
  • 31. ©2018 Logical Clocks AB. All Rights Reserved Differential Evolution in Tensorboard (2/4) CIFAR-10 31/38
  • 32. ©2018 Logical Clocks AB. All Rights Reserved Differential Evolution in Tensorboard (3/4) CIFAR-10 32/38
  • 33. ©2018 Logical Clocks AB. All Rights Reserved Differential Evolution in Tensorboard (4/4) CIFAR-10 33/38
  • 34. Distributed Training with Horovod hvd.init() opt = hvd.DistributedOptimizer(opt) if hvd.local_rank()==0: [TensorFlow Code here] ….. else: [TensorFlow Code here] ….. 34/38
  • 35. ©2018 Logical Clocks AB. All Rights Reserved Model Serving: Monitor + Retrain 35/38
  • 36. Summary •The future of Deep Learning is Distributed https://www.oreilly.com/ideas/distributed-tensorflow •Hops is a new Data Platform with first-class support for Python / Deep Learning / ML / Data Governance / GPUs *https://twitter.com/karpathy/status/972701240017633281 “It is starting to look like deep learning workflows of the future feature autotuned architectures running with autotuned compute schedules across arbitrary backends.” Andrej Karpathy - Head of AI @ Tesla 36/38
  • 37. ©2018 Logical Clocks AB. All Rights Reserved The Team Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Antonios Kouzoupis, Alex Ormenisan, Fabio Buso, Robin Andersson, August Bonds. Active: Alumni: Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Steffen Grohsschmiedt, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu, Fanti Machmount Al Samisti, Braulio Grana, Adam Alpire, Zahin Azher Rashid, ArunaKumari Yedurupaka, Tobias Johansson , Roberto Bampi, Roshan Sedar. www.hops.io @hopshadoop

Hinweis der Redaktion

  1. AI is going distributed. These words come from Rich Sutton, and that is significant because the pre-Deepmind reinforcement learning developed by him and others did not scale with available compute. It was single-agent, single-threaded AI. But that has changed in the last few years as we can see on this chart.
  2. If we take AlexNet as the starting point for the deep learning revolution on hardware accelerators, like GPUs, since then, in the last 5 years, we have seen a 300,000 fold increase in the number of compute operations required to train the state-of-the-art deep learning models, recently with Neural Machine translation by Google and Alpha Go Zero by DeepMind. That is a 3 and a half month doubling time. And if we compare that with Moore’s Law - Moore’s law states that the number of transistors on a chip double every 18 months. When Moore’s Law hit the wall in the mid 2000s, it crossed that chasm by going multi-core. This increase in compute in the last 5 years has not happened because Nvidia doubled the number of cores on their GPUs every 3.5 months. Deep Learning also hit a wall around 2016. We crossed that chasm by going distributed. And yes, AlphaGo would not have beaten the world’s best Go Player if it had not been a distributed system.
  3. DL years are like dog years – so the last 5 years is a half a lifetime. Let’s look at the last 12 months. anthropomorphized by Yann Le Cunn and Jeff Dean
  4. 1 billion images with 1500 hashtags from instagram, trained over several weeks on 336 GPUs.
  5. Better Optimization algorithms –large-batch second-order methods like K-FAC Better Regularization methods – like Dropout, The other 3 – which were the domain of ML experts can now be the domain of AI itself. Throw compute and storage at them and they will do better than us.
  6. The first phase of the Battle begins with who can train DNNs faster
  7. Facebook initiated the hostilities….
  8. A friend tried to explain Paxos to me – it was painful for both of us.
  9. How can it be slower this time? I haven’t changed anything! What do you mean service X have moved? A WHAT caused my system to crash?!? What do you mean the time isn’t the same at all servers? Hey Spark, I just want to collect TBs at the Driver….what’s your problem with that?
  10. KubeFlow – Data Scientists write infrastructure (Dockerfiles, Kubernetes YML files) and ML/DL code
  11. Code over Visual Programming Languages: Python/R ML Frameworks: scikit-learn, SparkML Notebooks: Jupyter DL Frameworks: Keras-TensorFlow/PyTorch
  12. we have this dataset – the census data from the US. The hdfs module we import will just give us a path to a project – not a HDFS path, just a “/Project/dotai” path.
  13. Flickorama
  14. Higher-Level APIs in TensorFlow Estimator, Experiment, and Dataset tensorflow.contrib.learn.python.learn.estimators Tuner.run_experiment(..) Spark can launch many TensorFlow jobs in parallel for Hyperparameter Tuning One experiment per Mapper
  15. Higher-Level APIs in TensorFlow Estimator, Experiment, and Dataset tensorflow.contrib.learn.python.learn.estimators Tuner.run_experiment(..) Spark can launch many TensorFlow jobs in parallel for Hyperparameter Tuning One experiment per Mapper
  16. In other words, the future of deep learning is “distributed”.
  17. HopsFS is a drop-in replacement for HDFS that can scale Hadoop Filesystem (HDFS) to over 1 million ops/s by migrating the NameNode metadata to an external scale-out in-memory database. This talk will introduce recent improvements in HopsFS: storing small files in the database (both in-memory and on SSD disk tables), a new scalable block-reporting protocol, support for erasure-coding with data locality, and work on multi-data center replication. For small files (under 64-128 KB), HopsFS can reduce read latency to under 10ms, while also improving read throughput by 3-4X and write throughput by >15X. Our new block reporting protocol reduces block reporting traffic by up to 99% for large clusters, at the cost of a small increase in metadata. While our solution for erasure-coding is implemented at the block-level preserving data locality. Finally, our ongoing work on geographic replication points a way forward for HDFS in the cloud, providing data-center level high availability without any performance hit. Like Maslo's hierarchy of needs, the AI hierarchy of needs describes the stepwise requirements that organizations must go through to become data-driven and intelligent in their use of data. First, organizations must reliably ingest and manage their data assets with pipelines - Hadoop being the natural data platform for this first level. Only then, can organizations progress to understanding their data in hindsight with analytics tools, scale-out databases, and visualization. From here, organizations need to transition to predictive analytics, labelling data, building traditional machine learning and deep learning models, and managing features and experiments. From here to hyperscale AI companies, organizations need to scale out their neural network training to tens or hundreds of GPUs and automate their machine learning processes. In this talk, we will traverse this AI hierarchy of needs for Hadoop and TensorFlow. We will start at support for GPUs in YARN, and work our way up to distributing TensorFlow (using parameter server (TensorFlowOnSpark) and AllReduce (Horovod) frameworks) across many servers, while a unified feature store based on HDFS, Hive, and Kafka.