SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
Spark Technology
Center
Convolutional Neural Networks at
Scale in MLlib
Jeremy
Nixon
Spark Technology
Center
1. Machine Learning Engineer at the Spark
Technology Center
2. Contributor to MLlib, dedicated to
scalable deep learning.
3. Previously, studied Applied Mathematics
to Computer Science and Economics at
Harvard
Jeremy Nixon
Spark Technology
Center
Large Scale Data Processing
● In-memory compute
● Up to 100x faster than Hadoop
Improved Usability
● Rich APIs in Scala, Java, Python
● Interactive Shell
Spark Technology
Center
Spark’s Machine Learning Library
● Alternating Least Squares
● Lasso
● Ridge Regression
● Logistic Regression
● Decision Trees
● Naive Bayes
● SVMs
● …
MLlib
Spark Technology
Center
Part of Spark
● Integrated Data Analysis
●
Scalable
Python, Scala, Java APIs
MLlib
Spark Technology
Center
● Deep Learning benefits from large
datasets
● Spark allows for Large Scale Data
Analysis
● Compute is Local to Data
● Integrated into organization’s Spark
Jobs
● Leverages existing compute cluster
Deep Learning
in MLlib
Spark Technology
Center
Github Link:
https://github.com/JeremyNixon/sparkdl
Spark Package:
https://spark-packages.org/package/JeremyNixon/sparkdl
Links
Spark Technology
Center
1. Framing Deep Learning
2. MLlib Deep Learning API
3. Optimization
4. Performance
5. Future Work
6. Deep Learning Options on Spark
7. Deep Learning Outside of Spark
Structure
Spark Technology
Center
1. Structural Assumptions
2. Automated Feature Engineering
3. Learning Representations
4. Applications
Framing
Convolutional
Neural Networks
Spark Technology
Center
Structural
Assumptions:
Location
Invariance
- Convolution is a restriction on the
features that can be combined.
- Location Invariance leads to strong
accuracy in vision, audio, and
language.
colah.github.io
Spark Technology
Center
Structural
Assumptions:
Hierarchical
Abstraction
Spark Technology
Center
- Pixels - Edges - Shapes - Parts - Objects
- Learn features that are optimized for the
data
- Makes transfer learning feasible
Structural
Assumptions:
Hierarchical
Abstraction
Spark Technology
Center
- Character - Word - Phrase - Sentence
- Phonemes - Words
- Pixels - Edges - Shapes - Parts - Objects
Structural
Assumptions:
Composition
Spark Technology
Center
1. CNNs - State of the art
a. Object Recognition
b. Object Localization
c. Image Segmentation
d. Image Restoration
e. Music Recommendation
2. RNNs (LSTM) - State of the Art
a. Speech Recognition
b. Question Answering
c. Machine Translation
d. Text Summarization
e. Named Entity Recognition
f. Natural Language Generation
g. Word Sense Disambiguation
h. Image / Video Captioning
i. Sentiment Analysis
Applications
Spark Technology
Center
● Computationally Efficient
● Makes Transfer Learning Easy
● Takes advantage of location
invariance
Structural
Assumptions:
Weight Sharing
Spark Technology
Center
- Network depth creates an extraordinary
range of possible models.
- That flexibility creates value in large
datasets to reduce variance.
Structural
Assumptions:
Combinatorial
Flexibility
Spark Technology
Center
Automated
Feature
Engineering
- Feature hierarchy is too complex to engineer manually
- Works well for compositional structure, overfits elsewhere
Spark Technology
Center
Learning
Representations
Hidden Layer
+
Nonlinearity
http://colah.github.io/posts/2014-03-NN-Manifolds-To
pology/
Spark Technology
Center
Flexibility. High level enough to be efficient.
Low level enough to be expressive.
MLlib Flexible Deep
Learning API
Spark Technology
Center
Modularity enables Logistic Regression,
Feedforward Networks.
MLlib Flexible Deep
Learning API
Spark Technology
Center
Optimization
Modern optimizers allow for
more efficient, stable
training.
Momentum cancels noise in
the gradient.
Spark Technology
Center
Optimization
Modern optimizers allow for
more efficient, stable
training.
RMSProp automatically
adapts the learning rate.
Spark Technology
Center
Parallel implementation of
backpropagation:
1. Each worker gets weights from master
node.
2. Each worker computes a gradient on its
data.
3. Each worker sends gradient to master.
4. Master averages the gradients and
updates the weights.
Distributed
Optimization
Spark Technology
Center
● Parallel MLP on Spark with 7 nodes ~=
Caffe w/GPU (single node).
● Advantages to parallelism diminish with
additional nodes due to
communication costs.
● Additional workers are valuable up to
~20 workers.
● See
https://github.com/avulanov/ann-benc
hmark for more details
Performance
Spark Technology
Center
Github: https://github.com/JeremyNixon/sparkdl
Spark Package:
https://spark-packages.org/package/JeremyNixon/s
parkdl
Access
Spark Technology
Center
1. GPU Acceleration (External)
2. Python API
3. Keras Integration
4. Residual Layers
5. Hardening
6. Regularization
7. Batch Normalization
8. Tensor Support
Future Work
Deep Learning on Spark
1. Major Projects
a. DL4J
b. BigDL
c. Spark-deep-learning
d. Tensorflow-on-Spark
e. SystemML
2. Important Comparisons
3. Minor & Abandoned Projects
a. H20AI DeepWater
b. TensorFrames
c. Caffe-on-Spark
d. Scalable-deep-learning
e. MLlib Deep Learning
f. Sparknet
g. DeepDist
● Distributed GPU support for all major deep learning architectures
○ CPU / Distributed CPU / Single GPU options exist
○ Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec
● Actively Supported and Improved
● APIs in Java, Scala, Python
○ Fairly Inelegant API, there’s a optin through ScalNet (Keras-like front end)
○ Working towards becoming a Keras Backend
● Backed by Skymind (Committed)
○ ~15 person startup, Adam Gibson + Chris Nicholson
● Modular front end in DL4J
● Backed by linear algebra library ND4J
○ Numerical computing wrapper over BLAS for various backends
● Python API has Keras import / export
● Production with proprietary ‘Skymind Intelligence Layer’
DL4J
BigDL
● Distributed CPU based library
○ Backed by Intel MKL / multithreading
○ No benchmark out as yet
● Support for most major deep learning architectures
○ Convolutional Networks, RNNs, LSTMs, no Word2Vec / Glove
● Backed by Intel (Committed)
○ Actively Supported / Improved
○ Intel has already acquired Nirvana and partnered with Chainer - strategy here is unclear.
○ Intel doesn’t look to be supporting their own Xeon GPU with BigDL
● Scala and Python API Support
○ API Modeled after Torch
● Support for numeric computing via tensors
Spark-deep-learning
● Databricks’ library focused on model serving, to allow scaled out inference
● ‘Transfer Learning’ (Allows logistic regression layer to be retrained)
● Python API
○ One-liner for integrating Keras model into a pipeline
● Supports Tensorflow models
○ Keras Import for Tensorflow backed Keras Models
● Support for image processing only
● Weakly Supported by Databricks
○ Last commit was a month ago
○ Qualifying lines - “We will implement text processing, audio processing if there is interest”
1. Goal is to scale out Caffe / Tensorflow on heterogenous GPU / CPU setup
a. Each executor launches a Caffe / TF instance
b. RDMA / Infiniband for distributing compute in TF on Spark, improvement over TF’s
ethernet model
2. Goal is to minimize changes to Tensorflow / Caffe code during scaleout
3. Allows for Model / Data parallelism
4. Weakly supported by Yahoo
a. Caffe-on-spark hasn’t seen a commit in 6 months
b. Tensorflow-on-spark gets about 2 minor commits / month
5. Yahoo demonstrated capability on large scale Flickr dataset
6. Visualization with tensorboard
Caffe / Tensorflow -on-Spark
SystemML
● Deep Learning library with single-node GPU support, moving towards
distributed GPU support
○ Supports CNNs for Classification, Localization, Segmentation
○ Supports RNNs / LSTM
● Attached to linear algebra focused ML library w/ linear algebra compiler
● Backed by IBM
○ Actively being Improved
● Provides CPU based support for most computer vision tasks
○ Convolutional Networks
● Caffe2DML for caffe integration
● DML API
○ SystemML has Python API for a handful of algorithms, may come out with Python DL API
Important Comparisons
Framework Hardware Supported Models API
DL4J CPU / GPU,
Distributed CPU / GPU
CNNs, RNNs,
Feedforward Nets,
Word2Vec
Java, Scala, Python
BigDL CPU / Distributed CPU CNNs, RNNs,
Feedforward Nets
Scala, Python
Spark-Deep-Learning CPU / Distributed CPU Vison - CNNs,
Feedforward Nets
Python
Caffe / Tensorflow on Spark CPU / GPU,
Distributed CPU / GPU
CNNs, RNNs,
Feedforward Nets,
Word2Vec
Python
SystemML Deep Learning CPU, Towards GPU /
Distrbuted GPU
CNNs, RNNS,
Feedforward Nets
DML, Potentially
Python
Important Comparisons
Framework Support Strength Goal Distinguishing Value
DL4J Skymind. Fully focused
on package, but still a
Startup.
Fully fledged Deep
Learning solution from
training to production
Comprehensive,
Distributed GPU.
BigDL Intel. Fairly strong
AI/DL commitment.
Has Chainer, Nirvana.
Spark / Hadoop
solution, bring DL to
the data
Comprehensive
Spark-Deep-Learning Databricks, ambiguous
level of commitment
Scaleout solution for
TF users
Scaling out with Spark
at inference time
Caffe / Tensorflow on Spark Yahoo. Caffe-on-spark
looks abandoned,
TF-on Spark better.
Scaling out training on
heterogenous
hardware.
Scaling out training
with distributed CPU /
GPU.
SystemML Deep Learning IBM team. Deep Learning
Training solution
GPU Support, Moving
towards Distributed
GPU Support.
Minor & Abandoned Projects
1. H20AI DeepWater
a. Integrates other frameworks (TF, MXNet, Caffe) into H20 Platform
b. Only native support is for feedforward networks
2. MXNet Integration
a. Nascent, few commits from Microsoft engineer
3. TensorFrames
a. Focused on hyperparameter tuning, running TF instances in parallel. ~ 2 commits / month
4. Caffe-on-Spark
a. No commits for ~6 months
5. Scalable-deep-learning
a. Only supports feedforward networks / autoencoder, CPU based
6. MLlib Deep Learning
a. Only supports feedforward networks, CPU based
7. Sparknet
a. Abandoned, no commits for 18 months
Deep Learning
Outside of Spark
Deep Learning
Outside of Spark
Spark Technology
Center
Thank you for your attention!
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer ModelsDatabricks
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Databricks
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherDatabricks
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotSteve Moore
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI TodayDESMOND YUEN
 
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba... DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...Databricks
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Databricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Databricks
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsDatabricks
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkSpark Summit
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionDatabricks
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibDataWorks Summit
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranDatabricks
 

Was ist angesagt? (20)

Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDistributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache Spark
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark Together
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François GarillotDeepLearning4J and Spark: Successes and Challenges - François Garillot
DeepLearning4J and Spark: Successes and Challenges - François Garillot
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba... DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
DLoBD: An Emerging Paradigm of Deep Learning Over Big Data Stacks with Dhaba...
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
 
Predicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph AlgorithmsPredicting Influence and Communities Using Graph Algorithms
Predicting Influence and Communities Using Graph Algorithms
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
The Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in SparkThe Potential of GPU-driven High Performance Data Analytics in Spark
The Potential of GPU-driven High Performance Data Analytics in Spark
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat Detection
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlib
 
CI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel KobranCI/CD for Machine Learning with Daniel Kobran
CI/CD for Machine Learning with Daniel Kobran
 

Ähnlich wie Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkDatabricks
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDataWorks Summit
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systemsinside-BigData.com
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayNick Pentreath
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Wee Hyong Tok
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopJosh Patterson
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningAdam Gibson
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jGuglielmo Iozzia
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkDatabricks
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkDatabricks
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Simplilearn
 

Ähnlich wie Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017 (20)

Integrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache SparkIntegrating Deep Learning Libraries with Apache Spark
Integrating Deep Learning Libraries with Apache Spark
 
Deep Learning with Spark and GPUs
Deep Learning with Spark and GPUsDeep Learning with Spark and GPUs
Deep Learning with Spark and GPUs
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
eScience Cluster Arch. Overview
eScience Cluster Arch. OvervieweScience Cluster Arch. Overview
eScience Cluster Arch. Overview
 
Scalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC SystemsScalable and Distributed DNN Training on Modern HPC Systems
Scalable and Distributed DNN Training on Modern HPC Systems
 
AI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI DayAI and Spark - IBM Community AI Day
AI and Spark - IBM Community AI Day
 
Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425Spark summit 2019 infrastructure for deep learning in apache spark 0425
Spark summit 2019 infrastructure for deep learning in apache spark 0425
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep Learning
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4jHands on image recognition with scala spark and deep learning4j
Hands on image recognition with scala spark and deep learning4j
 
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache SparkRunning Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
 
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
Deep Learning Frameworks 2019 | Which Deep Learning Framework To Use | Deep L...
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Kürzlich hochgeladen (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf ATL 2017

  • 1. Spark Technology Center Convolutional Neural Networks at Scale in MLlib Jeremy Nixon
  • 2. Spark Technology Center 1. Machine Learning Engineer at the Spark Technology Center 2. Contributor to MLlib, dedicated to scalable deep learning. 3. Previously, studied Applied Mathematics to Computer Science and Economics at Harvard Jeremy Nixon
  • 3. Spark Technology Center Large Scale Data Processing ● In-memory compute ● Up to 100x faster than Hadoop Improved Usability ● Rich APIs in Scala, Java, Python ● Interactive Shell
  • 4. Spark Technology Center Spark’s Machine Learning Library ● Alternating Least Squares ● Lasso ● Ridge Regression ● Logistic Regression ● Decision Trees ● Naive Bayes ● SVMs ● … MLlib
  • 5. Spark Technology Center Part of Spark ● Integrated Data Analysis ● Scalable Python, Scala, Java APIs MLlib
  • 6. Spark Technology Center ● Deep Learning benefits from large datasets ● Spark allows for Large Scale Data Analysis ● Compute is Local to Data ● Integrated into organization’s Spark Jobs ● Leverages existing compute cluster Deep Learning in MLlib
  • 7. Spark Technology Center Github Link: https://github.com/JeremyNixon/sparkdl Spark Package: https://spark-packages.org/package/JeremyNixon/sparkdl Links
  • 8. Spark Technology Center 1. Framing Deep Learning 2. MLlib Deep Learning API 3. Optimization 4. Performance 5. Future Work 6. Deep Learning Options on Spark 7. Deep Learning Outside of Spark Structure
  • 9. Spark Technology Center 1. Structural Assumptions 2. Automated Feature Engineering 3. Learning Representations 4. Applications Framing Convolutional Neural Networks
  • 10. Spark Technology Center Structural Assumptions: Location Invariance - Convolution is a restriction on the features that can be combined. - Location Invariance leads to strong accuracy in vision, audio, and language. colah.github.io
  • 12. Spark Technology Center - Pixels - Edges - Shapes - Parts - Objects - Learn features that are optimized for the data - Makes transfer learning feasible Structural Assumptions: Hierarchical Abstraction
  • 13. Spark Technology Center - Character - Word - Phrase - Sentence - Phonemes - Words - Pixels - Edges - Shapes - Parts - Objects Structural Assumptions: Composition
  • 14. Spark Technology Center 1. CNNs - State of the art a. Object Recognition b. Object Localization c. Image Segmentation d. Image Restoration e. Music Recommendation 2. RNNs (LSTM) - State of the Art a. Speech Recognition b. Question Answering c. Machine Translation d. Text Summarization e. Named Entity Recognition f. Natural Language Generation g. Word Sense Disambiguation h. Image / Video Captioning i. Sentiment Analysis Applications
  • 15. Spark Technology Center ● Computationally Efficient ● Makes Transfer Learning Easy ● Takes advantage of location invariance Structural Assumptions: Weight Sharing
  • 16. Spark Technology Center - Network depth creates an extraordinary range of possible models. - That flexibility creates value in large datasets to reduce variance. Structural Assumptions: Combinatorial Flexibility
  • 17. Spark Technology Center Automated Feature Engineering - Feature hierarchy is too complex to engineer manually - Works well for compositional structure, overfits elsewhere
  • 19. Spark Technology Center Flexibility. High level enough to be efficient. Low level enough to be expressive. MLlib Flexible Deep Learning API
  • 20. Spark Technology Center Modularity enables Logistic Regression, Feedforward Networks. MLlib Flexible Deep Learning API
  • 21. Spark Technology Center Optimization Modern optimizers allow for more efficient, stable training. Momentum cancels noise in the gradient.
  • 22. Spark Technology Center Optimization Modern optimizers allow for more efficient, stable training. RMSProp automatically adapts the learning rate.
  • 23. Spark Technology Center Parallel implementation of backpropagation: 1. Each worker gets weights from master node. 2. Each worker computes a gradient on its data. 3. Each worker sends gradient to master. 4. Master averages the gradients and updates the weights. Distributed Optimization
  • 24. Spark Technology Center ● Parallel MLP on Spark with 7 nodes ~= Caffe w/GPU (single node). ● Advantages to parallelism diminish with additional nodes due to communication costs. ● Additional workers are valuable up to ~20 workers. ● See https://github.com/avulanov/ann-benc hmark for more details Performance
  • 25. Spark Technology Center Github: https://github.com/JeremyNixon/sparkdl Spark Package: https://spark-packages.org/package/JeremyNixon/s parkdl Access
  • 26. Spark Technology Center 1. GPU Acceleration (External) 2. Python API 3. Keras Integration 4. Residual Layers 5. Hardening 6. Regularization 7. Batch Normalization 8. Tensor Support Future Work
  • 27. Deep Learning on Spark 1. Major Projects a. DL4J b. BigDL c. Spark-deep-learning d. Tensorflow-on-Spark e. SystemML 2. Important Comparisons 3. Minor & Abandoned Projects a. H20AI DeepWater b. TensorFrames c. Caffe-on-Spark d. Scalable-deep-learning e. MLlib Deep Learning f. Sparknet g. DeepDist
  • 28. ● Distributed GPU support for all major deep learning architectures ○ CPU / Distributed CPU / Single GPU options exist ○ Supports Convolutional Nets, LSTMs / RNNs, Feedforward Nets, Word2Vec ● Actively Supported and Improved ● APIs in Java, Scala, Python ○ Fairly Inelegant API, there’s a optin through ScalNet (Keras-like front end) ○ Working towards becoming a Keras Backend ● Backed by Skymind (Committed) ○ ~15 person startup, Adam Gibson + Chris Nicholson ● Modular front end in DL4J ● Backed by linear algebra library ND4J ○ Numerical computing wrapper over BLAS for various backends ● Python API has Keras import / export ● Production with proprietary ‘Skymind Intelligence Layer’ DL4J
  • 29. BigDL ● Distributed CPU based library ○ Backed by Intel MKL / multithreading ○ No benchmark out as yet ● Support for most major deep learning architectures ○ Convolutional Networks, RNNs, LSTMs, no Word2Vec / Glove ● Backed by Intel (Committed) ○ Actively Supported / Improved ○ Intel has already acquired Nirvana and partnered with Chainer - strategy here is unclear. ○ Intel doesn’t look to be supporting their own Xeon GPU with BigDL ● Scala and Python API Support ○ API Modeled after Torch ● Support for numeric computing via tensors
  • 30. Spark-deep-learning ● Databricks’ library focused on model serving, to allow scaled out inference ● ‘Transfer Learning’ (Allows logistic regression layer to be retrained) ● Python API ○ One-liner for integrating Keras model into a pipeline ● Supports Tensorflow models ○ Keras Import for Tensorflow backed Keras Models ● Support for image processing only ● Weakly Supported by Databricks ○ Last commit was a month ago ○ Qualifying lines - “We will implement text processing, audio processing if there is interest”
  • 31. 1. Goal is to scale out Caffe / Tensorflow on heterogenous GPU / CPU setup a. Each executor launches a Caffe / TF instance b. RDMA / Infiniband for distributing compute in TF on Spark, improvement over TF’s ethernet model 2. Goal is to minimize changes to Tensorflow / Caffe code during scaleout 3. Allows for Model / Data parallelism 4. Weakly supported by Yahoo a. Caffe-on-spark hasn’t seen a commit in 6 months b. Tensorflow-on-spark gets about 2 minor commits / month 5. Yahoo demonstrated capability on large scale Flickr dataset 6. Visualization with tensorboard Caffe / Tensorflow -on-Spark
  • 32. SystemML ● Deep Learning library with single-node GPU support, moving towards distributed GPU support ○ Supports CNNs for Classification, Localization, Segmentation ○ Supports RNNs / LSTM ● Attached to linear algebra focused ML library w/ linear algebra compiler ● Backed by IBM ○ Actively being Improved ● Provides CPU based support for most computer vision tasks ○ Convolutional Networks ● Caffe2DML for caffe integration ● DML API ○ SystemML has Python API for a handful of algorithms, may come out with Python DL API
  • 33. Important Comparisons Framework Hardware Supported Models API DL4J CPU / GPU, Distributed CPU / GPU CNNs, RNNs, Feedforward Nets, Word2Vec Java, Scala, Python BigDL CPU / Distributed CPU CNNs, RNNs, Feedforward Nets Scala, Python Spark-Deep-Learning CPU / Distributed CPU Vison - CNNs, Feedforward Nets Python Caffe / Tensorflow on Spark CPU / GPU, Distributed CPU / GPU CNNs, RNNs, Feedforward Nets, Word2Vec Python SystemML Deep Learning CPU, Towards GPU / Distrbuted GPU CNNs, RNNS, Feedforward Nets DML, Potentially Python
  • 34. Important Comparisons Framework Support Strength Goal Distinguishing Value DL4J Skymind. Fully focused on package, but still a Startup. Fully fledged Deep Learning solution from training to production Comprehensive, Distributed GPU. BigDL Intel. Fairly strong AI/DL commitment. Has Chainer, Nirvana. Spark / Hadoop solution, bring DL to the data Comprehensive Spark-Deep-Learning Databricks, ambiguous level of commitment Scaleout solution for TF users Scaling out with Spark at inference time Caffe / Tensorflow on Spark Yahoo. Caffe-on-spark looks abandoned, TF-on Spark better. Scaling out training on heterogenous hardware. Scaling out training with distributed CPU / GPU. SystemML Deep Learning IBM team. Deep Learning Training solution GPU Support, Moving towards Distributed GPU Support.
  • 35. Minor & Abandoned Projects 1. H20AI DeepWater a. Integrates other frameworks (TF, MXNet, Caffe) into H20 Platform b. Only native support is for feedforward networks 2. MXNet Integration a. Nascent, few commits from Microsoft engineer 3. TensorFrames a. Focused on hyperparameter tuning, running TF instances in parallel. ~ 2 commits / month 4. Caffe-on-Spark a. No commits for ~6 months 5. Scalable-deep-learning a. Only supports feedforward networks / autoencoder, CPU based 6. MLlib Deep Learning a. Only supports feedforward networks, CPU based 7. Sparknet a. Abandoned, no commits for 18 months
  • 38. Spark Technology Center Thank you for your attention! Questions?