Weitere ähnliche Inhalte Ähnlich wie Deep Learning with Cloudera (20) Mehr von Cloudera, Inc. (20) Kürzlich hochgeladen (20) Deep Learning with Cloudera1. 1© Cloudera, Inc. All rights reserved.
Deep Learning with Cloudera
Thomas W. Dinsmore
Arun Krishnakumar
2. 2© Cloudera, Inc. All rights reserved.
●Deep Learning: A Proven Technique
●Deep Learning with Cloudera
●How to Move Forward with Deep Learning
●Questions
Deep Learning with Cloudera
7. 7© Cloudera, Inc. All rights reserved.
Machine Learning: algorithms and
methods that extract useful patterns
from data.
8. 8© Cloudera, Inc. All rights reserved.
Machine Learning Categories
Linear
Models
Categorical
Models
Bayesian
Methods
Decision
Trees
Artificial
Neural
Networks
Ensemble
Models
Kernel-
Based
Methods
Latent
Variable
Analysis
Cluster
Analysis
Association
Rules
Learning
Evolutionary
Algorithms
Genetic
Algorithms
9. 9© Cloudera, Inc. All rights reserved.
Machine Learning Categories
Linear
Models
Categorical
Models
Bayesian
Methods
Decision
Trees
Neural
Networks
Ensemble
Models
Kernel-
Based
Methods
Latent
Variable
Analysis
Cluster
Analysis
Association
Rules
Learning
Evolutionary
Algorithms
Genetic
Algorithms
Deep
Learning
10. 10© Cloudera, Inc. All rights reserved.
Nodes, the “DNA” of neural networks
Weights
(input from
other nodes)
Transfer
Function
Activation
Function
To other nodes
14. 14© Cloudera, Inc. All rights reserved.
A neural network is “deep” if it has >1 hidden layer
Input Layer
Hidden Layers
Output Layer
…
18. 18© Cloudera, Inc. All rights reserved.
Advantages
● Learns higher-level features
● Detects complex interactions
These, in turn, make DL practical for:
● High-cardinality target variables
● High-dimension data
● Unlabeled data
Disadvantages
● Technical challenge
● Opaqueness
● Overfitting
● Computationally intensive
● Deployment challenges
Deep learning: why or why not?
19. 19© Cloudera, Inc. All rights reserved.
The Deep Learning “Silo”
Data Platform Deep Learning
Platform
• Latency
• Security issues
• Governance issues
• Deployment issues
21. 21© Cloudera, Inc. All rights reserved.
Bring deep learning to your data (not vice-versa)
22. 22© Cloudera, Inc. All rights reserved.
GPUCPU
• Single-node
training
CDH
CPU
CDH
CPU
• Distributed training
• Transfer learning
• Inference
Deep Learning with Cloudera: On Premises or in the
CloudCloudera Data
Science
Workbench
Apache Spark in
Cloudera
23. 23© Cloudera, Inc. All rights reserved.
Accelerates data science from
development to production with:
●Secure self-service data access
●On-demand compute
●Support for Python, R, and Scala
●Project dependency isolation for
multiple library versions
●Workflow automation, version
control, collaboration and sharing
Cloudera Data Science Workbench
Self-service data science for the enterprise
24. 24© Cloudera, Inc. All rights reserved.
A modern data science architecture
CDH CDH
Cloudera Manager
gateway nodes CDH nodes
●Built on Docker and Kubernetes
●Runs on dedicated gateway nodes
●User sessions run in isolated
“engine” containers which:
○Host Kerberos-authenticated
Python/R/Scala runtimes
○Interact with Spark via YARN
client mode (Driver runs in
container, workers on CDH)
●Single-cluster only (for now)
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
25. 25© Cloudera, Inc. All rights reserved.
“Our data scientists want GPUs, but we
can’t find a way to deliver multi-tenancy.
If they go to the cloud on their own, it’s
expensive and we lose governance.”
●Extend existing CDSW benefits to
GPU-optimized deep learning tools
●Schedule & share GPU resources
●Train on GPUs, deploy on CPUs
●Works on-premises or cloud
Accelerated deep learning on-demand with GPUs
Data Science Workbench
GPUCPU
CDH
CPU
CDH
CPU
single-node
training
distributed
training, scoring
Multi-tenant GPU support on-premises or
cloud
27. 27© Cloudera, Inc. All rights reserved.
“Spark is becoming a de facto data science
foundation.”
-- Gartner, Magic Quadrant for Data Science Platforms
28. 28© Cloudera, Inc. All rights reserved.
● Apache Spark is well-established in the enterprise
○Robust ecosystem
○Supports many different data sources
○Large and growing user community
●Run deep learning on existing clusters
○Transfer learning
○ Inference
● Simplifies integration with other ML tools, pipelines
Deep learning on Apache Spark
29. 29© Cloudera, Inc. All rights reserved.
Deep learning in Cloudera with Apache Spark
• Two packages:
• CaffeOnSpark
• TensorFlowOnSpark
• Developed by Yahoo
• Python and Scala APIs
• All DL architectures
• Integrated pipeline
• Open source DL library
• Developed by Skymind
• Built on JVMs
• Supports CPUs and
GPUs
• Java, Scala, Python APIs
• Training and inference
• Imports models from:
• TensorFlow
• Caffe
• Torch
• Theano
• Deep learning framework
• Developed by Intel
• Supports CPUs only
• Leverages Intel MKL
• Scala, Python APIs
• Imports models from:
• TensorFlow
• Caffe
• Torch
Spark Packages DL4J BigDL
30. 30© Cloudera, Inc. All rights reserved.
● Train in Cloudera Data Science Workbench
○ Works with all frameworks
○ GPUs on demand
● Deploy in Apache Spark
● Your data remains in place
● Bring deep learning to your data, not the other way around
Deep learning with Cloudera.
36. 36© Cloudera, Inc. All rights reserved.
● Stay focused on solving business problems
● Choose pilot projects carefully
○ Image, video classification and tagging
○ Object recognition
○ Handwriting recognition
○ Speech recognition
○ Speech translation
○ Text processing
● Organize data flows first
● Embrace open source frameworks
● Leverage transfer learning
● Don’t create new silos
● Use (mostly) mainstream hardware
How to Move Forward with Deep Learning