Deep Learning and Recurrent Neural Networks in the Enterprise
1. Deep Learning and Recurrent Neural
Networks in the Enterprise
StampedeCon
St. Louis 2016
Josh Patterson, Skymind
2. Presenter: Josh Patterson
Past
Research in Swarm Algorithms: Real-time optimization techniques in
mesh sensor networks
TVA / NERC: Smartgrid, Sensor Collection, and Big Data
Cloudera: Principal SA, Working with Fortune 500
Patterson Consulting: Working with Fortune 500 on Big Data, ML
Today
Skymind, Director Field Engineering
josh@skymind.io / @jpatanooga
DL4J Co-creator,
Co-Author on Upcoming Oreilly Book
âDeep Learning: A Practitionerâs Approachâ
3. Topics
⢠What is Deep Learning?
⢠DL4J
⢠Recurrent Neural Network Applications
5. Defining Deep Learning
⢠Higher neuron counts than in previous
generation neural networks
⢠Different and evolved ways to connect layers
inside neural networks
⢠More computing power to train
⢠Automated Feature Learning
6. Automated Feature Learning
⢠Deep Learning can be thought of as workflows
for automated feature construction
â From âfeature constructionâ to âfeature learningâ
⢠As Yann LeCun says:
â âmachines that learn to represent the worldâ
7.
8.
9. These are the features learned at each neuron in a Restricted Boltzmann Machine
(RBMS)
These features are passed to higher levels of RBMs to learn more complicated things.
Part of the
â7â digit
10. Unreasonable Effectiveness:
Benchmark Records
1. Text-to-speech synthesis (Fan et al., Microsoft, Interspeech 2014)
2. Language identification (Gonzalez-Dominguez et al., Google, Interspeech 2014)
3. Large vocabulary speech recognition (Sak et al., Google, Interspeech 2014)
4. Prosody contour prediction (Fernandez et al., IBM, Interspeech 2014)
5. Medium vocabulary speech recognition (Geiger et al., Interspeech 2014)
6. English to French translation (Sutskever et al., Google, NIPS 2014)
7. Audio onset detection (Marchi et al., ICASSP 2014)
8. Social signal classification (Brueckner & Schulter, ICASSP 2014)
9. Arabic handwriting recognition (Bluche et al., DAS 2014)
10. TIMIT phoneme recognition (Graves et al., ICASSP 2013)
11. Optical character recognition (Breuel et al., ICDAR 2013)
12. Image caption generation (Vinyals et al., Google, 2014)
13. Video to textual description (Donahue et al., 2014)
14. Syntactic parsing for Natural Language Processing (Vinyals et al., Google, 2014)
15. Photo-real talking heads (Soong and Wang, Microsoft, 2014).
11. Four Major Architectures
⢠Deep Belief Networks
⢠Convolutional Neural Networks
⢠Recurrent Neural Networks
⢠Recursive Neural Networks
12. Quick Usage Guide
⢠If I have Timeseries or Audio Input
â I should use a Recurrent Neural Network
â Examples: Fraud Detection, Anomaly Detection
⢠If I have Image input
â I should use a Convolutional Neural Network
⢠If I have Video input
â I should use a hybrid Convolutional + Recurrent
Architecture!
14. The More Things ChangeâŚ
⢠Deep Learning is still trying to answer the
same fundamental questions such as:
â âis this image a face?â
⢠The difference is Deep Learning makes hard
questions easier to answer with better
architectures and more computing power
â We do this by matching the correct architecture
w the right problem
18. ND4J: The Need for Speed
⢠Javacpp (cython for java)
â Auto generate JNI bindings for C++ by parsing classes
â Allows for easy maintenance and deployment of c++
binaries in java
⢠CPU backends
â Openmp (multithreading within native operations)
â Openblas or MKL (BLAS operations)
â SIMD-extensions
⢠GPU backends
â DL4J supports Cuda 7.5 at the moment, and will support
8.0 support as soon as it comes out.
â Leverages cudnn as well
19. Prepping Data is Time Consuming
http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#633ea7f67f75
21. DataVec
⢠DataVec is a tool for machine learning ETL
(Extract, Transform, Load) operations.
â Spark-Enabled and focused on Supporting DL4J
⢠Also performs vectorization
â Image, CSV, Sequences (timeseries), more
⢠Open Source, ASF 2.0 Licensed
â https://github.com/deeplearning4j/DataVec
23. Source: IDC White Paper - sponsored by EMC.
As the Economy Contracts, the Digital Universe Expands. May 2009.
.
Transactional Data Explosion
⢠2,500 exabytes of new information in 2012 with Internet as primary driver
⢠Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2
âzettabytesâ this year
Relational
Transactional
(Logs, Sensors)
(You)
24. NERC Sensor Data Collection
openPDC PMU Data Collection circa 2009
⢠120 Sensors
⢠30 samples/second
⢠4.3B Samples/day
⢠Housed in Hadoop
25. Sensor Timeseries Classification with RNNs
⢠Recurrent Neural Networks have the ability to
model change of input over time
⢠Older techniques (mostly) do not retain time
domain
â Hidden Markov Models doâŚ
⢠but are more limited
⢠Key Takeaway:
â For working with Timeseries data, RNNs will be
more accurate
27. Anomaly Detection
⢠Model the normal patterns in the data
⢠Autoencoders give us the ability to look at
data that it hasnât seen before
â Find anomalous patterns in sequences
â Can also use RNNs for pattern classification
⢠Interesting Industry Applications
â Telecom
â Financial Services
29. âGoogle is living a few years in the
future and sending the rest of us
messagesâ
-- Doug Cutting in 2013
⢠However
â Most organizations are not built like Google
⢠(and Jeff Dean does not work at your companyâŚ)
⢠Anyone building Next-Gen infrastructure has
to consider these things
30. Certified on Two Hadoop Distributions
⢠Running Spark on Hadoop via YARN gives us
â Sharing cluster resources between heterogeneous
workloads concurrently
â Access to the yarn scheduler capabilities
â Better control of executors in Spark
â Kerberos support for security
⢠Certified on CDH 5.4
⢠Certified on HDP 2.4
â [ Coming later this month ]
31. Questions?
Thank you for your time and attention
âDeep Learning: A Practitionerâs Approachâ
(Oreilly, October 2016)
32. Running DL4J Workflows on Spark
⢠DataVec is built to scale out via Spark RDDs
â RDD<LabeledPoint>
â RDD<DataSet>
⢠DL4J Uses same MultiLayerConfiguration as
single host version
â Uses SparkDl4jMultiLayer to drive the training on spark
â Performs Parameter Averaging
spark-submit --class
io.skymind.spark.dl4j.datavec.BasicDataVecExample --master yarn --
num-executors 1 --properties-file ./spark_extra.props
./Skymind_spark-1.0-SNAPSHOT.jar