SlideShare ist ein Scribd-Unternehmen logo
1 von 90
Downloaden Sie, um offline zu lesen
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning on HDP
Dhruv Kumar – dkumar@hortonworks.com
Solutions Engineer, Hortonworks
2015
Version 1.0
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 2
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Scientists See Promise in Deep-Learning Programs
John Markoff
November 23, 2012
Rich Rashid in Tianjin, October, 25, 2012
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Impact of deep learning in speech technologyWhere is it?
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 6
……Facebook’s foray into deep learning sees it following its
competitors Google and Microsoft, which have used the approach to
impressive effect in the past year. Google has hired and acquired
leading talent in the field (see “
10 Breakthrough Technologies 2013: Deep Learning”), and last year
created software that taught itself to recognize cats and other objects
by reviewing stills from YouTube videos. The underlying deep learning
technology was later used to slash the error rate of Google’s voice
recognition services (see “Google’s Virtual Brain Goes to Work”)
….Researchers at Microsoft have used deep learning to build a
system that translates speech from English to Mandarin Chinese in
real time (see “Microsoft Brings Star Trek’s Voice Translator to Life”).
Chinese Web giant Baidu also recently established a Silicon Valley
research lab to work on deep learning.
September 20, 2013
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 7
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 8
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 9
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 10
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 11
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 12
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
“The word is spreading in all corners of the tech industry that the biggest part of big data, the
unstructured part, possesses learnable patterns that we now have the computing power and
algorithmic leverage to discern…This change marks a true disruption, and there are fortunes to be
made. There are also tremendous social consequences to consider that require as much creativity
and investment as the more immediately lucrative deep learning startups that are popping up all
over…”
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
“Using an artificial intelligence technique inspired by theories about how the brain
recognizes patterns, technology companies are reporting startling gains in fields as
diverse as computer vision, speech recognition and the identification of promising
new molecules for designing drugs.
The advances have led to widespread enthusiasm among researchers who design
software to perform human activities like seeing, listening and thinking. They offer
the promise of machines that converse with humans and perform tasks like driving
cars and working in factories, raising the specter of automated robots that could
replace human workers.”
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Google Trends for “Deep Learning” keyword
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enterprise use cases
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deep Learning
•  One of the many pattern recognition techniques in Data
Science
•  Excels at rich media applications:
•  Image recognition
•  Speech translation
•  Voice recognition
•  Loosely inspired by human brain models
•  Synonymous with Artificial Neural Networks, Multi Layer
Networks
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
In this workshop
-  Fundamentals of Deep Learning
-  Implementation and Libraries in Real Life
-  Demo!
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So, 1. what exactly is deep learning ?
And, 2. why is it generally better than other methods on image,
speech and certain other types of data?
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So, 1. what exactly is deep learning ?
And, 2. why is it generally better than other methods on image,
speech and certain other types of data?
The short answers
1. ‘Deep Learning’ means using a neural network
with several layers of nodes between input and output
2. the series of layers between input & output do
feature identification and processing in a series of stages,
just as our brains seem to.
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
but:
3. multilayer neural networks have been around for
25 years. What’s actually new?
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
but:
3. multilayer neural networks have been around for
25 years. What’s actually new?
we have always had good algorithms for learning the
weights in networks with 1 hidden layer
but these algorithms are not good at
learning the weights for networks with
more hidden layers
what’s new is: algorithms for training many-layer networks
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Longer answers
1.  reminder/quick-explanation of how neural network
weights are learned;
2.  the idea of unsupervised feature learning (why
‘intermediate features’ are important for difficult
classification tasks, and how NNs seem to naturally learn
them)
3.  The ‘breakthrough’ – the simple trick for training Deep
neural networks
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Neuron Quick Look
24
Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
W1
W2
W3
f(x)
1.4
-2.5
-0.06
Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
2.7
-8.6
0.002
f(x)
1.4
-2.5
-0.06
x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Initialise with random weights
Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
1.4
2.7
1.9
Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
1.4
2.7 0.8
1.9
Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
1.4
2.7 0.8
0
1.9 error 0.8
Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
1.4
2.7 0.8
0
1.9 error 0.8
Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
6.4
2.8
1.7
Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
6.4
2.8 0.9
1.7
Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
6.4
2.8 0.9
1
1.7 error -0.1
Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
6.4
2.8 0.9
1
1.7 error -0.1
Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
And so on ….
6.4
2.8 0.9
1
1.7 error -0.1
Repeat this thousands, maybe millions of times – each time
taking a random training instance, and making slight
weight adjustments
Algorithms for weight adjustment are designed to make
changes that will reduce the error
Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The decision boundary perspective…
Initial random weights
Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The decision boundary perspective…
Present a training instance / adjust the weights
Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The decision boundary perspective…
Present a training instance / adjust the weights
Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The decision boundary perspective…
Present a training instance / adjust the weights
Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The decision boundary perspective…
Present a training instance / adjust the weights
Page 44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The decision boundary perspective…
Eventually ….
Page 45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
In essence
•  weight-learning algorithms for NNs are dumb
•  they work by making thousands and thousands of
tiny adjustments, each making the network do better
at the most recent pattern, but perhaps a little worse
on many others
•  but, by dumb luck, eventually this tends to be good
enough to learn effective classifiers for many real
applications
Page 46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Some other points
Detail of a standard NN weight learning algorithm – later
If f(x) is non-linear, a network with 1 hidden layer can, in theory, learn
perfectly any classification problem. A set of weights exists that can
produce the targets from the inputs. The problem is finding them.
Page 47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Some other ‘by the way’ points
If f(x) is linear, the NN can only draw straight decision boundaries (even if there
are many layers of units)
Page 48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Some other ‘by the way’ points
NNs use nonlinear f(x) so they
can draw complex boundaries,
but keep the data unchanged
Page 49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Some other ‘by the way’ points
NNs use nonlinear f(x) so they SVMs only draw straight lines,
can draw complex boundaries, but they transform the data first
but keep the data unchanged in a way that makes that OK
Page 50 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Feature detectors
Page 51 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What’s this unit doing?
Page 52 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hidden layer units become self-organised feature
detectors
…
1
63
1 5 10 15 20 25 …
strong +ve weight
low/zero weight
Page 53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What does this unit detect?
…
1
63
1 5 10 15 20 25 …
strong +ve weight
low/zero weight
Page 54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What does this unit detect?
…
1
63
1 5 10 15 20 25 …
strong +ve weight
low/zero weight
it will send strong signal for a horizontal
line in the top row, ignoring everywhere else
Page 55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What does this unit detect?
…
1
63
1 5 10 15 20 25 …
strong +ve weight
low/zero weight
Page 56 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What does this unit detect?
…
1
63
1 5 10 15 20 25 …
strong +ve weight
low/zero weight
Strong signal for a dark area in the top left
corner
Page 57 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
What features might you expect a good NN
to learn, when trained with data like this?
Page 58 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
63
1
vertical lines
Page 59 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
63
1
horizontal lines
Page 60 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
63
1
Small circles
Page 61 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
successive layers can learn higher-level features …
etc …detect lines in
Specific positions
v
Higher level detetors
( horizontal line,
“RHS vertical lune”
“upper loop”, etc…
etc …
Page 62 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
successive layers can learn higher-level features …
etc …detect lines in
Specific positions
v
Higher level detetors
( horizontal line,
“RHS vertical lune”
“upper loop”, etc…
etc …
What does this unit detect?
Page 63 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So: multiple layers make sense
Page 64 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So: multiple layers make sense
Your brain works that way
Page 65 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So: multiple layers make sense
Many-layer neural network architectures should be capable of learning the true underlying
features and ‘feature logic’, and therefore generalise very well …
Page 66 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
But, until very recently, weight-learning algorithms simply
did not work on multi-layer architectures
Page 67 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Along came deep learning …
Page 68 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The new way to train multi-layer NNs…
Page 69 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The new way to train multi-layer NNs…
Train this layer first
Page 70 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The new way to train multi-layer NNs…
Train this layer first
then this layer
Page 71 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The new way to train multi-layer NNs…
Train this layer first
then this layer
then this layer
Page 72 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The new way to train multi-layer NNs…
Train this layer first
then this layer
then this layer
then this layer
Page 73 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The new way to train multi-layer NNs…
Train this layer first
then this layer
then this layer
then this layer
finally this layer
Page 74 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The new way to train multi-layer NNs…
EACH of the (non-output) layers is trained to be
an auto-encoder
Basically, it is forced to learn good features that
describe what comes from the previous layer
Page 75 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
an auto-encoder is trained, with an absolutely standard weight-
adjustment algorithm to reproduce the input
Page 76 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
an auto-encoder is trained, with an absolutely standard weight-
adjustment algorithm to reproduce the input
By making this happen with (many) fewer units than the inputs, this
forces the ‘hidden layer’ units to become good feature detectors
Page 77 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
intermediate layers are each trained to be auto encoders
(or similar)
Page 78 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Final layer trained to predict class based on outputs from
previous layers
Page 79 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
But, how does one train?
79
Overall, the NN is trying to minimize a cost
function while adjusting the weights. To
find the minima, Gradient Descent is
used.
If the training set is very large, GD can be
too slow since each input is evaluated in
the cost function. So, one can sample a
subset of input for computing GD. If
sampling is done at random, it is called
Stochastic Gradient Descent.
The implementation of this mathematical
formulation is done by the Error
Backpropagation Algorithm.
Page 80 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
And that’s that
•  That’s the basic idea
•  There are many many types of deep learning,
•  Different kinds of autoencoder, variations on
architectures and training algorithms, etc…
•  Very fast growing area …
Page 81 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Doing this in real life
Page 82 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Typical Workflow
82
1. Ingest training data and store it
2. Split data set into: training, testing and validation sets
3. Vectorize and extract features to go into next step
4. Architect multi layer network, initialize
5. Feed data and train
6. Test and Validate
7. Repeat steps 4 and 5 until desired
8. Store model
9. Put model in app, start generalizing on real data.
Page 83 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
So what do you get?
83
1. Ingest training data and store it using Kafka, Flume, good old
web scraping
2. Split data set into: training, testing and validation sets
3. Vectorize and extract features to go into next step
4. Architect multi layer network, initialize
5. Feed data and train
6. Test and Validate
7. Repeat steps 4 and 5 until desired
8. Store model
9. Put model in app, start generalizing on real data.
Steps 2, 3, 4 and 5:
Use libraries such as
Caffe, Theano,
Deeplearning4j, H20
Page 84 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Distributed Deep Learning on Hadoop
•  In ASF: Apache Singa – new incubator project
•  Two main partners of Hortonworks: Skymind, and H20
•  Skymind is focussed on Deep Learning exclusively (deeplearning4j), H2O includes other ML libraries.
•  Both provide scale out on HDP + Spark
•  Skymind has GPU Acceleration built in - uses CUDA for doing linear algebra operations
•  Both are open source, Apache licensed.
84
Page 85 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Deeplearning4j Architecture
85
Page 86 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
DL4J: Canova for Vectorization and Ingest
•  Canova uses an input/output format system (similar to
how Hadoop uses MapReduce)
•  Supports all major types of input data (text, CSV, audio,
image and video)
•  Can be extended for specialized input formats
•  Connects to Kafka
86
Page 87 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
ND4J:
•  N-dimensional vector library
•  Scientific computing for JVM
•  DL4J uses it to do linear algebra for backpropagation
•  Supports GPUs via CUDA and Native via Jblas
•  Deploys on Android
•  DL4J code remains unchanged whether using GPU or
CPU
87
Page 88 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 88
How to chose a
Neural Net in
DL4J core?
Page 89 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Demo!
Page 90 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank You
hortonworks.com

Weitere ähnliche Inhalte

Was ist angesagt?

Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageHortonworks
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Protecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopProtecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopDataWorks Summit
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3Hortonworks
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformHortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceHortonworks
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopHortonworks
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution Hortonworks
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchHortonworks
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks
 

Was ist angesagt? (20)

Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Enterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble StorageEnterprise Hadoop with Hortonworks and Nimble Storage
Enterprise Hadoop with Hortonworks and Nimble Storage
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts Presentation
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Protecting enterprise Data in Hadoop
Protecting enterprise Data in HadoopProtecting enterprise Data in Hadoop
Protecting enterprise Data in Hadoop
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Falcon Meetup
Falcon Meetup Falcon Meetup
Falcon Meetup
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
 
Introduction to Hortonworks Data Platform
Introduction to Hortonworks Data PlatformIntroduction to Hortonworks Data Platform
Introduction to Hortonworks Data Platform
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data GovernanceDiscover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in HadoopDiscover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop SearchDiscover HDP 2.1: Apache Solr for Hadoop Search
Discover HDP 2.1: Apache Solr for Hadoop Search
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 

Andere mochten auch

Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHortonworks
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionCloudera, Inc.
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Hortonworks
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with sparkHortonworks
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache AmbariHortonworks
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambariHortonworks
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesHortonworks
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Clusterahortonworks
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overviewTushar Dudhatra
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Rangertrihug
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleHortonworks
 

Andere mochten auch (20)

Hive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it finalHive on spark is blazing fast or is it final
Hive on spark is blazing fast or is it final
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for productionFaster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
 
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
 
Hortonworks tech workshop in-memory processing with spark
Hortonworks tech workshop   in-memory processing with sparkHortonworks tech workshop   in-memory processing with spark
Hortonworks tech workshop in-memory processing with spark
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Hortonworks Technical Workshop: Apache Ambari
Hortonworks Technical Workshop:   Apache AmbariHortonworks Technical Workshop:   Apache Ambari
Hortonworks Technical Workshop: Apache Ambari
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
The path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial ServicesThe path to a Modern Data Architecture in Financial Services
The path to a Modern Data Architecture in Financial Services
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Ranger admin dev overview
Ranger admin dev overviewRanger admin dev overview
Ranger admin dev overview
 
TriHUG October: Apache Ranger
TriHUG October: Apache RangerTriHUG October: Apache Ranger
TriHUG October: Apache Ranger
 
Apache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, ScaleApache Hive 2.0; SQL, Speed, Scale
Apache Hive 2.0; SQL, Speed, Scale
 

Ähnlich wie Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop

Enterprise Data Science at Scale
Enterprise Data Science at ScaleEnterprise Data Science at Scale
Enterprise Data Science at ScaleArtem Ervits
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scaleCarolyn Duby
 
99supershortjuly2021academicpresentationversion42-210805030711.pptx
99supershortjuly2021academicpresentationversion42-210805030711.pptx99supershortjuly2021academicpresentationversion42-210805030711.pptx
99supershortjuly2021academicpresentationversion42-210805030711.pptxyehyaibrahem2
 
1P A R T Introduction to Analytics and AII
1P A R T Introduction to Analytics and AII1P A R T Introduction to Analytics and AII
1P A R T Introduction to Analytics and AIITatianaMajor22
 
Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
 
Pivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterprisePivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterpriseEMC
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motionRaúl Marín
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onDataWorks Summit
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Aengus Rooney
 
Webinar - Getting Started with mLearning
Webinar - Getting Started with mLearningWebinar - Getting Started with mLearning
Webinar - Getting Started with mLearningRaptivity
 
EON-XR Center Grant Programs 2021
EON-XR Center Grant Programs 2021EON-XR Center Grant Programs 2021
EON-XR Center Grant Programs 2021Senthilkumar R
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With SparkShivaji Dutta
 
Chapter 6Techniques for Predictive ModelingBusiness.docx
Chapter 6Techniques for Predictive ModelingBusiness.docxChapter 6Techniques for Predictive ModelingBusiness.docx
Chapter 6Techniques for Predictive ModelingBusiness.docxmccormicknadine86
 
Academic presentation - Future Skill Force Development
Academic presentation - Future Skill Force DevelopmentAcademic presentation - Future Skill Force Development
Academic presentation - Future Skill Force DevelopmentSenthilkumar R
 

Ähnlich wie Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop (20)

Deep learning 101
Deep learning 101Deep learning 101
Deep learning 101
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Enterprise Data Science at Scale
Enterprise Data Science at ScaleEnterprise Data Science at Scale
Enterprise Data Science at Scale
 
openEHR sll-2015final
openEHR sll-2015finalopenEHR sll-2015final
openEHR sll-2015final
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Enterprise data science at scale
Enterprise data science at scaleEnterprise data science at scale
Enterprise data science at scale
 
99supershortjuly2021academicpresentationversion42-210805030711.pptx
99supershortjuly2021academicpresentationversion42-210805030711.pptx99supershortjuly2021academicpresentationversion42-210805030711.pptx
99supershortjuly2021academicpresentationversion42-210805030711.pptx
 
1P A R T Introduction to Analytics and AII
1P A R T Introduction to Analytics and AII1P A R T Introduction to Analytics and AII
1P A R T Introduction to Analytics and AII
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
Developing a curriculum and mobile application to deliver training and in...
Developing a  curriculum and mobile  application to deliver  training and  in...Developing a  curriculum and mobile  application to deliver  training and  in...
Developing a curriculum and mobile application to deliver training and in...
 
Pivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterprisePivotal agile development_the_software-defined_enterprise
Pivotal agile development_the_software-defined_enterprise
 
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
[Hortonworks] Future Of Data: Madrid - HDF & Data in motion
 
Overcoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus onOvercoming the AI hype — and what enterprises should really focus on
Overcoming the AI hype — and what enterprises should really focus on
 
Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training Hortonworks Big Data Career Paths and Training
Hortonworks Big Data Career Paths and Training
 
Webinar - Getting Started with mLearning
Webinar - Getting Started with mLearningWebinar - Getting Started with mLearning
Webinar - Getting Started with mLearning
 
EON-XR Center Grant Programs 2021
EON-XR Center Grant Programs 2021EON-XR Center Grant Programs 2021
EON-XR Center Grant Programs 2021
 
Machine Learning With Spark
Machine Learning With SparkMachine Learning With Spark
Machine Learning With Spark
 
Chapter 6Techniques for Predictive ModelingBusiness.docx
Chapter 6Techniques for Predictive ModelingBusiness.docxChapter 6Techniques for Predictive ModelingBusiness.docx
Chapter 6Techniques for Predictive ModelingBusiness.docx
 
Academic presentation - Future Skill Force Development
Academic presentation - Future Skill Force DevelopmentAcademic presentation - Future Skill Force Development
Academic presentation - Future Skill Force Development
 

Mehr von Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Mehr von Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Kürzlich hochgeladen

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Kürzlich hochgeladen (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop

  • 1. Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning on HDP Dhruv Kumar – dkumar@hortonworks.com Solutions Engineer, Hortonworks 2015 Version 1.0
  • 2. Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 2
  • 3. Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Scientists See Promise in Deep-Learning Programs John Markoff November 23, 2012 Rich Rashid in Tianjin, October, 25, 2012
  • 4. Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
  • 5. Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Impact of deep learning in speech technologyWhere is it?
  • 6. Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 6 ……Facebook’s foray into deep learning sees it following its competitors Google and Microsoft, which have used the approach to impressive effect in the past year. Google has hired and acquired leading talent in the field (see “ 10 Breakthrough Technologies 2013: Deep Learning”), and last year created software that taught itself to recognize cats and other objects by reviewing stills from YouTube videos. The underlying deep learning technology was later used to slash the error rate of Google’s voice recognition services (see “Google’s Virtual Brain Goes to Work”) ….Researchers at Microsoft have used deep learning to build a system that translates speech from English to Mandarin Chinese in real time (see “Microsoft Brings Star Trek’s Voice Translator to Life”). Chinese Web giant Baidu also recently established a Silicon Valley research lab to work on deep learning. September 20, 2013
  • 7. Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 7
  • 8. Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 8
  • 9. Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 9
  • 10. Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 10
  • 11. Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 11
  • 12. Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 12
  • 13. Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved “The word is spreading in all corners of the tech industry that the biggest part of big data, the unstructured part, possesses learnable patterns that we now have the computing power and algorithmic leverage to discern…This change marks a true disruption, and there are fortunes to be made. There are also tremendous social consequences to consider that require as much creativity and investment as the more immediately lucrative deep learning startups that are popping up all over…”
  • 14. Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved “Using an artificial intelligence technique inspired by theories about how the brain recognizes patterns, technology companies are reporting startling gains in fields as diverse as computer vision, speech recognition and the identification of promising new molecules for designing drugs. The advances have led to widespread enthusiasm among researchers who design software to perform human activities like seeing, listening and thinking. They offer the promise of machines that converse with humans and perform tasks like driving cars and working in factories, raising the specter of automated robots that could replace human workers.”
  • 15. Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Google Trends for “Deep Learning” keyword
  • 16. Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Enterprise use cases
  • 17. Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deep Learning •  One of the many pattern recognition techniques in Data Science •  Excels at rich media applications: •  Image recognition •  Speech translation •  Voice recognition •  Loosely inspired by human brain models •  Synonymous with Artificial Neural Networks, Multi Layer Networks
  • 18. Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved In this workshop -  Fundamentals of Deep Learning -  Implementation and Libraries in Real Life -  Demo!
  • 19. Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So, 1. what exactly is deep learning ? And, 2. why is it generally better than other methods on image, speech and certain other types of data?
  • 20. Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So, 1. what exactly is deep learning ? And, 2. why is it generally better than other methods on image, speech and certain other types of data? The short answers 1. ‘Deep Learning’ means using a neural network with several layers of nodes between input and output 2. the series of layers between input & output do feature identification and processing in a series of stages, just as our brains seem to.
  • 21. Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved but: 3. multilayer neural networks have been around for 25 years. What’s actually new?
  • 22. Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved but: 3. multilayer neural networks have been around for 25 years. What’s actually new? we have always had good algorithms for learning the weights in networks with 1 hidden layer but these algorithms are not good at learning the weights for networks with more hidden layers what’s new is: algorithms for training many-layer networks
  • 23. Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Longer answers 1.  reminder/quick-explanation of how neural network weights are learned; 2.  the idea of unsupervised feature learning (why ‘intermediate features’ are important for difficult classification tasks, and how NNs seem to naturally learn them) 3.  The ‘breakthrough’ – the simple trick for training Deep neural networks
  • 24. Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Neuron Quick Look 24
  • 25. Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved W1 W2 W3 f(x) 1.4 -2.5 -0.06
  • 26. Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 2.7 -8.6 0.002 f(x) 1.4 -2.5 -0.06 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
  • 27. Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved A dataset Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
  • 28. Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training the neural network Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc …
  • 29. Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Initialise with random weights
  • 30. Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 1.4 2.7 1.9
  • 31. Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 1.4 2.7 0.8 1.9
  • 32. Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 1.4 2.7 0.8 0 1.9 error 0.8
  • 33. Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 1.4 2.7 0.8 0 1.9 error 0.8
  • 34. Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Present a training pattern 6.4 2.8 1.7
  • 35. Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Feed it through to get output 6.4 2.8 0.9 1.7
  • 36. Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Compare with target output 6.4 2.8 0.9 1 1.7 error -0.1
  • 37. Page 37 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … Adjust weights based on error 6.4 2.8 0.9 1 1.7 error -0.1
  • 38. Page 38 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Training data Fields class 1.4 2.7 1.9 0 3.8 3.4 3.2 0 6.4 2.8 1.7 1 4.1 0.1 0.2 0 etc … And so on …. 6.4 2.8 0.9 1 1.7 error -0.1 Repeat this thousands, maybe millions of times – each time taking a random training instance, and making slight weight adjustments Algorithms for weight adjustment are designed to make changes that will reduce the error
  • 39. Page 39 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Initial random weights
  • 40. Page 40 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  • 41. Page 41 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  • 42. Page 42 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  • 43. Page 43 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Present a training instance / adjust the weights
  • 44. Page 44 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The decision boundary perspective… Eventually ….
  • 45. Page 45 © Hortonworks Inc. 2011 – 2014. All Rights Reserved In essence •  weight-learning algorithms for NNs are dumb •  they work by making thousands and thousands of tiny adjustments, each making the network do better at the most recent pattern, but perhaps a little worse on many others •  but, by dumb luck, eventually this tends to be good enough to learn effective classifiers for many real applications
  • 46. Page 46 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other points Detail of a standard NN weight learning algorithm – later If f(x) is non-linear, a network with 1 hidden layer can, in theory, learn perfectly any classification problem. A set of weights exists that can produce the targets from the inputs. The problem is finding them.
  • 47. Page 47 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other ‘by the way’ points If f(x) is linear, the NN can only draw straight decision boundaries (even if there are many layers of units)
  • 48. Page 48 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other ‘by the way’ points NNs use nonlinear f(x) so they can draw complex boundaries, but keep the data unchanged
  • 49. Page 49 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Some other ‘by the way’ points NNs use nonlinear f(x) so they SVMs only draw straight lines, can draw complex boundaries, but they transform the data first but keep the data unchanged in a way that makes that OK
  • 50. Page 50 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Feature detectors
  • 51. Page 51 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What’s this unit doing?
  • 52. Page 52 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Hidden layer units become self-organised feature detectors … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight
  • 53. Page 53 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight
  • 54. Page 54 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight it will send strong signal for a horizontal line in the top row, ignoring everywhere else
  • 55. Page 55 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight
  • 56. Page 56 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What does this unit detect? … 1 63 1 5 10 15 20 25 … strong +ve weight low/zero weight Strong signal for a dark area in the top left corner
  • 57. Page 57 © Hortonworks Inc. 2011 – 2014. All Rights Reserved What features might you expect a good NN to learn, when trained with data like this?
  • 58. Page 58 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 63 1 vertical lines
  • 59. Page 59 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 63 1 horizontal lines
  • 60. Page 60 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 63 1 Small circles
  • 61. Page 61 © Hortonworks Inc. 2011 – 2014. All Rights Reserved successive layers can learn higher-level features … etc …detect lines in Specific positions v Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc… etc …
  • 62. Page 62 © Hortonworks Inc. 2011 – 2014. All Rights Reserved successive layers can learn higher-level features … etc …detect lines in Specific positions v Higher level detetors ( horizontal line, “RHS vertical lune” “upper loop”, etc… etc … What does this unit detect?
  • 63. Page 63 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So: multiple layers make sense
  • 64. Page 64 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So: multiple layers make sense Your brain works that way
  • 65. Page 65 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So: multiple layers make sense Many-layer neural network architectures should be capable of learning the true underlying features and ‘feature logic’, and therefore generalise very well …
  • 66. Page 66 © Hortonworks Inc. 2011 – 2014. All Rights Reserved But, until very recently, weight-learning algorithms simply did not work on multi-layer architectures
  • 67. Page 67 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Along came deep learning …
  • 68. Page 68 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs…
  • 69. Page 69 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first
  • 70. Page 70 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer
  • 71. Page 71 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer then this layer
  • 72. Page 72 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer then this layer then this layer
  • 73. Page 73 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… Train this layer first then this layer then this layer then this layer finally this layer
  • 74. Page 74 © Hortonworks Inc. 2011 – 2014. All Rights Reserved The new way to train multi-layer NNs… EACH of the (non-output) layers is trained to be an auto-encoder Basically, it is forced to learn good features that describe what comes from the previous layer
  • 75. Page 75 © Hortonworks Inc. 2011 – 2014. All Rights Reserved an auto-encoder is trained, with an absolutely standard weight- adjustment algorithm to reproduce the input
  • 76. Page 76 © Hortonworks Inc. 2011 – 2014. All Rights Reserved an auto-encoder is trained, with an absolutely standard weight- adjustment algorithm to reproduce the input By making this happen with (many) fewer units than the inputs, this forces the ‘hidden layer’ units to become good feature detectors
  • 77. Page 77 © Hortonworks Inc. 2011 – 2014. All Rights Reserved intermediate layers are each trained to be auto encoders (or similar)
  • 78. Page 78 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Final layer trained to predict class based on outputs from previous layers
  • 79. Page 79 © Hortonworks Inc. 2011 – 2014. All Rights Reserved But, how does one train? 79 Overall, the NN is trying to minimize a cost function while adjusting the weights. To find the minima, Gradient Descent is used. If the training set is very large, GD can be too slow since each input is evaluated in the cost function. So, one can sample a subset of input for computing GD. If sampling is done at random, it is called Stochastic Gradient Descent. The implementation of this mathematical formulation is done by the Error Backpropagation Algorithm.
  • 80. Page 80 © Hortonworks Inc. 2011 – 2014. All Rights Reserved And that’s that •  That’s the basic idea •  There are many many types of deep learning, •  Different kinds of autoencoder, variations on architectures and training algorithms, etc… •  Very fast growing area …
  • 81. Page 81 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Doing this in real life
  • 82. Page 82 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Typical Workflow 82 1. Ingest training data and store it 2. Split data set into: training, testing and validation sets 3. Vectorize and extract features to go into next step 4. Architect multi layer network, initialize 5. Feed data and train 6. Test and Validate 7. Repeat steps 4 and 5 until desired 8. Store model 9. Put model in app, start generalizing on real data.
  • 83. Page 83 © Hortonworks Inc. 2011 – 2014. All Rights Reserved So what do you get? 83 1. Ingest training data and store it using Kafka, Flume, good old web scraping 2. Split data set into: training, testing and validation sets 3. Vectorize and extract features to go into next step 4. Architect multi layer network, initialize 5. Feed data and train 6. Test and Validate 7. Repeat steps 4 and 5 until desired 8. Store model 9. Put model in app, start generalizing on real data. Steps 2, 3, 4 and 5: Use libraries such as Caffe, Theano, Deeplearning4j, H20
  • 84. Page 84 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Distributed Deep Learning on Hadoop •  In ASF: Apache Singa – new incubator project •  Two main partners of Hortonworks: Skymind, and H20 •  Skymind is focussed on Deep Learning exclusively (deeplearning4j), H2O includes other ML libraries. •  Both provide scale out on HDP + Spark •  Skymind has GPU Acceleration built in - uses CUDA for doing linear algebra operations •  Both are open source, Apache licensed. 84
  • 85. Page 85 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Deeplearning4j Architecture 85
  • 86. Page 86 © Hortonworks Inc. 2011 – 2014. All Rights Reserved DL4J: Canova for Vectorization and Ingest •  Canova uses an input/output format system (similar to how Hadoop uses MapReduce) •  Supports all major types of input data (text, CSV, audio, image and video) •  Can be extended for specialized input formats •  Connects to Kafka 86
  • 87. Page 87 © Hortonworks Inc. 2011 – 2014. All Rights Reserved ND4J: •  N-dimensional vector library •  Scientific computing for JVM •  DL4J uses it to do linear algebra for backpropagation •  Supports GPUs via CUDA and Native via Jblas •  Deploys on Android •  DL4J code remains unchanged whether using GPU or CPU 87
  • 88. Page 88 © Hortonworks Inc. 2011 – 2014. All Rights Reserved 88 How to chose a Neural Net in DL4J core?
  • 89. Page 89 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Demo!
  • 90. Page 90 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Thank You hortonworks.com