Weitere ähnliche Inhalte Ähnlich wie Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data Science Presentation (20) Mehr von Impetus Technologies (20) Kürzlich hochgeladen (20) Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data Science Presentation1. © 2014 Impetus Technologies1
Impetus Technologies Inc.
Deep Learning: Evolution of ML
from Statistical to Brain-like
Computing
The Fifth Elephant
July 25, 2014.
Dr. Vijay Srinivas Agneeswaran,
Director, Big Data Labs,
Impetus
2. © 2014 Impetus Technologies2
Contents
Introduction to Artificial Neural Networks
Deep learning networks
• Towards deep learning
• From ANNs to DLNs.
• Basics of DLNs.
• Related Approaches.
Distributed DLNs: Challenges
Distributed DLNs over GraphLab
4. © 2014 Impetus Technologies4
Introduction to Artificial Neural Networks (ANNs)
Perceptron
5. © 2014 Impetus Technologies5
Introduction to Artificial Neural Networks (ANNs)
Sigmoid Neuron
• Small change in input = small change in behaviour.
• Output of a sigmoid neuron is given below:
• Small change in input = small change in behaviour.
• Output of a sigmoid neuron is given below:
6. © 2014 Impetus Technologies6
Introduction to Artificial Neural Networks (ANNs): Back
Propagation
http://zerkpage.tripod.com/ann.htm
What is this?
NAND Gate!
initialize network weights (often small random
values)
do forEach training example ex
prediction = neural-net-output(network, ex) //
forward pass
actual = teacher-output(ex)
compute error (prediction - actual) at the
output units
compute delta(wh)for all weights from hidden
layer to output layer // backward pass
compute delta(wi) for all weights from input
layer to hidden layer
// backward pass continued
update network weights until all examples
classified correctly or
another stopping criterion satisfied
return the network
7. © 2014 Impetus Technologies7
The network to identify the individual digits from the
input image
http://neuralnetworksanddeeplearning.com/chap1.html
8. © 2014 Impetus Technologies8
Different Shallow Architectures
Weighted Sum Weighted SumWeighted Sum
Template matchers
Fixed Basis
Functions
Simple Trainable
Basis Functions
Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L.
Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007.
Linear predictor ANN, Radial Basis FunctionsKernel Machines
10. © 2014 Impetus Technologies10
DLN for Face Recognition
http://theanalyticsstore.com/deep-learning/
11. © 2014 Impetus Technologies11
Deep Learning Networks: Learning
No general
learning algorithm
(No-free-lunch
theorem by
Wolpert 1996).
Learning
algorithm for
specific tasks
– perception,
control,
prediction,
planning,
reasoning,
language
understanding
.
Limitations of
BP – local
minima,
optimization
challenges for
non-convex
objective
functions.
Hinton’s deep
belief
networks as
stack of
RBMs.
Lecun’s
energy based
learning for
DBNs.
12. © 2014 Impetus Technologies12
• This is a deep neural network composed
of multiple layers of latent variables
(hidden units or feature detectors)
• Can be viewed as a stack of RBMs
• Hinton along with his student proposed
that these networks can be trained
greedily one layer at a time
Deep Belief Networks
http://www.iro.umontreal.ca/~lisa/twiki/pub/Public/DeepBeliefNetworks/DBNs.png
• Boltzmann Machine is a specific energy
model with linear energy function.
13. © 2014 Impetus Technologies13
• RBM are Energy Based Models (EBM)
• EBM associate an energy with every
configuration of a system
• Learning corresponds to modifying the
shape of energy function, so that it has
desirable properties
• Like in physics, lower energy = more
stability
• So, modify shape of energy function such
that the desirable configurations have lower
energy
Energy Based Models
http://www.cs.nyu.edu/~yann/research
/ebm/loss-func.png
14. © 2014 Impetus Technologies14
Other DL networks: Convolutional Networks
Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based
Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di
Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-.
15. © 2014 Impetus Technologies15
• Aim of auto encoders network is to learn
a compressed representation for set of
data
• Is an unsupervised learning algorithm
that applies back propagation, setting
the target values equal to inputs
(identity function)
• Denoising auto encoder addresses
identity function by randomly corrupting
input that the auto encoder must then
reconstruct or denoise
• Best applied when there is structure in
the data
• Applications : Dimensionality reduction,
feature selection
Other DL Networks: Auto Encoders (Auto-
associators or Diabolo Network
16. © 2014 Impetus Technologies16
Why Deep Learning Networks are Brain-like?
Statistical
approach of
traditional ML –
SVMs or kernel
approaches.
• Not applicable in
deep learning
networks.
Human brain –
trophic factors
Traditional ML –
lot of data
munging,
representational
issues (feature
abstractor),
before classifier
can kick in.
Deep learning –
allows the
system to learn
representations
as well naturally.
17. © 2014 Impetus Technologies17
Success stories of DLNs
Android voice
recognition system –
based on DLNs
Improves accuracy by
25% compared to state-
of-art
Microsoft Skype
Translate software
and Digital assistant
Cortana
1.2 million images, 1000 classes (ImageNet Data) –
error rate of 15.3%, better than state of art at 26.1%
18. © 2014 Impetus Technologies18
Success stories of DLNs…..
Senna system – PoS tagging, chunking, NER, semantic role
labeling, syntactic parsing
Comparable F1 score with state-of-art with huge speed advantage (5
days VS few hours).
DLNs VS TF-IDF: 1 million documents, relevance search. 3.2ms
VS 1.2s.
Robot navigation
19. © 2014 Impetus Technologies19
Potential Applications of DLNs
Speech recognition/enhancement
Video sequencing
Emotion recognition (video/audio),
Malware detection,
Robotics – navigation.
multi-modal learning (text and image).
Natural Language Processing
20. © 2014 Impetus Technologies20
• Deeplearning4j – open source
implementation of Jeffery Dean’s
distributed deep learning paper.
• Theano: python library of math
functions.
– Efficient use of GPUs
transparently.
• Hinton’ courses on Coursera:
https://www.coursera.org/instructor/~15
4
Available resources
21. © 2014 Impetus Technologies21
• Large no. of
parameters can also
improve accuracy.
• Limitations –
CPU_to_GPU data
transfers.
Challenges in Realizing DLNs
Large no. of training
examples – high
accuracy.
Inherently sequential
nature – freeze up one
layer for learning.
GPUs to improve
training speedup
Distributed DLNs –
Jeffrey Dean’s work.
22. © 2014 Impetus Technologies22
• Motivation
– Scalable, low latency training
– Parallelize training data and learn
fast
Distributed DLNs
• Jeffrey Dean’s work DistBelief
– Pseudo-centralized realization
23. © 2014 Impetus Technologies23
• Purely distributed realizations are
needed.
• Our approach
– Use asynchronous graph
processing framework (GraphLab)
– Making modifications in GraphLab
code as required
• Layer abstraction, mass
communication
Distributed DLNs over GraphLab
24. © 2014 Impetus Technologies24
Distributed DLNs over GraphLab
Engine
25. © 2014 Impetus Technologies25
• ANN to Distributed Deep Learning
– Key ideas in deep learning
– Need for distributed
realizations.
– DistBelief, deeplearning4j etc.
– Our work on large scale
distributed deep learning
• Deep learning leads us from
statistics based machine learning
towards brain inspired AI.
Conclusions
26. © 2014 Impetus Technologies26
THANK YOU!
Mail • bigdata@impetus.com
LinkedIn • www.linkedin.com/company/impetus
Blogs • blogs.impetus.com
Twitter • @impetustech
28. © 2014 Impetus Technologies28
• Recurrent Neural networks
– Long Short Term Memory
(LSTM), Temporal data
• Sum-product networks
– Deep architectures of sum-
product networks
• Hierarchical temporal memory
– online structural and algorithmic
model of neocortex.
Other Brain-like Approaches
29. © 2014 Impetus Technologies29
• Connections between units form a Directed cycle i.e. a typical feed back
connections
• RNNs can use their internal memory to process arbitrary sequences of inputs
• RNNs cannot learn to look far back past
• LSTM solve this problem by introducing stem cells
• These stem cells can remember a value for an arbitrary amount of time
Recurrent Neural Networks
30. © 2014 Impetus Technologies30
• SPN is deep network model and is a directed acyclic graph
• These networks allow to compute the probability of an event quickly
• SPNs try to convert multi linear functions to ones in computationally short
forms i.e. it must consist of multiple additions and multiplications
• Leaves correspond to variables and nodes correspond to sums and products
Sum-Product Networks (SPN)
31. © 2014 Impetus Technologies31
• Is a online machine learning model developed by Jeff Hawkins
• This model learns one instance at a time
• Best explained by online stock model. Today’s situation of stock helps in
prediction of tomorrow’s stock
• A HTM network is tree shaped hierarchy of levels
• Higher hierarchy levels can use patterns learned at lower levels. This is adopted
from learning model adopted by brain in the form of neo cortex
Hierarchical Temporal Memory
32. © 2014 Impetus Technologies32
http://en.wikipedia.org/wiki/Hierarchical_temporal_memory
33. © 2014 Impetus Technologies33
Mathematical Equations
• The Energy Function is defined as follows:
b’ and c’ are the biases
𝐸 𝑥, ℎ = −𝑏′ 𝑥 − 𝑐′ℎ − ℎ′ 𝑊𝑥
where, W represents the weights connecting
visible layer and hidden layer.
34. © 2014 Impetus Technologies34
Learning Energy Based Models
• Energy based models can be learnt by performing gradient descent on
negative log-likelihood of training data
• It has the following form:
−
𝜕 log 𝑝 𝑥
𝜕θ
=
𝜕 𝐹 𝑥
𝜕θ
−
𝑥̃
𝑝 𝑥
𝜕 𝐹 𝑥
𝜕θ
Positive phase Negative phase
Hinweis der Redaktion Reference : http://neuralnetworksanddeeplearning.com/chap1.html
Consider the problem to identify the individual digits from the input image
Each image 28 by 28 pixel image. Then network is designed as follows
Input layer (image) -> 28*28 = 784 neurons. Each neuron corresponds to a pixel
The output layer can be identified by the number of digits to be identified i.e. 10 (0 to 9)
The intermediate hidden layer can be experimented with varied number of neurons. Let us fix at 10 nodes in hidden layer
Reference: http://neuralnetworksanddeeplearning.com/chap1.html
How about recognizing a human face from given set of random images?
Attack this problem in the similar fashion explained earlier. Input -> Image pixels, output -> Is it a face or not? (a single node)
A face can be recognized by answering some questions like “Is there an eye in the top left?”, “Is there a nose in the middle?” etc..
Each question corresponds to a hidden layer
ANN for face recognition?
Why SVMs or any kernel based approach cannot be used here.
Implicit assumption of a locally smooth function around each training example.
Problem decomposition into sub-problems
Breakdown into sub-problems, solvable by sub-networks. Complex problem requires more sub-networks, more hidden layers, hence need for deep neural networks.
http://deeplearning4j.org/convolutionalnets.html
Refined by Lecun in 1989 – mainly to apply CNNs to identify variability in 2D image data.
Introduced in 1980 by Fukushima
A type of RBMs where the communication is absent across the nodes in the same layer
Nodes are not connected to every other node of next layer. Symmetry is not there
Convolution networks learn images by pieces rather than learning as a whole (RBM does this)
Designed to use minimal amounts of pre processing
http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity Add layers
Give no. of nodes in each layer
Create max. no. of nodes across the layers.
Forward propagation
Backward propagation.
Run the engine
http://www.idsia.ch/~juergen/rnn.html http://deep-awesomeness.tumblr.com/post/63736448581/sum-product-networks-spm
http://lessoned.blogspot.in/2011/10/intro-to-sum-product-networks.html http://en.wikipedia.org/wiki/Hierarchical_temporal_memory