11. Deep Learning Today
• Advancement in speech recognition in the last 2 years
• A few long-standing performance records were broken with deep learning methods
• Microsoft and Google have both deployed DL-based speech recognition systems in
their products
• Advancement in Computer Vision
• Feature engineering is the bread-and-butter of a large portion of the CV community,
which creates some resistance to feature learning
• But the record holders on ImageNet and Semantic Segmentation are convolutional
nets
• Advancement in Natural Language Processing
• Fine-grained sentiment analysis, syntactic parsing
• Language model, machine translation, question answering
11
12. 12
Engine management
• The behaviour of a car engine is influenced
by a large number of parameters
– temperature at various points
– fuel/air mixture
– lubricant viscosity.
• Major companies have used neural networks
to dynamically tune an engine depending on
current settings.
14. 14
Signature recognition
• Each person's signature is different.
• There are structural similarities which are
difficult to quantify.
• One company has manufactured a machine
which recognizes signatures to within a high
level of accuracy.
– Considers speed in addition to gross shape.
– Makes forgery even more difficult.
15. 15
Sonar target recognition
• Distinguish mines from rocks on sea-bed
• The neural network is provided with a large
number of parameters which are extracted
from the sonar signal.
• The training set consists of sets of signals
from rocks and mines.
16. 16
Stock market prediction
• “Technical trading” refers to trading based
solely on known statistical parameters; e.g.
previous price
• Neural networks have been used to attempt
to predict changes in prices.
• Difficult to assess success since companies
using these techniques are reluctant to
disclose information.
17. 17
Mortgage assessment
• Assess risk of lending to an individual.
• Difficult to decide on marginal cases.
• Neural networks have been trained to make
decisions, based upon the opinions of expert
underwriters.
• Neural network produced a 12% reduction in
delinquencies compared with human experts.
22. Limitations of Neural Networks
Random initialization + densely connected networks lead to:
• High cost
• Each neuron in the neural network can be considered as a logistic regression.
• Training the entire neural network is to train all the interconnected logistic regressions.
• Difficult to train as the number of hidden layers increases
• Recall that logistic regression is trained by gradient descent.
• In backpropagation, gradient is progressively getting more dilute. That is, below top layers,
the correction signal 𝛿" is minimal.
• Stuck in local optima
• The objective function of the neural network is usually not convex.
• The random initialization does not guarantee starting from the proximity of global optima.
• Solution:
• Deep Learning/Learning multiple levels of representation
22
34. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
1.4
2.7
1.9
35. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
1.4
2.7 0.8
1.9
36. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
1.4
2.7 0.8
0
1.9 error 0.8
37. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
1.4
2.7 0.8
0
1.9 error 0.8
38. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Present a training pattern
6.4
2.8
1.7
39. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Feed it through to get output
6.4
2.8 0.9
1.7
40. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Compare with target output
6.4
2.8 0.9
1
1.7 error -0.1
41. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Adjust weights based on error
6.4
2.8 0.9
1
1.7 error -0.1
42. Training data
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
And so on ….
6.4
2.8 0.9
1
1.7 error -0.1
Repeat this thousands, maybe millions of times – each time
taking a random training instance, and making slight
weight adjustments
Algorithms for weight adjustment are designed to make
changes that will reduce the error
55. What does this unit detect?
…
1
63
1 5 10 15 20 25 …
strong +ve weight
low/zero weight
it will send strong signal for a horizontal
line in the top row, ignoring everywhere else
62. Backpropagation Algorithm – Main
Idea – error in hidden layers
The ideas of the algorithm can be summarized as follows :
1. Computes the error term for the output units using the
observed error.
2. From output layer, repeat
- propagating the error term back to the previous layer
and
- updating the weights between the two layers
until the earliest hidden layer is reached.
63. Backpropagation Algorithm
• Initialize weights (typically random!)
• Keep doing epochs
• For each example e in training set do
• forward pass to compute
• O = neural-net-output(network,e)
• miss = (T-O) at each output unit
• backward passto calculate deltas to weights
• update all weights
• end
• until tuning set error stops improving
Backward pass explained in next slideForward pass explained
earlier
69. Bias
Each neuron is like a simple logistic regression and you
have y=σ(Wx+b). The input values are multiplied with the
weights and the bias affects the initial level of squashing
in the sigmoid function (tanh etc.), which results the
desired the non-linearity.
For example, assume that you want a neuron to
fire y≈1 when all the input pixels are black x≈0. If there is
no bias no matter what weights W you have, given the
equation y=σ(Wx) the neuron will always fire y≈0.5.
Tanh0
Bias = 6
1[data values between -1 & 1]
74. TensorFlow
• What is it:
• Neural networks software for numerical computation - uses data flow graphs for computation
• Developed at Google’s machine intelligence research organization
• What can it be used for:
• Any machine neural network problem
• Video Demonstration
• Six minute video introduction on TensorFlow on youtube.
• Further information:
• www.tensorflow.org
• https://www.youtube.com/watch?v=bYeBL92v99Y
74
75. Torch
• What is it:
• Torch is a scientific computing framework for machine learning.
• The goal is to be flexible and allow the building of scientific algorithms quickly - contains neural network
and optimization libraries
• What can it be used for:
• Machine learning neural network problems
• Video Demonstration
• Three minute introduction on youtube.
• Further information:
• http://torch.ch/
• https://www.youtube.com/watch?v=uxja6iwOnc4&list=PLjJh1vlSEYgvGod9wWiydumYl8hOXixNu&index
=19
75
76. CNTK
• What is it:
• CNTK stands for Computational Network Toolkit - created by Microsoft.
• Designed for use with CPUs or GPUs (ie, graphical processing units)
• What can it be used for:
• Can be used for image classification problems, video analysis, speech recognition and natural language
processing.
• Video Demonstration
• A two minute introduction on youtube.
• Further information:
• https://www.cntk.ai/
• https://www.youtube.com/watch?v=-mLdConF1EU
76
77. Caffee
• What is it:
• Caffee is a deep learning framework designed to be modular and fast – used with
CPUs or GPUs.
• Developed by Berkeley Vision and Learning Center (BLVC) and community
contributors.
• What can it be used for:
• Originally developed for machine vision; but, now able to handle speech and text
problems.
• Video Demonstration
• A three minute introduction on youtube.
• Further information:
• http://caffe.berkeleyvision.org/
• https://www.youtube.com/watch?v=bOIZ74rOik0
77
94. References
• Bordes, A., Chopra, S., & Weston, J. (2014). Question answering with subgraph embeddings.arXiv preprint
arXiv:1406.3676.
• Graves, A., Mohamed, A. R., & Hinton, G. (2013, May). Speech recognition with deep recurrent neural
networks. InAcoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp.
6645-6649). IEEE.
• Graves, A. (2013). Generating sequences with recurrent neural networks.arXiv preprint arXiv:1308.0850.
• Irsoy, O., & Cardie, C. (2014, October). Opinion Mining with Deep Recurrent Neural Networks.
InEMNLP(pp. 720-728).
• Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector
space.arXiv preprint arXiv:1301.3781.
• Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the
ACM,8(10), 627-633.
• Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013, October).
Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the
conference on empirical methods in natural language processing (EMNLP)(Vol. 1631, p. 1642).
• Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks.
InAdvances in neural information processing systems(pp. 3104-3112).
• Tai, K. S., Socher, R., & Manning, C. D. (2015). Improved semantic representations from tree-structured
long short-term memory networks.arXiv preprint arXiv:1503.00075. 94