SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Neural Networks for Machine Learning

            Lecture 1a
 Why do we need machine learning?

 Geoffrey Hinton
 with
 Nitish Srivastava
 Kevin Swersky
What is Machine Learning?
•   It is very hard to write programs that solve problems like recognizing a
    three-dimensional object from a novel viewpoint in new lighting
    conditions in a cluttered scene.
      – We don’t know what program to write because we don’t know how
          its done in our brain.
      – Even if we had a good idea about how to do it, the program might
          be horrendously complicated.

•   It is hard to write a program to compute the probability that a credit
    card transaction is fraudulent.
      – There may not be any rules that are both simple and reliable. We
          need to combine a very large number of weak rules.
      – Fraud is a moving target. The program needs to keep changing.
The Machine Learning Approach
• Instead of writing a program by hand for each specific task, we collect
  lots of examples that specify the correct output for a given input.
• A machine learning algorithm then takes these examples and produces
  a program that does the job.
   – The program produced by the learning algorithm may look very
      different from a typical hand-written program. It may contain millions
      of numbers.
   – If we do it right, the program works for new cases as well as the ones
      we trained it on.
   – If the data changes the program can change too by training on the
      new data.
• Massive amounts of computation are now cheaper than paying
  someone to write a task-specific program.
Some examples of tasks best solved by learning
• Recognizing patterns:
   – Objects in real scenes
   – Facial identities or facial expressions
   – Spoken words
• Recognizing anomalies:
   – Unusual sequences of credit card transactions
   – Unusual patterns of sensor readings in a nuclear power plant
• Prediction:
   – Future stock prices or currency exchange rates
   – Which movies will a person like?
A standard example of machine learning
• A lot of genetics is done on fruit flies.
   – They are convenient because they breed fast.
   – We already know a lot about them.
• The MNIST database of hand-written digits is the the machine learning
  equivalent of fruit flies.
   – They are publicly available and we can learn them quite fast in a
      moderate-sized neural net.
   – We know a huge amount about how well various machine learning
      methods do on MNIST.
• We will use MNIST as our standard task.
It is very hard to say what makes a 2
Beyond MNIST: The ImageNet task
• 1000 different object classes in 1.3 million high-resolution training images
  from the web.
    – Best system in 2010 competition got 47% error for its first choice and
      25% error for its top 5 choices.
• Jitendra Malik (an eminent neural net sceptic) said that this competition is
  a good test of whether deep neural networks work well for object
  recognition.
    – A very deep neural net (Krizhevsky et. al. 2012) gets less that 40%
      error for its first choice and less than 20% for its top 5 choices
      (see lecture 5).
Some examples from an earlier version of the net
It can deal with a wide range of objects
It makes some really cool errors
The Speech Recognition Task
• A speech recognition system has several stages:
   – Pre-processing: Convert the sound wave into a vector of acoustic
     coefficients. Extract a new vector about every 10 mille seconds.
   – The acoustic model: Use a few adjacent vectors of acoustic coefficients
     to place bets on which part of which phoneme is being spoken.
   – Decoding: Find the sequence of bets that does the best job of fitting the
     acoustic data and also fitting a model of the kinds of things people say.
• Deep neural networks pioneered by George Dahl and Abdel-rahman
  Mohamed are now replacing the previous machine learning method
  for the acoustic model.
Phone recognition on the TIMIT benchmark
                       (Mohamed, Dahl, & Hinton, 2012)
        183 HMM-state labels           – After standard post-processing
    not pre-trained                      using a bi-phone model, a deep
       2000 logistic hidden units        net with 8 layers gets 20.7% error
5 more layers of                         rate.
pre-trained weights                    – The best previous speaker-
       2000 logistic hidden units        independent result on TIMIT was
                                         24.4% and this required averaging
       2000 logistic hidden units        several models.
                                       – Li Deng (at MSR) realised that this
  15 frames of 40 filterbank outputs     result could change the way
  + their temporal derivatives           speech recognition was done.
Word error rates from MSR, IBM, & Google
(Hinton et. al. IEEE Signal Processing Magazine, Nov 2012)

The task            Hours of        Deep neural     Gaussian   GMM with
                    training data   network         Mixture    more data
                                                    Model

Switchboard         309             18.5%           27.4%      18.6%
(Microsoft                                                     (2000 hrs)
Research)
English broadcast   50              17.5%           18.8%
news (IBM)

Google voice        5,870           12.3%                      16.0%
search                              (and falling)              (>>5,870 hrs)
(android 4.1)
Neural Networks for Machine Learning

               Lecture 1b
        What are neural networks?

 Geoffrey Hinton
 with
 Nitish Srivastava
 Kevin Swersky
Reasons to study neural computation
•   To understand how the brain actually works.
     – Its very big and very complicated and made of stuff that dies when you
        poke it around. So we need to use computer simulations.
•   To understand a style of parallel computation inspired by neurons and their
    adaptive connections.
     – Very different style from sequential computation.
          • should be good for things that brains are good at (e.g. vision)
          • Should be bad for things that brains are bad at (e.g. 23 x 71)
•   To solve practical problems by using novel learning algorithms inspired by
    the brain (this course)
     – Learning algorithms can be very useful even if they are not how the
       brain actually works.
A typical cortical neuron
•   Gross physical structure:
     – There is one axon that branches
     – There is a dendritic tree that collects input from
       other neurons.
                                                                    axon
•   Axons typically contact dendritic trees at synapses
     – A spike of activity in the axon causes charge to be
       injected into the post-synaptic neuron.                         axon hillock
                                                             body
•   Spike generation:                                                      dendritic
     – There is an axon hillock that generates outgoing                    tree
        spikes whenever enough charge has flowed in at
        synapses to depolarize the cell membrane.
Synapses

• When a spike of activity travels along an axon and
  arrives at a synapse it causes vesicles of transmitter
  chemical to be released.
   – There are several kinds of transmitter.

• The transmitter molecules diffuse across the synaptic
  cleft and bind to receptor molecules in the membrane of
  the post-synaptic neuron thus changing their shape.
   – This opens up holes that allow specific ions in or
      out.
How synapses adapt

• The effectiveness of the synapse can be changed:
   – vary the number of vesicles of transmitter.
   – vary the number of receptor molecules.

• Synapses are slow, but they have advantages over RAM
   – They are very small and very low-power.
   – They adapt using locally available signals
      • But what rules do they use to decide how to change?
How the brain works on one slide!
•   Each neuron receives inputs from other neurons
     - A few neurons also connect to receptors.
     - Cortical neurons use spikes to communicate.
•   The effect of each input line on the neuron is controlled
    by a synaptic weight
     – The weights can be positive or negative.
•   The synaptic weights adapt so that the whole network learns to perform
    useful computations
     – Recognizing objects, understanding language, making plans,
        controlling the body.
                         11                             4
•   You have about 10 neurons each with about 10 weights.
     – A huge number of weights can affect the computation in a very short
        time. Much better bandwidth than a workstation.
Modularity and the brain
•   Different bits of the cortex do different things.
     – Local damage to the brain has specific effects.
     – Specific tasks increase the blood flow to specific regions.
•   But cortex looks pretty much the same all over.
     – Early brain damage makes functions relocate.
•   Cortex is made of general purpose stuff that has the ability to turn into
    special purpose hardware in response to experience.
     – This gives rapid parallel computation plus flexibility.
     – Conventional computers get flexibility by having stored sequential
        programs, but this requires very fast central processors to perform
        long sequential computations.
Neural Networks for Machine Learning

           Lecture 1c
  Some simple models of neurons

  Geoffrey Hinton
  with
  Nitish Srivastava
  Kevin Swersky
Idealized neurons
•   To model things we have to idealize them (e.g. atoms)
      – Idealization removes complicated details that are not essential
          for understanding the main principles.
      – It allows us to apply mathematics and to make analogies to
          other, familiar systems.
      – Once we understand the basic principles, its easy to add
          complexity to make the model more faithful.
•   It is often worth understanding models that are known to be wrong
    (but we must not forget that they are wrong!)
      – E.g. neurons that communicate real values rather than discrete
          spikes of activity.
Linear neurons
• These are simple but computationally limited
   – If we can make them learn we may get insight into more
     complicated neurons.
         bias         i th input


  y b                   xi wi
                  i                weight on
output
         index over                i th input
         input connections
Linear neurons
• These are simple but computationally limited
   – If we can make them learn we may get insight into more
     complicated neurons.



   y b              xi wi            y
                                       0
                i
                                               0
                                                   b       xi wi
                                                       i
Binary threshold neurons
• McCulloch-Pitts (1943): influenced Von Neumann.
   – First compute a weighted sum of the inputs.
   – Then send out a fixed size spike of activity if
     the weighted sum exceeds a threshold.                 1
   – McCulloch and Pitts thought that each spike




                                                       output
     is like the truth value of a proposition and
     each neuron combines truth values to
     compute the truth value of another                         0
     proposition!                                                    threshold
                                                                    weighted input
Binary threshold neurons

• There are two equivalent ways to write the equations for
  a binary threshold neuron:

 z          xi wi                        z = b + å xi wi
                              q = -b
       i                                            i

           1 if   z                              1 if   z³0
 y                                       y
           0 otherwise                           0 otherwise
Rectified Linear Neurons
      (sometimes called linear threshold neurons)

They compute a linear weighted sum of their inputs.
The output is a non-linear function of the total input.


z = b + å xi wi
            i
        z if z >0
y =
                                 y
       0 otherwise                                 0      z
Sigmoid neurons

                                z = b + å xi wi
• These give a real-valued                               1
                                                  y=
                                                           -z
  output that is a smooth and
  bounded function of their
  total input.
                                           i           1+ e
   – Typically they use the
      logistic function                1
   – They have nice
      derivatives which make     y   0.5
      learning easy (see
      lecture 3).                      0
                                                   0     z
Stochastic binary neurons
• These use the same equations      z = b + å xi wi   p(s = 1) =
                                                                     1
  as logistic units.                                               1+ e
                                                                         -z
                                              i
   – But they treat the output of
      the logistic as the
      probability of producing a          1
      spike in a short time
      window.                       p 0.5
• We can do a similar trick for
  rectified linear units:                 0
   – The output is treated as the                       0
      Poisson rate for spikes.                        z
Neural Networks for Machine Learning

              Lecture 1d
     A simple example of learning

  Geoffrey Hinton
  with
  Nitish Srivastava
  Kevin Swersky
A very simple way to recognize handwritten shapes
• Consider a neural network with two
  layers of neurons.
                                            0 1 2 3 4 5 6 7 8 9
   – neurons in the top layer represent
      known shapes.
   – neurons in the bottom layer
      represent pixel intensities.
• A pixel gets to vote if it has ink on it.
   – Each inked pixel can vote for several
      different shapes.
• The shape that gets the most votes wins.
How to display the weights
   1      2       3       4       5       6       7       8       9      0




                                               The input
                                               image

Give each output unit its own “map” of the input image and display the weight
coming from each pixel in the location of that pixel in the map.
Use a black or white blob with the area representing the magnitude of the weight
and the color representing the sign.
How to learn the weights
1      2       3       4       5      6       7       8       9       0




                                           The image


Show the network an image and increment the weights from active pixels
to the correct class.
Then decrement the weights from active pixels to whatever class the
network guesses.
1   2   3   4   5     6     7   8   9   0




                The image
1   2   3   4   5     6     7   8   9   0




                The image
1   2   3   4   5     6     7   8   9   0




                The image
1   2   3   4   5     6     7   8   9   0




                The image
1   2   3   4   5     6     7   8   9   0




                The image
The learned weights
 1      2       3      4       5      6       7      8       9      0




                              The image
The details of the learning algorithm will be explained in future lectures.
Why the simple learning algorithm is insufficient

 • A two layer network with a single winner in the top layer is
   equivalent to having a rigid template for each shape.
    – The winner is the template that has the biggest overlap
       with the ink.
 • The ways in which hand-written digits vary are much too
   complicated to be captured by simple template matches of
   whole shapes.
    – To capture all the allowable variations of a digit we need
       to learn the features that it is composed of.
Examples of handwritten digits that can be recognized
       correctly the first time they are seen
Neural Networks for Machine Learning

              Lecture 1e
         Three types of learning

Geoffrey Hinton
with
Nitish Srivastava
Kevin Swersky
Types of learning task
• Supervised learning
   – Learn to predict an output when given an input vector.

• Reinforcement learning
   – Learn to select an action to maximize payoff.

• Unsupervised learning
   – Discover a good internal representation of the input.
Two types of supervised learning

• Each training case consists of an input vector x and a target output t.

• Regression: The target output is a real number or a whole vector of
  real numbers.
   – The price of a stock in 6 months time.
   – The temperature at noon tomorrow.

• Classification: The target output is a class label.
   – The simplest case is a choice between 1 and 0.
   – We can also have multiple alternative labels.
How supervised learning typically works
• We start by choosing a model-class: y = f (x;W)
  – A model-class, f, is a way of using some numerical
    parameters, W, to map each input vector, x, into a predicted
    output y.

• Learning usually means adjusting the parameters to reduce the
  discrepancy between the target output, t, on each training case
  and the actual output, y, produced by the model.
                     1
   – For regression, 2 (y - t)2 is often a sensible measure of the
     discrepancy.
   – For classification there are other measures that are generally
     more sensible (they also work better).
Reinforcement learning
• In reinforcement learning, the output is an action or sequence of actions
  and the only supervisory signal is an occasional scalar reward.
   – The goal in selecting each action is to maximize the expected sum
      of the future rewards.
   – We usually use a discount factor for delayed rewards so that we
      don’t have to look too far into the future.
• Reinforcement learning is difficult:
   – The rewards are typically delayed so its hard to know where we
      went wrong (or right).
   – A scalar reward does not supply much information.
• This course cannot cover everything and reinforcement learning is one
  of the important topics we will not cover.
Unsupervised learning
• For about 40 years, unsupervised learning was largely ignored by the
  machine learning community
    – Some widely used definitions of machine learning actually excluded it.
    – Many researchers thought that clustering was the only form of
       unsupervised learning.
• It is hard to say what the aim of unsupervised learning is.
    – One major aim is to create an internal representation of the input that
       is useful for subsequent supervised or reinforcement learning.
    – You can compute the distance to a surface by using the disparity
       between two images. But you don’t want to learn to compute
       disparities by stubbing your toe thousands of times.
Other goals for unsupervised learning
• It provides a compact, low-dimensional representation of the input.
    – High-dimensional inputs typically live on or near a low-
       dimensional manifold (or several such manifolds).
    – Principal Component Analysis is a widely used linear method
       for finding a low-dimensional representation.
• It provides an economical high-dimensional representation of the
  input in terms of learned features.
    – Binary features are economical.
    – So are real-valued features that are nearly all zero.
• It finds sensible clusters in the input.
    – This is an example of a very sparse code in which only one of
       the features is non-zero.

Weitere ähnliche Inhalte

Was ist angesagt?

MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
101: Convolutional Neural Networks
101: Convolutional Neural Networks 101: Convolutional Neural Networks
101: Convolutional Neural Networks Mad Scientists
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Impetus Technologies
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOleg Mygryn
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Sivagowry Shathesh
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detailsonykhan3
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningJörgen Sandig
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature surveyAkshay Hegde
 
Deep learning short introduction
Deep learning short introductionDeep learning short introduction
Deep learning short introductionAdwait Bhave
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentationguestac67362
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 
Convolutional neural networks
Convolutional neural networksConvolutional neural networks
Convolutional neural networksSlobodan Blazeski
 

Was ist angesagt? (20)

MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
101: Convolutional Neural Networks
101: Convolutional Neural Networks 101: Convolutional Neural Networks
101: Convolutional Neural Networks
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing Unit I & II in Principles of Soft computing
Unit I & II in Principles of Soft computing
 
Deep learning
Deep learningDeep learning
Deep learning
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detail
 
Deep learning
Deep learningDeep learning
Deep learning
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Deep Learning - A Literature survey
Deep Learning - A Literature surveyDeep Learning - A Literature survey
Deep Learning - A Literature survey
 
Deep learning short introduction
Deep learning short introductionDeep learning short introduction
Deep learning short introduction
 
Artificial Neural Network Paper Presentation
Artificial Neural Network Paper PresentationArtificial Neural Network Paper Presentation
Artificial Neural Network Paper Presentation
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Convolutional neural networks
Convolutional neural networksConvolutional neural networks
Convolutional neural networks
 

Andere mochten auch

Brain and language skills- How our brain works in acquiring sills in a second...
Brain and language skills- How our brain works in acquiring sills in a second...Brain and language skills- How our brain works in acquiring sills in a second...
Brain and language skills- How our brain works in acquiring sills in a second...Babu Appat
 
Deep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go HomeDeep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go HomeHolberton School
 
How the Brain Works part 2
How the Brain Works part 2How the Brain Works part 2
How the Brain Works part 2Chantal Brockman
 
How the Brain Works part 1`
How the Brain Works part 1`How the Brain Works part 1`
How the Brain Works part 1`Chantal Brockman
 
How your brain works: A 200-million-year-old success story
How your brain works: A 200-million-year-old success storyHow your brain works: A 200-million-year-old success story
How your brain works: A 200-million-year-old success storyLoretta Breuning, PhD
 
Ppt how our mind works
Ppt how our mind worksPpt how our mind works
Ppt how our mind worksfatima-ss
 

Andere mochten auch (8)

Brain and language skills- How our brain works in acquiring sills in a second...
Brain and language skills- How our brain works in acquiring sills in a second...Brain and language skills- How our brain works in acquiring sills in a second...
Brain and language skills- How our brain works in acquiring sills in a second...
 
Deep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go HomeDeep Learning Class #1 - Go Deep or Go Home
Deep Learning Class #1 - Go Deep or Go Home
 
How the Brain Works part 2
How the Brain Works part 2How the Brain Works part 2
How the Brain Works part 2
 
How the Brain Works part 1`
How the Brain Works part 1`How the Brain Works part 1`
How the Brain Works part 1`
 
How your brain works: A 200-million-year-old success story
How your brain works: A 200-million-year-old success storyHow your brain works: A 200-million-year-old success story
How your brain works: A 200-million-year-old success story
 
SCARF Augmented Stakeholder Analysis
SCARF Augmented Stakeholder Analysis SCARF Augmented Stakeholder Analysis
SCARF Augmented Stakeholder Analysis
 
Ppt how our mind works
Ppt how our mind worksPpt how our mind works
Ppt how our mind works
 
Mind power
Mind powerMind power
Mind power
 

Ähnlich wie Neural Networks-1

Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introductionAdwait Bhave
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima PanditPurnima Pandit
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnnkartikaursang53
 
Neural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep LearningNeural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep Learningcomifa7406
 
Hard & soft computing
Hard & soft computingHard & soft computing
Hard & soft computingSCAROLINEECE
 
Lect1_Threshold_Logic_Unit lecture 1 - ANN
Lect1_Threshold_Logic_Unit  lecture 1 - ANNLect1_Threshold_Logic_Unit  lecture 1 - ANN
Lect1_Threshold_Logic_Unit lecture 1 - ANNMostafaHazemMostafaa
 
Artificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionArtificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionVigneshwer Dhinakaran
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptxSherinRappai
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017Iwan Sofana
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural NetworksDean Wyatte
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at IntelAtul Vaish
 

Ähnlich wie Neural Networks-1 (20)

Neural network
Neural networkNeural network
Neural network
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
 
Cognitive Computing
Cognitive ComputingCognitive Computing
Cognitive Computing
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
 
Neural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep LearningNeural Networks for Machine Learning and Deep Learning
Neural Networks for Machine Learning and Deep Learning
 
Hard & soft computing
Hard & soft computingHard & soft computing
Hard & soft computing
 
Lect1_Threshold_Logic_Unit lecture 1 - ANN
Lect1_Threshold_Logic_Unit  lecture 1 - ANNLect1_Threshold_Logic_Unit  lecture 1 - ANN
Lect1_Threshold_Logic_Unit lecture 1 - ANN
 
Artificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionArtificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognition
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 
Deep learning.pptx
Deep learning.pptxDeep learning.pptx
Deep learning.pptx
 
7 nn1-intro.ppt
7 nn1-intro.ppt7 nn1-intro.ppt
7 nn1-intro.ppt
 
Deep Learning Training at Intel
Deep Learning Training at IntelDeep Learning Training at Intel
Deep Learning Training at Intel
 

Kürzlich hochgeladen

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 

Kürzlich hochgeladen (20)

Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 

Neural Networks-1

  • 1. Neural Networks for Machine Learning Lecture 1a Why do we need machine learning? Geoffrey Hinton with Nitish Srivastava Kevin Swersky
  • 2. What is Machine Learning? • It is very hard to write programs that solve problems like recognizing a three-dimensional object from a novel viewpoint in new lighting conditions in a cluttered scene. – We don’t know what program to write because we don’t know how its done in our brain. – Even if we had a good idea about how to do it, the program might be horrendously complicated. • It is hard to write a program to compute the probability that a credit card transaction is fraudulent. – There may not be any rules that are both simple and reliable. We need to combine a very large number of weak rules. – Fraud is a moving target. The program needs to keep changing.
  • 3. The Machine Learning Approach • Instead of writing a program by hand for each specific task, we collect lots of examples that specify the correct output for a given input. • A machine learning algorithm then takes these examples and produces a program that does the job. – The program produced by the learning algorithm may look very different from a typical hand-written program. It may contain millions of numbers. – If we do it right, the program works for new cases as well as the ones we trained it on. – If the data changes the program can change too by training on the new data. • Massive amounts of computation are now cheaper than paying someone to write a task-specific program.
  • 4. Some examples of tasks best solved by learning • Recognizing patterns: – Objects in real scenes – Facial identities or facial expressions – Spoken words • Recognizing anomalies: – Unusual sequences of credit card transactions – Unusual patterns of sensor readings in a nuclear power plant • Prediction: – Future stock prices or currency exchange rates – Which movies will a person like?
  • 5. A standard example of machine learning • A lot of genetics is done on fruit flies. – They are convenient because they breed fast. – We already know a lot about them. • The MNIST database of hand-written digits is the the machine learning equivalent of fruit flies. – They are publicly available and we can learn them quite fast in a moderate-sized neural net. – We know a huge amount about how well various machine learning methods do on MNIST. • We will use MNIST as our standard task.
  • 6. It is very hard to say what makes a 2
  • 7. Beyond MNIST: The ImageNet task • 1000 different object classes in 1.3 million high-resolution training images from the web. – Best system in 2010 competition got 47% error for its first choice and 25% error for its top 5 choices. • Jitendra Malik (an eminent neural net sceptic) said that this competition is a good test of whether deep neural networks work well for object recognition. – A very deep neural net (Krizhevsky et. al. 2012) gets less that 40% error for its first choice and less than 20% for its top 5 choices (see lecture 5).
  • 8. Some examples from an earlier version of the net
  • 9. It can deal with a wide range of objects
  • 10. It makes some really cool errors
  • 11. The Speech Recognition Task • A speech recognition system has several stages: – Pre-processing: Convert the sound wave into a vector of acoustic coefficients. Extract a new vector about every 10 mille seconds. – The acoustic model: Use a few adjacent vectors of acoustic coefficients to place bets on which part of which phoneme is being spoken. – Decoding: Find the sequence of bets that does the best job of fitting the acoustic data and also fitting a model of the kinds of things people say. • Deep neural networks pioneered by George Dahl and Abdel-rahman Mohamed are now replacing the previous machine learning method for the acoustic model.
  • 12. Phone recognition on the TIMIT benchmark (Mohamed, Dahl, & Hinton, 2012) 183 HMM-state labels – After standard post-processing not pre-trained using a bi-phone model, a deep 2000 logistic hidden units net with 8 layers gets 20.7% error 5 more layers of rate. pre-trained weights – The best previous speaker- 2000 logistic hidden units independent result on TIMIT was 24.4% and this required averaging 2000 logistic hidden units several models. – Li Deng (at MSR) realised that this 15 frames of 40 filterbank outputs result could change the way + their temporal derivatives speech recognition was done.
  • 13. Word error rates from MSR, IBM, & Google (Hinton et. al. IEEE Signal Processing Magazine, Nov 2012) The task Hours of Deep neural Gaussian GMM with training data network Mixture more data Model Switchboard 309 18.5% 27.4% 18.6% (Microsoft (2000 hrs) Research) English broadcast 50 17.5% 18.8% news (IBM) Google voice 5,870 12.3% 16.0% search (and falling) (>>5,870 hrs) (android 4.1)
  • 14. Neural Networks for Machine Learning Lecture 1b What are neural networks? Geoffrey Hinton with Nitish Srivastava Kevin Swersky
  • 15. Reasons to study neural computation • To understand how the brain actually works. – Its very big and very complicated and made of stuff that dies when you poke it around. So we need to use computer simulations. • To understand a style of parallel computation inspired by neurons and their adaptive connections. – Very different style from sequential computation. • should be good for things that brains are good at (e.g. vision) • Should be bad for things that brains are bad at (e.g. 23 x 71) • To solve practical problems by using novel learning algorithms inspired by the brain (this course) – Learning algorithms can be very useful even if they are not how the brain actually works.
  • 16. A typical cortical neuron • Gross physical structure: – There is one axon that branches – There is a dendritic tree that collects input from other neurons. axon • Axons typically contact dendritic trees at synapses – A spike of activity in the axon causes charge to be injected into the post-synaptic neuron. axon hillock body • Spike generation: dendritic – There is an axon hillock that generates outgoing tree spikes whenever enough charge has flowed in at synapses to depolarize the cell membrane.
  • 17. Synapses • When a spike of activity travels along an axon and arrives at a synapse it causes vesicles of transmitter chemical to be released. – There are several kinds of transmitter. • The transmitter molecules diffuse across the synaptic cleft and bind to receptor molecules in the membrane of the post-synaptic neuron thus changing their shape. – This opens up holes that allow specific ions in or out.
  • 18. How synapses adapt • The effectiveness of the synapse can be changed: – vary the number of vesicles of transmitter. – vary the number of receptor molecules. • Synapses are slow, but they have advantages over RAM – They are very small and very low-power. – They adapt using locally available signals • But what rules do they use to decide how to change?
  • 19. How the brain works on one slide! • Each neuron receives inputs from other neurons - A few neurons also connect to receptors. - Cortical neurons use spikes to communicate. • The effect of each input line on the neuron is controlled by a synaptic weight – The weights can be positive or negative. • The synaptic weights adapt so that the whole network learns to perform useful computations – Recognizing objects, understanding language, making plans, controlling the body. 11 4 • You have about 10 neurons each with about 10 weights. – A huge number of weights can affect the computation in a very short time. Much better bandwidth than a workstation.
  • 20. Modularity and the brain • Different bits of the cortex do different things. – Local damage to the brain has specific effects. – Specific tasks increase the blood flow to specific regions. • But cortex looks pretty much the same all over. – Early brain damage makes functions relocate. • Cortex is made of general purpose stuff that has the ability to turn into special purpose hardware in response to experience. – This gives rapid parallel computation plus flexibility. – Conventional computers get flexibility by having stored sequential programs, but this requires very fast central processors to perform long sequential computations.
  • 21. Neural Networks for Machine Learning Lecture 1c Some simple models of neurons Geoffrey Hinton with Nitish Srivastava Kevin Swersky
  • 22. Idealized neurons • To model things we have to idealize them (e.g. atoms) – Idealization removes complicated details that are not essential for understanding the main principles. – It allows us to apply mathematics and to make analogies to other, familiar systems. – Once we understand the basic principles, its easy to add complexity to make the model more faithful. • It is often worth understanding models that are known to be wrong (but we must not forget that they are wrong!) – E.g. neurons that communicate real values rather than discrete spikes of activity.
  • 23. Linear neurons • These are simple but computationally limited – If we can make them learn we may get insight into more complicated neurons. bias i th input y b xi wi i weight on output index over i th input input connections
  • 24. Linear neurons • These are simple but computationally limited – If we can make them learn we may get insight into more complicated neurons. y b xi wi y 0 i 0 b xi wi i
  • 25. Binary threshold neurons • McCulloch-Pitts (1943): influenced Von Neumann. – First compute a weighted sum of the inputs. – Then send out a fixed size spike of activity if the weighted sum exceeds a threshold. 1 – McCulloch and Pitts thought that each spike output is like the truth value of a proposition and each neuron combines truth values to compute the truth value of another 0 proposition! threshold weighted input
  • 26. Binary threshold neurons • There are two equivalent ways to write the equations for a binary threshold neuron: z xi wi z = b + å xi wi q = -b i i 1 if z 1 if z³0 y y 0 otherwise 0 otherwise
  • 27. Rectified Linear Neurons (sometimes called linear threshold neurons) They compute a linear weighted sum of their inputs. The output is a non-linear function of the total input. z = b + å xi wi i z if z >0 y = y 0 otherwise 0 z
  • 28. Sigmoid neurons z = b + å xi wi • These give a real-valued 1 y= -z output that is a smooth and bounded function of their total input. i 1+ e – Typically they use the logistic function 1 – They have nice derivatives which make y 0.5 learning easy (see lecture 3). 0 0 z
  • 29. Stochastic binary neurons • These use the same equations z = b + å xi wi p(s = 1) = 1 as logistic units. 1+ e -z i – But they treat the output of the logistic as the probability of producing a 1 spike in a short time window. p 0.5 • We can do a similar trick for rectified linear units: 0 – The output is treated as the 0 Poisson rate for spikes. z
  • 30. Neural Networks for Machine Learning Lecture 1d A simple example of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky
  • 31. A very simple way to recognize handwritten shapes • Consider a neural network with two layers of neurons. 0 1 2 3 4 5 6 7 8 9 – neurons in the top layer represent known shapes. – neurons in the bottom layer represent pixel intensities. • A pixel gets to vote if it has ink on it. – Each inked pixel can vote for several different shapes. • The shape that gets the most votes wins.
  • 32. How to display the weights 1 2 3 4 5 6 7 8 9 0 The input image Give each output unit its own “map” of the input image and display the weight coming from each pixel in the location of that pixel in the map. Use a black or white blob with the area representing the magnitude of the weight and the color representing the sign.
  • 33. How to learn the weights 1 2 3 4 5 6 7 8 9 0 The image Show the network an image and increment the weights from active pixels to the correct class. Then decrement the weights from active pixels to whatever class the network guesses.
  • 34. 1 2 3 4 5 6 7 8 9 0 The image
  • 35. 1 2 3 4 5 6 7 8 9 0 The image
  • 36. 1 2 3 4 5 6 7 8 9 0 The image
  • 37. 1 2 3 4 5 6 7 8 9 0 The image
  • 38. 1 2 3 4 5 6 7 8 9 0 The image
  • 39. The learned weights 1 2 3 4 5 6 7 8 9 0 The image The details of the learning algorithm will be explained in future lectures.
  • 40. Why the simple learning algorithm is insufficient • A two layer network with a single winner in the top layer is equivalent to having a rigid template for each shape. – The winner is the template that has the biggest overlap with the ink. • The ways in which hand-written digits vary are much too complicated to be captured by simple template matches of whole shapes. – To capture all the allowable variations of a digit we need to learn the features that it is composed of.
  • 41. Examples of handwritten digits that can be recognized correctly the first time they are seen
  • 42. Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky
  • 43. Types of learning task • Supervised learning – Learn to predict an output when given an input vector. • Reinforcement learning – Learn to select an action to maximize payoff. • Unsupervised learning – Discover a good internal representation of the input.
  • 44. Two types of supervised learning • Each training case consists of an input vector x and a target output t. • Regression: The target output is a real number or a whole vector of real numbers. – The price of a stock in 6 months time. – The temperature at noon tomorrow. • Classification: The target output is a class label. – The simplest case is a choice between 1 and 0. – We can also have multiple alternative labels.
  • 45. How supervised learning typically works • We start by choosing a model-class: y = f (x;W) – A model-class, f, is a way of using some numerical parameters, W, to map each input vector, x, into a predicted output y. • Learning usually means adjusting the parameters to reduce the discrepancy between the target output, t, on each training case and the actual output, y, produced by the model. 1 – For regression, 2 (y - t)2 is often a sensible measure of the discrepancy. – For classification there are other measures that are generally more sensible (they also work better).
  • 46. Reinforcement learning • In reinforcement learning, the output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward. – The goal in selecting each action is to maximize the expected sum of the future rewards. – We usually use a discount factor for delayed rewards so that we don’t have to look too far into the future. • Reinforcement learning is difficult: – The rewards are typically delayed so its hard to know where we went wrong (or right). – A scalar reward does not supply much information. • This course cannot cover everything and reinforcement learning is one of the important topics we will not cover.
  • 47. Unsupervised learning • For about 40 years, unsupervised learning was largely ignored by the machine learning community – Some widely used definitions of machine learning actually excluded it. – Many researchers thought that clustering was the only form of unsupervised learning. • It is hard to say what the aim of unsupervised learning is. – One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning. – You can compute the distance to a surface by using the disparity between two images. But you don’t want to learn to compute disparities by stubbing your toe thousands of times.
  • 48. Other goals for unsupervised learning • It provides a compact, low-dimensional representation of the input. – High-dimensional inputs typically live on or near a low- dimensional manifold (or several such manifolds). – Principal Component Analysis is a widely used linear method for finding a low-dimensional representation. • It provides an economical high-dimensional representation of the input in terms of learned features. – Binary features are economical. – So are real-valued features that are nearly all zero. • It finds sensible clusters in the input. – This is an example of a very sparse code in which only one of the features is non-zero.