SlideShare a Scribd company logo
1 of 52
Download to read offline
Deep Learning for 
             Generic Object Recognition

                                 Yann LeCun,
                  Computational and Biological Learning Lab
                   The Courant Institute of Mathematical Sciences
                             New York University
              Collaborators: Marc'Aurelio Ranzato, Fu Jie Huang,




             http://yann.lecun.com
             http://www.cs.nyu.edu/~yann
Yann LeCun
Generic Object Detection and Recognition with Invariance
to Pose, Illumination and Clutter
 Computer Vision and Biological Vision are getting back 
 together again after a long divorce (Hinton, LeCun, Poggio, 
 Perona,  Ullman, Lowe, Triggs, S. Geman, Itti, Olshausen, 
 Simoncelli, ....).
 What happened? (1) Machine Learning, (2) Moore's Law.
 Generic Object Recognition is the problem of detecting and 
 classifying objects into generic categories such as “cars”, “trucks”, 
 “airplanes”, “animals”, or “human figures”
   Appearances are highly variable within a category because of 
 shape variation, position in the visual field, scale, viewpoint, 
 illumination, albedo, texture, background clutter, and occlusions.
  Learning invariant representations is key.

  Understanding the neural mechanism behind invariant 
 recognition is one of the main goals of Visual Neuroscience.
Why do we need “Deep” Architectures?

        Conjecture: we won't solve the perception problem without solving the 
        problem of learning in deep architectures [Hinton]
         Neural nets with lots of layers
         Deep belief networks
         Factor graphs with a “Markov” structure

        We will not solve the perception problem with kernel machines
         Kernel machines are glorified template matchers
         You can't handle complicated invariances with templates (you would
         need too many templates)

        Many interesting functions are “deep”
         Any function can be approximated with 2 layers (linear combination of
         non-linear functions)
         But many interesting functions a more efficiently represented with
         multiple layers
         Stupid examples: binary addition

Yann LeCun
Generic Object Detection and Recognition 
       with Invariance to Pose and Illumination
       50 toys belonging to 5 categories: animal, human figure, airplane, truck, car
       10 instance per category: 5 instances used for training, 5 instances for testing
       Raw dataset: 972 stereo pair of each object instance. 48,600 image pairs total.

       For each instance:
      18 azimuths 
             0 to 350 degrees every 20 
             degrees
      9 elevations
             30 to 70 degrees from 
             horizontal every 5 degrees
      6 illuminations
             on/off combinations of 4 
             lights
       2 cameras (stereo)
                                           Training instances                    Test instances
             7.5 cm apart
             40 cm from the object
Yann LeCun
Textured and Cluttered Datasets




Yann LeCun
Convolutional Network

                                                 Layer 3
                                                                                        Layer 6
                                                 24@18x18         Layer 4
     Stereo            Layer 1                                                Layer 5 Fully 
                                     Layer 2                      24@6x6
     input             8@92x92                                                100       connected
     2@96x96                         8@23x23
                                                                                        (500 weights)
                                                                                               5



                                        4x4             6x6
                    5x5                                                                      6x6
                                        subsampling     convolution             3x3
                    convolution                                                            convolution
                                                        (96 kernels)            subsampling
                    (16 kernels)                                                           (2400 kernels)
              90,857 free parameters, 3,901,162 connections.
             The architecture alternates convolutional layers  (feature detectors) and subsampling layers 
             (local feature pooling for invariance to small distortions).
              The entire network is trained end­to­end (all the layers are trained simultaneously).
               A gradient­based algorithm is used to minimize a supervised loss function. 
Yann LeCun
Alternated Convolutions and Subsampling

                           “Simple cells”
                                            “Complex cells”




                                               Averaging 
                       Multiple                subsampling
                       convolutions
       Local features are extracted 
      everywhere.
       averaging/subsampling layer 
      builds robustness to variations in 
      feature locations.
       Hubel/Wiesel'62, Fukushima'71,  
      LeCun'89, Riesenhuber & 
      Poggio'02, Ullman'02,....

Yann LeCun
Normalized­Uniform Set: Error Rates

              Linear Classifier on raw stereo images:     30.2% error. 
              K­Nearest­Neighbors on raw stereo images:  18.4% error.
              K­Nearest­Neighbors on PCA­95:              16.6% error.
              Pairwise SVM on 96x96 stereo images:        11.6% error
              Pairwise SVM on 95 Principal Components: 13.3% error.
              Convolutional Net on 96x96 stereo images:     5.8% error. 




                             Training instances Test instances
Yann LeCun
Normalized­Uniform Set: Learning Times




        SVM: using a parallel implementation by     Chop off the
        Graf, Durdanovic, and Cosatto (NEC Labs)    last layer of the
                                                    convolutional net
                                                    and train an SVM on it
Yann LeCun
Jittered­Cluttered Dataset




              Jittered­Cluttered Dataset: 
              291,600 tereo pairs for training, 58,320 for testing
              Objects are jittered: position, scale, in­plane rotation, contrast, brightness, 
             backgrounds, distractor objects,...
             Input dimension: 98x98x2 (approx 18,000)




Yann LeCun
Experiment 2: Jittered­Cluttered Dataset




          291,600 training samples, 58,320 test samples
         SVM with Gaussian kernel                             43.3% error
        Convolutional Net with binocular input:                 7.8% error
         Convolutional Net + SVM on top:                        5.9% error
         Convolutional Net with monocular input:              20.8% error
         Smaller mono net (DEMO):                             26.0% error
         Dataset available from http://www.cs.nyu.edu/~yann
Yann LeCun
Jittered­Cluttered Dataset




                      The convex loss, VC bounds   Chop off the last layer,
             OUCH!
                      and representers theorems    and train an SVM on it
                      don't seem to help           it works!
Yann LeCun
What's wrong with K­NN and SVMs?


  K­NN and SVM with Gaussian kernels  are based on matching global templates
  Both are “shallow” architectures
  There is now way to learn invariant recognition tasks with such naïve architectures 
 (unless we use an impractically large number of templates).
                                                                    Output
   The number of necessary templates grows 
   exponentially with the number of dimensions                        Linear
   of variations.                                                  Combinations
    Global templates are in trouble when the 
                                                      Features (similarities)
   variations include: category, instance shape, 
   configuration (for articulated object), 
                                                          Global Template Matchers
   position, azimuth, elevation, scale,  
                                                      (each training sample is a template
   illumination, texture, albedo, in­plane 
   rotation, background luminance, background              Input
   texture, background clutter, .....
Examples (Monocular Mode)




Yann LeCun
Learned Features

                                Layer 3
Layer 2




                      Layer 1
              Input




 Yann LeCun
Examples (Monocular Mode)




Yann LeCun
Examples (Monocular Mode)




Yann LeCun
Examples (Monocular Mode)




Yann LeCun
Supervised Learning in “Deep” Architectures

        Backprop can train “deep” architectures reasonably well
         It works better if the architecture has some structure (e.g. A
         convolutional net)

        Deep architectures with some structure (e.g. Convolutional nets) beat 
        shallow ones (e.g. Kernel machines) on image classification tasks:
         Handwriting recognition
         Face detection
         Generic object recognition

        Deep architectures are inherently more efficient for representing complex 
        functions.

        Have we solved the problem of training deep architectures?
         Can we do backprop with lots of layers?
         Can we train deep belief networks?

        NO! 
Yann LeCun
MNIST Dataset




              Handwritten Digit Dataset MNIST: 60,000 training samples, 10,000 test samples




Yann LeCun
Handwritten Digit Recognition with a Convolutional Network

                                                    Layer 3                    Layer 5 Layer 6: RBF
                       Layer 1        Layer 2                      Layer 4
       input                                        12@10x10                   84@1x1 Fully 
                       6@28x28        6@14x14                      12@5x5
       1@32x32
                                                                                         connected
                                                                                                  10

                                                                                       5x5
                                         2x2             5x5                 2x2
                     5x5                                                              convolution
                                         subsampling     convolution         subsampling
                     convolution

              60,000 free parameters, 400,000 connections.
             The architecture alternates convolutional layers  (feature detectors) and subsampling layers 
             (local feature pooling for invariance to small distortions).
              Handwritten Digit Dataset MNIST: 60,000 training samples, 10,000 test samples
              The entire network is trained end­to­end (all the layers are trained simultaneously).
              Test Error Rate: 0.8%


Yann LeCun
Results on MNIST Handwritten Digits (P=60,000)
                          CLASSIFIER                        DEFORMATIONPREPROCESSING         ERROR    Reference
                          linear classifier (1-layer NN)               none                   12.00   LeCun et al. 1998
                          linear classifier (1-layer NN)               deskewing               8.40   LeCun et al. 1998
                          pairwise linear classifier                   deskewing               7.60   LeCun et al. 1998
                          K-nearest-neighbors, (L2)                    none                    3.09   K. Wilder, U. Chicago
                          K-nearest-neighbors, (L2)                    deskewing               2.40   LeCun et al. 1998
                          K-nearest-neighbors, (L2)                    deskew, clean, blur     1.80   K. Wilder, U. Chicago
             Best         K-NN L3, 2 pixel jitter                      deskew, clean, blur     1.22   K. Wilder, U. Chicago
             Hand-crafted K-NN, shape context matching                 shape context feature   0.63   Belongie PAMI 02
                          40 PCA + quadratic classifier                none                    3.30   LeCun et al. 1998
                          1000 RBF + linear classifier                 none                    3.60   LeCun et al. 1998
                          K-NN, Tangent Distance                       subsamp 16x16 pixels    1.10   LeCun et al. 1998
                          SVM, Gaussian Kernel                         none                    1.40   Many
                          SVM deg 4 polynomial                         deskewing               1.10   Cortes/Vapnik
                          Reduced Set SVM deg 5 poly                   deskewing               1.00   Scholkopf
                          Virtual SVM deg-9 poly            Affine     none                    0.80   Scholkopf
             Best         V-SVM, 2-pixel jittered                      none                    0.68   DeCoste/Scholkopf, MLJ'02
             Kernel-based V-SVM, 2-pixel jittered                      deskewing               0.56   DeCoste/Scholkopf, MLJ'02
                          2-layer NN, 300 HU, MSE                      none                    4.70   LeCun et al. 1998
                          2-layer NN, 300 HU, MSE,          Affine     none                    3.60   LeCun et al. 1998
                          2-layer NN, 300 HU                           deskewing               1.60   LeCun et al. 1998
                          3-layer NN, 500+150 HU                       none                    2.95   LeCun et al. 1998
             Best fully-c 3-layer NN, 500+150 HU            Affine     none                    2.45   LeCun et al. 1998
             Neural Net   3-layer NN, 500+300 HU, CE, reg              none                    1.53   Hinton, in press, 2005
                          2-layer NN, 800 HU, CE                       none                    1.60   Simard et al., ICDAR 2003
                          2-layer NN, 800 HU, CE            Affine     none                    1.10   Simard et al., ICDAR 2003
                          2-layer NN, 800 HU, MSE           Elastic    none                    0.90   Simard et al., ICDAR 2003
             Best know-   2-layer NN, 800 HU, CE            Elastic    none                    0.70   Simard et al., ICDAR 2003
             Ledge-free   Stacked RBM + backprop                       none                    0.95   Hinton, in press, 2005
                          Convolutional net LeNet-1                    subsamp 16x16 pixels    1.70   LeCun et al. 1998
                          Convolutional net LeNet-4                    none                    1.10   LeCun et al. 1998
                          Convolutional net LeNet-5,                   none                    0.95   LeCun et al. 1998
                          Convolutional net LeNet-5,        Affine     none                    0.80   LeCun et al. 1998
                          Boosted LeNet-4                   Affine     none                    0.70   LeCun et al. 1998
                          Convolutional net, CE             Affine     none                    0.60   Simard et al., ICDAR 2003
             Best overall Convolutional net, CE             Elastic    none                    0.40   Simard et al., ICDAR 2003
Yann LeCun
Best Results on MNIST (from raw images: no preprocessing)

             CLASSIFIER                        DEFORMATION ERROR %   Reference
      Knowledge-free methods
             2-layer NN, 800 HU, CE                           1.60   Simard et al., ICDAR 2003
             3-layer NN, 500+300 HU, CE, reg                  1.53   Hinton, in press, 2005
             SVM, Gaussian Kernel                             1.40   Cortes 92 + Many others
      Convolutional nets
             Convolutional net LeNet-5,                       0.80   LeCun 2005 Unpublished
             Convolutional net LeNet-6,                       0.70   LeCun 2006 Unpublished
      Training set augmented with Affine Distortions
             2-layer NN, 800 HU, CE            Affine         1.10   Simard et al., ICDAR 2003
             Virtual SVM deg-9 poly            Affine         0.80   Scholkopf
             Convolutional net, CE             Affine         0.60   Simard et al., ICDAR 2003
      Training et augmented with Elastic Distortions
             2-layer NN, 800 HU, CE            Elastic        0.70   Simard et al., ICDAR 2003
             Convolutional net, CE             Elastic        0.40   Simard et al., ICDAR 2003




   Convolutional Nets are the best known method for handwriting recognition
Yann LeCun
Problems with Supervised Learning in Deep Architectures

        vanishing gradient, symmetry breaking
         The first layers have a hard time learning useful things
         How to break the symmetry so that different units do different things

        Idea [Hinton]: 
         1 - Initialize the first (few) layers with unsupervised training
         2 – Refine the whole network with backprop

        Problem: How do we train a layer in unsupervised mode?
         Auto-encoder: only works when the first layer is smaller than the input
         What if the first layer is larger than the input?
         Reconstruction is trivial!

        Solution: sparse over­complete representations
         Keep the number of bits in the first layer low
         Hinton uses a Restricted Boltzmann Machine in which the first layer
         uses stochastic binary units

Yann LeCun
Best Results on MNIST (from raw images: no preprocessing)

           CLASSIFIER                         DEFORMATION   ERROR   Reference
      Knowledge-free methods
           2-layer NN, 800 HU, CE                            1.60   Simard et al., ICDAR 2003
           3-layer NN, 500+300 HU, CE, reg                   1.53   Hinton, in press, 2005
           SVM, Gaussian Kernel                              1.40   Cortes 92 + Many others
           Unsupervised Stacked RBM + backprop               0.95   Hinton, in press, 2005
      Convolutional nets
           Convolutional net LeNet-5,                        0.80   LeCun 2005 Unpublished
           Convolutional net LeNet-6,                        0.70   LeCun 2006 Unpublished

      Training set augmented with Affine Distortions
             2-layer NN, 800 HU, CE            Affine        1.10   Simard et al., ICDAR 2003
             Virtual SVM deg-9 poly            Affine        0.80   Scholkopf
             Convolutional net, CE             Affine        0.60   Simard et al., ICDAR 2003
      Training et augmented with Elastic Distortions
             2-layer NN, 800 HU, CE            Elastic       0.70   Simard et al., ICDAR 2003
             Convolutional net, CE             Elastic       0.40   Simard et al., ICDAR 2003




Yann LeCun
Unsupervised Learning of Sparse Over­Complete Features


        Classification is easier with over­complete feature sets 

        Existing Unsupervised Feature Learning (non sparse/overcomplete):
         PCA, ICA, Auto-Encoder, Kernel-PCA

        Sparse/Overcomplete Methods
         Non-Negative Matrix Factorization
         Sparse-Overcomplete basis functions (Olshausen and Field 1997)
         Product of Experts (Teh, Welling, Osindero, Hinton 2003)




Yann LeCun
Symmetric  Product  of  Experts
Symmetric  Product  of  Experts



P  Z∣X , W c , W d        exp − E  X , Z , W c , W d 


E  X , Z , W c , W d  = E C  X , Z , W c E D  X , Z , W d 




                       1                2       1
EC  X , Z , W c =
                       2
                         ∥   Z −W c X   ∥   =
                                                2
                                                    ∑  z i−W ic X 2
                       1                2       1
ED  X , Z , W d  =
                       2
                         ∥X −W Z∥
                                 d
                                            =
                                                2
                                                    ∑  xi−W d  2
                                                             i
                                                               Z
Inference   &   Learning


 Inference 
                                              [
   Z = argmin Z E  X , Z , W  = argmin Z E C  X , Z , W E D  X , Z , W 
                                                                                ]
    let Z(0) be the encoder prediction
    find code which minimizes total energy 
    gradient descent optimization

 Learning 
   W  W − ∂ E  X , Z , W  /∂ W
                     

    using the optimal code, minimize E w.r.t. the weights W 
    gradient descent optimization
Inference   &   Learning
Inference   ­   step 1


                             Z(0) = Wc X




                    COPY
                      in Z
Inference   ­   step 1

    Ed


              Forward propagation
X




                             Wc X



                                    Ec
Inference   ­   step 1


                         optimal Z
   Back propagation of
   gradients w.r.t. Z
Learning   ­   step 2

    Ed


              Forward propagation
X
                                         optimal Z




                                    Ec
Learning   ­   step 2



    Back propagation of
    gradients w.r.t W


   update Wd



       update Wc
Sparsifying Logistic
                 z i  t
 zi  t = e
                            / i t  , i∈[ 1. . m ]

                  z i  t
 i  t =  e                 1−  i  t−1 


 temporal vs. spatial sparsity
    => no normalization


    is treated as a learned parameter
 

    => TSM is a sigmoid function with a
    special bias             zi  t =
                                             1
                                                − z i t 
                                         1B e

     is saturated during training  to allow     
  
   units to have different sparseness            input uniformly distributed in [­1,1]
Natural image patches  ­  Berkeley


                      Berkeley data set

                       100,000 12x12 patches
                       200 units in the code
                          
                              0.02
                          
                              1
                       learning rate  0.001
                       L1, L2 regularizer  0.001
                       fast convergence: < 30min.
Natural image patches  ­  Berkeley




200 decoder filters   (reshaped columns of matrix Wd)
Natural image patches  ­  Berkeley

                     Encoder direct filters
                     (rows of Wc)




                     Decoder reverse filters
                     (cols. of Wd)
Natural image patches  ­  Forest


                       Forest data set

                      100,000 12x12 patches
                      200 units in the code
                          
                             0.02
                          
                             1
                      learning rate  0.001
                      L1, L2 regularizer  0.001
                      fast convergence: < 30min.
Natural image patches  ­  Forest




200 decoder filters   (reshaped columns of matrix Wd)
Natural image patches  ­  Forest

                    Encoder direct filters
                    (rows of Wc)




                    Decoder reverse filters
                    (cols. of Wd)
Natural image patches  ­  Forest

test sample code word
                                            codes are: 
                                              sparse
                                              almost binary
                                              quite decorrelated
                                            in testing codes are produced by propagating the 
                                             
                                            input patch through encoder and TSM
                                             
                                                controls sparsity
                                                controls the “bit content” in each code unit

                                        unit activity



code words from 200 randomly selected test patches
What about an autoencoder?
What about an autoencoder?
What about an autoencoder?


encoder filters                        decoder filters




                                                    0.1
  filters are random
                                                   
                                                         0.5
  convergence only for large    and small   
                                          
Handwritten digits  ­  MNIST

                        60,000  28x28 images
                        196 units in the code
                            0.01
                        
                        
                            1
                        learning rate  0.001
                        L1, L2 regularizer  0.005




                   Encoder direct filters
Handwritten digits  ­  MNIST
Handwritten digits  ­ MNIST




               forward propagation  through 
               encoder and decoder




               after training there is no need to 
               minimize in code space
Initializing a Convolutional Net with SPoE


        Architecture: LeNet­6
         1->50->50->200->10

        Baseline: random initialization
         0.7% error on test set

        First Layer Initialized with SpoE
         0.6% error on test set

        Training with elastically­distorted 
        samples:
         0.38% error on test set




Yann LeCun
Initializing a Convolutional Net with SPoE




        Architecture: LeNet­6
         1->50->50->200->10
         9x9 kernels instead of 5x5

        Baseline: random initialization

        First Layer Initialized with SpoE




Yann LeCun
Best Results on MNIST (from raw images: no preprocessing)

             CLASSIFIER                          DEFORMATION   ERROR   Reference
      Knowledge-free methods
             2-layer NN, 800 HU, CE                             1.60   Simard et al., ICDAR 2003
             3-layer NN, 500+300 HU, CE, reg                    1.53   Hinton, in press, 2005
             SVM, Gaussian Kernel                               1.40   Cortes 92 + Many others
             Unsupervised Stacked RBM + backprop                0.95   Hinton, in press, 2005
      Convolutional nets
             Convolutional net LeNet-5,                         0.80   LeCun 2005 Unpublished
             Convolutional net LeNet-6,                         0.70   LeCun 2006 Unpublished
             Conv. net LeNet-6- + unsup learning                0.60   LeCun 2006 Unpublished
      Training set augmented with Affine Distortions
             2-layer NN, 800 HU, CE              Affine         1.10   Simard et al., ICDAR 2003
             Virtual SVM deg-9 poly              Affine         0.80   Scholkopf
             Convolutional net, CE               Affine         0.60   Simard et al., ICDAR 2003
      Training et augmented with Elastic Distortions
             2-layer NN, 800 HU, CE              Elastic        0.70   Simard et al., ICDAR 2003
             Convolutional net, CE               Elastic        0.40   Simard et al., ICDAR 2003
             Conv. net LeNet-6- + unsup learning Elastic        0.38   LeCun 2006 Unpublished




Yann LeCun
Conclusion


             Deep architectures are better than shallow ones

             We haven't solved the deep learning problem yet

             Larger networks are better

             Initializing the first layer(s) with unsupervised learning helps

             WANTED: a learning algorithm for deep architectures that 
             seamlessly blends supervised and unsupervised learning




Yann LeCun

More Related Content

What's hot

PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...United States Air Force Academy
 
PhDThesis, Dr Shen Furao
PhDThesis, Dr Shen FuraoPhDThesis, Dr Shen Furao
PhDThesis, Dr Shen FuraoSOINN Inc.
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...ITIIIndustries
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Pres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engPres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engDaniele Ciriello
 
Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...IDES Editor
 
Fcv learn le_cun
Fcv learn le_cunFcv learn le_cun
Fcv learn le_cunzukun
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introductionzukun
 
Parallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking systemParallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking systemcseij
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Marano presentation
Marano presentationMarano presentation
Marano presentationBeppe Marano
 
Mesh Generation and Topological Data Analysis
Mesh Generation and Topological Data AnalysisMesh Generation and Topological Data Analysis
Mesh Generation and Topological Data AnalysisDon Sheehy
 

What's hot (20)

PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
PixelCNN, Wavenet, Normalizing Flows - Santiago Pascual - UPC Barcelona 2018
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
Scaled Eigen Appearance and Likelihood Prunning for Large Scale Video Duplica...
 
PhDThesis, Dr Shen Furao
PhDThesis, Dr Shen FuraoPhDThesis, Dr Shen Furao
PhDThesis, Dr Shen Furao
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
Semantic Video Segmentation with Using Ensemble of Particular Classifiers and...
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Pres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engPres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_eng
 
Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...
 
Fcv learn le_cun
Fcv learn le_cunFcv learn le_cun
Fcv learn le_cun
 
crfasrnn_presentation
crfasrnn_presentationcrfasrnn_presentation
crfasrnn_presentation
 
AlexNet
AlexNetAlexNet
AlexNet
 
Structlight
StructlightStructlight
Structlight
 
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: IntroductionICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
ICCV2009: MAP Inference in Discrete Models: Part 1: Introduction
 
Parallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking systemParallel wisard object tracker a rambased tracking system
Parallel wisard object tracker a rambased tracking system
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Marano presentation
Marano presentationMarano presentation
Marano presentation
 
Mesh Generation and Topological Data Analysis
Mesh Generation and Topological Data AnalysisMesh Generation and Topological Data Analysis
Mesh Generation and Topological Data Analysis
 

Similar to Lecun 20060816-ciar-02-deep learning for generic object recognition

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?Philip Zheng
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4zukun
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in PythonKv Sagar
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Codemotion
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief netszukun
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier홍배 김
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2Jeong Choi
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aiaYi-Fan Liou
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processingYu Huang
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & PythonLonghow Lam
 

Similar to Lecun 20060816-ciar-02-deep learning for generic object recognition (20)

Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?
 
ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4ECCV2010: feature learning for image classification, part 4
ECCV2010: feature learning for image classification, part 4
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
Alberto Massidda - Scenes from a memory - Codemotion Rome 2019
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
 
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
 
Cnn
CnnCnn
Cnn
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aia
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Deep learning for image video processing
Deep learning for image video processingDeep learning for image video processing
Deep learning for image video processing
 
MNN
MNNMNN
MNN
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Keras on tensorflow in R & Python
Keras on tensorflow in R & PythonKeras on tensorflow in R & Python
Keras on tensorflow in R & Python
 

More from zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

More from zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Lecun 20060816-ciar-02-deep learning for generic object recognition

  • 1. Deep Learning for  Generic Object Recognition  Yann LeCun, Computational and Biological Learning Lab     The Courant Institute of Mathematical Sciences New York University Collaborators: Marc'Aurelio Ranzato, Fu Jie Huang, http://yann.lecun.com http://www.cs.nyu.edu/~yann Yann LeCun
  • 2. Generic Object Detection and Recognition with Invariance to Pose, Illumination and Clutter Computer Vision and Biological Vision are getting back  together again after a long divorce (Hinton, LeCun, Poggio,  Perona,  Ullman, Lowe, Triggs, S. Geman, Itti, Olshausen,  Simoncelli, ....). What happened? (1) Machine Learning, (2) Moore's Law. Generic Object Recognition is the problem of detecting and  classifying objects into generic categories such as “cars”, “trucks”,  “airplanes”, “animals”, or “human figures”   Appearances are highly variable within a category because of  shape variation, position in the visual field, scale, viewpoint,  illumination, albedo, texture, background clutter, and occlusions.  Learning invariant representations is key.  Understanding the neural mechanism behind invariant  recognition is one of the main goals of Visual Neuroscience.
  • 3. Why do we need “Deep” Architectures? Conjecture: we won't solve the perception problem without solving the  problem of learning in deep architectures [Hinton] Neural nets with lots of layers Deep belief networks Factor graphs with a “Markov” structure We will not solve the perception problem with kernel machines Kernel machines are glorified template matchers You can't handle complicated invariances with templates (you would need too many templates) Many interesting functions are “deep” Any function can be approximated with 2 layers (linear combination of non-linear functions) But many interesting functions a more efficiently represented with multiple layers Stupid examples: binary addition Yann LeCun
  • 4. Generic Object Detection and Recognition  with Invariance to Pose and Illumination  50 toys belonging to 5 categories: animal, human figure, airplane, truck, car  10 instance per category: 5 instances used for training, 5 instances for testing  Raw dataset: 972 stereo pair of each object instance. 48,600 image pairs total.  For each instance: 18 azimuths  0 to 350 degrees every 20  degrees 9 elevations 30 to 70 degrees from  horizontal every 5 degrees 6 illuminations on/off combinations of 4  lights 2 cameras (stereo) Training instances Test instances 7.5 cm apart 40 cm from the object Yann LeCun
  • 6. Convolutional Network Layer 3 Layer 6 24@18x18 Layer 4 Stereo Layer 1 Layer 5 Fully  Layer 2 24@6x6 input 8@92x92 100 connected 2@96x96 8@23x23 (500 weights) 5 4x4 6x6 5x5 6x6 subsampling convolution 3x3 convolution convolution (96 kernels) subsampling (16 kernels) (2400 kernels)  90,857 free parameters, 3,901,162 connections. The architecture alternates convolutional layers  (feature detectors) and subsampling layers  (local feature pooling for invariance to small distortions).  The entire network is trained end­to­end (all the layers are trained simultaneously).   A gradient­based algorithm is used to minimize a supervised loss function.  Yann LeCun
  • 7. Alternated Convolutions and Subsampling “Simple cells” “Complex cells” Averaging  Multiple  subsampling convolutions  Local features are extracted  everywhere.  averaging/subsampling layer  builds robustness to variations in  feature locations.  Hubel/Wiesel'62, Fukushima'71,   LeCun'89, Riesenhuber &  Poggio'02, Ullman'02,.... Yann LeCun
  • 8. Normalized­Uniform Set: Error Rates  Linear Classifier on raw stereo images:  30.2% error.   K­Nearest­Neighbors on raw stereo images:  18.4% error.  K­Nearest­Neighbors on PCA­95: 16.6% error.  Pairwise SVM on 96x96 stereo images:  11.6% error  Pairwise SVM on 95 Principal Components: 13.3% error.  Convolutional Net on 96x96 stereo images:     5.8% error.  Training instances Test instances Yann LeCun
  • 9. Normalized­Uniform Set: Learning Times SVM: using a parallel implementation by  Chop off the Graf, Durdanovic, and Cosatto (NEC Labs)  last layer of the convolutional net and train an SVM on it Yann LeCun
  • 10. Jittered­Cluttered Dataset  Jittered­Cluttered Dataset:   291,600 tereo pairs for training, 58,320 for testing  Objects are jittered: position, scale, in­plane rotation, contrast, brightness,  backgrounds, distractor objects,... Input dimension: 98x98x2 (approx 18,000) Yann LeCun
  • 11. Experiment 2: Jittered­Cluttered Dataset   291,600 training samples, 58,320 test samples  SVM with Gaussian kernel 43.3% error Convolutional Net with binocular input:   7.8% error  Convolutional Net + SVM on top:   5.9% error  Convolutional Net with monocular input: 20.8% error  Smaller mono net (DEMO): 26.0% error  Dataset available from http://www.cs.nyu.edu/~yann Yann LeCun
  • 12. Jittered­Cluttered Dataset The convex loss, VC bounds Chop off the last layer, OUCH! and representers theorems and train an SVM on it don't seem to help it works! Yann LeCun
  • 13. What's wrong with K­NN and SVMs?  K­NN and SVM with Gaussian kernels  are based on matching global templates  Both are “shallow” architectures  There is now way to learn invariant recognition tasks with such naïve architectures  (unless we use an impractically large number of templates). Output The number of necessary templates grows  exponentially with the number of dimensions  Linear of variations. Combinations  Global templates are in trouble when the  Features (similarities) variations include: category, instance shape,  configuration (for articulated object),  Global Template Matchers position, azimuth, elevation, scale,   (each training sample is a template illumination, texture, albedo, in­plane  rotation, background luminance, background  Input texture, background clutter, .....
  • 15. Learned Features Layer 3 Layer 2 Layer 1 Input Yann LeCun
  • 19. Supervised Learning in “Deep” Architectures Backprop can train “deep” architectures reasonably well It works better if the architecture has some structure (e.g. A convolutional net) Deep architectures with some structure (e.g. Convolutional nets) beat  shallow ones (e.g. Kernel machines) on image classification tasks: Handwriting recognition Face detection Generic object recognition Deep architectures are inherently more efficient for representing complex  functions. Have we solved the problem of training deep architectures? Can we do backprop with lots of layers? Can we train deep belief networks? NO!  Yann LeCun
  • 20. MNIST Dataset  Handwritten Digit Dataset MNIST: 60,000 training samples, 10,000 test samples Yann LeCun
  • 21. Handwritten Digit Recognition with a Convolutional Network Layer 3 Layer 5 Layer 6: RBF Layer 1 Layer 2 Layer 4 input 12@10x10 84@1x1 Fully  6@28x28 6@14x14 12@5x5 1@32x32 connected 10 5x5 2x2 5x5 2x2 5x5 convolution subsampling convolution subsampling convolution  60,000 free parameters, 400,000 connections. The architecture alternates convolutional layers  (feature detectors) and subsampling layers  (local feature pooling for invariance to small distortions).  Handwritten Digit Dataset MNIST: 60,000 training samples, 10,000 test samples  The entire network is trained end­to­end (all the layers are trained simultaneously).  Test Error Rate: 0.8% Yann LeCun
  • 22. Results on MNIST Handwritten Digits (P=60,000) CLASSIFIER DEFORMATIONPREPROCESSING ERROR Reference linear classifier (1-layer NN) none 12.00 LeCun et al. 1998 linear classifier (1-layer NN) deskewing 8.40 LeCun et al. 1998 pairwise linear classifier deskewing 7.60 LeCun et al. 1998 K-nearest-neighbors, (L2) none 3.09 K. Wilder, U. Chicago K-nearest-neighbors, (L2) deskewing 2.40 LeCun et al. 1998 K-nearest-neighbors, (L2) deskew, clean, blur 1.80 K. Wilder, U. Chicago Best K-NN L3, 2 pixel jitter deskew, clean, blur 1.22 K. Wilder, U. Chicago Hand-crafted K-NN, shape context matching shape context feature 0.63 Belongie PAMI 02 40 PCA + quadratic classifier none 3.30 LeCun et al. 1998 1000 RBF + linear classifier none 3.60 LeCun et al. 1998 K-NN, Tangent Distance subsamp 16x16 pixels 1.10 LeCun et al. 1998 SVM, Gaussian Kernel none 1.40 Many SVM deg 4 polynomial deskewing 1.10 Cortes/Vapnik Reduced Set SVM deg 5 poly deskewing 1.00 Scholkopf Virtual SVM deg-9 poly Affine none 0.80 Scholkopf Best V-SVM, 2-pixel jittered none 0.68 DeCoste/Scholkopf, MLJ'02 Kernel-based V-SVM, 2-pixel jittered deskewing 0.56 DeCoste/Scholkopf, MLJ'02 2-layer NN, 300 HU, MSE none 4.70 LeCun et al. 1998 2-layer NN, 300 HU, MSE, Affine none 3.60 LeCun et al. 1998 2-layer NN, 300 HU deskewing 1.60 LeCun et al. 1998 3-layer NN, 500+150 HU none 2.95 LeCun et al. 1998 Best fully-c 3-layer NN, 500+150 HU Affine none 2.45 LeCun et al. 1998 Neural Net 3-layer NN, 500+300 HU, CE, reg none 1.53 Hinton, in press, 2005 2-layer NN, 800 HU, CE none 1.60 Simard et al., ICDAR 2003 2-layer NN, 800 HU, CE Affine none 1.10 Simard et al., ICDAR 2003 2-layer NN, 800 HU, MSE Elastic none 0.90 Simard et al., ICDAR 2003 Best know- 2-layer NN, 800 HU, CE Elastic none 0.70 Simard et al., ICDAR 2003 Ledge-free Stacked RBM + backprop none 0.95 Hinton, in press, 2005 Convolutional net LeNet-1 subsamp 16x16 pixels 1.70 LeCun et al. 1998 Convolutional net LeNet-4 none 1.10 LeCun et al. 1998 Convolutional net LeNet-5, none 0.95 LeCun et al. 1998 Convolutional net LeNet-5, Affine none 0.80 LeCun et al. 1998 Boosted LeNet-4 Affine none 0.70 LeCun et al. 1998 Convolutional net, CE Affine none 0.60 Simard et al., ICDAR 2003 Best overall Convolutional net, CE Elastic none 0.40 Simard et al., ICDAR 2003 Yann LeCun
  • 23. Best Results on MNIST (from raw images: no preprocessing) CLASSIFIER DEFORMATION ERROR % Reference Knowledge-free methods 2-layer NN, 800 HU, CE 1.60 Simard et al., ICDAR 2003 3-layer NN, 500+300 HU, CE, reg 1.53 Hinton, in press, 2005 SVM, Gaussian Kernel 1.40 Cortes 92 + Many others Convolutional nets Convolutional net LeNet-5, 0.80 LeCun 2005 Unpublished Convolutional net LeNet-6, 0.70 LeCun 2006 Unpublished Training set augmented with Affine Distortions 2-layer NN, 800 HU, CE Affine 1.10 Simard et al., ICDAR 2003 Virtual SVM deg-9 poly Affine 0.80 Scholkopf Convolutional net, CE Affine 0.60 Simard et al., ICDAR 2003 Training et augmented with Elastic Distortions 2-layer NN, 800 HU, CE Elastic 0.70 Simard et al., ICDAR 2003 Convolutional net, CE Elastic 0.40 Simard et al., ICDAR 2003 Convolutional Nets are the best known method for handwriting recognition Yann LeCun
  • 24. Problems with Supervised Learning in Deep Architectures vanishing gradient, symmetry breaking The first layers have a hard time learning useful things How to break the symmetry so that different units do different things Idea [Hinton]:  1 - Initialize the first (few) layers with unsupervised training 2 – Refine the whole network with backprop Problem: How do we train a layer in unsupervised mode? Auto-encoder: only works when the first layer is smaller than the input What if the first layer is larger than the input? Reconstruction is trivial! Solution: sparse over­complete representations Keep the number of bits in the first layer low Hinton uses a Restricted Boltzmann Machine in which the first layer uses stochastic binary units Yann LeCun
  • 25. Best Results on MNIST (from raw images: no preprocessing) CLASSIFIER DEFORMATION ERROR Reference Knowledge-free methods 2-layer NN, 800 HU, CE 1.60 Simard et al., ICDAR 2003 3-layer NN, 500+300 HU, CE, reg 1.53 Hinton, in press, 2005 SVM, Gaussian Kernel 1.40 Cortes 92 + Many others Unsupervised Stacked RBM + backprop 0.95 Hinton, in press, 2005 Convolutional nets Convolutional net LeNet-5, 0.80 LeCun 2005 Unpublished Convolutional net LeNet-6, 0.70 LeCun 2006 Unpublished Training set augmented with Affine Distortions 2-layer NN, 800 HU, CE Affine 1.10 Simard et al., ICDAR 2003 Virtual SVM deg-9 poly Affine 0.80 Scholkopf Convolutional net, CE Affine 0.60 Simard et al., ICDAR 2003 Training et augmented with Elastic Distortions 2-layer NN, 800 HU, CE Elastic 0.70 Simard et al., ICDAR 2003 Convolutional net, CE Elastic 0.40 Simard et al., ICDAR 2003 Yann LeCun
  • 26. Unsupervised Learning of Sparse Over­Complete Features Classification is easier with over­complete feature sets  Existing Unsupervised Feature Learning (non sparse/overcomplete): PCA, ICA, Auto-Encoder, Kernel-PCA Sparse/Overcomplete Methods Non-Negative Matrix Factorization Sparse-Overcomplete basis functions (Olshausen and Field 1997) Product of Experts (Teh, Welling, Osindero, Hinton 2003) Yann LeCun
  • 28. Symmetric  Product  of  Experts P  Z∣X , W c , W d   exp − E  X , Z , W c , W d  E  X , Z , W c , W d  = E C  X , Z , W c E D  X , Z , W d  1 2 1 EC  X , Z , W c = 2 ∥ Z −W c X ∥ = 2 ∑  z i−W ic X 2 1 2 1 ED  X , Z , W d  = 2 ∥X −W Z∥  d = 2 ∑  xi−W d  2 i Z
  • 29. Inference   &   Learning  Inference  [ Z = argmin Z E  X , Z , W  = argmin Z E C  X , Z , W E D  X , Z , W   ]  let Z(0) be the encoder prediction  find code which minimizes total energy   gradient descent optimization  Learning  W  W − ∂ E  X , Z , W  /∂ W   using the optimal code, minimize E w.r.t. the weights W   gradient descent optimization
  • 31. Inference   ­   step 1 Z(0) = Wc X COPY   in Z
  • 32. Inference   ­   step 1 Ed Forward propagation X Wc X Ec
  • 33. Inference   ­   step 1 optimal Z Back propagation of gradients w.r.t. Z
  • 34. Learning   ­   step 2 Ed Forward propagation X optimal Z Ec
  • 35. Learning   ­   step 2 Back propagation of gradients w.r.t W update Wd update Wc
  • 36. Sparsifying Logistic  z i  t zi  t = e  / i t  , i∈[ 1. . m ]  z i  t i  t =  e  1−  i  t−1   temporal vs. spatial sparsity     => no normalization     is treated as a learned parameter      => TSM is a sigmoid function with a     special bias zi  t = 1  − z i t  1B e     is saturated during training  to allow          units to have different sparseness  input uniformly distributed in [­1,1]
  • 37. Natural image patches  ­  Berkeley Berkeley data set  100,000 12x12 patches  200 units in the code          0.02          1  learning rate  0.001  L1, L2 regularizer  0.001  fast convergence: < 30min.
  • 39. Natural image patches  ­  Berkeley Encoder direct filters (rows of Wc) Decoder reverse filters (cols. of Wd)
  • 40. Natural image patches  ­  Forest Forest data set  100,000 12x12 patches  200 units in the code          0.02          1  learning rate  0.001  L1, L2 regularizer  0.001  fast convergence: < 30min.
  • 42. Natural image patches  ­  Forest Encoder direct filters (rows of Wc) Decoder reverse filters (cols. of Wd)
  • 43. Natural image patches  ­  Forest test sample code word  codes are:  sparse almost binary quite decorrelated in testing codes are produced by propagating the   input patch through encoder and TSM      controls sparsity     controls the “bit content” in each code unit unit activity code words from 200 randomly selected test patches
  • 46. What about an autoencoder? encoder filters decoder filters  0.1  filters are random  0.5  convergence only for large    and small     
  • 47. Handwritten digits  ­  MNIST  60,000  28x28 images  196 units in the code      0.01        1  learning rate  0.001  L1, L2 regularizer  0.005 Encoder direct filters
  • 48. Handwritten digits  ­  MNIST Handwritten digits  ­ MNIST forward propagation  through  encoder and decoder after training there is no need to  minimize in code space
  • 49. Initializing a Convolutional Net with SPoE Architecture: LeNet­6 1->50->50->200->10 Baseline: random initialization 0.7% error on test set First Layer Initialized with SpoE 0.6% error on test set Training with elastically­distorted  samples: 0.38% error on test set Yann LeCun
  • 50. Initializing a Convolutional Net with SPoE Architecture: LeNet­6 1->50->50->200->10 9x9 kernels instead of 5x5 Baseline: random initialization First Layer Initialized with SpoE Yann LeCun
  • 51. Best Results on MNIST (from raw images: no preprocessing) CLASSIFIER DEFORMATION ERROR Reference Knowledge-free methods 2-layer NN, 800 HU, CE 1.60 Simard et al., ICDAR 2003 3-layer NN, 500+300 HU, CE, reg 1.53 Hinton, in press, 2005 SVM, Gaussian Kernel 1.40 Cortes 92 + Many others Unsupervised Stacked RBM + backprop 0.95 Hinton, in press, 2005 Convolutional nets Convolutional net LeNet-5, 0.80 LeCun 2005 Unpublished Convolutional net LeNet-6, 0.70 LeCun 2006 Unpublished Conv. net LeNet-6- + unsup learning 0.60 LeCun 2006 Unpublished Training set augmented with Affine Distortions 2-layer NN, 800 HU, CE Affine 1.10 Simard et al., ICDAR 2003 Virtual SVM deg-9 poly Affine 0.80 Scholkopf Convolutional net, CE Affine 0.60 Simard et al., ICDAR 2003 Training et augmented with Elastic Distortions 2-layer NN, 800 HU, CE Elastic 0.70 Simard et al., ICDAR 2003 Convolutional net, CE Elastic 0.40 Simard et al., ICDAR 2003 Conv. net LeNet-6- + unsup learning Elastic 0.38 LeCun 2006 Unpublished Yann LeCun
  • 52. Conclusion Deep architectures are better than shallow ones We haven't solved the deep learning problem yet Larger networks are better Initializing the first layer(s) with unsupervised learning helps WANTED: a learning algorithm for deep architectures that  seamlessly blends supervised and unsupervised learning Yann LeCun