SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
            5 years from now, 
                           5 years from now, 
                         everyone will learn 
                          everyone will learn 
                             their features
                              their features
                        (you might as well start now)
                         (you might as well start now)

                                     Yann LeCun
                                      Yann LeCun
                        Courant Institute of Mathematical Sciences 
                         Courant Institute of Mathematical Sciences 
                                        and 
                                         and 
                              Center for Neural Science, 
                               Center for Neural Science, 
                                  New York University
                                   New York University


Yann LeCun
IIHave aaTerrible Confession to Make
        Have Terrible Confession to Make

        I'm interested in vision, but no more in vision than in audition or in
        other perceptual modalities.
        I'm interested in perception (and in control).
        I'd like to find a learning algorithm and architecture that could work
        (with minor changes) for many modalities
          Nature seems to have found one.
        Almost all natural perceptual signals have a local structure (in space
        and time) similar to images and videos
          Heavy correlation between neighboring variables
          Local patches of variables have structure, and are representable
          by feature vectors.
        I like vision because it's challenging, it's useful, it's fun, we have data
          the image recognition community is not yet stuck in a deep
          local minimum like the speech recognition community.


Yann LeCun
The Unity of
             The Unity of
              Recognition
              Recognition
             Architectures
             Architectures


Yann LeCun
Most Recognition Systems Are Built on the Same Architecture
     Most Recognition Systems Are Built on the Same Architecture



                   Filter     Non­          feature    Norma­
                                                                        Classifier
                   Bank     Linearity       Pooling    lization




                  Filter    Non­                        Filter    Non­
                                     Pool     Norm                          Pool     Norm   Classifier
                  Bank      Lin                         Bank      Lin



      First stage: dense SIFT, HOG, GIST, sparse coding, RBM, auto-encoders.....
      Second stage: K-means, sparse coding, LCC....
      Pooling: average, L2, max, max with bias (elastic templates).....
       Convolutional Nets: same architecture, but everything is trained.
Yann LeCun
Filter Bank + Non-Linearity + Pooling + Normalization
      Filter Bank + Non-Linearity + Pooling + Normalization




               Filter                 Non­        Spatial
               Bank                 Linearity    Pooling 




        This model of a feature extraction stage is biologically-inspired
         ...whether you like it or not (just ask David Lowe)
         Inspired by [Hubel and Wiesel 1962]
         The use of this module goes back to Fukushima's Neocognitron
         (and even earlier models in the 60's).

Yann LeCun
How well does this work?
      How well does this work?


                 Filter     Non­       feature    Filter    Non­          feature
                                                                                    Classifier
                 Bank      Linearity   Pooling    Bank     Linearity     Pooling 



             Oriented      Winner Histogram                            Pyramid      SVM or
                                            K­means
              Edges        Takes (sum)                                 Histogram. Another 
                                            Or 
                           All                                         Elastic parts Simple
                                            Sparse Coding
                          SIFT                                         Models,...   classifier
    Some results on C101 (I know, I know....)
     SIFT->K-means->Pyramid pooling->SVM intersection kernel: >65%
                [Lazebnik et al. CVPR 2006]
      SIFT->Sparse coding on Blocks->Pyramid pooling->SVM: >75%
                [Boureau et al. CVPR 2010] [Yang et al. 2008]
      SIFT->Local Sparse coding on Block->Pyramid pooling->SVM: >77%
                [Boureau et al. ICCV 2011]
      (Small) supervised ConvNet with sparsity penalty: >71%
                [rejected from CVPR,ICCV,etc] REAL TIME
Yann LeCun
Convolutional Networks (ConvNets) fits that model
    Convolutional Networks (ConvNets) fits that model




Yann LeCun
Why do two stages work better than one stage?
     Why do two stages work better than one stage?


                  Filter   Non­                 Filter   Non­
                                  Pool   Norm                   Pool   Norm   Classifier
                  Bank     Lin                  Bank     Lin




      The second stage extracts mid-level features
      Having multiple stages helps the selectivity-invariance dilemma




Yann LeCun
Learning Hierarchical Representations
      Learning Hierarchical Representations


                        Trainable                Trainable
                                                                Trainable
                         Feature                  Feature
                                                                Classifier
                        Transform                Transform


                                         Learned Internal Representation
             I agree with David Lowe: we should learn the features
             It worked for speech, handwriting, NLP.....
             In a way, the vision community has been running a ridiculously
             inefficient evolutionary learning algorithm to learn features:
               Mutation: tweak existing features in many different ways
               Selection: Publish the best ones at CVPR
               Reproduction: combine several features from the last CVPR
               Iterate. Problem: Moore's law works against you
Yann LeCun
Sometimes,
                   Sometimes,
                Biology gives you
                Biology gives you
                  good hints
                   good hints
                   example:
                    example:
             contrast normalization
             contrast normalization



Yann LeCun
Harsh Non-Linearity + Contrast Normalization + Sparsity
       Harsh Non-Linearity + Contrast Normalization + Sparsity
    C     Convolutions (filter bank)
    Soft Thresholding + Abs  
    N     Subtractive and Divisive Local Normalization
    P     Pooling down­sampling layer: average or max? 




                                                                   Pooling, sub­sampling


                                                                                           contrast normalization
                                                                                            subtractive+divisive 
                                    Thresholding
                    Convolutions




                                                   Rectification


                 THIS IS ONE STAGE OF THE CONVNET
Yann LeCun
Soft Thresholding Non-Linearity
     Soft Thresholding Non-Linearity




Yann LeCun
Local Contrast Normalization
      Local Contrast Normalization
      Performed on the state of every layer, including
      the input
      Subtractive Local Contrast Normalization
       Subtracts from every value in a feature a
       Gaussian-weighted average of its
       neighbors (high-pass filter)
      Divisive Local Contrast Normalization
       Divides every value in a layer by the
       standard deviation of its neighbors over
       space and over all feature maps
      Subtractive + Divisive LCN performs a kind of
      approximate whitening.




Yann LeCun
C101 Performance (I know, IIknow)
      C101 Performance (I know, know)



       Small network: 64 features at stage-1, 256 features at stage-2:


       Tanh non-linearity, No Rectification, No normalization:   29%
       Tanh non-linearity, Rectification, normalization:         65%
       Shrink non-linearity, Rectification, norm, sparsity penalty 71%




Yann LeCun
Results on Caltech101 with sigmoid non-linearity
     Results on Caltech101 with sigmoid non-linearity




                       ← like HMAX model




Yann LeCun
Feature Learning
              Feature Learning
             Works Really Well
             Works Really Well
             on everything but C101
             on everything but C101




Yann LeCun
C101 is very unfavorable to learning-based systems
      C101 is very unfavorable to learning-based systems
      Because it's so small. We are switching to ImageNet
      Some results on NORB
                        No normalization

                                 Random filters
                                           Unsup filters
                                                           Sup filters
                                                                     Unsup+Sup filters




Yann LeCun
Sparse Auto-Encoders
      Sparse Auto-Encoders
       Inference by gradient descent starting from the encoder output

             i                  i               2                    i     2
  E Y , Z =∥Y −W d Z∥ ∥Z −g e W e ,Y ∥  ∑ j ∣z j∣
                 i                          i
         Z =argmin z E Y , z ; W 


                            i
                         ∥Y −Y∥
                                   2
                                                WdZ                             ∑j .

   INPUT             Y                                         Z   ∣z j∣
                                                                               FEATURES 


                                        i
                         ge W e ,Y                      2
                                                    ∥Z − Z∥

Yann LeCun
Using PSD to Train aaHierarchy of Features
      Using PSD to Train Hierarchy of Features
       Phase 1: train first layer using PSD




          ∥Y i −Y∥2
                         WdZ                     ∑j .

     Y                                Z   ∣z j∣

         ge W e ,Y i           2
                           ∥Z − Z∥


                                                          FEATURES 


Yann LeCun
Using PSD to Train aaHierarchy of Features
      Using PSD to Train Hierarchy of Features
       Phase 1: train first layer using PSD
       Phase 2: use encoder + absolute value as feature extractor




     Y                         ∣z j∣

         ge W e ,Y i 

                                                                    FEATURES 


Yann LeCun
Using PSD to Train aaHierarchy of Features
      Using PSD to Train Hierarchy of Features
       Phase 1: train first layer using PSD
       Phase 2: use encoder + absolute value as feature extractor
       Phase 3: train the second layer using PSD




                                           ∥Y i −Y∥2
                                                           WdZ                     ∑j .

     Y                         ∣z j∣   Y                                Z   ∣z j∣

         ge W e ,Y i                     ge W e ,Y i           2
                                                             ∥Z − Z∥


                                                                                            FEATURES 


Yann LeCun
Using PSD to Train aaHierarchy of Features
      Using PSD to Train Hierarchy of Features
       Phase 1: train first layer using PSD
       Phase 2: use encoder + absolute value as feature extractor
       Phase 3: train the second layer using PSD
       Phase 4: use encoder + absolute value as 2 nd feature extractor




     Y                         ∣z j∣                            ∣z j∣

         ge W e ,Y i                    ge W e ,Y i 

                                                                         FEATURES 


Yann LeCun
Using PSD to Train aaHierarchy of Features
      Using PSD to Train Hierarchy of Features
      Phase 1: train first layer using PSD
      Phase 2: use encoder + absolute value as feature extractor
      Phase 3: train the second layer using PSD
      Phase 4: use encoder + absolute value as 2 nd feature extractor
      Phase 5: train a supervised classifier on top
      Phase 6 (optional): train the entire system with supervised back-propagation




     Y                         ∣z j∣                            ∣z j∣   classifier

         ge W e ,Y i                       ge W e ,Y i 

                                                                          FEATURES 


Yann LeCun
Learned Features on natural patches: V1-like receptive fields
      Learned Features on natural patches: V1-like receptive fields




Yann LeCun
Using PSD Features for Object Recognition
      Using PSD Features for Object Recognition
      64 filters on 9x9 patches trained with PSD
       with Linear-Sigmoid-Diagonal Encoder




Yann LeCun
Convolutional Sparse Coding
         Convolutional Sparse Coding

    [Kavukcuoglu et al. NIPS 2010]: convolutional PSD

    [Zeiler, Krishnan, Taylor, Fergus, CVPR 2010]: Deconvolutional Network
    [Lee, Gross, Ranganath, Ng,  ICML 2009]: Convolutional Boltzmann Machine
    [Norouzi, Ranjbar, Mori, CVPR 2009]:  Convolutional Boltzmann Machine
    [Chen, Sapiro, Dunson, Carin, Preprint 2010]: Deconvolutional Network with 
    automatic adjustment of code dimension.


Yann LeCun
Convolutional Training
     Convolutional Training

      Problem:
       With patch-level training, the learning algorithm must reconstruct
       the entire patch with a single feature vector
       But when the filters are used convolutionally, neighboring feature
       vectors will be highly redundant




                                         Patch­level training produces
                                         lots of filters that are shifted
                                         versions of each other.




Yann LeCun
Convolutional Sparse Coding
     Convolutional Sparse Coding
      Replace the dot products with dictionary element by convolutions.
       Input Y is a full image
       Each code component Zk is a feature map (an image)
       Each dictionary element is a convolution kernel

      Regular sparse coding


      Convolutional S.C.




                     Y        =   ∑.            *       Zk
                                  k
                                        Wk
       “deconvolutional networks” [Zeiler, Taylor, Fergus CVPR 2010]
Yann LeCun
Convolutional PSD: Encoder with aasoft sh() Function
     Convolutional PSD: Encoder with soft sh() Function

      Convolutional Formulation
       Extend sparse coding from PATCH to IMAGE




       PATCH based learning             CONVOLUTIONAL learning


Yann LeCun
Cifar-10 Dataset
       Cifar-10 Dataset
      Dataset of tiny images
       Images are 32x32 color images
       10 object categories with 50000 training and 10000 testing
      Example Images




Yann LeCun
Comparative Results on Cifar-10 Dataset
        Comparative Results on Cifar-10 Dataset




    * Krizhevsky. Learning multiple layers of features from tiny images. Masters thesis, Dept of CS U of Toronto

   **Ranzato and Hinton. Modeling pixel means and covariances using a factorized third order boltzmann machine.
   CVPR 2010
Yann LeCun
Road Sign Recognition Competition
       Road Sign Recognition Competition
   GTSRB Road Sign Recognition Competition (phase 1)
    32x32 images
    The 13 of the top 14 entries are ConvNets, 6 from NYU, 7 from IDSIA
    No 6 is humans!




Yann LeCun
Pedestrian Detection (INRIA Dataset)
       Pedestrian Detection (INRIA Dataset)




                    [Sermanet et al., Rejected from ICCV 2011]]
Yann LeCun
Pedestrian Detection: Examples
       Pedestrian Detection: Examples




Yann LeCun                              [Kavukcuoglu et al. NIPS 2010]
       Learning 
                             Learning 
                     Invariant Features
                      Invariant Features




Yann LeCun
Why just pool over space? Why not over orientation?
    Why just pool over space? Why not over orientation?
     Using an idea from Hyvarinen: topographic square pooling (subspace ICA)
      1. Apply filters on a patch (with suitable non-linearity)
      2. Arrange filter outputs on a 2D plane
      3. square filter outputs
      4. minimize sqrt of sum of blocks of squared filter outputs




Yann LeCun
Why just pool over space? Why not over orientation?
    Why just pool over space? Why not over orientation?
      The filters arrange
      themselves spontaneously so
      that similar filters enter the
      same pool.
      The pooling units can be seen
      as complex cells
      They are invariant to local
      transformations of the input
       For some it's translations,
       for others rotations, or
       other transformations.




Yann LeCun
Pinwheels?
     Pinwheels?
      Does that look
      pinwheely to
      you?




Yann LeCun
Sparsity through
              Sparsity through
             Lateral Inhibition
             Lateral Inhibition


Yann LeCun
Invariant Features Lateral Inhibition
     Invariant Features Lateral Inhibition
      Replace the L1 sparsity term by a lateral inhibition matrix




Yann LeCun
Invariant Features Lateral Inhibition
     Invariant Features Lateral Inhibition
      Zeros I S matrix have tree structure




Yann LeCun
Invariant Features Lateral Inhibition
     Invariant Features Lateral Inhibition
      Non-zero values in S form a ring in a 2D topology
       Input patches are high-pass filtered




Yann LeCun
Invariant Features Lateral Inhibition
     Invariant Features Lateral Inhibition
      Non-zero values in S form a ring in a 2D topology
       Left: non high-pass filtering of input
       Right: patch-level mean removal




Yann LeCun
Invariant Features Short-Range Lateral Excitation + L1
     Invariant Features Short-Range Lateral Excitation + L1
      l




Yann LeCun
Disentangling the
              Disentangling the
             Explanatory Factors
             Explanatory Factors
                 of Images
                  of Images


Yann LeCun
Separating
     Separating

      I used to think that recognition was all about eliminating irrelevant
      information while keeping the useful one
        Building invariant representations
        Eliminating irrelevant variabilities
      I now think that recognition is all about disentangling independent factors
      of variations:
        Separating “what” and “where”
        Separating content from instantiation parameters
        Hinton's “capsules”; Karol Gregor's what-where auto-encoders




Yann LeCun
Invariant Features through Temporal Constancy
     Invariant Features through Temporal Constancy
      Object is cross-product of object type and instantiation parameters
       [Hinton 1981]




                                                      small   medium    large


             Object type                                  Object size
                              [Karol Gregor et al.]
Yann LeCun
Invariant Features through Temporal Constancy
     Invariant Features through Temporal Constancy

             Decoder                                                                   Predicted
                        St                 St­1                 St­2
                                                                                         input


                   W1                W1                   W1                   W2
                   t                 t­1                  t­2                  t       Inferred 
              C   1
                                C   1
                                                     C   1
                                                                          C   2
                                                                                         code

              C    t
                                C    t­1
                                                     C    t­2
                                                                          C    t       Predicted
                  1                 1                    1                    2
                                                                                         code
                                                                                   f
                  1                1           f °W1            2
              f °W              f °W                             W
                                                                  2 
                                                                 W W2
                            t                  t­1                  t­2
Yann LeCun
             Encoder    S                  S                    S                       Input
Invariant Features through Temporal Constancy
     Invariant Features through Temporal Constancy



             C1
             (where)




              C2
              (what)



Yann LeCun
Generating from the Network
     Generating from the Network


             Input




Yann LeCun
What is the right
              What is the right
              criterion to train
               criterion to train
             hierarchical feature
             hierarchical feature
                  extraction
                   extraction
               architectures?
                architectures?


Yann LeCun
Flattening the Data Manifold?
     Flattening the Data Manifold?

         The manifold of all images of <Category-X> is low-dimensional
         and highly curvy
         Feature extractors should “flatten” the manifold




Yann LeCun
Flattening the
             Flattening the
             Data Manifold?
             Data Manifold?




Yann LeCun
The Ultimate Recognition System
      The Ultimate Recognition System


                       Trainable             Trainable
                                                              Trainable
                        Feature               Feature
                                                              Classifier
                       Transform             Transform


                                      Learned Internal Representation
             Bottom-up and top-down information
              Top-down: complex inference and disambiguation
              Bottom-up: learns to quickly predict the result of the top-down
              inference
             Integrated supervised and unsupervised learning
               Capture the dependencies between all observed variables
             Compositionality
              Each stage has latent instantiation variables
Yann LeCun

Weitere ähnliche Inhalte

Ähnlich wie Fcv learn le_cun

What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?Philip Zheng
 
Lecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionLecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionzukun
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013Philip Zheng
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 
Taxonomy-Based Glyph Design
Taxonomy-Based Glyph DesignTaxonomy-Based Glyph Design
Taxonomy-Based Glyph DesignEamonn Maguire
 
Elettronica: Multimedia Information Processing in Smart Environments by Aless...
Elettronica: Multimedia Information Processing in Smart Environments by Aless...Elettronica: Multimedia Information Processing in Smart Environments by Aless...
Elettronica: Multimedia Information Processing in Smart Environments by Aless...Codemotion
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Visionbutest
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearningEyad Alshami
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep LearningYaminiAlapati1
 
Fcv bio cv_simoncelli
Fcv bio cv_simoncelliFcv bio cv_simoncelli
Fcv bio cv_simoncellizukun
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2Jeong Choi
 

Ähnlich wie Fcv learn le_cun (19)

What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?
 
Lecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognitionLecun 20060816-ciar-02-deep learning for generic object recognition
Lecun 20060816-ciar-02-deep learning for generic object recognition
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Taxonomy-Based Glyph Design
Taxonomy-Based Glyph DesignTaxonomy-Based Glyph Design
Taxonomy-Based Glyph Design
 
Elettronica: Multimedia Information Processing in Smart Environments by Aless...
Elettronica: Multimedia Information Processing in Smart Environments by Aless...Elettronica: Multimedia Information Processing in Smart Environments by Aless...
Elettronica: Multimedia Information Processing in Smart Environments by Aless...
 
lecun-01.ppt
lecun-01.pptlecun-01.ppt
lecun-01.ppt
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
Machine Learning in Computer Vision
Machine Learning in Computer VisionMachine Learning in Computer Vision
Machine Learning in Computer Vision
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 
Fcv bio cv_simoncelli
Fcv bio cv_simoncelliFcv bio cv_simoncelli
Fcv bio cv_simoncelli
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Kürzlich hochgeladen

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Kürzlich hochgeladen (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Fcv learn le_cun

  • 1.             5 years from now,              5 years from now,              everyone will learn              everyone will learn          their features         their features            (you might as well start now)            (you might as well start now)  Yann LeCun  Yann LeCun         Courant Institute of Mathematical Sciences          Courant Institute of Mathematical Sciences  and  and      Center for Neural Science,      Center for Neural Science,        New York University       New York University Yann LeCun
  • 2. IIHave aaTerrible Confession to Make Have Terrible Confession to Make I'm interested in vision, but no more in vision than in audition or in other perceptual modalities. I'm interested in perception (and in control). I'd like to find a learning algorithm and architecture that could work (with minor changes) for many modalities Nature seems to have found one. Almost all natural perceptual signals have a local structure (in space and time) similar to images and videos Heavy correlation between neighboring variables Local patches of variables have structure, and are representable by feature vectors. I like vision because it's challenging, it's useful, it's fun, we have data the image recognition community is not yet stuck in a deep local minimum like the speech recognition community. Yann LeCun
  • 3. The Unity of The Unity of Recognition Recognition Architectures Architectures Yann LeCun
  • 4. Most Recognition Systems Are Built on the Same Architecture Most Recognition Systems Are Built on the Same Architecture Filter Non­ feature Norma­ Classifier Bank  Linearity Pooling  lization Filter Non­ Filter Non­ Pool Norm Pool Norm Classifier Bank  Lin Bank  Lin First stage: dense SIFT, HOG, GIST, sparse coding, RBM, auto-encoders..... Second stage: K-means, sparse coding, LCC.... Pooling: average, L2, max, max with bias (elastic templates)..... Convolutional Nets: same architecture, but everything is trained. Yann LeCun
  • 5. Filter Bank + Non-Linearity + Pooling + Normalization Filter Bank + Non-Linearity + Pooling + Normalization Filter Non­ Spatial Bank  Linearity Pooling  This model of a feature extraction stage is biologically-inspired ...whether you like it or not (just ask David Lowe) Inspired by [Hubel and Wiesel 1962] The use of this module goes back to Fukushima's Neocognitron (and even earlier models in the 60's). Yann LeCun
  • 6. How well does this work? How well does this work? Filter Non­ feature Filter Non­ feature Classifier Bank  Linearity Pooling  Bank  Linearity Pooling  Oriented Winner Histogram Pyramid SVM or K­means  Edges Takes (sum) Histogram. Another  Or  All Elastic parts Simple Sparse Coding SIFT Models,... classifier Some results on C101 (I know, I know....) SIFT->K-means->Pyramid pooling->SVM intersection kernel: >65% [Lazebnik et al. CVPR 2006] SIFT->Sparse coding on Blocks->Pyramid pooling->SVM: >75% [Boureau et al. CVPR 2010] [Yang et al. 2008] SIFT->Local Sparse coding on Block->Pyramid pooling->SVM: >77% [Boureau et al. ICCV 2011] (Small) supervised ConvNet with sparsity penalty: >71% [rejected from CVPR,ICCV,etc] REAL TIME Yann LeCun
  • 7. Convolutional Networks (ConvNets) fits that model Convolutional Networks (ConvNets) fits that model Yann LeCun
  • 8. Why do two stages work better than one stage? Why do two stages work better than one stage? Filter Non­ Filter Non­ Pool Norm Pool Norm Classifier Bank  Lin Bank  Lin The second stage extracts mid-level features Having multiple stages helps the selectivity-invariance dilemma Yann LeCun
  • 9. Learning Hierarchical Representations Learning Hierarchical Representations Trainable Trainable Trainable Feature Feature Classifier Transform Transform Learned Internal Representation I agree with David Lowe: we should learn the features It worked for speech, handwriting, NLP..... In a way, the vision community has been running a ridiculously inefficient evolutionary learning algorithm to learn features: Mutation: tweak existing features in many different ways Selection: Publish the best ones at CVPR Reproduction: combine several features from the last CVPR Iterate. Problem: Moore's law works against you Yann LeCun
  • 10. Sometimes, Sometimes, Biology gives you Biology gives you good hints good hints example: example: contrast normalization contrast normalization Yann LeCun
  • 11. Harsh Non-Linearity + Contrast Normalization + Sparsity Harsh Non-Linearity + Contrast Normalization + Sparsity  C     Convolutions (filter bank)  Soft Thresholding + Abs    N     Subtractive and Divisive Local Normalization  P     Pooling down­sampling layer: average or max?  Pooling, sub­sampling contrast normalization subtractive+divisive   Thresholding Convolutions Rectification THIS IS ONE STAGE OF THE CONVNET Yann LeCun
  • 12. Soft Thresholding Non-Linearity Soft Thresholding Non-Linearity Yann LeCun
  • 13. Local Contrast Normalization Local Contrast Normalization Performed on the state of every layer, including the input Subtractive Local Contrast Normalization Subtracts from every value in a feature a Gaussian-weighted average of its neighbors (high-pass filter) Divisive Local Contrast Normalization Divides every value in a layer by the standard deviation of its neighbors over space and over all feature maps Subtractive + Divisive LCN performs a kind of approximate whitening. Yann LeCun
  • 14. C101 Performance (I know, IIknow) C101 Performance (I know, know) Small network: 64 features at stage-1, 256 features at stage-2: Tanh non-linearity, No Rectification, No normalization: 29% Tanh non-linearity, Rectification, normalization: 65% Shrink non-linearity, Rectification, norm, sparsity penalty 71% Yann LeCun
  • 15. Results on Caltech101 with sigmoid non-linearity Results on Caltech101 with sigmoid non-linearity ← like HMAX model Yann LeCun
  • 16. Feature Learning Feature Learning Works Really Well Works Really Well on everything but C101 on everything but C101 Yann LeCun
  • 17. C101 is very unfavorable to learning-based systems C101 is very unfavorable to learning-based systems Because it's so small. We are switching to ImageNet Some results on NORB No normalization Random filters Unsup filters Sup filters Unsup+Sup filters Yann LeCun
  • 18. Sparse Auto-Encoders Sparse Auto-Encoders Inference by gradient descent starting from the encoder output i i 2 i 2 E Y , Z =∥Y −W d Z∥ ∥Z −g e W e ,Y ∥  ∑ j ∣z j∣ i i Z =argmin z E Y , z ; W  i ∥Y −Y∥  2 WdZ ∑j . INPUT Y Z ∣z j∣ FEATURES  i ge W e ,Y   2 ∥Z − Z∥ Yann LeCun
  • 19. Using PSD to Train aaHierarchy of Features Using PSD to Train Hierarchy of Features Phase 1: train first layer using PSD ∥Y i −Y∥2  WdZ ∑j . Y Z ∣z j∣ ge W e ,Y i   2 ∥Z − Z∥ FEATURES  Yann LeCun
  • 20. Using PSD to Train aaHierarchy of Features Using PSD to Train Hierarchy of Features Phase 1: train first layer using PSD Phase 2: use encoder + absolute value as feature extractor Y ∣z j∣ ge W e ,Y i  FEATURES  Yann LeCun
  • 21. Using PSD to Train aaHierarchy of Features Using PSD to Train Hierarchy of Features Phase 1: train first layer using PSD Phase 2: use encoder + absolute value as feature extractor Phase 3: train the second layer using PSD ∥Y i −Y∥2  WdZ ∑j . Y ∣z j∣ Y Z ∣z j∣ ge W e ,Y i  ge W e ,Y i   2 ∥Z − Z∥ FEATURES  Yann LeCun
  • 22. Using PSD to Train aaHierarchy of Features Using PSD to Train Hierarchy of Features Phase 1: train first layer using PSD Phase 2: use encoder + absolute value as feature extractor Phase 3: train the second layer using PSD Phase 4: use encoder + absolute value as 2 nd feature extractor Y ∣z j∣ ∣z j∣ ge W e ,Y i  ge W e ,Y i  FEATURES  Yann LeCun
  • 23. Using PSD to Train aaHierarchy of Features Using PSD to Train Hierarchy of Features Phase 1: train first layer using PSD Phase 2: use encoder + absolute value as feature extractor Phase 3: train the second layer using PSD Phase 4: use encoder + absolute value as 2 nd feature extractor Phase 5: train a supervised classifier on top Phase 6 (optional): train the entire system with supervised back-propagation Y ∣z j∣ ∣z j∣ classifier ge W e ,Y i  ge W e ,Y i  FEATURES  Yann LeCun
  • 24. Learned Features on natural patches: V1-like receptive fields Learned Features on natural patches: V1-like receptive fields Yann LeCun
  • 25. Using PSD Features for Object Recognition Using PSD Features for Object Recognition 64 filters on 9x9 patches trained with PSD with Linear-Sigmoid-Diagonal Encoder Yann LeCun
  • 26. Convolutional Sparse Coding Convolutional Sparse Coding [Kavukcuoglu et al. NIPS 2010]: convolutional PSD [Zeiler, Krishnan, Taylor, Fergus, CVPR 2010]: Deconvolutional Network [Lee, Gross, Ranganath, Ng,  ICML 2009]: Convolutional Boltzmann Machine [Norouzi, Ranjbar, Mori, CVPR 2009]:  Convolutional Boltzmann Machine [Chen, Sapiro, Dunson, Carin, Preprint 2010]: Deconvolutional Network with  automatic adjustment of code dimension. Yann LeCun
  • 27. Convolutional Training Convolutional Training Problem: With patch-level training, the learning algorithm must reconstruct the entire patch with a single feature vector But when the filters are used convolutionally, neighboring feature vectors will be highly redundant Patch­level training produces lots of filters that are shifted versions of each other. Yann LeCun
  • 28. Convolutional Sparse Coding Convolutional Sparse Coding Replace the dot products with dictionary element by convolutions. Input Y is a full image Each code component Zk is a feature map (an image) Each dictionary element is a convolution kernel Regular sparse coding Convolutional S.C. Y = ∑. * Zk k Wk “deconvolutional networks” [Zeiler, Taylor, Fergus CVPR 2010] Yann LeCun
  • 29. Convolutional PSD: Encoder with aasoft sh() Function Convolutional PSD: Encoder with soft sh() Function Convolutional Formulation Extend sparse coding from PATCH to IMAGE PATCH based learning CONVOLUTIONAL learning Yann LeCun
  • 30. Cifar-10 Dataset Cifar-10 Dataset Dataset of tiny images Images are 32x32 color images 10 object categories with 50000 training and 10000 testing Example Images Yann LeCun
  • 31. Comparative Results on Cifar-10 Dataset Comparative Results on Cifar-10 Dataset * Krizhevsky. Learning multiple layers of features from tiny images. Masters thesis, Dept of CS U of Toronto **Ranzato and Hinton. Modeling pixel means and covariances using a factorized third order boltzmann machine. CVPR 2010 Yann LeCun
  • 32. Road Sign Recognition Competition Road Sign Recognition Competition GTSRB Road Sign Recognition Competition (phase 1) 32x32 images The 13 of the top 14 entries are ConvNets, 6 from NYU, 7 from IDSIA No 6 is humans! Yann LeCun
  • 33. Pedestrian Detection (INRIA Dataset) Pedestrian Detection (INRIA Dataset) [Sermanet et al., Rejected from ICCV 2011]] Yann LeCun
  • 34. Pedestrian Detection: Examples Pedestrian Detection: Examples Yann LeCun [Kavukcuoglu et al. NIPS 2010]
  • 35.        Learning         Learning          Invariant Features         Invariant Features Yann LeCun
  • 36. Why just pool over space? Why not over orientation? Why just pool over space? Why not over orientation? Using an idea from Hyvarinen: topographic square pooling (subspace ICA) 1. Apply filters on a patch (with suitable non-linearity) 2. Arrange filter outputs on a 2D plane 3. square filter outputs 4. minimize sqrt of sum of blocks of squared filter outputs Yann LeCun
  • 37. Why just pool over space? Why not over orientation? Why just pool over space? Why not over orientation? The filters arrange themselves spontaneously so that similar filters enter the same pool. The pooling units can be seen as complex cells They are invariant to local transformations of the input For some it's translations, for others rotations, or other transformations. Yann LeCun
  • 38. Pinwheels? Pinwheels? Does that look pinwheely to you? Yann LeCun
  • 39. Sparsity through Sparsity through Lateral Inhibition Lateral Inhibition Yann LeCun
  • 40. Invariant Features Lateral Inhibition Invariant Features Lateral Inhibition Replace the L1 sparsity term by a lateral inhibition matrix Yann LeCun
  • 41. Invariant Features Lateral Inhibition Invariant Features Lateral Inhibition Zeros I S matrix have tree structure Yann LeCun
  • 42. Invariant Features Lateral Inhibition Invariant Features Lateral Inhibition Non-zero values in S form a ring in a 2D topology Input patches are high-pass filtered Yann LeCun
  • 43. Invariant Features Lateral Inhibition Invariant Features Lateral Inhibition Non-zero values in S form a ring in a 2D topology Left: non high-pass filtering of input Right: patch-level mean removal Yann LeCun
  • 44. Invariant Features Short-Range Lateral Excitation + L1 Invariant Features Short-Range Lateral Excitation + L1 l Yann LeCun
  • 45. Disentangling the Disentangling the Explanatory Factors Explanatory Factors of Images of Images Yann LeCun
  • 46. Separating Separating I used to think that recognition was all about eliminating irrelevant information while keeping the useful one Building invariant representations Eliminating irrelevant variabilities I now think that recognition is all about disentangling independent factors of variations: Separating “what” and “where” Separating content from instantiation parameters Hinton's “capsules”; Karol Gregor's what-where auto-encoders Yann LeCun
  • 47. Invariant Features through Temporal Constancy Invariant Features through Temporal Constancy Object is cross-product of object type and instantiation parameters [Hinton 1981] small medium large Object type Object size [Karol Gregor et al.] Yann LeCun
  • 48. Invariant Features through Temporal Constancy Invariant Features through Temporal Constancy Decoder Predicted St St­1 St­2 input W1 W1 W1 W2 t t­1 t­2 t Inferred  C 1 C 1 C 1 C 2 code C t C t­1 C t­2 C t Predicted 1 1 1 2 code f 1 1 f °W1 2 f °W f °W W 2  W W2 t t­1 t­2 Yann LeCun Encoder S S S Input
  • 49. Invariant Features through Temporal Constancy Invariant Features through Temporal Constancy C1 (where) C2 (what) Yann LeCun
  • 50. Generating from the Network Generating from the Network Input Yann LeCun
  • 51. What is the right What is the right criterion to train criterion to train hierarchical feature hierarchical feature extraction extraction architectures? architectures? Yann LeCun
  • 52. Flattening the Data Manifold? Flattening the Data Manifold? The manifold of all images of <Category-X> is low-dimensional and highly curvy Feature extractors should “flatten” the manifold Yann LeCun
  • 53. Flattening the Flattening the Data Manifold? Data Manifold? Yann LeCun
  • 54. The Ultimate Recognition System The Ultimate Recognition System Trainable Trainable Trainable Feature Feature Classifier Transform Transform Learned Internal Representation Bottom-up and top-down information Top-down: complex inference and disambiguation Bottom-up: learns to quickly predict the result of the top-down inference Integrated supervised and unsupervised learning Capture the dependencies between all observed variables Compositionality Each stage has latent instantiation variables Yann LeCun