SlideShare ist ein Scribd-Unternehmen logo
1 von 69
CVPR12 Tutorial on Deep
      Learning

       Sparse Coding
             Kai Yu

       yukai@baidu.com
 Department of Multimedia, Baidu
Relentless research on visual recognition

       Caltech 101     80 Million Tiny Images




                              ImageNet
    PASCAL VOC




08/22/12                                        2
The pipeline of machine visual perception

                                                        Most Efforts in
                                                       Machine Learning
                                                             Inference:
            Low-level      Pre-      Feature     Feature
                                                            prediction,
             sensing    processing   extract.   selection
                                                            recognition




              • Most critical for accuracy
              • Account for most of the computation for testing
              • Most time-consuming in development cycle
              • Often hand-craft in practice



 08/22/12                                                                 3
Computer vision features




          SIFT                   Spin image




           HoG                      RIFT


                                GLOH

                           Slide Credit: Andrew Ng
Learning features from data



                                                        Machine Learning
                                                             Inference:
            Low-level      Pre-      Feature     Feature
                                                            prediction,
             sensing    processing   extract.   selection
                                                            recognition



                              Feature Learning:
                         instead of design features,
                        let’s design feature learners



 08/22/12                                                                  5
Learning features from data via sparse coding




                                                             Inference:
            Low-level      Pre-      Feature     Feature
                                                            prediction,
             sensing    processing   extract.   selection
                                                            recognition



               Sparse coding offers an effective building
                    block to learn useful features




 08/22/12                                                                 6
Outline

  1.       Sparse coding for image classification
  2.       Understanding sparse coding
  3.       Hierarchical sparse coding
  4.       Other topics: e.g. structured model, scale-up,
           discriminative training
  5.       Summary




08/22/12                                                    7
“BoW representation + SPM” Paradigm - I




                      Bag-of-visual-words representation
                         (BoW) based on VQ coding


 08/22/12   Figure credit: Fei-Fei Li                      8
“BoW representation + SPM” Paradigm - II




                        Spatial pyramid matching:
                  pooling in different scales and locations

 08/22/12   Figure credit: Svetlana Lazebnik                  9
Image Classification using “BoW + SPM”

                 Image Classification
                     Spatial Pooling




                                                   Dense SIFT
                                       VQ Coding
    Classifier




 08/22/12                                                       10
The Architecture of “Coding + Pooling”




             Coding   Pooling   Coding   Pooling




  • •e.g.,
    e.g.,convolutional neural net, HMAX, BoW, …
          convolutional neural net, HMAX, BoW, …


                       11
“BoW+SPM” has two coding+pooling layers




      Local Gradients     Pooling   VQ Coding    Average Pooling     SVM
                                                (obtain histogram)

                e.g, SIFT, HOG




    SIFT feature itself follows a coding+pooling operation
    SIFT feature itself follows a coding+pooling operation
                                                                           12
Develop better coding methods




       Better Coding   Better Pooling   Better Coding Better Pooling    Better
                                                                       Classifier



 - Coding: nonlinear mapping data into another feature space
 - Better coding methods: sparse coding, RBMs, auto-encoders


                               13
What is sparse coding

    Sparse coding (Olshausen & Field,1996). Originally
    developed to explain early visual processing in the brain
    (edge detection).

    Training: given a set of random patches x, learning a
    dictionary of bases [Φ1, Φ2, …]

    Coding: for data vector x, solve LASSO to find the
    sparse coefficient vector a




08/22/12                                                        14
Sparse coding: training time

   Input: Images x1, x2, …, xm (eachinRd)
   Learn: Dictionary of bases φ1, φ2, …, φk (alsoRd).




 Alternating optimization:
 1.Fix dictionary φ1, φ2, …, φk , optimize a (a standard LASSO
 problem )

 2.Fix activations a, optimize dictionary φ1, φ2, …, φk (a
 convex QP problem)
Sparse coding: testing time

Input: Novel image patch x (inRd) and previously learned
  φi’s
Output: Representation [ai,1, ai,2, …, ai,κ] of image patch xi.




             ≈ 0.8 *              + 0.3 *               + 0.5 *

  Represent xi as: ai = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
Sparse coding illustration
     50
                Natural Images                                                                                                                              Learned bases (φ1 , …, φ64): “Edges”
    1 00


    1 50


    2 00                            50

    2 50
                                1 00

    3 00
                                1 50

    3 50
                                2 00
    4 00
                                2 50                                        50
    4 50

                                3 00                                       100
    5 00
           50   100   1 50   2 00        250        300     3 50    4 00   450
                                                                           150    5 00
                                3 50

                                                                           200
                                4 00

                                                                           250
                                4 50

                                                                           300
                                5 00
                                               50         1 00     150     2350
                                                                             00   250     3 00   3 50   4 00    4 50      500


                                                                           400


                                                                           450


                                                                           500
                                                                                     50    100   150    200    250     3 00     350   400   450   5 00




 Test example

                                                                                  ≈ 0.8 *                                                                + 0.3 *           + 0.5 *

         x       ≈ 0, …,      φ , + 0.3             φ42 + 0.5
 [a1, …, a64] = [0, 0.8 *0, 0.8360, …, 0,0.3 *, 0, …, 0, 0.5, 0] *                                                                                                                   φ63
 (feature representation)

                                                                                                 Slide credit: Andrew Ng                                       Compact & easily interpretable
Self-taught Learning   [Raina, Lee, Battle, Packer & Ng, ICML 07]




         Motorcycles                 Not motorcycles


                                                Testing:
                                                 Testing:
                                                What isis this?
                                                 What this?


                                 …
            Unlabeled images                   Slide credit: Andrew Ng
Classification Result on Caltech 101



            9K images, 101 classes

                                     64%
                                     SIFT VQ + Nonlinear
                                     SVM



                                     ~50%
                                     Pixel Sparse Coding
                                     + Linear SVM


 08/22/12                                              19
Sparse Coding on SIFT – ScSPM algorithm
                                      [Yang, Yu, Gong & Huang, CVPR09]




      Local Gradients      Pooling     Sparse Coding   Max Pooling    Linear
                                                                     Classifier

                e.g, SIFT, HOG




                                 20
Sparse Coding on SIFT – the ScSPM algorithm
                          [Yang, Yu, Gong & Huang, CVPR09]



            Caltech-101

                                    64%
                                    SIFT VQ + Nonlinear
                                    SVM



                                    73%
                                    SIFT Sparse Coding +
                                    Linear SVM (ScSPM)

 08/22/12                                                    21
22




                                            Dense SIFT
                                                              73%

                                          Sparse Coding
Summary: Accuracies on Caltech 101




                                        Spatial Max Pooling
                                                                - Sparse coding is a better building block
                                            Linear SVM
                                       Sparse Coding




                                                              50%
                                                                - Deep models are preferred
                                     Spatial Max Pooling
                                        Linear SVM
                                            Dense SIFT




                                                               Key message:
                                                              64%
                                            VQ Coding
                                           Spatial Pooling




                                                              08/22/12
                                           Nonlinear SVM
Outline

  1.       Sparse coding for image classification
  2.       Understanding sparse coding
  3.       Hierarchical sparse coding
  4.       Other topics: e.g. structured model, scale-up,
           discriminative training
  5.       Summary




08/22/12                                                    23
Outline

  1. Sparse coding for image classification
  2. Understanding sparse coding
       –     Connections to RBMs, autoencoders, …
       –     Sparse activations vs. sparse models, ..
       –     Sparsity vs. locality
       –     local sparse coding methods
  1. Hierarchical sparse coding
  2. Other topics: e.g. structured model, scale-up,
           discriminative training
  3.       Summary

08/22/12                                                24
Classical sparse coding




-    a is sparse
-    a is often higher dimension than x
-    Activation a=f(x) is nonlinear implicit function of x
-    reconstruction x’=g(a) is linear & explicit

                a
                    f(x) encoding      x’              decoding


                                                   a
                                            g(a)
                x
08/22/12                                                      25
RBM & autoencoders


-    also involve activation and reconstruction
-    but have explicit f(x)
-    not necessarily enforce sparsity on a
-    but if put sparsity on a, often get improved results
     [e.g. sparse RBM, Lee et al. NIPS08]

               a



                                      x’
                   f(x) encoding                      decoding




                                                  a
                                           g(a)
               x



08/22/12                                                         26
Sparse coding: A broader view

     Any feature mapping from x to a, i.e. a = f(x),
     where
     -a is sparse (and often higher dim. than x)
     -f(x) is nonlinear
     -reconstruction x’=g(a) , such that x’≈x
               a




                                     x’
                   f(x)




                                                 a
                                          g(a)
               x

  Therefore, sparse RBMs, sparse auto-encoder, even VQ
  can be viewed as a form of sparse coding.
08/22/12                                                 27
Outline
  1. Sparse coding for image classification
  2. Understanding sparse coding
       –     Connections to RBMs, autoencoders, …
       –     Sparse activations vs. sparse models, …
       –     Sparsity vs. locality
       –     local sparse coding methods
  1. Hierarchical sparse coding
  2. Other topics: e.g. structured model, scale-up,
           discriminative training
  3.       Summary


08/22/12                                               28
Sparse activations vs. sparse models

For a general function learning problem a = f(x):
1. sparse model: f(x)’s parameters are sparse
    - example: LASSO f(x)=<w,x>, w is sparse
    - the goal is feature selection: all data selects a common
      subset of features
    - hot topic in machine learning


2. sparse activations: f(x)’s outputs are sparse
    - example: sparse coding a=f(x), a is sparse
    - the goal is feature learning: different data points activate
      different feature subsets

 08/22/12                                                            29
Example of sparse models

                                                    f(x)=<w,x>, where
                                                   w=[0, 0.2, 0, 0.1, 0, 0]




                                                                              ]
                                                                              [ 0 | 0 | 0 0


                                                                                                  ]
                                                                                                  [ 0 | 0 | 0 0
                                                                                                  ]
                                                                                                  [ 0 | 0 | 0 0
| ]
xm [ | | | | |




                     | ][ | | | | |
                                  x1 [ | | | | |
                     x2
                     | ]




                                                                                    [ 0 | 0 | 0 0
                                                                                              …
   x3 [ | | | | |
   | ]
                 …                                                                  ]



                 • because the 2nd and 4th elements of w are non-zero,
                   these are the two selected features in x
                 • globally-aligned sparse representation
      08/22/12                                                                                                    30
Example of sparse activations (sparse
coding)




                                       0 ]
                                       am [ 0 0 0 | | …

                                                              0 ]
                                                              a2 ] | 0 0 0 …
                                                              0 [|
                                                                           a1 [ 0 | | 0 0 …
            x1
             x2
       x3
                                                          …

                  xm                   a3 [ | 0 | 0 0 …
                                       0 ]



• different x has different dimensions activated

• locally-shared sparse representation: similar x’s tend to have
  similar non-zero dimensions

08/22/12                                                                                      31
Example of sparse activations (sparse
coding)




                                       0 ]
                                       am [ 0 0 0 | | …

                                                              0 ]
                                                              a2 ] | | 0 0 …
                                                              0 [0
                                                                           a1 [ | | 0 0 0 …
                x2
           x1        x3                                   …
                      xm               a3 [ 0 0 | | 0 …
                                       0 ]



  • another example: preserving manifold structure

  • more informative in highlighting richer data structures,
    i.e. clusters, manifolds,

08/22/12                                                                                      32
Outline

  1. Sparse coding for image classification
  2. Understanding sparse coding
       –     Connections to RBMs, autoencoders, …
       –     Sparse activations vs. sparse models, …
       –     Sparsity vs. locality
       –     Local sparse coding methods
  1. Hierarchical sparse coding
  2. Other topics: e.g. structured model, scale-up,
           discriminative training
  3.       Summary

08/22/12                                               33
Sparsity vs. Locality

                                 • Intuition: similar data should
                                   get similar activated features
           sparse coding
                                 • Local sparse coding:
                                    • data in the same
                                      neighborhood tend to
                  local sparse        have shared activated
                     coding           features;

                                    • data in different
                                      neighborhoods tend to
                                      have different features
                                      activated.
08/22/12                                                        34
Sparse coding is not always local: example




            Case 1                          Case 2
     independent subspaces          data manifold (or clusters)

 • Each basis is a “direction”   • Each basis an “anchor point”
 • Sparsity: each datum is a     • Sparsity: each datum is a
 linear combination of only      linear combination of neighbor
 several bases.                  anchors.
                                 • Sparsity is caused by locality.
 08/22/12
                                                                  35
Two approaches to local sparse coding




          Approach 1                     Approach 2
 Coding via local anchor points   Coding via local subspaces




 08/22/12
                                                               36
Classical sparse coding is empirically local




 When it works best for classification, the codes
     are often found local.

 It’s preferred to let similar data have similar
     non-zero dimensions in their codes.

08/22/12                                             37
MNIST Experiment: Classification using
 SC




                    Try different values


                                     • •60K training, 10K for
                                         60K training, 10K for
                                     test
                                      test
                                     • • k=512
                                       Let k=512
                                        Let
                                     ••
                                      Linear SVM on
                                        Linear SVM on
                                     sparse codes
                                      sparse codes



08/22/12                                                   38
MNIST Experiment: Lambda = 0.0005



Each basis is like a
Each basis is like a
 part or direction.
 part or direction.




08/22/12                             39
MNIST Experiment: Lambda = 0.005



Again, each basis is like
Again, each basis is like
  a part or direction.
   a part or direction.




  08/22/12                            40
MNIST Experiment: Lambda = 0.05



Now, each basis is
Now, each basis is
 more like a digit !!
 more like a digit




08/22/12                           41
MNIST Experiment: Lambda = 0.5



      Like VQ now!
      Like VQ now!




08/22/12                          42
Geometric view of sparse
coding




        Error: 4.54%       Error: 3.75%         Error: 2.64%


   • •When sparse coding achieves the best classification
      When sparse coding achieves the best classification
   accuracy, the learned bases are like digits ––each basis
    accuracy, the learned bases are like digits each basis
   has aaclear local class association.
    has clear local class association.


 08/22/12                                                      43
Distribution of coefficients (MNIST)




           Neighbor bases tend to
            Neighbor bases tend to
           get nonzero coefficients
           get nonzero coefficients




08/22/12                                44
Distribution of coefficient (SIFT,
 Caltech101)




           Similar observation here!
           Similar observation here!




08/22/12                               45
Outline

  1. Sparse coding for image classification
  2. Understanding sparse coding
       –     Connections to RBMs, autoencoders, …
       –     Sparse activations vs. sparse models, …
       –     Sparsity vs. locality
       –     Local sparse coding methods
  1. Other topics: e.g. structured model, scale-up,
           discriminative training
  2.       Summary


08/22/12                                               46
Why develop local sparse coding methods


 Since locality is a preferred property in sparse
     coding, let’s explicitly ensure the locality.

 The new algorithms can be well theoretically
     justified

 The new algorithms will have computational
     advantages over classical sparse coding



08/22/12                                             47
Two approaches to local sparse coding




          Approach 1                                                Approach 2
 Coding via local anchor points                              Coding via local subspaces

      Local coordinate coding                                      Super-vector coding
  Learning locality-constrained linear coding for image       Image Classification using Super-Vector Coding of
      classification, Jingjun Wang, Jianchao Yang, Kai Yu,    Local Image Descriptors, Xi Zhou, Kai Yu, Tong
      Fengjun Lv, Thomas Huang. In CVPR 2010.                 Zhang, and Thomas Huang. In ECCV 2010.

  Nonlinear learning using local coordinate coding,           Large-scale Image Classification: Fast Feature
  Kai Yu, Tong Zhang, and Yihong Gong. In NIPS 2009.          Extraction and SVM Training, Yuanqing Lin, Fengjun
                                                              Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai
                                                              Yu, LiangLiang Cao, Thomas Huang. In CVPR 2011



 08/22/12
                                                                                                                   48
A function approximation framework to
understand coding


                            • •Assumption: image
                                Assumption: image
                            patches xxfollow aanonlinear
                              patches follow nonlinear
                            manifold, and f(x) is smooth
                              manifold, and f(x) is smooth
                            on the manifold.
                              on the manifold.
                            • •Coding: nonlinear mapping
                                Coding: nonlinear mapping
                                         xx aa
                                            
                            typically, aais high-dim &
                              typically, is high-dim &
                            sparse
                              sparse
                            • •Nonlinear Learning:
                                Nonlinear Learning:
                                     f(x) ==<w, a>
                                      f(x) <w, a>



 08/22/12                                               49
Local sparse coding




                   Approach 1
             Local coordinate coding




 08/22/12
                                       50
Function Interpolation based on LCC
                                           Yu, Zhang, Gong, NIPS 10



            locally linear




                             data points
                             bases
 08/22/12                                                             51
Local Coordinate Coding (LCC):
connect coding to n onlinear function learning


If f(x) is (alpha, beta)-Lipschitz smooth
             The key message:

             A good coding scheme should
             1. have a small coding error,
             2. and also be sufficiently local
       Function               Coding error       Locality term
     approximation
         error


  08/22/12                                                 52
Local Coordinate Coding (LCC) Yu, Zhang & Gong, NIPS 09
                                  Wang, Yang, Yu, Lv, Huang CVPR 10


• •Dictionary Learning: k-means (or hierarchical k-means)
   Dictionary Learning: k-means (or hierarchical k-means)

• Coding for x, to obtain its sparse representation a

      Step 1 – ensure locality: find the K nearest bases


      Step 2 – ensure low coding error:




08/22/12                                                              53
Local sparse coding




                  Approach 2
               Super-vector coding




 08/22/12
                                     54
ecewise local linear (first-order)
Function approximation via super-vector coding:
                           Zhou, Yu, Zhang, and Huang, ECCV 10


        Local tangent




                    data points
                    cluster centers
Super-vector coding: Justification



If f(x) is beta-Lipschitz smooth, and


                                        Local tangent




             Function approximation error               Quantization
                                                           error


  08/22/12                                                         56
Super-Vector Coding (SVC)
                                 Zhou, Yu, Zhang, and Huang, ECCV 10


 • •Dictionary Learning: k-means (or hierarchical k-means)
    Dictionary Learning: k-means (or hierarchical k-means)

• Coding for x, to obtain its sparse representation a

      Step 1 – find the nearest basis of x, obtain its VQ coding

                       e.g. [0, 0, 1, 0, …]

      Step 2 – form super vector coding:

            e.g. [0, 0, 1, 0, …, 0, 0, (x-m 3 ), 0 ,… ]

                    Zero-order        Local tangent
08/22/12                                                               57
Results on ImageNet Challenge Dataset

      ImageNet Challenge:
 1.4 million images, 1000 classes
                                    40%
                                    VQ + Intersection Kernel


                                    62%
                                    LCC + Linear SVM


                                    65%
                                    SVC + Linear SVM

 08/22/12                                                 58
Summary: local sparse coding




            Approach 1                          Approach 2
      Local coordinate coding                Super-vector coding

     -      Sparsity achieved by explicitly ensuring locality
     -      Sound theoretical justifications
     -      Much simpler to implement and compute
     -      Strong empirical success

 08/22/12
                                                                   59
Outline

  1.       Sparse coding for image classification
  2.       Understanding sparse coding
  3.       Hierarchical sparse coding
  4.       Other topics: e.g. structured model, scale-up,
           discriminative training
  5.       Summary




08/22/12                                                    60
Hierarchical sparse coding
                                                 Yu, Lin, & Lafferty, CVPR 11
                Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus, ICCV 11


              Learning from
              unlabeled data




      Sparse Coding        Pooling     Sparse Coding    Pooling




                                                                          61
1
           wi       αk    (φk )    wi =       S (W ) Ω(α )
    n i =1
A two-layer sparse coding + λ 1 W + γ α
    (W , α ) =
               k =1
                       L(W, α ) formulation
                                        1           1
               W ,α               n Yu, Lin, & Lafferty, CVPR 11
                       α 0,            n
                                                  ∈ λ
                                    1
                        S (= ) ≡
                  (W , α ) W               wW,iTαλ + p×1 p W 1 + γ α
                                          L( iw ) R                    1
             (W , α ) =             n((W,αα ) + 1 W 1 + γ α 1
                                W ,αL W, )
                                   L i =1               n
                           W ,α                   n
                                α 0,
                        1 α 0,
                   n
               1
                                              2 (W, α )
                                        2
                            xi − Bwi + λLwi Ω(α )wi .
               n i =1 2                      L(W, α )
                                       L(W, α )
                           n
              W = (w1 w2 · · n ) ∈2 Rp×λn2 2
                     n 11     ·1w
            L(W, α1 = α 1 RqBW −F Bwi + S2 Wi )Ω(α )wi
                  )         X − xi
                            ∈ xi2− Bwi + + λ 2 w λΩ(α )Ω(α ).
                                       2          (w               .
                      2n
                       n i =1             n     i      wi
                  n  i =1
                           2
                                q                −1
                    wi
                    Ω(αw ≡ww1 · 2 ·)·∈n()R∈Σ n α ).= Ω(α )− 1
                                                  p× n
                    W ) ( w α k · w φkp× (
                       =
                  W = ( 1 2 ·· n qw       ) R
                               α ∈
                          α ∈k =1q R
                          W R              S (W ) Ω(α )
              1                wi       α             −1
 08/22/12                               q   W         −1                   62
                                    q
MNIST Results - classification
                                            Yu, Lin, & Lafferty, CVPR 11




  HSC vs. CNN: HSC provide even better performance than CNN
        more amazingly, HSC learns features in unsupervised manner!

                                                                       63
MNIST results -- learned dictionary
                                         Yu, Lin, & Lafferty, CVPR 11


A hidden unit in the second layer is connected to a unit group in
the 1st layer: invariance to translation, rotation, and deformation




                                                                    64
Caltech101 results - classification
                                           Yu, Lin, & Lafferty, CVPR 11




    Learned descriptor: performs slightly better than SIFT + SC

                                                                      65
Adaptive Deconvolutional Networks for Mid
and High Level Feature Learning
                    Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus, ICCV 2011


 Hierarchical              L4 Feature
  Convolutional Sparse        Maps
  Coding.                                                Select L4 Features
                           L3 Feature
 Trained with respect       Maps
  to image from all
                                                      Select L3 Feature Groups
  layers (L1-L4).
                            L2 Feature
 Pooling both spatially      Maps
  and amongst features.
 Learns invariant mid-     L1 Feature                Select L2 Feature Groups
  level features.             Maps

                              Image                         L1 Features
Outline

  1.       Sparse coding for image classification
  2.       Understanding sparse coding
  3.       Hierarchical sparse coding
  4.       Other topics: e.g. structured model, scale-up,
           discriminative training
  5.       Summary




08/22/12                                                    67
Other topics of sparse coding

       Structured sparse coding, for example
       – Group sparse coding [Bengio et al, NIPS 09]
       – Learning hierarchical dictionary [Jenatton, Mairal et al, 2010]

       Scale-up sparse coding, for example
       – Feature-sign algorithm [Lee et al, NIPS 07]
       – Feed-forward approximation [Gregor & LeCun, ICML 10]
       – Online dictionary learning [Mairal et al, ICML 2009]

       Discriminative training, for example
       – Backprop algorithms [Bradley & Jbagnell, NIPS 08; Yang et al.
           CVPR 10]
       – Supervised dictionary training [Mairal et al, NIPS08]

08/22/12                                                                   68
Summary of Sparse Coding

 Sparse coding is an effect way for (unsupervised)
  feature learning

 A building block for deep models

 Sparse coding and its local variants (LCC, SVC) have
  pushed the boundary of accuracies on Caltech101,
  PASCAL VOC, ImageNet, …

 Challenge: discriminative training is not straightforward

Weitere ähnliche Inhalte

Ähnlich wie P02 sparse coding cvpr2012 deep learning methods for vision

P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
zukun
 
HUMAN FACE IDENTIFICATION
HUMAN FACE IDENTIFICATION HUMAN FACE IDENTIFICATION
HUMAN FACE IDENTIFICATION
bhupesh lahare
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videos
Insaf Setitra
 
MoDisco EclipseCon2010
MoDisco EclipseCon2010MoDisco EclipseCon2010
MoDisco EclipseCon2010
fmadiot
 
A comprehensive formal verification solution for ARM based SOC design
A comprehensive formal verification solution for ARM based SOC design A comprehensive formal verification solution for ARM based SOC design
A comprehensive formal verification solution for ARM based SOC design
chiportal
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]
imec.archive
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]
imec.archive
 
Reverse engineering
Reverse engineeringReverse engineering
Reverse engineering
Saswat Padhi
 
MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB
 
Upgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVC
Upgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVCUpgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVC
Upgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVC
FPGA Central
 

Ähnlich wie P02 sparse coding cvpr2012 deep learning methods for vision (20)

Vb
VbVb
Vb
 
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
 
HUMAN FACE IDENTIFICATION
HUMAN FACE IDENTIFICATION HUMAN FACE IDENTIFICATION
HUMAN FACE IDENTIFICATION
 
Oscon miller 2011
Oscon miller 2011Oscon miller 2011
Oscon miller 2011
 
Object classification in far field and low resolution videos
Object classification in far field and low resolution videosObject classification in far field and low resolution videos
Object classification in far field and low resolution videos
 
Av Recognition
Av RecognitionAv Recognition
Av Recognition
 
MoDisco EclipseCon2010
MoDisco EclipseCon2010MoDisco EclipseCon2010
MoDisco EclipseCon2010
 
A comprehensive formal verification solution for ARM based SOC design
A comprehensive formal verification solution for ARM based SOC design A comprehensive formal verification solution for ARM based SOC design
A comprehensive formal verification solution for ARM based SOC design
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]
 
2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]2008 brokerage 04 smart vision system [compatibility mode]
2008 brokerage 04 smart vision system [compatibility mode]
 
Ocr color
Ocr colorOcr color
Ocr color
 
Reverse engineering
Reverse engineeringReverse engineering
Reverse engineering
 
MongoDB for Java Developers with Spring Data
MongoDB for Java Developers with Spring DataMongoDB for Java Developers with Spring Data
MongoDB for Java Developers with Spring Data
 
Workshop OSGI PPT
Workshop OSGI PPTWorkshop OSGI PPT
Workshop OSGI PPT
 
Extreme Spatio-Temporal Data Analysis
Extreme Spatio-Temporal Data AnalysisExtreme Spatio-Temporal Data Analysis
Extreme Spatio-Temporal Data Analysis
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixels
 
MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011MongoDB for Java Devs with Spring Data - MongoPhilly 2011
MongoDB for Java Devs with Spring Data - MongoPhilly 2011
 
BlackHat EU 2012 - Zhenhua Liu - Breeding Sandworms: How To Fuzz Your Way Out...
BlackHat EU 2012 - Zhenhua Liu - Breeding Sandworms: How To Fuzz Your Way Out...BlackHat EU 2012 - Zhenhua Liu - Breeding Sandworms: How To Fuzz Your Way Out...
BlackHat EU 2012 - Zhenhua Liu - Breeding Sandworms: How To Fuzz Your Way Out...
 
Upgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVC
Upgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVCUpgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVC
Upgrading to System Verilog for FPGA Designs, Srinivasan Venkataramanan, CVC
 
Design and Concepts of Android Graphics
Design and Concepts of Android GraphicsDesign and Concepts of Android Graphics
Design and Concepts of Android Graphics
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Kürzlich hochgeladen

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 

P02 sparse coding cvpr2012 deep learning methods for vision

  • 1. CVPR12 Tutorial on Deep Learning Sparse Coding Kai Yu yukai@baidu.com Department of Multimedia, Baidu
  • 2. Relentless research on visual recognition Caltech 101 80 Million Tiny Images ImageNet PASCAL VOC 08/22/12 2
  • 3. The pipeline of machine visual perception Most Efforts in Machine Learning Inference: Low-level Pre- Feature Feature prediction, sensing processing extract. selection recognition • Most critical for accuracy • Account for most of the computation for testing • Most time-consuming in development cycle • Often hand-craft in practice 08/22/12 3
  • 4. Computer vision features SIFT Spin image HoG RIFT GLOH Slide Credit: Andrew Ng
  • 5. Learning features from data Machine Learning Inference: Low-level Pre- Feature Feature prediction, sensing processing extract. selection recognition Feature Learning: instead of design features, let’s design feature learners 08/22/12 5
  • 6. Learning features from data via sparse coding Inference: Low-level Pre- Feature Feature prediction, sensing processing extract. selection recognition Sparse coding offers an effective building block to learn useful features 08/22/12 6
  • 7. Outline 1. Sparse coding for image classification 2. Understanding sparse coding 3. Hierarchical sparse coding 4. Other topics: e.g. structured model, scale-up, discriminative training 5. Summary 08/22/12 7
  • 8. “BoW representation + SPM” Paradigm - I Bag-of-visual-words representation (BoW) based on VQ coding 08/22/12 Figure credit: Fei-Fei Li 8
  • 9. “BoW representation + SPM” Paradigm - II Spatial pyramid matching: pooling in different scales and locations 08/22/12 Figure credit: Svetlana Lazebnik 9
  • 10. Image Classification using “BoW + SPM” Image Classification Spatial Pooling Dense SIFT VQ Coding Classifier 08/22/12 10
  • 11. The Architecture of “Coding + Pooling” Coding Pooling Coding Pooling • •e.g., e.g.,convolutional neural net, HMAX, BoW, … convolutional neural net, HMAX, BoW, … 11
  • 12. “BoW+SPM” has two coding+pooling layers Local Gradients Pooling VQ Coding Average Pooling SVM (obtain histogram) e.g, SIFT, HOG SIFT feature itself follows a coding+pooling operation SIFT feature itself follows a coding+pooling operation 12
  • 13. Develop better coding methods Better Coding Better Pooling Better Coding Better Pooling Better Classifier - Coding: nonlinear mapping data into another feature space - Better coding methods: sparse coding, RBMs, auto-encoders 13
  • 14. What is sparse coding Sparse coding (Olshausen & Field,1996). Originally developed to explain early visual processing in the brain (edge detection). Training: given a set of random patches x, learning a dictionary of bases [Φ1, Φ2, …] Coding: for data vector x, solve LASSO to find the sparse coefficient vector a 08/22/12 14
  • 15. Sparse coding: training time Input: Images x1, x2, …, xm (eachinRd) Learn: Dictionary of bases φ1, φ2, …, φk (alsoRd). Alternating optimization: 1.Fix dictionary φ1, φ2, …, φk , optimize a (a standard LASSO problem ) 2.Fix activations a, optimize dictionary φ1, φ2, …, φk (a convex QP problem)
  • 16. Sparse coding: testing time Input: Novel image patch x (inRd) and previously learned φi’s Output: Representation [ai,1, ai,2, …, ai,κ] of image patch xi. ≈ 0.8 * + 0.3 * + 0.5 * Represent xi as: ai = [0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …]
  • 17. Sparse coding illustration 50 Natural Images Learned bases (φ1 , …, φ64): “Edges” 1 00 1 50 2 00 50 2 50 1 00 3 00 1 50 3 50 2 00 4 00 2 50 50 4 50 3 00 100 5 00 50 100 1 50 2 00 250 300 3 50 4 00 450 150 5 00 3 50 200 4 00 250 4 50 300 5 00 50 1 00 150 2350 00 250 3 00 3 50 4 00 4 50 500 400 450 500 50 100 150 200 250 3 00 350 400 450 5 00 Test example ≈ 0.8 * + 0.3 * + 0.5 * x ≈ 0, …, φ , + 0.3 φ42 + 0.5 [a1, …, a64] = [0, 0.8 *0, 0.8360, …, 0,0.3 *, 0, …, 0, 0.5, 0] * φ63 (feature representation) Slide credit: Andrew Ng Compact & easily interpretable
  • 18. Self-taught Learning [Raina, Lee, Battle, Packer & Ng, ICML 07] Motorcycles Not motorcycles Testing: Testing: What isis this? What this? … Unlabeled images Slide credit: Andrew Ng
  • 19. Classification Result on Caltech 101 9K images, 101 classes 64% SIFT VQ + Nonlinear SVM ~50% Pixel Sparse Coding + Linear SVM 08/22/12 19
  • 20. Sparse Coding on SIFT – ScSPM algorithm [Yang, Yu, Gong & Huang, CVPR09] Local Gradients Pooling Sparse Coding Max Pooling Linear Classifier e.g, SIFT, HOG 20
  • 21. Sparse Coding on SIFT – the ScSPM algorithm [Yang, Yu, Gong & Huang, CVPR09] Caltech-101 64% SIFT VQ + Nonlinear SVM 73% SIFT Sparse Coding + Linear SVM (ScSPM) 08/22/12 21
  • 22. 22 Dense SIFT 73% Sparse Coding Summary: Accuracies on Caltech 101 Spatial Max Pooling - Sparse coding is a better building block Linear SVM Sparse Coding 50% - Deep models are preferred Spatial Max Pooling Linear SVM Dense SIFT Key message: 64% VQ Coding Spatial Pooling 08/22/12 Nonlinear SVM
  • 23. Outline 1. Sparse coding for image classification 2. Understanding sparse coding 3. Hierarchical sparse coding 4. Other topics: e.g. structured model, scale-up, discriminative training 5. Summary 08/22/12 23
  • 24. Outline 1. Sparse coding for image classification 2. Understanding sparse coding – Connections to RBMs, autoencoders, … – Sparse activations vs. sparse models, .. – Sparsity vs. locality – local sparse coding methods 1. Hierarchical sparse coding 2. Other topics: e.g. structured model, scale-up, discriminative training 3. Summary 08/22/12 24
  • 25. Classical sparse coding - a is sparse - a is often higher dimension than x - Activation a=f(x) is nonlinear implicit function of x - reconstruction x’=g(a) is linear & explicit a f(x) encoding x’ decoding a g(a) x 08/22/12 25
  • 26. RBM & autoencoders - also involve activation and reconstruction - but have explicit f(x) - not necessarily enforce sparsity on a - but if put sparsity on a, often get improved results [e.g. sparse RBM, Lee et al. NIPS08] a x’ f(x) encoding decoding a g(a) x 08/22/12 26
  • 27. Sparse coding: A broader view Any feature mapping from x to a, i.e. a = f(x), where -a is sparse (and often higher dim. than x) -f(x) is nonlinear -reconstruction x’=g(a) , such that x’≈x a x’ f(x) a g(a) x Therefore, sparse RBMs, sparse auto-encoder, even VQ can be viewed as a form of sparse coding. 08/22/12 27
  • 28. Outline 1. Sparse coding for image classification 2. Understanding sparse coding – Connections to RBMs, autoencoders, … – Sparse activations vs. sparse models, … – Sparsity vs. locality – local sparse coding methods 1. Hierarchical sparse coding 2. Other topics: e.g. structured model, scale-up, discriminative training 3. Summary 08/22/12 28
  • 29. Sparse activations vs. sparse models For a general function learning problem a = f(x): 1. sparse model: f(x)’s parameters are sparse - example: LASSO f(x)=<w,x>, w is sparse - the goal is feature selection: all data selects a common subset of features - hot topic in machine learning 2. sparse activations: f(x)’s outputs are sparse - example: sparse coding a=f(x), a is sparse - the goal is feature learning: different data points activate different feature subsets 08/22/12 29
  • 30. Example of sparse models f(x)=<w,x>, where w=[0, 0.2, 0, 0.1, 0, 0] ] [ 0 | 0 | 0 0 ] [ 0 | 0 | 0 0 ] [ 0 | 0 | 0 0 | ] xm [ | | | | | | ][ | | | | | x1 [ | | | | | x2 | ] [ 0 | 0 | 0 0 … x3 [ | | | | | | ] … ] • because the 2nd and 4th elements of w are non-zero, these are the two selected features in x • globally-aligned sparse representation 08/22/12 30
  • 31. Example of sparse activations (sparse coding) 0 ] am [ 0 0 0 | | … 0 ] a2 ] | 0 0 0 … 0 [| a1 [ 0 | | 0 0 … x1 x2 x3 … xm a3 [ | 0 | 0 0 … 0 ] • different x has different dimensions activated • locally-shared sparse representation: similar x’s tend to have similar non-zero dimensions 08/22/12 31
  • 32. Example of sparse activations (sparse coding) 0 ] am [ 0 0 0 | | … 0 ] a2 ] | | 0 0 … 0 [0 a1 [ | | 0 0 0 … x2 x1 x3 … xm a3 [ 0 0 | | 0 … 0 ] • another example: preserving manifold structure • more informative in highlighting richer data structures, i.e. clusters, manifolds, 08/22/12 32
  • 33. Outline 1. Sparse coding for image classification 2. Understanding sparse coding – Connections to RBMs, autoencoders, … – Sparse activations vs. sparse models, … – Sparsity vs. locality – Local sparse coding methods 1. Hierarchical sparse coding 2. Other topics: e.g. structured model, scale-up, discriminative training 3. Summary 08/22/12 33
  • 34. Sparsity vs. Locality • Intuition: similar data should get similar activated features sparse coding • Local sparse coding: • data in the same neighborhood tend to local sparse have shared activated coding features; • data in different neighborhoods tend to have different features activated. 08/22/12 34
  • 35. Sparse coding is not always local: example Case 1 Case 2 independent subspaces data manifold (or clusters) • Each basis is a “direction” • Each basis an “anchor point” • Sparsity: each datum is a • Sparsity: each datum is a linear combination of only linear combination of neighbor several bases. anchors. • Sparsity is caused by locality. 08/22/12 35
  • 36. Two approaches to local sparse coding Approach 1 Approach 2 Coding via local anchor points Coding via local subspaces 08/22/12 36
  • 37. Classical sparse coding is empirically local  When it works best for classification, the codes are often found local.  It’s preferred to let similar data have similar non-zero dimensions in their codes. 08/22/12 37
  • 38. MNIST Experiment: Classification using SC Try different values • •60K training, 10K for 60K training, 10K for test test • • k=512 Let k=512 Let •• Linear SVM on Linear SVM on sparse codes sparse codes 08/22/12 38
  • 39. MNIST Experiment: Lambda = 0.0005 Each basis is like a Each basis is like a part or direction. part or direction. 08/22/12 39
  • 40. MNIST Experiment: Lambda = 0.005 Again, each basis is like Again, each basis is like a part or direction. a part or direction. 08/22/12 40
  • 41. MNIST Experiment: Lambda = 0.05 Now, each basis is Now, each basis is more like a digit !! more like a digit 08/22/12 41
  • 42. MNIST Experiment: Lambda = 0.5 Like VQ now! Like VQ now! 08/22/12 42
  • 43. Geometric view of sparse coding Error: 4.54% Error: 3.75% Error: 2.64% • •When sparse coding achieves the best classification When sparse coding achieves the best classification accuracy, the learned bases are like digits ––each basis accuracy, the learned bases are like digits each basis has aaclear local class association. has clear local class association. 08/22/12 43
  • 44. Distribution of coefficients (MNIST) Neighbor bases tend to Neighbor bases tend to get nonzero coefficients get nonzero coefficients 08/22/12 44
  • 45. Distribution of coefficient (SIFT, Caltech101) Similar observation here! Similar observation here! 08/22/12 45
  • 46. Outline 1. Sparse coding for image classification 2. Understanding sparse coding – Connections to RBMs, autoencoders, … – Sparse activations vs. sparse models, … – Sparsity vs. locality – Local sparse coding methods 1. Other topics: e.g. structured model, scale-up, discriminative training 2. Summary 08/22/12 46
  • 47. Why develop local sparse coding methods  Since locality is a preferred property in sparse coding, let’s explicitly ensure the locality.  The new algorithms can be well theoretically justified  The new algorithms will have computational advantages over classical sparse coding 08/22/12 47
  • 48. Two approaches to local sparse coding Approach 1 Approach 2 Coding via local anchor points Coding via local subspaces Local coordinate coding Super-vector coding Learning locality-constrained linear coding for image Image Classification using Super-Vector Coding of classification, Jingjun Wang, Jianchao Yang, Kai Yu, Local Image Descriptors, Xi Zhou, Kai Yu, Tong Fengjun Lv, Thomas Huang. In CVPR 2010. Zhang, and Thomas Huang. In ECCV 2010. Nonlinear learning using local coordinate coding, Large-scale Image Classification: Fast Feature Kai Yu, Tong Zhang, and Yihong Gong. In NIPS 2009. Extraction and SVM Training, Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, LiangLiang Cao, Thomas Huang. In CVPR 2011 08/22/12 48
  • 49. A function approximation framework to understand coding • •Assumption: image Assumption: image patches xxfollow aanonlinear patches follow nonlinear manifold, and f(x) is smooth manifold, and f(x) is smooth on the manifold. on the manifold. • •Coding: nonlinear mapping Coding: nonlinear mapping xx aa  typically, aais high-dim & typically, is high-dim & sparse sparse • •Nonlinear Learning: Nonlinear Learning: f(x) ==<w, a> f(x) <w, a> 08/22/12 49
  • 50. Local sparse coding Approach 1 Local coordinate coding 08/22/12 50
  • 51. Function Interpolation based on LCC Yu, Zhang, Gong, NIPS 10 locally linear data points bases 08/22/12 51
  • 52. Local Coordinate Coding (LCC): connect coding to n onlinear function learning If f(x) is (alpha, beta)-Lipschitz smooth The key message: A good coding scheme should 1. have a small coding error, 2. and also be sufficiently local Function Coding error Locality term approximation error 08/22/12 52
  • 53. Local Coordinate Coding (LCC) Yu, Zhang & Gong, NIPS 09 Wang, Yang, Yu, Lv, Huang CVPR 10 • •Dictionary Learning: k-means (or hierarchical k-means) Dictionary Learning: k-means (or hierarchical k-means) • Coding for x, to obtain its sparse representation a Step 1 – ensure locality: find the K nearest bases Step 2 – ensure low coding error: 08/22/12 53
  • 54. Local sparse coding Approach 2 Super-vector coding 08/22/12 54
  • 55. ecewise local linear (first-order) Function approximation via super-vector coding: Zhou, Yu, Zhang, and Huang, ECCV 10 Local tangent data points cluster centers
  • 56. Super-vector coding: Justification If f(x) is beta-Lipschitz smooth, and Local tangent Function approximation error Quantization error 08/22/12 56
  • 57. Super-Vector Coding (SVC) Zhou, Yu, Zhang, and Huang, ECCV 10 • •Dictionary Learning: k-means (or hierarchical k-means) Dictionary Learning: k-means (or hierarchical k-means) • Coding for x, to obtain its sparse representation a Step 1 – find the nearest basis of x, obtain its VQ coding e.g. [0, 0, 1, 0, …] Step 2 – form super vector coding: e.g. [0, 0, 1, 0, …, 0, 0, (x-m 3 ), 0 ,… ] Zero-order Local tangent 08/22/12 57
  • 58. Results on ImageNet Challenge Dataset ImageNet Challenge: 1.4 million images, 1000 classes 40% VQ + Intersection Kernel 62% LCC + Linear SVM 65% SVC + Linear SVM 08/22/12 58
  • 59. Summary: local sparse coding Approach 1 Approach 2 Local coordinate coding Super-vector coding - Sparsity achieved by explicitly ensuring locality - Sound theoretical justifications - Much simpler to implement and compute - Strong empirical success 08/22/12 59
  • 60. Outline 1. Sparse coding for image classification 2. Understanding sparse coding 3. Hierarchical sparse coding 4. Other topics: e.g. structured model, scale-up, discriminative training 5. Summary 08/22/12 60
  • 61. Hierarchical sparse coding Yu, Lin, & Lafferty, CVPR 11 Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus, ICCV 11 Learning from unlabeled data Sparse Coding Pooling Sparse Coding Pooling 61
  • 62. 1 wi αk (φk ) wi = S (W ) Ω(α ) n i =1 A two-layer sparse coding + λ 1 W + γ α (W , α ) = k =1 L(W, α ) formulation 1 1 W ,α n Yu, Lin, & Lafferty, CVPR 11 α 0, n ∈ λ 1 S (= ) ≡ (W , α ) W wW,iTαλ + p×1 p W 1 + γ α L( iw ) R 1 (W , α ) = n((W,αα ) + 1 W 1 + γ α 1 W ,αL W, ) L i =1 n W ,α n α 0, 1 α 0, n 1 2 (W, α ) 2 xi − Bwi + λLwi Ω(α )wi . n i =1 2 L(W, α ) L(W, α ) n W = (w1 w2 · · n ) ∈2 Rp×λn2 2 n 11 ·1w L(W, α1 = α 1 RqBW −F Bwi + S2 Wi )Ω(α )wi ) X − xi ∈ xi2− Bwi + + λ 2 w λΩ(α )Ω(α ). 2 (w . 2n n i =1 n i wi n i =1 2 q −1 wi Ω(αw ≡ww1 · 2 ·)·∈n()R∈Σ n α ).= Ω(α )− 1 p× n W ) ( w α k · w φkp× ( = W = ( 1 2 ·· n qw ) R α ∈ α ∈k =1q R W R S (W ) Ω(α ) 1 wi α −1 08/22/12 q W −1 62 q
  • 63. MNIST Results - classification Yu, Lin, & Lafferty, CVPR 11 HSC vs. CNN: HSC provide even better performance than CNN  more amazingly, HSC learns features in unsupervised manner! 63
  • 64. MNIST results -- learned dictionary Yu, Lin, & Lafferty, CVPR 11 A hidden unit in the second layer is connected to a unit group in the 1st layer: invariance to translation, rotation, and deformation 64
  • 65. Caltech101 results - classification Yu, Lin, & Lafferty, CVPR 11 Learned descriptor: performs slightly better than SIFT + SC 65
  • 66. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus, ICCV 2011  Hierarchical L4 Feature Convolutional Sparse Maps Coding. Select L4 Features L3 Feature  Trained with respect Maps to image from all Select L3 Feature Groups layers (L1-L4). L2 Feature  Pooling both spatially Maps and amongst features.  Learns invariant mid- L1 Feature Select L2 Feature Groups level features. Maps Image L1 Features
  • 67. Outline 1. Sparse coding for image classification 2. Understanding sparse coding 3. Hierarchical sparse coding 4. Other topics: e.g. structured model, scale-up, discriminative training 5. Summary 08/22/12 67
  • 68. Other topics of sparse coding  Structured sparse coding, for example – Group sparse coding [Bengio et al, NIPS 09] – Learning hierarchical dictionary [Jenatton, Mairal et al, 2010]  Scale-up sparse coding, for example – Feature-sign algorithm [Lee et al, NIPS 07] – Feed-forward approximation [Gregor & LeCun, ICML 10] – Online dictionary learning [Mairal et al, ICML 2009]  Discriminative training, for example – Backprop algorithms [Bradley & Jbagnell, NIPS 08; Yang et al. CVPR 10] – Supervised dictionary training [Mairal et al, NIPS08] 08/22/12 68
  • 69. Summary of Sparse Coding  Sparse coding is an effect way for (unsupervised) feature learning  A building block for deep models  Sparse coding and its local variants (LCC, SVC) have pushed the boundary of accuracies on Caltech101, PASCAL VOC, ImageNet, …  Challenge: discriminative training is not straightforward

Hinweis der Redaktion

  1. Let’s further check what’s happening when best classification performance is achieved.
  2. Hi I’m here to talk about my poster “Adaptive Deconvolutional for Mid and High Level Feature Learning” This is an extension to previous work on Deconvolutional Networks released at CVPR last year. Deconv Nets are a hierarchical convolutional sparse coding model used to learn features from images in an unsupervised fashion. We added two important extensions: - the first novel extension is to train the model with respect to the input image from any given layer. - other hierarchical models build on the layer below - allows stacking the model while still maintaining strong reconstructions - the second extension is a pooling operation that acts both spatially and amongst features - we store the switch locations within each pooling region in order to preserve sharp reconstructions through multiple layers - pooling over features provides more competition amongst features and therefore forces groupings to occur