SlideShare a Scribd company logo
1 of 77
A Neuromorphic Approach
to Computer Vision
Thomas Serre & Tomaso Poggio


Center for Biological and Computational Learning
Computer Science and Artificial Intelligence Laboratory
McGovern Institute for Brain Research
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology
Past Neo2 team:
CalTech, Bremen & MIT
                                 Tomaso Poggio, MIT
                                  Bob Desimone, MIT
                              Christof Koch, CalTech
Expertise:                  Winrich Freiwald, Bremen
 Computational neuroscience
 Animal behavior
 Neuronal recording in IT and V4 + fMRI in monkeys
 Data processing
 Access to human recordings
 Multi electrodes
The problem: invariant
recognition in natural scenes
The problem: invariant
recognition in natural scenes
                Object recognition is hard!
The problem: invariant
recognition in natural scenes
                Object recognition is hard!
                Our visual capabilities are
                computationally amazing
The problem: invariant
recognition in natural scenes
                Object recognition is hard!
                Our visual capabilities are
                computationally amazing
                Long-term goal: Reverse-
                engineer the visual system
                and build machines that
                see and interpret the visual
                world as well as we do
Neurally plausible quantitative
                   model of visual perception                                                                                                       Model
                                                                                                                                                    layers
                                                                                                                                                               RF sizes              Num.
                                                                                                                                                                                     units
                                                                                                                                         Animal
                   Prefrontal                                                                11,                                           vs.




                                                                                                                                                                                                task-dependent learning
                    Cortex                       46                 8      45 12             13
                                                                                                                                       non-animal   classification                    10 0
                                                                                                                                                        units




                                                                                                                                                                                                      Supervised



                                                                                                                                                                                                                           Increase in complexity (number of subunits), RF size and invariance
                                                                                                         PG
                                                         V2,V3,V4,MT,MST
                                        LIP,VIP,DP,7a




                                                                                                    V1
                                                                                      AIT,36,35
                                                                           PIT, AIT




                                                                                                              TE
                                                                                                                                                                            o              2
                                                                                                                                                     S4                 7             10

                                                        STP
                          Rostral STS


                                         }




                                                                                                                          TG   36 35
                                                                                                                                                                            o
                                        TPO PGa IPa TEa TEm                                                                                          C3                 7             10 3
       PG Cortex




                                                                                                                                                                                               task-independent learning
                                                                                                                    AIT
                                                                                                                                                                            o
                                                                                                                                                     C2b                7             10 3




                                                                                                                                                                                                     Unsupervised
                                                                                                                                                                        o        o
                                                                                                                                                     S3              1.2 - 3.2        10 4

DP   VIP LIP 7a PP MSTcMSTp                                     FST                                           PIT    TF                                                 o       o
                                                                                                                                                     S2b             0.9 - 4.4        10 7

                                                                                                                                                                        o       o
                                                                                                                                                     C2              1.1 - 3.0        10 5

                                                                                                                                                                        o       o
                                                         PO                V3A           MT               V4                                         S2
                                                                                                                                                                     0.6 - 2.4        10 7

                                                                                                                                                                        o       o
                                                                                                   V2
                                                                                                         V3
                                                                                                                                                     C1              0.4 - 1.6        10 4

                                                                                                                                                                                o
                                                                                                   V1                                                                0.2o- 1.1        10 6
                                                                                                                                                     S1



                          dorsal stream                                                             ventral stream
                         'where' pathway                                                            'what' pathway

                                                                                                                                                           Simple cells
                                                                                                                                                           Complex cells
                                                                                                                                                           Tuning               Main routes
                                                                                                                                                           MAX                  Bypass routes
Neurally plausible quantitative
                   model of visual perception                                                                                                       Model
                                                                                                                                                    layers
                                                                                                                                                               RF sizes              Num.
                                                                                                                                                                                     units
                                                                                                                                         Animal
                   Prefrontal                                                                11,                                           vs.




                                                                                                                                                                                                task-dependent learning
                    Cortex                       46                 8      45 12             13
                                                                                                                                       non-animal   classification                    10 0
                                                                                                                                                        units




                                                                                                                                                                                                      Supervised



                                                                                                                                                                                                                           Increase in complexity (number of subunits), RF size and invariance
                                                                                                         PG



                                                                                                                                                              Large-scale (108 units),
                                                         V2,V3,V4,MT,MST
                                        LIP,VIP,DP,7a




                                                                                                    V1
                                                                                      AIT,36,35
                                                                           PIT, AIT




                                                                                                              TE




                                                                                                                                                              spans several areas of the
                                                                                                                                                                            o              2
                                                                                                                                                     S4                 7             10

                                                        STP
                          Rostral STS


                                         }




                                                                                                                          TG   36 35



                                                                                                                                                              visual cortex
                                                                                                                                                                            o
                                        TPO PGa IPa TEa TEm                                                                                          C3                 7             10 3
       PG Cortex




                                                                                                                                                                                               task-independent learning
                                                                                                                    AIT
                                                                                                                                                                            o
                                                                                                                                                     C2b                7             10 3




                                                                                                                                                                                                     Unsupervised
                                                                                                                                                                        o        o
                                                                                                                                                     S3              1.2 - 3.2        10 4

DP   VIP LIP 7a PP MSTcMSTp                                     FST                                           PIT    TF                                                 o       o
                                                                                                                                                     S2b             0.9 - 4.4        10 7

                                                                                                                                                                        o       o
                                                                                                                                                     C2              1.1 - 3.0        10 5

                                                                                                                                                                        o       o
                                                         PO                V3A           MT               V4                                         S2
                                                                                                                                                                     0.6 - 2.4        10 7

                                                                                                                                                                        o       o
                                                                                                   V2
                                                                                                         V3
                                                                                                                                                     C1              0.4 - 1.6        10 4

                                                                                                                                                                                o
                                                                                                   V1                                                                0.2o- 1.1        10 6
                                                                                                                                                     S1



                          dorsal stream                                                             ventral stream
                         'where' pathway                                                            'what' pathway

                                                                                                                                                           Simple cells
                                                                                                                                                           Complex cells
                                                                                                                                                           Tuning               Main routes
                                                                                                                                                           MAX                  Bypass routes
Neurally plausible quantitative
                   model of visual perception                                                                                                       Model
                                                                                                                                                    layers
                                                                                                                                                               RF sizes              Num.
                                                                                                                                                                                     units
                                                                                                                                         Animal
                   Prefrontal                                                                11,                                           vs.




                                                                                                                                                                                                task-dependent learning
                    Cortex                       46                 8      45 12             13
                                                                                                                                       non-animal   classification                    10 0
                                                                                                                                                        units




                                                                                                                                                                                                      Supervised



                                                                                                                                                                                                                           Increase in complexity (number of subunits), RF size and invariance
                                                                                                         PG



                                                                                                                                                              Large-scale (108 units),
                                                         V2,V3,V4,MT,MST
                                        LIP,VIP,DP,7a




                                                                                                    V1
                                                                                      AIT,36,35
                                                                           PIT, AIT




                                                                                                              TE




                                                                                                                                                              spans several areas of the
                                                                                                                                                                            o              2
                                                                                                                                                     S4                 7             10

                                                        STP
                          Rostral STS


                                         }




                                                                                                                          TG   36 35



                                                                                                                                                              visual cortex
                                                                                                                                                                            o
                                        TPO PGa IPa TEa TEm                                                                                          C3                 7             10 3
       PG Cortex




                                                                                                                                                                                               task-independent learning
                                                                                                                    AIT
                                                                                                                                                                            o
                                                                                                                                                     C2b                7             10 3




                                                                                                                                                                                                     Unsupervised
                                                                                                                                                              Combination of forward
                                                                                                                                                                        o        o
                                                                                                                                                     S3              1.2 - 3.2        10 4

DP   VIP LIP 7a PP MSTcMSTp                                     FST                                           PIT    TF                                                 o       o



                                                                                                                                                              and reverse engineering
                                                                                                                                                     S2b             0.9 - 4.4        10 7

                                                                                                                                                                        o       o
                                                                                                                                                     C2              1.1 - 3.0        10 5

                                                                                                                                                                        o       o
                                                         PO                V3A           MT               V4                                         S2
                                                                                                                                                                     0.6 - 2.4        10 7

                                                                                                                                                                        o       o
                                                                                                   V2
                                                                                                         V3
                                                                                                                                                     C1              0.4 - 1.6        10 4

                                                                                                                                                                                o
                                                                                                   V1                                                                0.2o- 1.1        10 6
                                                                                                                                                     S1



                          dorsal stream                                                             ventral stream
                         'where' pathway                                                            'what' pathway

                                                                                                                                                           Simple cells
                                                                                                                                                           Complex cells
                                                                                                                                                           Tuning               Main routes
                                                                                                                                                           MAX                  Bypass routes
Neurally plausible quantitative
                   model of visual perception                                                                                                       Model
                                                                                                                                                    layers
                                                                                                                                                               RF sizes              Num.
                                                                                                                                                                                     units
                                                                                                                                         Animal
                   Prefrontal                                                                11,                                           vs.




                                                                                                                                                                                                task-dependent learning
                    Cortex                       46                 8      45 12             13
                                                                                                                                       non-animal   classification                    10 0
                                                                                                                                                        units




                                                                                                                                                                                                      Supervised



                                                                                                                                                                                                                           Increase in complexity (number of subunits), RF size and invariance
                                                                                                         PG



                                                                                                                                                              Large-scale (108 units),
                                                         V2,V3,V4,MT,MST
                                        LIP,VIP,DP,7a




                                                                                                    V1
                                                                                      AIT,36,35
                                                                           PIT, AIT




                                                                                                              TE




                                                                                                                                                              spans several areas of the
                                                                                                                                                                            o              2
                                                                                                                                                     S4                 7             10

                                                        STP
                          Rostral STS


                                         }




                                                                                                                          TG   36 35



                                                                                                                                                              visual cortex
                                                                                                                                                                            o
                                        TPO PGa IPa TEa TEm                                                                                          C3                 7             10 3
       PG Cortex




                                                                                                                                                                                               task-independent learning
                                                                                                                    AIT
                                                                                                                                                                            o
                                                                                                                                                     C2b                7             10 3




                                                                                                                                                                                                     Unsupervised
                                                                                                                                                              Combination of forward
                                                                                                                                                                        o        o
                                                                                                                                                     S3              1.2 - 3.2        10 4

DP   VIP LIP 7a PP MSTcMSTp                                     FST                                           PIT    TF                                                 o       o



                                                                                                                                                              and reverse engineering
                                                                                                                                                     S2b             0.9 - 4.4        10 7

                                                                                                                                                                        o       o
                                                                                                                                                     C2              1.1 - 3.0        10 5

                                                                                                                                                                        o       o
                                                                                                                                                                     0.6 - 2.4        10 7

                                                                                                                                                              Shown to be consistent
                                                         PO                V3A           MT               V4                                         S2

                                                                                                                                                                        o       o
                                                                                                   V2
                                                                                                         V3
                                                                                                                                                     C1              0.4 - 1.6        10 4


                                                                                                   V1
                                                                                                                                                     S1       with many experimental
                                                                                                                                                                     0.2o- 1.1
                                                                                                                                                                                o
                                                                                                                                                                                      10 6



                          dorsal stream
                         'where' pathway
                                                                                                    ventral stream
                                                                                                    'what' pathway
                                                                                                                                                              data across areas of visual
                                                                                                                                                              cortex
                                                                                                                                                           Simple cells
                                                                                                                                                           Complex cells
                                                                                                                                                           Tuning               Main routes
                                                                                                                                                           MAX                  Bypass routes
Feedforward processing and
rapid recognition
Feedforward processing and
rapid recognition
Feedforward processing and
rapid recognition
Feedforward processing and
rapid recognition
Feedforward processing and
   rapid recognition
             category
             selective
               units
  linear
perceptron
Model validation against
electrophysiology data
Model validation against
  electrophysiology data

                                   1              IT           Model

                                  0.8
     Classification performance




                                  0.6


                                  0.4


                                  0.2


                  0
               Size: 3.4o                3.4o     1.7o     6.8o   3.4o    3.4o
            Position: center            center   center   center 2ohorz. 4ohorz.


     TRAIN

Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005
Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005
Explaining human performance
    in rapid categorization tasks




Serre Oliva & Poggio 2007
Explaining human performance
    in rapid categorization tasks




Serre Oliva & Poggio 2007
Explaining human performance
    in rapid categorization tasks




                                      Head   Close-body   Medium-body   Far-body


                            Animals




Serre Oliva & Poggio 2007   Natural
Explaining human performance
    in rapid categorization tasks
                                                2.6



                                                2.4




                             Performance (d')
                                                1.8



                                                1.4
                                                             Model (82% correct)
                                                1.0
                                                             Human observers (80% correct)

                                                      Head    Close-body   Medium-body Far-body
                                                      Head     Close-      Medium- Far-
                                                               body         body      body
                            Animals




Serre Oliva & Poggio 2007   Natural
Decoding animal category
    from IT cortex




         Recording site in monkey’s IT




Meyers Freiwald Embark Kreiman Serre Poggio in prep
Decoding animal category
    from IT cortex

                                                      Model


                                                       IT neurons



         Recording site in monkey’s IT                fMRI




Meyers Freiwald Embark Kreiman Serre Poggio in prep
Decoding animal
category from IT
cortex in humans
Decoding animal
category from IT
cortex in humans

    ~145 ms   Animal




                       Non-animal
Decoding animal
category from IT
cortex in humans
Decoding animal
category from IT
cortex in humans
Decoding animal
category from IT
cortex in humans
Bio-motivated computer
 vision
  Scene parsing and object recognition



                                                    Computer vision
                                                   system based on
                                                     the response
                                                      properties of
                                                     neurons in the
                                                  ventral stream of the
                                                      visual cortex
Serre Wolf & Poggio 2005; Wolf & Bileschi 2006;
                 Serre et al 2007
Bio-motivated computer
 vision
  Scene parsing and object recognition




Serre Wolf & Poggio 2005; Wolf & Bileschi 2006;
                 Serre et al 2007
Bio-motivated computer
 vision
  Scene parsing and object recognition




                                                  Gflops




Serre Wolf & Poggio 2005; Wolf & Bileschi 2006;
                 Serre et al 2007
Bio-motivated computer
 vision
  Scene parsing and object recognition               Speed improvement since 2006




                                                  image size   multi-thread   GPU (cuda)

                                                    64x64         4.5x           14x

                                                  128x128         3.5x           14x

                                                  256x256         1.5x           17x

                                                  512x512         2.5x           25x


                                                      From ~1 min down to ~1 sec !!
Serre Wolf & Poggio 2005; Wolf & Bileschi 2006;
                 Serre et al 2007
Bio-motivated computer
vision
Action recognition in video sequences    motion-sensitive MT-like units


wave 2             bend       jump 2




            side
  jack                          wave 1



  walk
                   jump
                                run




     Jhuang Serre Wolf & Poggio 2007
Recognition accuracy
                                                Dollar et
                                                              model    chance
                                                 al ‘05

                 KTH Human                        81.3%       91.6%    16.7%


                Weiz. Human                       86.7%       96.3%    11.1%


                  UCSD Mice                       75.6%       79.0%    20.0%


★ Cross-validation: 2/3   training, 1/3 testing, 10 repeats       Jhuang Serre Wolf & Poggio ICCV’07
Automatic recognition of
rodent behavior




                 Serre Jhuang Garrote Poggio Steele in prep
Automatic recognition of
rodent behavior    Performance
                             human
                                              72%
                           agreement

                            proposed
                                              71%
                             system

                          commercial
                                              56%
                            system


                             chance           12%


                   Serre Jhuang Garrote Poggio Steele in prep
Neuroscience of attention
and Bayesian inference
Neuroscience of attention
and Bayesian inference
Neuroscience of attention
and Bayesian inference
Neuroscience of attention
and Bayesian inference




                  integrated model of
               attention and recognition
Neuroscience of attention
and Bayesian inference
        PFC




         IT




       V4/PIT


                                   integrated model of
        V2                      attention and recognition
              in collaboration with Desimone lab (monkey electrophysiology)
Neuroscience of attention
and Bayesian inference
        PFC

                feature-based
                   attention

         IT




       V4/PIT


                                   integrated model of
        V2                      attention and recognition
              in collaboration with Desimone lab (monkey electrophysiology)
Neuroscience of attention
and Bayesian inference
                      PFC

                              feature-based
                                 attention

                       IT
     LIP/FEF


                     V4/PIT
 spatial attention

                                                 integrated model of
                      V2                      attention and recognition
                            in collaboration with Desimone lab (monkey electrophysiology)
Neuroscience of Attention
     and Bayesian inference
                               PFC

                                        feature-based
                                           attention

                                IT
             LIP/FEF


                              V4/PIT
         spatial attention



                                V2

see also Rao 2005; Lee & Mumford 2003                   Chikkerur Serre & Poggio in prep
Neuroscience of Attention
     and Bayesian inference
                               PFC                                        O

                                        feature-based
                                                                                 object priors
                                           attention

                                IT                                        Fi
             LIP/FEF                                      L


                              V4/PIT                                      Fli
         spatial attention                          location priors
                                                                                N



                                V2                                         I

see also Rao 2005; Lee & Mumford 2003                                 Chikkerur Serre & Poggio in prep
Model predicts well human
eye-movements

                Integrating (local)
                feature-based + (global)
                context-based cues
                accounts for 92% of
                inter-subject agreement!




                   Chikkerur Tan Serre & Poggio in sub
Model performance
improves with attention




                performance (d’)
                                                   one shift of
                                   no attention
                                                    attention

                                    Model             Humans




                                   Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model             Humans




                                       Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model             Humans




                                       Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model             Humans




                                       Chikkerur Serre & Poggio in prep
Model performance
improves with attention
                                       mask             no mask

                                   3




                performance (d’)
                                   2

                                   1

                                   0
                                                       one shift of
                                       no attention
                                                        attention

                                        Model             Humans




                                       Chikkerur Serre & Poggio in prep
Main Achievements in Neo2
Main Achievements in Neo2
Extended + extensively tested feedforward model on real-world recognition
tasks [Poggio]:
   matches neural data
   mimics human performance in rapid categorization
   performs at the level of state-of-the-art computer vision systems
   C++ software + interface available / 100x speed-up
   combined with saliency algorithm + tested on real-time street surveillance
   (video)
Main Achievements in Neo2
Extended + extensively tested feedforward model on real-world recognition
tasks [Poggio]:
   matches neural data
   mimics human performance in rapid categorization
   performs at the level of state-of-the-art computer vision systems
   C++ software + interface available / 100x speed-up
   combined with saliency algorithm + tested on real-time street surveillance
   (video)
Demonstrated read out of cluttered natural images from monkey fMRI and
physiology recordings in inferotemporal cortex [Freiwald and Poggio]:
   first decoding of cluttered complex images
   agreement with original feedforward model
Main Achievements in Neo2
Extended + extensively tested feedforward model on real-world recognition
tasks [Poggio]:
   matches neural data
   mimics human performance in rapid categorization
   performs at the level of state-of-the-art computer vision systems
   C++ software + interface available / 100x speed-up
   combined with saliency algorithm + tested on real-time street surveillance
   (video)
Demonstrated read out of cluttered natural images from monkey fMRI and
physiology recordings in inferotemporal cortex [Freiwald and Poggio]:
   first decoding of cluttered complex images
   agreement with original feedforward model
Characterized neural encoding in V4, IT and FEF under passive and task-
dependent viewing conditions [Desimone and Poggio]:
   characterized the dynamics of bottom-up vs. top-down visual information
   processing (characteristic timing signature of activity in V4 and IT vs. FEF)
   top-down, task-dependent, attention modulates features in V4 and IT
Main Achievements in Neo2
Main Achievements in Neo2
Implemented new extended model suggested by these neuroscience
data from Desimone lab to include attention via feedback loops from
higher areas [Poggio]
  predicts well human gaze in natural images
  significantly improves recognition performance of original model in
  clutter
Main Achievements in Neo2
Implemented new extended model suggested by these neuroscience
data from Desimone lab to include attention via feedback loops from
higher areas [Poggio]
   predicts well human gaze in natural images
   significantly improves recognition performance of original model in
   clutter
Extended model for classification of video sequences (i.e., action
recognition) [Poggio]
   tested on several video databases and shown to outperform previous
   algorithms
Main Achievements in Neo2
Implemented new extended model suggested by these neuroscience
data from Desimone lab to include attention via feedback loops from
higher areas [Poggio]
   predicts well human gaze in natural images
   significantly improves recognition performance of original model in
   clutter
Extended model for classification of video sequences (i.e., action
recognition) [Poggio]
   tested on several video databases and shown to outperform previous
   algorithms
Demonstrated read-out from human medial temporal lobe (MTL) [Koch]
   Decoding of natural scenes from single neurons in human MTL
   Improved ability of saliency model to mimic human gaze patterns
Main Achievements in Neo2
Implemented new extended model suggested by these neuroscience
data from Desimone lab to include attention via feedback loops from
higher areas [Poggio]
   predicts well human gaze in natural images
   significantly improves recognition performance of original model in
   clutter
Extended model for classification of video sequences (i.e., action
recognition) [Poggio]
   tested on several video databases and shown to outperform previous
   algorithms
Demonstrated read-out from human medial temporal lobe (MTL) [Koch]
   Decoding of natural scenes from single neurons in human MTL
   Improved ability of saliency model to mimic human gaze patterns
Model used to transfer neuroscience data to biologically inspired vision
systems
MIT team:
                                     Poggio, Desimone, Serre,

Future Directions
                                      1-of-2 IT physiologist,
                                           + (Koch+Itti)


Develop new technologies to decode computations and
representations in the visual cortex:
MIT team:
                                     Poggio, Desimone, Serre,

Future Directions
                                      1-of-2 IT physiologist,
                                           + (Koch+Itti)


Develop new technologies to decode computations and
representations in the visual cortex:
                             Optical silencing and
   circuits                  stimulation technology
                             based on X-rhodopsin
MIT team:
                                      Poggio, Desimone, Serre,

Future Directions
                                       1-of-2 IT physiologist,
                                            + (Koch+Itti)


Develop new technologies to decode computations and
representations in the visual cortex:
                             Optical silencing and
   circuits                  stimulation technology
                             based on X-rhodopsin

                         Multi-electrode
  network
                         technology
MIT team:
                                      Poggio, Desimone, Serre,

Future Directions
                                       1-of-2 IT physiologist,
                                            + (Koch+Itti)


Develop new technologies to decode computations and
representations in the visual cortex:
                             Optical silencing and
   circuits                  stimulation technology
                             based on X-rhodopsin

                         Multi-electrode
  network
                         technology


                         Simultaneous recordings
  system
                         across areas
MIT team:
From the neuroscience                   Poggio, Desimone,
                                           Serre, XXX
data towards a
system-level model of
natural vision
   1. Clutter and image ambiguities: Attention and
     cortical feedback
   2. Learning and recognition of objects in video
     sequences
Clutter and image ambiguities:
Attention and cortical feedback




      IT
Clutter and image ambiguities:
Attention and cortical feedback


               Circuitry of attention and
               role of synchronization in
               top-down and bottom-up
               search tasks: monkey
      IT       electrophysiology in V4, IT
               and FEF
Clutter and image ambiguities:
Attention and cortical feedback



                   +

      IT
Learning and recognition of
objects in video sequences

How current computer
                        How brains learn
 vision systems learn
Learning and recognition of
objects in video sequences

How current computer
                        How brains learn
 vision systems learn
Thank you!
Past Neo2 team:
CalTech, Bremen & MIT

Tomaso Poggio, MIT
Bob Desimone, MIT
Christof Koch, CalTech
Winrich Freiwald, Bremen
IT readout improves with
    attention
                                                stim     cue        transient change
                                               isolated object




      +


                                                                 object not shown




Zhang Meyers Serre Bichot Desimone Poggio in prep                             n=67
IT readout improves with
    attention
                                                stim     cue        transient change
                                               isolated object




      +
                                                         attention away from object

                                                                 object not shown




Zhang Meyers Serre Bichot Desimone Poggio in prep                             n=67
IT readout improves with
    attention
                                                stim     cue        transient change
                                               isolated object




      +
                                                         attention away from object

                                                                 object not shown




Zhang Meyers Serre Bichot Desimone Poggio in prep                             n=67
MIT team:
    IT readout improves                                          Poggio, Desimone,
                                                                    Serre, XXX
    with attention
                                                stim     cue        transient change
                                               isolated object
                                                           attention on object



      +
                                                         attention away from object

                                                                 object not shown




Zhang Meyers Serre Bichot Desimone Poggio in prep                                n=67
Two functional classes of cells to explain
invariant object recognition in the visual
cortex
   Simple cells                                      Complex cells




    Template matching                                        Invariance
    Gaussian-like tuning                                  max-like operation
         ~ “AND”                                                ~”OR”

               Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)

More Related Content

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

A neuromoprhic approach to computer vision

  • 1. A Neuromorphic Approach to Computer Vision Thomas Serre & Tomaso Poggio Center for Biological and Computational Learning Computer Science and Artificial Intelligence Laboratory McGovern Institute for Brain Research Department of Brain & Cognitive Sciences Massachusetts Institute of Technology
  • 2. Past Neo2 team: CalTech, Bremen & MIT Tomaso Poggio, MIT Bob Desimone, MIT Christof Koch, CalTech Expertise: Winrich Freiwald, Bremen Computational neuroscience Animal behavior Neuronal recording in IT and V4 + fMRI in monkeys Data processing Access to human recordings Multi electrodes
  • 4. The problem: invariant recognition in natural scenes Object recognition is hard!
  • 5. The problem: invariant recognition in natural scenes Object recognition is hard! Our visual capabilities are computationally amazing
  • 6. The problem: invariant recognition in natural scenes Object recognition is hard! Our visual capabilities are computationally amazing Long-term goal: Reverse- engineer the visual system and build machines that see and interpret the visual world as well as we do
  • 7. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE o 2 S4 7 10 STP Rostral STS } TG 36 35 o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A MT V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes
  • 8. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG Large-scale (108 units), V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE spans several areas of the o 2 S4 7 10 STP Rostral STS } TG 36 35 visual cortex o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A MT V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes
  • 9. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG Large-scale (108 units), V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE spans several areas of the o 2 S4 7 10 STP Rostral STS } TG 36 35 visual cortex o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised Combination of forward o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o and reverse engineering S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o PO V3A MT V4 S2 0.6 - 2.4 10 7 o o V2 V3 C1 0.4 - 1.6 10 4 o V1 0.2o- 1.1 10 6 S1 dorsal stream ventral stream 'where' pathway 'what' pathway Simple cells Complex cells Tuning Main routes MAX Bypass routes
  • 10. Neurally plausible quantitative model of visual perception Model layers RF sizes Num. units Animal Prefrontal 11, vs. task-dependent learning Cortex 46 8 45 12 13 non-animal classification 10 0 units Supervised Increase in complexity (number of subunits), RF size and invariance PG Large-scale (108 units), V2,V3,V4,MT,MST LIP,VIP,DP,7a V1 AIT,36,35 PIT, AIT TE spans several areas of the o 2 S4 7 10 STP Rostral STS } TG 36 35 visual cortex o TPO PGa IPa TEa TEm C3 7 10 3 PG Cortex task-independent learning AIT o C2b 7 10 3 Unsupervised Combination of forward o o S3 1.2 - 3.2 10 4 DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o and reverse engineering S2b 0.9 - 4.4 10 7 o o C2 1.1 - 3.0 10 5 o o 0.6 - 2.4 10 7 Shown to be consistent PO V3A MT V4 S2 o o V2 V3 C1 0.4 - 1.6 10 4 V1 S1 with many experimental 0.2o- 1.1 o 10 6 dorsal stream 'where' pathway ventral stream 'what' pathway data across areas of visual cortex Simple cells Complex cells Tuning Main routes MAX Bypass routes
  • 15. Feedforward processing and rapid recognition category selective units linear perceptron
  • 17. Model validation against electrophysiology data 1 IT Model 0.8 Classification performance 0.6 0.4 0.2 0 Size: 3.4o 3.4o 1.7o 6.8o 3.4o 3.4o Position: center center center center 2ohorz. 4ohorz. TRAIN Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005 Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005
  • 18. Explaining human performance in rapid categorization tasks Serre Oliva & Poggio 2007
  • 19. Explaining human performance in rapid categorization tasks Serre Oliva & Poggio 2007
  • 20. Explaining human performance in rapid categorization tasks Head Close-body Medium-body Far-body Animals Serre Oliva & Poggio 2007 Natural
  • 21. Explaining human performance in rapid categorization tasks 2.6 2.4 Performance (d') 1.8 1.4 Model (82% correct) 1.0 Human observers (80% correct) Head Close-body Medium-body Far-body Head Close- Medium- Far- body body body Animals Serre Oliva & Poggio 2007 Natural
  • 22. Decoding animal category from IT cortex Recording site in monkey’s IT Meyers Freiwald Embark Kreiman Serre Poggio in prep
  • 23. Decoding animal category from IT cortex Model IT neurons Recording site in monkey’s IT fMRI Meyers Freiwald Embark Kreiman Serre Poggio in prep
  • 24. Decoding animal category from IT cortex in humans
  • 25. Decoding animal category from IT cortex in humans ~145 ms Animal Non-animal
  • 26. Decoding animal category from IT cortex in humans
  • 27. Decoding animal category from IT cortex in humans
  • 28. Decoding animal category from IT cortex in humans
  • 29. Bio-motivated computer vision Scene parsing and object recognition Computer vision system based on the response properties of neurons in the ventral stream of the visual cortex Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  • 30. Bio-motivated computer vision Scene parsing and object recognition Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  • 31. Bio-motivated computer vision Scene parsing and object recognition Gflops Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  • 32. Bio-motivated computer vision Scene parsing and object recognition Speed improvement since 2006 image size multi-thread GPU (cuda) 64x64 4.5x 14x 128x128 3.5x 14x 256x256 1.5x 17x 512x512 2.5x 25x From ~1 min down to ~1 sec !! Serre Wolf & Poggio 2005; Wolf & Bileschi 2006; Serre et al 2007
  • 33. Bio-motivated computer vision Action recognition in video sequences motion-sensitive MT-like units wave 2 bend jump 2 side jack wave 1 walk jump run Jhuang Serre Wolf & Poggio 2007
  • 34. Recognition accuracy Dollar et model chance al ‘05 KTH Human 81.3% 91.6% 16.7% Weiz. Human 86.7% 96.3% 11.1% UCSD Mice 75.6% 79.0% 20.0% ★ Cross-validation: 2/3 training, 1/3 testing, 10 repeats Jhuang Serre Wolf & Poggio ICCV’07
  • 35. Automatic recognition of rodent behavior Serre Jhuang Garrote Poggio Steele in prep
  • 36. Automatic recognition of rodent behavior Performance human 72% agreement proposed 71% system commercial 56% system chance 12% Serre Jhuang Garrote Poggio Steele in prep
  • 37. Neuroscience of attention and Bayesian inference
  • 38. Neuroscience of attention and Bayesian inference
  • 39. Neuroscience of attention and Bayesian inference
  • 40. Neuroscience of attention and Bayesian inference integrated model of attention and recognition
  • 41. Neuroscience of attention and Bayesian inference PFC IT V4/PIT integrated model of V2 attention and recognition in collaboration with Desimone lab (monkey electrophysiology)
  • 42. Neuroscience of attention and Bayesian inference PFC feature-based attention IT V4/PIT integrated model of V2 attention and recognition in collaboration with Desimone lab (monkey electrophysiology)
  • 43. Neuroscience of attention and Bayesian inference PFC feature-based attention IT LIP/FEF V4/PIT spatial attention integrated model of V2 attention and recognition in collaboration with Desimone lab (monkey electrophysiology)
  • 44. Neuroscience of Attention and Bayesian inference PFC feature-based attention IT LIP/FEF V4/PIT spatial attention V2 see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
  • 45. Neuroscience of Attention and Bayesian inference PFC O feature-based object priors attention IT Fi LIP/FEF L V4/PIT Fli spatial attention location priors N V2 I see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep
  • 46. Model predicts well human eye-movements Integrating (local) feature-based + (global) context-based cues accounts for 92% of inter-subject agreement! Chikkerur Tan Serre & Poggio in sub
  • 47. Model performance improves with attention performance (d’) one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 48. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 49. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 50. Model performance improves with attention 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 51. Model performance improves with attention mask no mask 3 performance (d’) 2 1 0 one shift of no attention attention Model Humans Chikkerur Serre & Poggio in prep
  • 53. Main Achievements in Neo2 Extended + extensively tested feedforward model on real-world recognition tasks [Poggio]: matches neural data mimics human performance in rapid categorization performs at the level of state-of-the-art computer vision systems C++ software + interface available / 100x speed-up combined with saliency algorithm + tested on real-time street surveillance (video)
  • 54. Main Achievements in Neo2 Extended + extensively tested feedforward model on real-world recognition tasks [Poggio]: matches neural data mimics human performance in rapid categorization performs at the level of state-of-the-art computer vision systems C++ software + interface available / 100x speed-up combined with saliency algorithm + tested on real-time street surveillance (video) Demonstrated read out of cluttered natural images from monkey fMRI and physiology recordings in inferotemporal cortex [Freiwald and Poggio]: first decoding of cluttered complex images agreement with original feedforward model
  • 55. Main Achievements in Neo2 Extended + extensively tested feedforward model on real-world recognition tasks [Poggio]: matches neural data mimics human performance in rapid categorization performs at the level of state-of-the-art computer vision systems C++ software + interface available / 100x speed-up combined with saliency algorithm + tested on real-time street surveillance (video) Demonstrated read out of cluttered natural images from monkey fMRI and physiology recordings in inferotemporal cortex [Freiwald and Poggio]: first decoding of cluttered complex images agreement with original feedforward model Characterized neural encoding in V4, IT and FEF under passive and task- dependent viewing conditions [Desimone and Poggio]: characterized the dynamics of bottom-up vs. top-down visual information processing (characteristic timing signature of activity in V4 and IT vs. FEF) top-down, task-dependent, attention modulates features in V4 and IT
  • 57. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter
  • 58. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter Extended model for classification of video sequences (i.e., action recognition) [Poggio] tested on several video databases and shown to outperform previous algorithms
  • 59. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter Extended model for classification of video sequences (i.e., action recognition) [Poggio] tested on several video databases and shown to outperform previous algorithms Demonstrated read-out from human medial temporal lobe (MTL) [Koch] Decoding of natural scenes from single neurons in human MTL Improved ability of saliency model to mimic human gaze patterns
  • 60. Main Achievements in Neo2 Implemented new extended model suggested by these neuroscience data from Desimone lab to include attention via feedback loops from higher areas [Poggio] predicts well human gaze in natural images significantly improves recognition performance of original model in clutter Extended model for classification of video sequences (i.e., action recognition) [Poggio] tested on several video databases and shown to outperform previous algorithms Demonstrated read-out from human medial temporal lobe (MTL) [Koch] Decoding of natural scenes from single neurons in human MTL Improved ability of saliency model to mimic human gaze patterns Model used to transfer neuroscience data to biologically inspired vision systems
  • 61. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex:
  • 62. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex: Optical silencing and circuits stimulation technology based on X-rhodopsin
  • 63. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex: Optical silencing and circuits stimulation technology based on X-rhodopsin Multi-electrode network technology
  • 64. MIT team: Poggio, Desimone, Serre, Future Directions 1-of-2 IT physiologist, + (Koch+Itti) Develop new technologies to decode computations and representations in the visual cortex: Optical silencing and circuits stimulation technology based on X-rhodopsin Multi-electrode network technology Simultaneous recordings system across areas
  • 65. MIT team: From the neuroscience Poggio, Desimone, Serre, XXX data towards a system-level model of natural vision 1. Clutter and image ambiguities: Attention and cortical feedback 2. Learning and recognition of objects in video sequences
  • 66. Clutter and image ambiguities: Attention and cortical feedback IT
  • 67. Clutter and image ambiguities: Attention and cortical feedback Circuitry of attention and role of synchronization in top-down and bottom-up search tasks: monkey IT electrophysiology in V4, IT and FEF
  • 68. Clutter and image ambiguities: Attention and cortical feedback + IT
  • 69. Learning and recognition of objects in video sequences How current computer How brains learn vision systems learn
  • 70. Learning and recognition of objects in video sequences How current computer How brains learn vision systems learn
  • 72. Past Neo2 team: CalTech, Bremen & MIT Tomaso Poggio, MIT Bob Desimone, MIT Christof Koch, CalTech Winrich Freiwald, Bremen
  • 73. IT readout improves with attention stim cue transient change isolated object + object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  • 74. IT readout improves with attention stim cue transient change isolated object + attention away from object object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  • 75. IT readout improves with attention stim cue transient change isolated object + attention away from object object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  • 76. MIT team: IT readout improves Poggio, Desimone, Serre, XXX with attention stim cue transient change isolated object attention on object + attention away from object object not shown Zhang Meyers Serre Bichot Desimone Poggio in prep n=67
  • 77. Two functional classes of cells to explain invariant object recognition in the visual cortex Simple cells Complex cells Template matching Invariance Gaussian-like tuning max-like operation ~ “AND” ~”OR” Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)

Editor's Notes

  1. Here is the team that I am representing: Tomaso Poggio and Bob Desimone at MIT, Christof Koch at CalTech and Winrich Freiwald who used to be in Bremen now at CalTech and soon at Rockfeller.
  2. Our group has been focusing on the computational mechanisms of invariant object recognition. This is obviously a very hard computational problems and despite decades of engineering efforts we still have not been able to build a computer algorithm that could compete with the speed, robustness and efficiency of the primate visual system. Our long term goal here is thus to try to build machines that not only mimic the processing of information in the visual cortex but also see and interpret the visual world as well as we do.
  3. Our group has been focusing on the computational mechanisms of invariant object recognition. This is obviously a very hard computational problems and despite decades of engineering efforts we still have not been able to build a computer algorithm that could compete with the speed, robustness and efficiency of the primate visual system. Our long term goal here is thus to try to build machines that not only mimic the processing of information in the visual cortex but also see and interpret the visual world as well as we do.
  4. Our group has been focusing on the computational mechanisms of invariant object recognition. This is obviously a very hard computational problems and despite decades of engineering efforts we still have not been able to build a computer algorithm that could compete with the speed, robustness and efficiency of the primate visual system. Our long term goal here is thus to try to build machines that not only mimic the processing of information in the visual cortex but also see and interpret the visual world as well as we do.
  5. Over the years we have developed an initial quantitative model of information processing in the visual cortex. The model tries to summarize what is currently known about the anatomy, physiology and organization of the visual cortex. The model does not try to explain the processing of information in one specific visual area but instead spans several visual areas with a relatively large number of units (on the order of 100 million). The model combines reverse engineering where the parameters of the model like RF sizes etc are derived from available data but also forward as it is inspired by well known principles from learning theory and computer vision. Together with colleagues, we have shown that the resulting architecture is surprisingly consistent with data from V1, V2, V4, MT and IT.
  6. Over the years we have developed an initial quantitative model of information processing in the visual cortex. The model tries to summarize what is currently known about the anatomy, physiology and organization of the visual cortex. The model does not try to explain the processing of information in one specific visual area but instead spans several visual areas with a relatively large number of units (on the order of 100 million). The model combines reverse engineering where the parameters of the model like RF sizes etc are derived from available data but also forward as it is inspired by well known principles from learning theory and computer vision. Together with colleagues, we have shown that the resulting architecture is surprisingly consistent with data from V1, V2, V4, MT and IT.
  7. Over the years we have developed an initial quantitative model of information processing in the visual cortex. The model tries to summarize what is currently known about the anatomy, physiology and organization of the visual cortex. The model does not try to explain the processing of information in one specific visual area but instead spans several visual areas with a relatively large number of units (on the order of 100 million). The model combines reverse engineering where the parameters of the model like RF sizes etc are derived from available data but also forward as it is inspired by well known principles from learning theory and computer vision. Together with colleagues, we have shown that the resulting architecture is surprisingly consistent with data from V1, V2, V4, MT and IT.
  8. Unfortunately I am not going to have too much time to give you details about this model. I would be happy to talk afterwards if anyone has questions. The key assumption here is that when the visual system is flashed with an image, the visual signal is rapidly routed through a hierarchy of visual areas in a single feedforward sweep. Here our key assumption is that the goal of the ventral stream of the visual cortex is to build during the first 150ms of visual processing a base representation, whereby object categories can be represented in an position and scale tolerant manner before more complex routines and in particular shifts of attention and eye movements take place. This base representation takes the form of a population of model units in various stages of the hierarchy tuned to key features of natural images with different levels of complexity and invariance. Learning in the model of the ventral stream is unsupervised such that when training the model to recognize a new object category we don’t have to retrain the whole hierarchy, only the task specific circuits that sit at the top for instance in the PFC, you can think of these task-specific circuits as a linear classifier if you will.
  9. Unfortunately I am not going to have too much time to give you details about this model. I would be happy to talk afterwards if anyone has questions. The key assumption here is that when the visual system is flashed with an image, the visual signal is rapidly routed through a hierarchy of visual areas in a single feedforward sweep. Here our key assumption is that the goal of the ventral stream of the visual cortex is to build during the first 150ms of visual processing a base representation, whereby object categories can be represented in an position and scale tolerant manner before more complex routines and in particular shifts of attention and eye movements take place. This base representation takes the form of a population of model units in various stages of the hierarchy tuned to key features of natural images with different levels of complexity and invariance. Learning in the model of the ventral stream is unsupervised such that when training the model to recognize a new object category we don’t have to retrain the whole hierarchy, only the task specific circuits that sit at the top for instance in the PFC, you can think of these task-specific circuits as a linear classifier if you will.
  10. Unfortunately I am not going to have too much time to give you details about this model. I would be happy to talk afterwards if anyone has questions. The key assumption here is that when the visual system is flashed with an image, the visual signal is rapidly routed through a hierarchy of visual areas in a single feedforward sweep. Here our key assumption is that the goal of the ventral stream of the visual cortex is to build during the first 150ms of visual processing a base representation, whereby object categories can be represented in an position and scale tolerant manner before more complex routines and in particular shifts of attention and eye movements take place. This base representation takes the form of a population of model units in various stages of the hierarchy tuned to key features of natural images with different levels of complexity and invariance. Learning in the model of the ventral stream is unsupervised such that when training the model to recognize a new object category we don’t have to retrain the whole hierarchy, only the task specific circuits that sit at the top for instance in the PFC, you can think of these task-specific circuits as a linear classifier if you will.
  11. Unfortunately I am not going to have too much time to give you details about this model. I would be happy to talk afterwards if anyone has questions. The key assumption here is that when the visual system is flashed with an image, the visual signal is rapidly routed through a hierarchy of visual areas in a single feedforward sweep. Here our key assumption is that the goal of the ventral stream of the visual cortex is to build during the first 150ms of visual processing a base representation, whereby object categories can be represented in an position and scale tolerant manner before more complex routines and in particular shifts of attention and eye movements take place. This base representation takes the form of a population of model units in various stages of the hierarchy tuned to key features of natural images with different levels of complexity and invariance. Learning in the model of the ventral stream is unsupervised such that when training the model to recognize a new object category we don’t have to retrain the whole hierarchy, only the task specific circuits that sit at the top for instance in the PFC, you can think of these task-specific circuits as a linear classifier if you will.
  12. Let me show you one example of some of the validation we have performed on this model. Here for instance we considered a small population of about 200 random model units in one of the top stages of the architecture I just presented. From this population activity we can try to readout the object category of stimuli that are presented to the model. In fact we can try to train a classifier with stimuli presented at one position and scale and see how well it generalizes to other position and scale. This tells you how much built-in invariance is built in the population of units. We get the results indicated here by the light gray bar plots corresponding to different amount of shifts in position and scale. You can play the same game on neurons in IT which is the highest purely visual area and has been critically linked with primates ability to recognize objects invariant of their position and scale. Here we found that the model was able to predict not only the overall level of performance but also the range of invariance to position and scale.
  13. Another important validation is behavior assessed here using human psychophysics. As I mentioned earlier, the original goal of the model was not to explain natural every day vision when you are free to move your eyes and shift your attention but rather was is often called rapid recognition or immediate recognition which corresponds to the first 100-150 ms of visual processing (when an image is briefly presented) ie when the visual system is forced to operate in a feedforward mode before eye movements and shifts of attention take place. An example is shown on the left. Here I flash an image for a couple of ms, you probably don’t have time to get every fine details of this image but most people are able to say whether they contain an animal or not. Here we had divided our dataset in 4 subcategories: head... overall both the model and human do about 80% on this very difficult task and you can see that they agree quite well in turns of how they perform for these 4 subcategories...
  14. Another important validation is behavior assessed here using human psychophysics. As I mentioned earlier, the original goal of the model was not to explain natural every day vision when you are free to move your eyes and shift your attention but rather was is often called rapid recognition or immediate recognition which corresponds to the first 100-150 ms of visual processing (when an image is briefly presented) ie when the visual system is forced to operate in a feedforward mode before eye movements and shifts of attention take place. An example is shown on the left. Here I flash an image for a couple of ms, you probably don’t have time to get every fine details of this image but most people are able to say whether they contain an animal or not. Here we had divided our dataset in 4 subcategories: head... overall both the model and human do about 80% on this very difficult task and you can see that they agree quite well in turns of how they perform for these 4 subcategories...
  15. Another important validation is behavior assessed here using human psychophysics. As I mentioned earlier, the original goal of the model was not to explain natural every day vision when you are free to move your eyes and shift your attention but rather was is often called rapid recognition or immediate recognition which corresponds to the first 100-150 ms of visual processing (when an image is briefly presented) ie when the visual system is forced to operate in a feedforward mode before eye movements and shifts of attention take place. An example is shown on the left. Here I flash an image for a couple of ms, you probably don’t have time to get every fine details of this image but most people are able to say whether they contain an animal or not. Here we had divided our dataset in 4 subcategories: head... overall both the model and human do about 80% on this very difficult task and you can see that they agree quite well in turns of how they perform for these 4 subcategories...
  16. This dependency of human and the model performance in terms of clutter motivated a subsequent electrophysiology experiment that was done with Winrich Freiwald during the Neo2 project. Here we found that this trend still holds for neurons in monkey IT cortex. Here we used fMRI to find areas that are differentially selective for animal vs. non-animal images. Winrich went on and recorded from a small pop of about 200 neurons in this area. You can see the readout results here on the right. We could reliably readout the animal category information from these difficult real-world images. Interestingly we found that there was also surprisingly high signal at the bold signal level (this is using a contrast agent).
  17. More recently we gained access to a population of epileptic patients with intractable epilepsy and that are planned for resective surgery. Typically the patients spend about a week at the hospital with implanted electrodes. They are being monitored 24/7 to try to essentially triangulate the epileptic site. Here these patients are a unique opportunity to not only get behavioral measurements but also simultaneous intracranial recordings (here we measure local field potentials from iEEG). I should emphasize that the spatial and temporal resolution that we get is several orders of magnitude higher than what we could get with non-invasive imaging technique such as fMRI. As an illustration, here is one electrode from one patient performing this animal vs non-animal categorization task. Here the electrode location has to be confirmed but is probably somewhere around the temporal lobe. Here you can see that already around 145 ms one can readout the presence or absence of an animal presented to the patient.
  18. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  19. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  20. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  21. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  22. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  23. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  24. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  25. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  26. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  27. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  28. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  29. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  30. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  31. of course one key limitation of this approach is that we have no control over the location of the electrodes which is based solely on medical criterion. However by pooling together data from multiple patients we hope that we will be able to reconstruct the feedforward sweep and recover readout latencies across the temporal lobe.
  32. In parallel we have used this model in real-world computer vision applications. For instance we have developed a computer vision system for the automatic parsing of street scene images. Here are examples of automatic parsing by the system overlaid over the original images. The colors and bounding boxes indicate predictions from the model (eg green for trees etc).
  33. We have done a number of improvements in terms of the implementation of this model. The original matlab implementation of this model was quite slow... We have been working on a number of ways to speed up this model. We started with efficient multi-threaded C/C++ implementation and finally went for exploiting the recent gains in computational power from graphics processing hardware (GPUs).
  34. More recently we have extended the approach for the recognition of human actions such as running, walking, jogging, jumping, waving etc... In all cases we have shown that the resulting biologically motivated computer vision systems were performing on par or better than state-of-the-art computer vision systems.
  35. There are several other systems that
  36. Let me switch gears and tell you a little bit about our work on attention. As I showed you earlier, one key limitation of this feedforward architecture is that it performs well for the recognition of objects when the objects to be recognized is large and the amount of background clutter is limited. I have shown you that consistent with human psychophysics and monkey electrophysiology the performance of the model decreases quite significantly when the amount of clutter increases. Here we have been working with the assumption that the way the visual system overcome this limitation is via cortical feedback and shifts of attention. In particular our working hypothesis is that the role of spatial attention is to suppress the clutter so that the object of interest appears as if it were presented in isolation. In collaboration with electrophysiology labs we are studying the circuits and networks of visual areas involved in attention.
  37. In collaboration with electrophysiology labs we are studying the circuits and networks of visual areas involved in attention which involves a complex interaction between the ventral stream and area V4 in particular, prefrontal areas such as the FEF as well as the parietal cortex.
  38. We had to perform two key extensions on this model. First we have assumed that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. we also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence)
  39. We had to perform two key extensions on this model. First we have assumed that feature-based attention acts through a cascade of top-down connections though the ventral stream originating in the PFC where a template of the target object is held in memory all the way down to V4 and possibly lower areas. we also assume a spatial attention modulation originating from the parietal cortex (here I am assuming LIP based on limited experimental evidence)
  40. This attentional mechanisms can be casted in a probabilistic Bayesian framework whereby the parietal cortex represents Location variables, the ventral stream represents feature variables. These are our image fragments. Variables for the target object are encoded in higher areas such as PFC... This framework is inspired by an earlier model by Rao to explain spatial attention and is a special case of the computational model of the visual cortex described by David Mumford and that probably most of you know...
  41. We have implemented the approach in the context of our animal detection task. The performance of the model increases with only one shift of attention. Here is the performance of the feedforward model as I showed you earlier but the performance is averaged across all categories. Here is the performance allowing one shift of attention. Just for comparison here is the performance of human observers when images are flashed very briefly. Here is the performance when human observers are left just a little more time, presumably just enough to allow one shift of attention. Obviously our long-term goal will be to match human level of performance when left with as much time as needed.
  42. We have implemented the approach in the context of our animal detection task. The performance of the model increases with only one shift of attention. Here is the performance of the feedforward model as I showed you earlier but the performance is averaged across all categories. Here is the performance allowing one shift of attention. Just for comparison here is the performance of human observers when images are flashed very briefly. Here is the performance when human observers are left just a little more time, presumably just enough to allow one shift of attention. Obviously our long-term goal will be to match human level of performance when left with as much time as needed.
  43. We have implemented the approach in the context of our animal detection task. The performance of the model increases with only one shift of attention. Here is the performance of the feedforward model as I showed you earlier but the performance is averaged across all categories. Here is the performance allowing one shift of attention. Just for comparison here is the performance of human observers when images are flashed very briefly. Here is the performance when human observers are left just a little more time, presumably just enough to allow one shift of attention. Obviously our long-term goal will be to match human level of performance when left with as much time as needed.
  44. We have implemented the approach in the context of our animal detection task. The performance of the model increases with only one shift of attention. Here is the performance of the feedforward model as I showed you earlier but the performance is averaged across all categories. Here is the performance allowing one shift of attention. Just for comparison here is the performance of human observers when images are flashed very briefly. Here is the performance when human observers are left just a little more time, presumably just enough to allow one shift of attention. Obviously our long-term goal will be to match human level of performance when left with as much time as needed.
  45. Let me just summarize some of our main achievements from phase 0 of Neo2.
  46. Let me just summarize some of our main achievements from phase 0 of Neo2.
  47. Let me just summarize some of our main achievements from phase 0 of Neo2.
  48. If we want to make real progress in deciphering the computations and representations in the visual cortex we really need to study brains not just at the level of single neurons but we need to integrate multiple levels of analysis: In particular we need to be able to: 1) understand how key computations for object recognition are carried out in cortical microcircuits. And we have been working on new tools for optical silencing and stimulation on neurons based on channel-rhodopsin to study these circuits. 2) understand the interaction between networks of neurons within single cortical areas, this will require the development of multi-electrode technologies not only in lower visual areas as currently done but also in higher visual areas that are more difficult to access 3) Finally we need to be able to record not in just one area at a time but multiple areas to understand how these areas communicate between each other.
  49. If we want to make real progress in deciphering the computations and representations in the visual cortex we really need to study brains not just at the level of single neurons but we need to integrate multiple levels of analysis: In particular we need to be able to: 1) understand how key computations for object recognition are carried out in cortical microcircuits. And we have been working on new tools for optical silencing and stimulation on neurons based on channel-rhodopsin to study these circuits. 2) understand the interaction between networks of neurons within single cortical areas, this will require the development of multi-electrode technologies not only in lower visual areas as currently done but also in higher visual areas that are more difficult to access 3) Finally we need to be able to record not in just one area at a time but multiple areas to understand how these areas communicate between each other.
  50. If we want to make real progress in deciphering the computations and representations in the visual cortex we really need to study brains not just at the level of single neurons but we need to integrate multiple levels of analysis: In particular we need to be able to: 1) understand how key computations for object recognition are carried out in cortical microcircuits. And we have been working on new tools for optical silencing and stimulation on neurons based on channel-rhodopsin to study these circuits. 2) understand the interaction between networks of neurons within single cortical areas, this will require the development of multi-electrode technologies not only in lower visual areas as currently done but also in higher visual areas that are more difficult to access 3) Finally we need to be able to record not in just one area at a time but multiple areas to understand how these areas communicate between each other.
  51. At the same time, these neuroscience data will allow us to not only validate but also extend existing models of the visual cortex and hopefully improve their recognition capabilities. In particular if we want to have computer systems that can compete with the primate visual system we need to go beyond rapid categorization tasks and study vision in more natural cases. In particular, I think there are two key Neuroscience questions that need to be studied: First as I eluded too already in this talk, cortical feedback and shifts of attention are likely to be the key computational mechanisms by which the visual system solves most of the difficulties inherent to vision namely dealing with significant amount of clutter as well as ambiguity in the visual input because of occlusion or low signal to noise. The second one is the processing of image sequences not as a succession of independent snapshots as I showed you in the model of rapid object categorization but rather models that can exploit the temporal continuity of image sequences both for learning invariance to 2D transformations (zooming and looming, translation, 3D rotation etc) but also for the recognition of object in motion.
  52. Along those lines we have started to make significant progress in understanding the circuitry of attention and in particular how spatial attention works to suppress the clutter in image displays of this kind.
  53. The next step is obviously to move towards more natural stimulus presentations.
  54. I think significant progress in computer vision will come from the use of video sequences and the exploitation of temporal continuity in those sequences. Here is the way current computer vision systems treat the visual world: As a collection of independent frames. Obviously the visual world is much richer than that and time is obviously an important component of visual perception. Obviously babies do not learn to recognize giraffes via labeled examples of this kind. Instead this baby who is going to the zoo perhaps for the first time has access to a much richer information, whereby giraffes undergo transformations such as rotation in depth, looming or shifting on the retina in a smooth continuous way. It is our belief that by exploiting these principles we will be able to build better learning algorithms.
  55. Most of the work in the areas of computer vision and visual neuroscience has focused on the recognition of isolated objects. However, vision is much more than just classification, as it involves interpreting, parsing and navigating in visual scenes. By just looking, a human observer could essentially answer an infinite number of questions about an image: for instance, about the location and the boundary of an object, how to grasp it or to navigate over it. These are essential problems for robotics applications, which in essence have remained unaddressed in the field of neuroscience.
  56. Here is the team that I am representing: Tomaso Poggio and Bob Desimone at MIT, Christof Koch at CalTech and Winrich Freiwald who used to be in Bremen now at CalTech and soon at Rockfeller.
  57. We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
  58. We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
  59. We have implemented the approach in the context of our animal search model mostly improves on medium and far conditions
  60. Computational considerations suggest that you need two types of operations and therefore functional classes of cells for invariant object recognition The gaussian-bell tuning was motivated by a learning algorithm called Radial Basis Function while the max operation was motivated by the standard scanning approach in computer vision and theoretical arguments from signal processing. The goal of the simple units is to increase the complexity of the representation. Here on this example by pooling together the activity of afferent units with different orientations via this Gaussian-like tuning. This Gaussian tuning is ubiquitous in the visual cortex from orientation tuning in V1 to tuning for complex objects around certain poses in IT.\\ The complex units pool together afferent units with the same preferred stimuli eg vertical bar but slightly different positions and scales. At the complex unit level we thus build some tolerance with respect to the exact position and scale of the stimulus within the receptive field of the unit.