A neuromoprhic approach to computer vision

A Neuromorphic Approach
to Computer Vision
Thomas Serre & Tomaso Poggio

Center for Biological and Computational Learning
Computer Science and Artiﬁcial Intelligence Laboratory
McGovern Institute for Brain Research
Department of Brain & Cognitive Sciences
Massachusetts Institute of Technology

Past Neo2 team:
CalTech, Bremen & MIT
Tomaso Poggio, MIT
Bob Desimone, MIT
Christof Koch, CalTech
Expertise: Winrich Freiwald, Bremen
Computational neuroscience
Animal behavior
Neuronal recording in IT and V4 + fMRI in monkeys
Data processing
Access to human recordings
Multi electrodes

The problem: invariant
recognition in natural scenes

Object recognition is hard!

Our visual capabilities are
computationally amazing

Our visual capabilities are
computationally amazing
Long-term goal: Reverse-
engineer the visual system
and build machines that
see and interpret the visual
world as well as we do

Neurally plausible quantitative
model of visual perception Model
layers
RF sizes Num.
units
Animal
Prefrontal 11, vs.

task-dependent learning
Cortex 46 8 45 12 13
non-animal classification 10 0
units

Supervised

Increase in complexity (number of subunits), RF size and invariance
PG
V2,V3,V4,MT,MST
LIP,VIP,DP,7a

V1
AIT,36,35
PIT, AIT

TE
o 2
S4 7 10

STP
Rostral STS

}

TG 36 35
o
TPO PGa IPa TEa TEm C3 7 10 3
PG Cortex

task-independent learning
AIT
o
C2b 7 10 3

Unsupervised
o o
S3 1.2 - 3.2 10 4

DP VIP LIP 7a PP MSTcMSTp FST PIT TF o o
S2b 0.9 - 4.4 10 7

o o
C2 1.1 - 3.0 10 5

o o
PO V3A MT V4 S2
0.6 - 2.4 10 7

o o
V2
V3
C1 0.4 - 1.6 10 4

o
V1 0.2o- 1.1 10 6
S1

dorsal stream ventral stream
'where' pathway 'what' pathway

Simple cells
Complex cells
Tuning Main routes
MAX Bypass routes

layers
RF sizes Num.
units
Animal
Prefrontal 11, vs.

Cortex 46 8 45 12 13
units

Supervised

PG

Large-scale (108 units),
V2,V3,V4,MT,MST
LIP,VIP,DP,7a

V1
AIT,36,35
PIT, AIT

TE

spans several areas of the
o 2
S4 7 10

STP
Rostral STS

}

TG 36 35

visual cortex
o
PG Cortex

AIT
o
C2b 7 10 3

Unsupervised
o o
S3 1.2 - 3.2 10 4

S2b 0.9 - 4.4 10 7

o o
C2 1.1 - 3.0 10 5

o o
PO V3A MT V4 S2
0.6 - 2.4 10 7

o o
V2
V3
C1 0.4 - 1.6 10 4

o
V1 0.2o- 1.1 10 6
S1


Simple cells
Complex cells
Tuning Main routes
MAX Bypass routes

layers
RF sizes Num.
units
Animal
Prefrontal 11, vs.

Cortex 46 8 45 12 13
units

Supervised

PG

V2,V3,V4,MT,MST
LIP,VIP,DP,7a

V1
AIT,36,35
PIT, AIT

TE

o 2
S4 7 10

STP
Rostral STS

}

TG 36 35

visual cortex
o
PG Cortex

AIT
o
C2b 7 10 3

Unsupervised
Combination of forward
o o
S3 1.2 - 3.2 10 4


and reverse engineering
S2b 0.9 - 4.4 10 7

o o
C2 1.1 - 3.0 10 5

o o
PO V3A MT V4 S2
0.6 - 2.4 10 7

o o
V2
V3
C1 0.4 - 1.6 10 4

o
V1 0.2o- 1.1 10 6
S1


Simple cells
Complex cells
Tuning Main routes
MAX Bypass routes

layers
RF sizes Num.
units
Animal
Prefrontal 11, vs.

Cortex 46 8 45 12 13
units

Supervised

PG

V2,V3,V4,MT,MST
LIP,VIP,DP,7a

V1
AIT,36,35
PIT, AIT

TE

o 2
S4 7 10

STP
Rostral STS

}

TG 36 35

visual cortex
o
PG Cortex

AIT
o
C2b 7 10 3

Unsupervised
Combination of forward
o o
S3 1.2 - 3.2 10 4


and reverse engineering
S2b 0.9 - 4.4 10 7

o o
C2 1.1 - 3.0 10 5

o o
0.6 - 2.4 10 7

Shown to be consistent
PO V3A MT V4 S2

o o
V2
V3
C1 0.4 - 1.6 10 4

V1
S1 with many experimental
0.2o- 1.1
o
10 6

dorsal stream
'where' pathway
ventral stream
'what' pathway
data across areas of visual
cortex
Simple cells
Complex cells
Tuning Main routes
MAX Bypass routes

Feedforward processing and
rapid recognition

Feedforward processing and
rapid recognition
category
selective
units
linear
perceptron

Model validation against
electrophysiology data

Model validation against
electrophysiology data

1 IT Model

0.8
Classification performance

0.6

0.4

0.2

0
Size: 3.4o 3.4o 1.7o 6.8o 3.4o 3.4o
Position: center center center center 2ohorz. 4ohorz.

TRAIN

Model data: Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005
Experimental data: Hung* Kreiman* Poggio & DiCarlo 2005

Explaining human performance
in rapid categorization tasks

Serre Oliva & Poggio 2007


Head Close-body Medium-body Far-body

Animals

Serre Oliva & Poggio 2007 Natural

2.6

2.4

Performance (d')
1.8

1.4
Model (82% correct)
1.0
Human observers (80% correct)

Head Close-body Medium-body Far-body
Head Close- Medium- Far-
body body body
Animals

Serre Oliva & Poggio 2007 Natural

Decoding animal category
from IT cortex

Recording site in monkey’s IT

Meyers Freiwald Embark Kreiman Serre Poggio in prep

Decoding animal category
from IT cortex

Model

IT neurons

Recording site in monkey’s IT fMRI

Meyers Freiwald Embark Kreiman Serre Poggio in prep

Decoding animal
category from IT
cortex in humans

Decoding animal
category from IT
cortex in humans

~145 ms Animal

Non-animal

Bio-motivated computer
vision
Scene parsing and object recognition

Computer vision
system based on
the response
properties of
neurons in the
ventral stream of the
visual cortex
Serre Wolf & Poggio 2005; Wolf & Bileschi 2006;
Serre et al 2007

vision

Serre et al 2007

vision

Gﬂops

Serre et al 2007

vision
Scene parsing and object recognition Speed improvement since 2006

image size multi-thread GPU (cuda)

64x64 4.5x 14x

128x128 3.5x 14x

256x256 1.5x 17x

512x512 2.5x 25x

From ~1 min down to ~1 sec !!
Serre et al 2007

vision
Action recognition in video sequences motion-sensitive MT-like units

wave 2 bend jump 2

side
jack wave 1

walk
jump
run

Jhuang Serre Wolf & Poggio 2007

Recognition accuracy
Dollar et
model chance
al ‘05

KTH Human 81.3% 91.6% 16.7%

Weiz. Human 86.7% 96.3% 11.1%

UCSD Mice 75.6% 79.0% 20.0%

★ Cross-validation: 2/3 training, 1/3 testing, 10 repeats Jhuang Serre Wolf & Poggio ICCV’07

Automatic recognition of
rodent behavior

Serre Jhuang Garrote Poggio Steele in prep

Automatic recognition of
rodent behavior Performance
human
72%
agreement

proposed
71%
system

commercial
56%
system

chance 12%

Serre Jhuang Garrote Poggio Steele in prep

Neuroscience of attention
and Bayesian inference


integrated model of
attention and recognition

PFC

IT

V4/PIT

integrated model of
V2 attention and recognition
in collaboration with Desimone lab (monkey electrophysiology)

PFC

feature-based
attention

IT

V4/PIT

integrated model of

PFC

feature-based
attention

IT
LIP/FEF

V4/PIT
spatial attention

integrated model of

Neuroscience of Attention
PFC

feature-based
attention

IT
LIP/FEF

V4/PIT
spatial attention

V2

see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep

Neuroscience of Attention
PFC O

feature-based
object priors
attention

IT Fi
LIP/FEF L

V4/PIT Fli
spatial attention location priors
N

V2 I

see also Rao 2005; Lee & Mumford 2003 Chikkerur Serre & Poggio in prep

Model predicts well human
eye-movements

Integrating (local)
feature-based + (global)
context-based cues
accounts for 92% of
inter-subject agreement!

Chikkerur Tan Serre & Poggio in sub

Model performance
improves with attention

performance (d’)
one shift of
no attention
attention

Model Humans

Chikkerur Serre & Poggio in prep

Model performance
3

performance (d’)
2

1

0
one shift of
no attention
attention

Model Humans


Model performance
mask no mask

3

performance (d’)
2

1

0
one shift of
no attention
attention

Model Humans


Main Achievements in Neo2
Extended + extensively tested feedforward model on real-world recognition
tasks [Poggio]:
matches neural data
mimics human performance in rapid categorization
performs at the level of state-of-the-art computer vision systems
C++ software + interface available / 100x speed-up
combined with saliency algorithm + tested on real-time street surveillance
(video)

tasks [Poggio]:
matches neural data
(video)
Demonstrated read out of cluttered natural images from monkey fMRI and
physiology recordings in inferotemporal cortex [Freiwald and Poggio]:
ﬁrst decoding of cluttered complex images
agreement with original feedforward model

tasks [Poggio]:
matches neural data
(video)
Demonstrated read out of cluttered natural images from monkey fMRI and
physiology recordings in inferotemporal cortex [Freiwald and Poggio]:
ﬁrst decoding of cluttered complex images
agreement with original feedforward model
Characterized neural encoding in V4, IT and FEF under passive and task-
dependent viewing conditions [Desimone and Poggio]:
characterized the dynamics of bottom-up vs. top-down visual information
processing (characteristic timing signature of activity in V4 and IT vs. FEF)
top-down, task-dependent, attention modulates features in V4 and IT

Implemented new extended model suggested by these neuroscience
data from Desimone lab to include attention via feedback loops from
higher areas [Poggio]
predicts well human gaze in natural images
signiﬁcantly improves recognition performance of original model in
clutter

clutter
Extended model for classiﬁcation of video sequences (i.e., action
recognition) [Poggio]
tested on several video databases and shown to outperform previous
algorithms

clutter
algorithms
Demonstrated read-out from human medial temporal lobe (MTL) [Koch]
Decoding of natural scenes from single neurons in human MTL
Improved ability of saliency model to mimic human gaze patterns

clutter
algorithms
Demonstrated read-out from human medial temporal lobe (MTL) [Koch]
Decoding of natural scenes from single neurons in human MTL
Improved ability of saliency model to mimic human gaze patterns
Model used to transfer neuroscience data to biologically inspired vision
systems

MIT team:
Poggio, Desimone, Serre,

Future Directions
1-of-2 IT physiologist,
+ (Koch+Itti)

Develop new technologies to decode computations and
representations in the visual cortex:

MIT team:

Future Directions
+ (Koch+Itti)

Optical silencing and
circuits stimulation technology
based on X-rhodopsin

MIT team:

Future Directions
+ (Koch+Itti)


Multi-electrode
network
technology

MIT team:

Future Directions
+ (Koch+Itti)


Multi-electrode
network
technology

Simultaneous recordings
system
across areas

MIT team:
From the neuroscience Poggio, Desimone,
Serre, XXX
data towards a
system-level model of
natural vision
1. Clutter and image ambiguities: Attention and
cortical feedback
2. Learning and recognition of objects in video
sequences

Clutter and image ambiguities:
Attention and cortical feedback

IT


Circuitry of attention and
role of synchronization in
top-down and bottom-up
search tasks: monkey
IT electrophysiology in V4, IT
and FEF


+

IT

Learning and recognition of
objects in video sequences

How current computer
How brains learn
vision systems learn

Past Neo2 team:
CalTech, Bremen & MIT

Tomaso Poggio, MIT
Bob Desimone, MIT
Christof Koch, CalTech
Winrich Freiwald, Bremen

IT readout improves with
attention
stim cue transient change
isolated object

+

object not shown

Zhang Meyers Serre Bichot Desimone Poggio in prep n=67

IT readout improves with
attention
isolated object

+
attention away from object

object not shown


MIT team:
IT readout improves Poggio, Desimone,
Serre, XXX
with attention
isolated object
attention on object

+
attention away from object

object not shown


Two functional classes of cells to explain
invariant object recognition in the visual
cortex
Simple cells Complex cells

Template matching Invariance
Gaussian-like tuning max-like operation
~ “AND” ~”OR”

Riesenhuber & Poggio 1999 (building on Fukushima 1980 and Hubel & Wiesel 1962)

A neuromoprhic approach to computer vision

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

A neuromoprhic approach to computer vision

Editor's Notes