Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Promising avenues for interdisciplinary research in vision
1. Dr. Oge Marques
Associate Professor
Computer Science and Engineering
Florida Atlantic University
Boca Raton, FL (USA)
June 2009
2. Take-home message
We postulate that many
challenging problems in
human and computer vision
research can be approached
in a truly interdisciplinary
way and show examples of
recent work on the topic of
“objects in context” that
support our claim.!
3. Outline
• Background and motivation
• Visual perception
• Object detection and recognition
• Scene recognition and analysis
• The role of context
• Representative work
• Concluding remarks
5. Background and motivation
• Computer vision has many open research
questions
– Object detection, recognition, and categorization
– Scene analysis, recognition, and understanding
– Objects in context
• Research in human vision has grown
tremendously
– Computational models of selected visual processes
have emerged
• A truly interdisciplinary effort can help bring the
best of human vision research into selected
problems in computer vision.
6. The fundamental question of
vision
How are we able so quickly and
effortlessly to perceive meaningful,
coherent, 3D scenes from
incomplete, 2D patterns of light
that enter our eyes?
7. A selected related question
How are we able to perceive,
detect, categorize and recognize
objects and scenes?
8. Vision Science
Interdisciplinary study of many areas of visual
processing and function
Areas of Research Disciplines
• Detection • Psychology
• Attention • Neuroscience
• Memory • Biology
• Recognition • Computer Science
• Motion perception • Engineering
• etc. • etc.
11. The hierarchical nature of the
scientific knowledge of the visual
system
The deeper in the system you go, the less we know…
12. What do we know about visual
perception?
Not much compared to what we don’t know
Ignorance
Knowledge
13. Outline
• Background and motivation
• Visual perception
• Object detection and recognition
• Scene recognition and analysis
• The role of context
• Representative work
• Concluding remarks
15. Four Stages of Visual Perception
Inspired by work by David Marr
(1945-1980)
• One of the most influential neuroscientists of vision.
• Thought of vision as an information-processing
task.
• In his book Vision (1982), he distinguished three
different levels of description involved in
understanding complex information processing
systems:
– Computational level
– Algorithmic level
– Implementation level
• An important point is that the levels can be
considered independently.
21. Outline
• Background and motivation
• Visual perception
• Object detection and recognition
• Scene recognition and analysis
• The role of context
• Representative work
• Concluding remarks
22. The challenge of object
recognition
• Why is it so difficult for computers to carry
out object recognition tasks that humans
can perform easily?
• Although most human visual perception
appears to be almost effortless, it involves
complex “behind the scenes” processes.
23. The challenge of object
recognition
Human vision scientist: “Let’s
look at selected behavioral
and neural processes that
make it possible for people to
perceive (i.e., detect and
recognize) objects.”
Computer vision scientist:
“Let’s model what is known –
and reasonable – and try it
out on standard databases
containing real-world images.”
24. The challenge of object
perception
• The stimulus on the receptors is ambiguous
25. The challenge of object
perception
• The stimulus on the receptors is ambiguous
The inverse projection problem
26. The challenge of object
perception
• The stimulus on the receptors is ambiguous
27. The challenge of object
perception
• The stimulus on the receptors is ambiguous
http://users.skynet.be/J.Beever/pave.htm
28. The challenge of object
perception
• Objects can be hidden or blurred
Can you find…
- the pencil?
- the glasses?
29. The challenge of object
perception
• Objects can be hidden or blurred
Who are these people?
30. The challenge of object
perception
Objects look different from different viewpoints
The ability of humans to recognize an object seen from different
viewpoints is called viewpoint invariance.
31. The challenge of object
perception
Objects look different from different viewpoints
Q: Which two faces correspond to the same person?
A1 (human): (a) and (c)
A2 (computer): (a) and (b)
32. Research question
How do we recognize objects from different
viewpoints?
Structural-Description Models Image-Description Models
Propose that our ability to recognize Propose that our ability to recognize
3D objects is based on 3D volumes objects from different viewpoints is
(called volumetric features) that can based on stored 2D views of the
be combined to create the overall object as it would appear from different
shape of an object. viewpoints.
Which Model Is Correct? The actual mechanism for object recognition probably
involves elements of both the structural-description and image-description models
(Palmeri & Gauthier, 2004)
33. Why do we care about object
recognition?
Because object recognition leads
to
perception of function.
34.
35.
36.
37. So, what do we use direct or
indirect?
“It seems exceedingly unlikely (though
logically possible) that we categorize
everything in our visual fields”, Palmer.
Hypothesis: we categorize the objects
that are relevant for a specific task that we
have at hand, but we only extract
affordances from the others.
45. Object categorization
mountain
tree
building
banner
street lamp
vendor
people
Slide by Fei-Fei, Fergus, Torralba
46. Scene and context categorization
• outdoor
• city
• …
Slide by Fei-Fei, Fergus, Torralba
47. Is this space large or small?
How far are the buildings in the back?
Slide by Fei-Fei, Fergus, Torralba
48. Activity
What is this person doing?
What are these two doing??
Slide by Fei-Fei, Fergus, Torralba
49. Outline
• Background and motivation
• Visual perception
• Object detection and recognition
• Scene recognition and analysis
• The role of context
• Representative work
• Concluding remarks
50. What is a scene?
• A scene is a view of a real-world
environment that contains multiples
surfaces and objects, organized in a
meaningful way.
– A tour of scene understanding literature:
http://cvcl.mit.edu/SUNSarticles.htm
51.
52. The “gist” of a scene
• Mary Potter (1975, 1976) demonstrated that
during a rapid sequential visual presentation (100
msec per image), a novel scene picture is indeed
instantly understood and observers seem to
comprehend a lot of visual information, but a
delay of a few hundreds msec (~ 300 msec) is
required for the picture to be consolidated in
memory.
• The “gist” (a summary) refers to the visual
information perceived after/during a glance at an
image.
• To simplify, the gist is often synonymous with the
basic level category of the scene or event (e.g.
wedding, bathroom, beach, forest, street)
53.
54. What we (don’t) know about scene
analysis, recognition, and
classification
• Humans are very good at recognizing and
classifying scenes
• We are also very fast (100 ms or less)
• We often sacrifice accuracy in the name of
speed (we capture the gist but miss many
details)
• How exactly do we do it?
55.
56.
57.
58.
59.
60. What is the basis for scene
identification?
• Different schools of thought:
– Scene-centered
– Part-based (i.e., object-centered)
– Holistic
61.
62.
63. Outline
• Background and motivation
• Visual perception
• Object detection and recognition
• Scene recognition and analysis
• The role of context
• Representative work
• Concluding remarks
64. Objects in context
• Objects do not exist isolated from a
context
• Torralba’s challenge: “How far can you
go without using an object detector?”
83. Biederman 1982
• Pictures shown for 150
ms.
• Objects in appropriate
context were detected
more accurately than
objects in an
inappropriate context.
• Scene consistency
affects object detection.
89. Biederman’s classes in Computer Vision
Galleguillos & Belongie, Tech Report (2008)
• Interposition and support can be coded by
reference to physical space.
• Probability, position and size are defined as
semantic relations because they require access
to the referential meaning of the object.
• Semantic relations include information about
detailed interactions among objects in the scene
and they are often used as contextual features.
90. Dreaming of an ideal computer vision solution…
Galleguillos & Belongie, Tech Report (2008)
91. Types of context
Galleguillos & Belongie, Tech Report (2008)
• Contextual features can be grouped into 3
categories:
– semantic context (probability)
– spatial context (position)
– scale context (size).
• Contextual knowledge can be any information that is
not directly produced by the appearance of an object.
• It can be obtained from:
– the nearby image data;
– image tags or annotations;
– the presence and location of other objects.
92. Acquiring and modeling context
Galleguillos & Belongie, Tech Report (2008)
• Which of the three should one use?
– Spatial and scale context are the most
exploited types of context by recognition
frameworks.
– Generally, semantic context is implicitly
present in spatial context, as information of
object co-occurrences come from identifying
objects for the spatial relations in the scene.
– The same happens to scale context, as scale
is measured with respect to others objects.
– Therefore, using spatial and scale context
involve using all forms of contextual
information in the scene.
93. Outline
• Background and motivation
• Visual perception
• Object detection and recognition
• Scene recognition and analysis
• The role of context
• Representative work
• Concluding remarks
94. Representative work
• There are many research groups working
on the intersection of human and
computer vision in numerous topics,
including “objects in context”.
• Most expressive example: work by
Aude Oliva and Antonio Torralba (and
collaborators) at MIT.
95. Representative work
• A case study:
– L.W. Renninger and J. Malik (2004). When is
scene recognition just texture
recognition? Vision Research, 44,
2301-2311.
96. Renninger and Malik
• Basic idea
– Consider texture as an
early cue for scene
perception.
• It’s simple
• It’s fast (pre-attentive)
(Julesz, 1981)
97. Renninger and Malik
• Approach
How well do
humans Build a texture-based
discriminate scenes model for scene
with very limited discrimination.
exposure?
Compare
performance!
100. Renninger and Malik
• Task
– 2AFC
– Subjects are shown an image
• Image exposure time: 37, 50 and 69ms
– Image followed by a jumbled scene mask
– The task is to select one of two word choices
that best describes the image
– Subject performance: 77%, 82% and 92%
correct
• Get ready…
101.
102.
103.
104.
105.
106. Texture Discrimination Model
– Cluster response distributions from V1-like
filters to get prototypical responses (textons)
– Remember what types of textons occur in
particular scenes (build histogram)
– Label new image using a nearest neighbor
classifier
• Compare texton histogram for new image to stored
representations (χ2 distance)
(Malik and Perona, 1990)
(Malik, et. al., 1999)
113. Renninger and Malik
• Conclusion
– Early scene identification can be mostly
explained by a simple texture model
114. Outline
• Background and motivation
• Visual perception
• Object detection and recognition
• Scene recognition and analysis
• The role of context
• Representative work
• Concluding remarks
115. Our experience
• Working with Dept of Psychology @ FAU
– Two joint graduate-level courses
– Joint student supervision
– Joint grant proposals
– Joint papers (in preparation)
– Constant discussions
– Promising days ahead…
• Imaging Science & Technology Center
• Multidisciplinary Vision Program
116. Our focus
• To establish quantitative measures
of the importance of context
– Method: present subjects with degraded
(blocky, blurry, etc.) objects against a
context and ask them to recognize the
objet as it becomes progressively more
visible.
– Human vision: behavioral experiments
– Computer vision: stimuli creation
117. Concluding remarks
• Great potential
• Cultural barriers
• Open problems and challenges on both
sides
• The time is ripe for interdisciplinary
research on vision, particularly “objects in
context”
118. Acknowledgments
• Thanks to Prof. Elan Barenholtz (Dept of
Psychology, FAU) for allowing me to use some
of his slides and for the many interesting
discussions on the topics presented in this talk.
• Many slides for this talk contain material made
publicly available on the Web by Antonio
Torralba and Aude Oliva (MIT) and Fei-Fei Li
(UIUC).
119. Thank you for attending my talk!
Questions?
Email: omarques@fau.edu