SlideShare ist ein Scribd-Unternehmen logo
1 von 117
Downloaden Sie, um offline zu lesen
Understanding Visual Scenes


                   Antonio Torralba

  Computer Science and Artificial Intelligence Laboratory (CSAIL)
   Department of Electrical Engineering and Computer Science
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
SKY


                                      TREE


       PERSON   BENCH
                            PERSON

                                      PATH
       LAKE                                  PERSON

                               DUCK

                                      PERSON
                     DUCK
SIGN          DUCK

                              GRASS
A VIEW OF A PARK ON A NICE SPRING DAY
PEOPLE WALKING IN THE PARK




Do not                             PERSON FEEDING
 feed                              DUCKS IN THE PARK
the ducks DUCKS LOOKING FOR FOOD
   sign
PEOPLE UNDER THE
SHADOW OF THE TREES




         DUCKS ON TOP
         OF THE GRASS
Why do we care about recognition?
Perception of function: We can perceive the 3D shape, texture,
  material properties, without knowing about objects.
But, the concept of category encapsulates also information
  about what can we do with those objects.




“We therefore include the perception of function as a proper –indeed, crucial- subject
 for vision science”, from Vision Science, chapter 9, Palmer.
The perception of function
•  Direct perception (affordances): Gibson
          Flat surface
          Horizontal             Sittable
          Knee-high              upon
          …



•  Mediated perception (Categorization)
         Flat surface
         Horizontal      Chair   Sittable
         Knee-high               upon
         …                                  Chair   Chair



                                                     Chair?
Direct perception
Some aspects of an object function can be
   perceived directly
•  Functional form: Some forms clearly
   indicate to a function (“sittable-upon”,
   container, cutting device, …)
   Sittable-upon Sittable-upon           It does not seem easy
                                         to sit-upon this…


                         Sittable-upon
Scenes, as objects, also have
        affordances
The function of the scene
NIPS2009: Understand Visual Scenes - Part 1
Direct perception
Some aspects of an object function can be
   perceived directly
•  Observer relativity: Function is observer
   dependent
                                     From http://lastchancerescueflint.org
Limitations of Direct Perception
Objects of similar structure might have very different functions




  Not all functions seem to be available from direct visual information only.
Limitations of Direct Perception
Object detection and recognition
Short overview of current approaches
Object recognition
                   Is it really so hard?
                  Find the chair in this image   Output of normalized correlation


This is a chair
Object recognition
                Is it really so hard?
Find the chair in this image




                                            Pretty much garbage
                               Simple template matching is not going to make it
So, let’s make the problem simpler:
            Block world




                        Object Recognition in the Geometric Era:
                        a Retrospective. Joseph L. Mundy. 2006
Binford and generalized                                     Recognition by
        cylinders                                            components




                                                          Irving Biederman
                                                          Recognition-by-Components: A Theory of Human Image
                                                          Understanding.
                                                          Psychological Review, 1987.




                                                                    Object Recognition in the Geometric Era:
    Introduced in computer vision by A. Pentland, 1986.             a Retrospective. Joseph L. Mundy. 2006
Families of recognition algorithms
                                             Voting models                           Shape matching
  Bag of words models
                                                                                     Deformable models



                                         Viola and Jones, ICCV 2001                Berg, Berg, Malik, 2005
Csurka, Dance, Fan, Willamowski, and    Heisele, Poggio, et. al., NIPS 01
                                                                                   Cootes, Edwards, Taylor, 2001
Bray 2004                                Schneiderman, Kanade 2004
Sivic, Russell, Freeman, Zisserman,       Vidal-Naquet, Ullman 2003
ICCV 2005

                                                                            Rigid template models
                Constellation models




              Fischler and Elschlager, 1973                                  Sirovich and Kirby 1987
                                                                             Turk, Pentland, 1991
             Burl, Leung, and Perona, 1995
            Weber, Welling, and Perona, 2000                                 Dalal & Triggs, 2006
        Fergus, Perona, & Zisserman, CVPR 2003
The face age




  Feret dataset, 1996 DARPA

•  The representation and matching of pictorial structures Fischler,
Elschlager (1973).
•  Face recognition using eigenfaces M. Turk and A. Pentland (1991).
•  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995)
•  Graded Learning for Object Detection - Fleuret, Geman (1999)
•  Robust Real-time Object Detection - Viola, Jones (2001)
•  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection
in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)
• ….
Face detection
Haar-like filters and cascades
Viola and Jones, ICCV 2001




                               The average intensity in the
                               block is computed with four
                               sums independently of the
                               block size.
Also Fleuret and Geman, 2001
Generic objects:
                    Edge based descriptors




       Gavrila, Philomin, ICCV 1999
                                                Papageorgiou & Poggio (2000)




J. Shotton, A. Blake, R. Cipolla. PAMI 2008.




                                               Opelt, Pinz, Zisserman, ECCV 2006
Histograms of oriented gradients

                           Shape context
SIFT, D. Lowe, ICCV 1999   Belongie, Malik, Puzicha, NIPS 2000
Histograms of oriented gradients
           Dalal & Trigs, 2006




                                 x       Not a person




                                     x   person
Adding parts
Felzenszwalb, McAllester, Ramanan. 2008.
Adding parts




          Felzenszwalb, McAllester, Ramanan. 2008.
Felzenszwalb, McAllester, Ramanan. 2008.
Evaluation of performance
                  Before plotting and ROC or precision-recall curves…




The detector challenge: by looking at the output of a detector on a random set
of images, can you guess which object is it trying to detect?
What object is detector trying to detect?




The detector challenge: by looking at the output of a detector on a random set
of images, can you guess which object is it trying to detect?
What object is detector trying to detect?




                                    Table detector




The detector challenge: by looking at the output of a detector on a random set
of images, can you guess which object is it trying to detect?
7               5
                6                                        4

                         3
             1 2

1. chair, 2. table, 3. road, 4. road, 5. table, 6. car, 7. keyboard.
Some symptoms of standard approaches
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
Scenes rule over objects




3D percept is driven by the scene, which imposes its ruling to the objects
Scene recognition
The gist of the scene
Mary Potter (1976)
Mary Potter (1975, 1976) demonstrated that during a rapid sequential
visual presentation (100 msec per image), a novel picture is instantly
understood and observers seem to comprehend a lot of visual
information
Demo : Rapid image understanding
By Aude Oliva




    Instructions: 9 photographs will be shown for
     half a second each. Your task is to memorize
     these pictures
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
NIPS2009: Understand Visual Scenes - Part 1
Memory Test


Which of the following pictures have you seen ?

         If you have seen the image
             clap your hands once

        If you have not seen the image
                  do nothing
Have you seen this picture ?
NO
Have you seen this picture ?
NO
Have you seen this picture ?
NO
Have you seen this picture ?
NO
Have you seen this picture ?
Yes
Have you seen this picture ?
NO
You have seen these pictures




You were tested with these pictures
The gist of the scene

In a glance, we remember the meaning of an
   image and its global layout but some
   objects and details are forgotten
From objects to scenes
    SceneType 2 {street, office, …}             S




Object localization O1   O1     O1     O1           O2   O2       O2   O2



       Local features      L         L              L         L        Riesenhuber & Poggio (99); Vidal
                                                                       -Naquet & Ullman (03); Serre &
                                                                        Poggio, (05); Agarwal & Roth, (02),
                                                                        Moghaddam, Pentland (97), Turk,
                                                                        Pentland (91),Vidal-Naquet, Ullman,
                                            I                           (03) Heisele, et al, (01), Agarwal &
                               Image                                    Roth, (02), Kremp, Geman, Amit (02),
                                                                        Dorko, Schmid, (03) Fergus, Perona,
                                                                        Zisserman (03), Fei Fei, Fergus,
                                                                        Perona, (03), Schneiderman, Kanade
                                                                        (00), Lowe (99)
What makes scenes different?




                                          Ceiling
          Ceiling                          Lamp                                  wall
           Light                                                painting
            Door Door                     Painting     mirror
         Door                  mirror
Wall   Door      Wall Door                 wall                   wall                  Lamp
                                                                                   phone
                                     Fireplace                             Bed         alarm
                              armchair       armchair
          Floor                                                                    Side-table
                                        Coffee table                              carpet

                    Different objects, different spatial layout
What makes scenes different?




                                                 ceiling
                                                                       lamp          lamp
cabinet
                                                            glasses     mirror
            Window           mirror         mirror
                                      shelves        wall
                                                            beer
   dishes    faucet           faucet              towel                    counter
              sink counter     sink soap     sink
                                       counter
                                                cabinet
 cabinet     cabinet
                                cabinet                            stool

                     Different objects, similar spatial layout
What makes scenes different?



                                                                           ceiling
            ceiling                          window
                                                        table         wall                   window
                                 table table          chair
window     wall                  table          table                            table
                      window       chair                       table chair          table chair
            table                                 chair chair                            table
            table chair      chair      table
  bottle                                              chair                            chair chair
            table                        chair              table          floor                table
  chair            chair             floor     chair
                               chair


                      Similar objects, different spatial layout
What makes scenes different?



                                                                           ceiling
            ceiling                          window
                                                         table         wall                   window
                                   table table         chair
window     wall                    table         table                            table
                      window         chair                      table chair          table chair
            table                                  chair chair                            table
            table chair      chair       table
  bottle                                               chair                            chair chair
            table                         chair              table          floor                table
  chair            chair              floor     chair
                               chair


                      Similar objects, different spatial layout
What makes scenes different?




  cabinets   ceiling               cabinets    ceiling                          ceiling
                        cabinets                         cabinets
                                                                     wall        screen
window      window      window                                        column
           seat seat               window      seat seat window             seat seat
        seat                                seat                           seat    seat
      seat         seat                   seat        seat            seat seat seat seat
  seat                seat            seat               seat         seat seat seat seat
                         seat                               seat
                                                                     seat seat    seat seat


                       Similar objects, and similar spatial layout
             Different lighting, different materials, very specific object categories
What can be an alternative to
          objects?
Scene emergent features
“Recognition via features that are not those of individual objects but “emerge” as
 objects are brought into relation to each other to form a scene.” – Biederman 81




                From “on the semantics of a glance at a scene”, Biederman, 1981
Examples of scene emergent features




  Suggestive edges and junctions   Simple geometric forms




              Blobs                       Textures
Ensemble statistics
Ariely, 2001, Seeing sets: Representation by statistical properties
 Chong, Treisman, 2003, Representation of statistical properties
      Alvarez, Oliva, 2008, 2009, Spatial ensemble statistics

                                               Conclusion: observers had
                                               more accurate representation
                                               of the mean than of the
                                               individual members of the set.
From scenes to objects
    SceneType 2 {street, office, …}             S




Object localization O1   O1     O1     O1           O2   O2       O2   O2
                                                                                   G

                                                                            •  Scene emergent
       Local features      L         L              L         L             •  Ensemble statistics
                                                                            •  Global features


                                            I
                               Image
How far can we go without objects?
 SceneType 2 {street, office, …}        S




                                                             G
                                                    •  Scene emergent
    Local features      L       L           L   L   •  Ensemble statistics
                                                    •  Global features


                                    I
                            Image
Global image descriptors
Global image descriptors
Bag of words                                     Spatially organized textures



                                                   M. Gorkani, R. Picard, ICPR 1994
                                                   A. Oliva, A. Torralba, IJCV 2001
Sivic et. al., ICCV 2005
Fei-Fei and Perona, CVPR 2005


Non localized textons



Walker, Malik. Vision Research 2004

                                      …            S. Lazebnik, et al, CVPR 2006                  …
R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age,
ACM Computing Surveys, vol. 40, no. 2, pp. 5:1-60, 2008.
Gist descriptor
                                          Oliva and Torralba, 2001

                                                                      •  Apply oriented Gabor filters
                                                                       over different scales
                                                                      •  Average filter energy
                                                                       in each bin




                                                                            8       orientations
                                                                            4       scales
                                                                         x 16       bins
                                                                          512       dimensions

                  Similar to SIFT (Lowe 1999) applied to the entire image
M. Gorkani, R. Picard, ICPR 1994; Walker, Malik. Vision Research 2004; Vogel et al. 2004;
Fei-Fei and Perona, CVPR 2005; S. Lazebnik, et al, CVPR 2006; …
Textons
Filter bank         K-means (100 clusters)




              Malik, Belongie, Shi, Leung, 1999




              Walker, Malik, 2004
Bag of words &
                     spatial pyramid matching
Sivic, Zisserman, 2003. Visual words = Kmeans of SIFT descriptors




                                                                    S. Lazebnik, et al, CVPR 2006
The 15-scenes benchmark
                                                              Oliva & Torralba, 2001
                                                              Fei Fei & Perona, 2005
                                                              Lazebnik, et al 2006




                                                                         Office




Skyscrapers   Suburb Building facade Coast     Forest      Bedroom    Living room




 Industrial    Street   Highway    Mountain Open country    Kitchen        Store
Scene recognition
100 training samples per class
SVM classifier in both cases
                                 Human
                                 performance
Large Scale Scene Recognition
                          Urban      Nature       Indoor 
~1,000 categories 
>130,000 images 
>12,000 fully annotated
 images 




                                   Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
NIPS2009: Understand Visual Scenes - Part 1
Performance with 400 categories




                 Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
Training images


       Abbey




Airplane cabin




Airport terminal




         Alley



Amphitheater


                                     Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
Training images   Correct classifications


       Abbey




Airplane cabin




Airport terminal




         Alley



Amphitheater


                                               Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
Training images   Correct classifications       Miss-classifications
                                                                Monastery    Cathedral Castle


       Abbey


                                                                 Toy shop    Van     Discotheque


Airplane cabin


                                                                 Subway      Stage    Restaurant

Airport terminal

                                                               Restaurant Courtyard     Canal
                                                               patio

         Alley

                                                                  Harbor     Coast     Athletic
                                                                                       field
Amphitheater


                                               Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
Example of three different scenes
         RIVER 




BEACH 
                    VILLAGE 




                  Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
But they are all part of the same picture




                       Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
But they are all part of the same picture




                       Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
Scene detection




         Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
Categories or a continuous space?




                      Check poster by Malisiewicz, Efros 
Categories or a continuous space?
       From the city to the mountains in 10 steps
Spatial envelope:
                      a continuous space of scenes

                                                   Highway
                                                   Street
                                                   City centre
                                                   Tall Building
Degree of Expansion




                              Degree of Openness     Oliva & Torralba, 2001
Spatial envelope:
                       a continuous space of scenes
                                                  Coast
                                                  Countryside
                                                  Forest
                                                  Mountain
Degree of Ruggedness




                             Degree of Openness
                                                    Oliva & Torralba, 2001
Context for object recognition
Who needs context anyway? 
We can recognize objects even out of context 




                                                Banksy
Look‐Alikes by Joan Steiner 




Even in high resolution, we can not shut down contextual processing and it is hard
    to recognize the true identities of the elements that compose this scene.
Why is context important? 
•  Changes the interpretation of an object (or its function)




•  Context defines what an unexpected event is
Objects and Scenes 




Biederman’s violaPons (1981): 
Global precedence 
Forest Before Trees: The Precedence of Global Features in Visual
Perception
Navon (1977)
Scene recognition without
   object recognition
                                            Scene
                                                    S




                                                    g

                                                  Scene
                                                 features




    Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
An integrated model of Scenes,
       Objects, and Parts
                                              Scene
                                                       S




                                 Ncar
                           P(Ncar | S = street)



                           01     5             N      g
                           P(Ncar | S = park)
                                                     Scene
                                                      gist
                                                    features

                           01     5            N

      Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
Object retrieval: scene features vs. detector 
Results using the keyboard detector alone




Results using both the detector
and the global scene features




                     Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
The layered structure of scenes 
                    Assuming a human observer standing on the ground




                              p(x)                       p(x2|x1)
In a display with multiple targets present, the location of one target constraints the ‘y’
coordinate of the remaining targets, but not the ‘x’ coordinate.
                                                      Torralba, Oliva, Castelhano, Henderson. 2006
Context driven object detection

                                              Scene
                                                         S




                                 Ncar          Zcar
                           P(Ncar | S = street)



                           01     5            N         g

                                                       Scene
                                                        gist
                                                      features




      Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
An integrated model of Scenes,
       Objects, and Parts
                              We train a multiview car detector.




                                      car
                                     Fi            p(d | F=1) = N(d | µ1, σ1)
                                                   p(d | F=0) = N(d | µ0, σ0)

                             dcari        xcari

                                             N=4



      Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
An integrated model of Scenes,
       Objects, and Parts
                                                    Scene
                                                              S




                                     Ncar           Zcar




                                      car
                                     Fi
                                                              g

                                                            Scene
                             dcari        xcari              gist
                                                           features
                                              M=4




      Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
Two tasks 
NIPS2009: Understand Visual Scenes - Part 1

Weitere ähnliche Inhalte

Ähnlich wie NIPS2009: Understand Visual Scenes - Part 1

Iccv2009 recognition and learning object categories p0 c00 - introduction
Iccv2009 recognition and learning object categories   p0 c00 - introductionIccv2009 recognition and learning object categories   p0 c00 - introduction
Iccv2009 recognition and learning object categories p0 c00 - introductionzukun
 
CVPR2010: modeling mutual context of object and human pose in human-object in...
CVPR2010: modeling mutual context of object and human pose in human-object in...CVPR2010: modeling mutual context of object and human pose in human-object in...
CVPR2010: modeling mutual context of object and human pose in human-object in...zukun
 
Mit6870 orsu lecture11
Mit6870 orsu lecture11Mit6870 orsu lecture11
Mit6870 orsu lecture11zukun
 
Object recognition
Object recognitionObject recognition
Object recognitionakkichester
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2zukun
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative modelszukun
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2zukun
 
Introduction vision
Introduction visionIntroduction vision
Introduction visionAshish Kumar
 
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...zukun
 
Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_featuresBo Li
 
Iccv2009 recognition and learning object categories p1 c01 - classical methods
Iccv2009 recognition and learning object categories   p1 c01 - classical methodsIccv2009 recognition and learning object categories   p1 c01 - classical methods
Iccv2009 recognition and learning object categories p1 c01 - classical methodszukun
 

Ähnlich wie NIPS2009: Understand Visual Scenes - Part 1 (20)

Iccv2009 recognition and learning object categories p0 c00 - introduction
Iccv2009 recognition and learning object categories   p0 c00 - introductionIccv2009 recognition and learning object categories   p0 c00 - introduction
Iccv2009 recognition and learning object categories p0 c00 - introduction
 
CVPR2010: modeling mutual context of object and human pose in human-object in...
CVPR2010: modeling mutual context of object and human pose in human-object in...CVPR2010: modeling mutual context of object and human pose in human-object in...
CVPR2010: modeling mutual context of object and human pose in human-object in...
 
Mit6870 orsu lecture11
Mit6870 orsu lecture11Mit6870 orsu lecture11
Mit6870 orsu lecture11
 
Object recognition
Object recognitionObject recognition
Object recognition
 
NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2NIPS2009: Understand Visual Scenes - Part 2
NIPS2009: Understand Visual Scenes - Part 2
 
Ventral stream
Ventral streamVentral stream
Ventral stream
 
Promising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in visionPromising avenues for interdisciplinary research in vision
Promising avenues for interdisciplinary research in vision
 
Cvpr2007 object category recognition p3 - discriminative models
Cvpr2007 object category recognition   p3 - discriminative modelsCvpr2007 object category recognition   p3 - discriminative models
Cvpr2007 object category recognition p3 - discriminative models
 
Mit6870 orsu lecture2
Mit6870 orsu lecture2Mit6870 orsu lecture2
Mit6870 orsu lecture2
 
Introduction vision
Introduction visionIntroduction vision
Introduction vision
 
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...Iccv2009 recognition and learning object categories   p2 c02 - recognizing mu...
Iccv2009 recognition and learning object categories p2 c02 - recognizing mu...
 
Pc Seminar Jordi
Pc Seminar JordiPc Seminar Jordi
Pc Seminar Jordi
 
perception
perceptionperception
perception
 
Perception
PerceptionPerception
Perception
 
Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
 
Chapter4
Chapter4Chapter4
Chapter4
 
Perception
PerceptionPerception
Perception
 
Iccv2009 recognition and learning object categories p1 c01 - classical methods
Iccv2009 recognition and learning object categories   p1 c01 - classical methodsIccv2009 recognition and learning object categories   p1 c01 - classical methods
Iccv2009 recognition and learning object categories p1 c01 - classical methods
 
Where is my mind?
Where is my mind?Where is my mind?
Where is my mind?
 
Part1
Part1Part1
Part1
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Kürzlich hochgeladen

5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxKatherine Villaluna
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...raviapr7
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICESayali Powar
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxEduSkills OECD
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxSaurabhParmar42
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 

Kürzlich hochgeladen (20)

5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptxPractical Research 1: Lesson 8 Writing the Thesis Statement.pptx
Practical Research 1: Lesson 8 Writing the Thesis Statement.pptx
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...Patient Counselling. Definition of patient counseling; steps involved in pati...
Patient Counselling. Definition of patient counseling; steps involved in pati...
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICEQuality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptxPISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
PISA-VET launch_El Iza Mohamedou_19 March 2024.pptx
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 

NIPS2009: Understand Visual Scenes - Part 1

  • 1. Understanding Visual Scenes Antonio Torralba Computer Science and Artificial Intelligence Laboratory (CSAIL) Department of Electrical Engineering and Computer Science
  • 4. SKY TREE PERSON BENCH PERSON PATH LAKE PERSON DUCK PERSON DUCK SIGN DUCK GRASS
  • 5. A VIEW OF A PARK ON A NICE SPRING DAY
  • 6. PEOPLE WALKING IN THE PARK Do not PERSON FEEDING feed DUCKS IN THE PARK the ducks DUCKS LOOKING FOR FOOD sign
  • 7. PEOPLE UNDER THE SHADOW OF THE TREES DUCKS ON TOP OF THE GRASS
  • 8. Why do we care about recognition? Perception of function: We can perceive the 3D shape, texture, material properties, without knowing about objects. But, the concept of category encapsulates also information about what can we do with those objects. “We therefore include the perception of function as a proper –indeed, crucial- subject for vision science”, from Vision Science, chapter 9, Palmer.
  • 9. The perception of function •  Direct perception (affordances): Gibson Flat surface Horizontal Sittable Knee-high upon … •  Mediated perception (Categorization) Flat surface Horizontal Chair Sittable Knee-high upon … Chair Chair Chair?
  • 10. Direct perception Some aspects of an object function can be perceived directly •  Functional form: Some forms clearly indicate to a function (“sittable-upon”, container, cutting device, …) Sittable-upon Sittable-upon It does not seem easy to sit-upon this… Sittable-upon
  • 11. Scenes, as objects, also have affordances
  • 12. The function of the scene
  • 14. Direct perception Some aspects of an object function can be perceived directly •  Observer relativity: Function is observer dependent From http://lastchancerescueflint.org
  • 15. Limitations of Direct Perception Objects of similar structure might have very different functions Not all functions seem to be available from direct visual information only.
  • 16. Limitations of Direct Perception
  • 17. Object detection and recognition Short overview of current approaches
  • 18. Object recognition Is it really so hard? Find the chair in this image Output of normalized correlation This is a chair
  • 19. Object recognition Is it really so hard? Find the chair in this image Pretty much garbage Simple template matching is not going to make it
  • 20. So, let’s make the problem simpler: Block world Object Recognition in the Geometric Era: a Retrospective. Joseph L. Mundy. 2006
  • 21. Binford and generalized Recognition by cylinders components Irving Biederman Recognition-by-Components: A Theory of Human Image Understanding. Psychological Review, 1987. Object Recognition in the Geometric Era: Introduced in computer vision by A. Pentland, 1986. a Retrospective. Joseph L. Mundy. 2006
  • 22. Families of recognition algorithms Voting models Shape matching Bag of words models Deformable models Viola and Jones, ICCV 2001 Berg, Berg, Malik, 2005 Csurka, Dance, Fan, Willamowski, and Heisele, Poggio, et. al., NIPS 01 Cootes, Edwards, Taylor, 2001 Bray 2004 Schneiderman, Kanade 2004 Sivic, Russell, Freeman, Zisserman, Vidal-Naquet, Ullman 2003 ICCV 2005 Rigid template models Constellation models Fischler and Elschlager, 1973 Sirovich and Kirby 1987 Turk, Pentland, 1991 Burl, Leung, and Perona, 1995 Weber, Welling, and Perona, 2000 Dalal & Triggs, 2006 Fergus, Perona, & Zisserman, CVPR 2003
  • 23. The face age Feret dataset, 1996 DARPA •  The representation and matching of pictorial structures Fischler, Elschlager (1973). •  Face recognition using eigenfaces M. Turk and A. Pentland (1991). •  Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) •  Graded Learning for Object Detection - Fleuret, Geman (1999) •  Robust Real-time Object Detection - Viola, Jones (2001) •  Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001) • ….
  • 25. Haar-like filters and cascades Viola and Jones, ICCV 2001 The average intensity in the block is computed with four sums independently of the block size. Also Fleuret and Geman, 2001
  • 26. Generic objects: Edge based descriptors Gavrila, Philomin, ICCV 1999 Papageorgiou & Poggio (2000) J. Shotton, A. Blake, R. Cipolla. PAMI 2008. Opelt, Pinz, Zisserman, ECCV 2006
  • 27. Histograms of oriented gradients Shape context SIFT, D. Lowe, ICCV 1999 Belongie, Malik, Puzicha, NIPS 2000
  • 28. Histograms of oriented gradients Dalal & Trigs, 2006 x Not a person x person
  • 30. Adding parts Felzenszwalb, McAllester, Ramanan. 2008.
  • 32. Evaluation of performance Before plotting and ROC or precision-recall curves… The detector challenge: by looking at the output of a detector on a random set of images, can you guess which object is it trying to detect?
  • 33. What object is detector trying to detect? The detector challenge: by looking at the output of a detector on a random set of images, can you guess which object is it trying to detect?
  • 34. What object is detector trying to detect? Table detector The detector challenge: by looking at the output of a detector on a random set of images, can you guess which object is it trying to detect?
  • 35. 7 5 6 4 3 1 2 1. chair, 2. table, 3. road, 4. road, 5. table, 6. car, 7. keyboard.
  • 36. Some symptoms of standard approaches
  • 41. Scenes rule over objects 3D percept is driven by the scene, which imposes its ruling to the objects
  • 43. Mary Potter (1976) Mary Potter (1975, 1976) demonstrated that during a rapid sequential visual presentation (100 msec per image), a novel picture is instantly understood and observers seem to comprehend a lot of visual information
  • 44. Demo : Rapid image understanding By Aude Oliva Instructions: 9 photographs will be shown for half a second each. Your task is to memorize these pictures
  • 55. Memory Test Which of the following pictures have you seen ? If you have seen the image clap your hands once If you have not seen the image do nothing
  • 56. Have you seen this picture ?
  • 57. NO
  • 58. Have you seen this picture ?
  • 59. NO
  • 60. Have you seen this picture ?
  • 61. NO
  • 62. Have you seen this picture ?
  • 63. NO
  • 64. Have you seen this picture ?
  • 65. Yes
  • 66. Have you seen this picture ?
  • 67. NO
  • 68. You have seen these pictures You were tested with these pictures
  • 69. The gist of the scene In a glance, we remember the meaning of an image and its global layout but some objects and details are forgotten
  • 70. From objects to scenes SceneType 2 {street, office, …} S Object localization O1 O1 O1 O1 O2 O2 O2 O2 Local features L L L L Riesenhuber & Poggio (99); Vidal -Naquet & Ullman (03); Serre & Poggio, (05); Agarwal & Roth, (02), Moghaddam, Pentland (97), Turk, Pentland (91),Vidal-Naquet, Ullman, I (03) Heisele, et al, (01), Agarwal & Image Roth, (02), Kremp, Geman, Amit (02), Dorko, Schmid, (03) Fergus, Perona, Zisserman (03), Fei Fei, Fergus, Perona, (03), Schneiderman, Kanade (00), Lowe (99)
  • 71. What makes scenes different? Ceiling Ceiling Lamp wall Light painting Door Door Painting mirror Door mirror Wall Door Wall Door wall wall Lamp phone Fireplace Bed alarm armchair armchair Floor Side-table Coffee table carpet Different objects, different spatial layout
  • 72. What makes scenes different? ceiling lamp lamp cabinet glasses mirror Window mirror mirror shelves wall beer dishes faucet faucet towel counter sink counter sink soap sink counter cabinet cabinet cabinet cabinet stool Different objects, similar spatial layout
  • 73. What makes scenes different? ceiling ceiling window table wall window table table chair window wall table table table window chair table chair table chair table chair chair table table chair chair table bottle chair chair chair table chair table floor table chair chair floor chair chair Similar objects, different spatial layout
  • 74. What makes scenes different? ceiling ceiling window table wall window table table chair window wall table table table window chair table chair table chair table chair chair table table chair chair table bottle chair chair chair table chair table floor table chair chair floor chair chair Similar objects, different spatial layout
  • 75. What makes scenes different? cabinets ceiling cabinets ceiling ceiling cabinets cabinets wall screen window window window column seat seat window seat seat window seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat seat Similar objects, and similar spatial layout Different lighting, different materials, very specific object categories
  • 76. What can be an alternative to objects?
  • 77. Scene emergent features “Recognition via features that are not those of individual objects but “emerge” as objects are brought into relation to each other to form a scene.” – Biederman 81 From “on the semantics of a glance at a scene”, Biederman, 1981
  • 78. Examples of scene emergent features Suggestive edges and junctions Simple geometric forms Blobs Textures
  • 79. Ensemble statistics Ariely, 2001, Seeing sets: Representation by statistical properties Chong, Treisman, 2003, Representation of statistical properties Alvarez, Oliva, 2008, 2009, Spatial ensemble statistics Conclusion: observers had more accurate representation of the mean than of the individual members of the set.
  • 80. From scenes to objects SceneType 2 {street, office, …} S Object localization O1 O1 O1 O1 O2 O2 O2 O2 G •  Scene emergent Local features L L L L •  Ensemble statistics •  Global features I Image
  • 81. How far can we go without objects? SceneType 2 {street, office, …} S G •  Scene emergent Local features L L L L •  Ensemble statistics •  Global features I Image
  • 83. Global image descriptors Bag of words Spatially organized textures M. Gorkani, R. Picard, ICPR 1994 A. Oliva, A. Torralba, IJCV 2001 Sivic et. al., ICCV 2005 Fei-Fei and Perona, CVPR 2005 Non localized textons Walker, Malik. Vision Research 2004 … S. Lazebnik, et al, CVPR 2006 … R. Datta, D. Joshi, J. Li, and J. Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, vol. 40, no. 2, pp. 5:1-60, 2008.
  • 84. Gist descriptor Oliva and Torralba, 2001 •  Apply oriented Gabor filters over different scales •  Average filter energy in each bin 8 orientations 4 scales x 16 bins 512 dimensions Similar to SIFT (Lowe 1999) applied to the entire image M. Gorkani, R. Picard, ICPR 1994; Walker, Malik. Vision Research 2004; Vogel et al. 2004; Fei-Fei and Perona, CVPR 2005; S. Lazebnik, et al, CVPR 2006; …
  • 85. Textons Filter bank K-means (100 clusters) Malik, Belongie, Shi, Leung, 1999 Walker, Malik, 2004
  • 86. Bag of words & spatial pyramid matching Sivic, Zisserman, 2003. Visual words = Kmeans of SIFT descriptors S. Lazebnik, et al, CVPR 2006
  • 87. The 15-scenes benchmark Oliva & Torralba, 2001 Fei Fei & Perona, 2005 Lazebnik, et al 2006 Office Skyscrapers Suburb Building facade Coast Forest Bedroom Living room Industrial Street Highway Mountain Open country Kitchen Store
  • 88. Scene recognition 100 training samples per class SVM classifier in both cases Human performance
  • 89. Large Scale Scene Recognition Urban  Nature  Indoor  ~1,000 categories  >130,000 images  >12,000 fully annotated  images  Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 91. Performance with 400 categories Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 92. Training images Abbey Airplane cabin Airport terminal Alley Amphitheater Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 93. Training images Correct classifications Abbey Airplane cabin Airport terminal Alley Amphitheater Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 94. Training images Correct classifications Miss-classifications Monastery Cathedral Castle Abbey Toy shop Van Discotheque Airplane cabin Subway Stage Restaurant Airport terminal Restaurant Courtyard Canal patio Alley Harbor Coast Athletic field Amphitheater Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 95. Example of three different scenes RIVER  BEACH  VILLAGE  Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 96. But they are all part of the same picture Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 97. But they are all part of the same picture Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 98. Scene detection Xiao, Hays, Ehinger, Oliva, Torralba; maybe 2010
  • 99. Categories or a continuous space? Check poster by Malisiewicz, Efros 
  • 100. Categories or a continuous space? From the city to the mountains in 10 steps
  • 101. Spatial envelope: a continuous space of scenes Highway Street City centre Tall Building Degree of Expansion Degree of Openness Oliva & Torralba, 2001
  • 102. Spatial envelope: a continuous space of scenes Coast Countryside Forest Mountain Degree of Ruggedness Degree of Openness Oliva & Torralba, 2001
  • 103. Context for object recognition
  • 105. Look‐Alikes by Joan Steiner  Even in high resolution, we can not shut down contextual processing and it is hard to recognize the true identities of the elements that compose this scene.
  • 106. Why is context important?  •  Changes the interpretation of an object (or its function) •  Context defines what an unexpected event is
  • 108. Global precedence  Forest Before Trees: The Precedence of Global Features in Visual Perception Navon (1977)
  • 109. Scene recognition without object recognition Scene S g Scene features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
  • 110. An integrated model of Scenes, Objects, and Parts Scene S Ncar P(Ncar | S = street) 01 5 N g P(Ncar | S = park) Scene gist features 01 5 N Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
  • 111. Object retrieval: scene features vs. detector  Results using the keyboard detector alone Results using both the detector and the global scene features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
  • 112. The layered structure of scenes  Assuming a human observer standing on the ground p(x) p(x2|x1) In a display with multiple targets present, the location of one target constraints the ‘y’ coordinate of the remaining targets, but not the ‘x’ coordinate. Torralba, Oliva, Castelhano, Henderson. 2006
  • 113. Context driven object detection Scene S Ncar Zcar P(Ncar | S = street) 01 5 N g Scene gist features Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
  • 114. An integrated model of Scenes, Objects, and Parts We train a multiview car detector. car Fi p(d | F=1) = N(d | µ1, σ1) p(d | F=0) = N(d | µ0, σ0) dcari xcari N=4 Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.
  • 115. An integrated model of Scenes, Objects, and Parts Scene S Ncar Zcar car Fi g Scene dcari xcari gist features M=4 Murphy, Torralba, Freeman; NIPS 2003. Torralba, Murphy, Freeman, CACM 2010.