SlideShare a Scribd company logo
1 of 36
Attentional Object
    Detection
  Why look for everything everywhere?




             Sergey Karayev
          for UC Berkeley Computer Vision Retreat 2011
Problem:
Recognition and localization of objects
          of multiple classes
         in cluttered scenes.
Proposals




 Detectors     Object Detection




Post-process
Proposals




 Detectors     Object Detection




Post-process
etc.

                Sliding window      Proposals




                                 ...with priors/
Voting                               pruning




                   Efficient
                    search
etc.

              Sliding window    Proposals




ā€¢Too slow: quadratic in number of search
dimensions (x,y,scale,class).
ā€¢Speed-ups:
 ā€¢Parallelization.
 ā˜…Priors/Pruning with non-detector
 features.
 ā˜…Algorithmic efficiency.
Proposals




Priors/pruning




 ā€¢Usesnon-detector features (location,
 geometry, context, depth, ā€œobjectnessā€)
 ā€¢Often done in post-processing.
Proposals

Currently only works for local features.


Voting



              Efficient
             subwindow
               search
ā€¢ Priority ordered? How?
Proposals      ā€¢ Pruned / Exhaustive?

               ā€¢ Class-speciļ¬c?




 Detectors




Post-process
ā€¢ Priority ordered? How?
Proposals      ā€¢ Pruned / Exhaustive?

               ā€¢ Class-speciļ¬c?




 Detectors




Post-process
Detector
                                                    Template/Parts



                                                                   single feature [2]. As a result each stage of the boosting

Local features                                                     process, which selects a new weak classiļ¬er, can be viewed
                                                                   as a feature selection process. AdaBoost provides an effec-
                                                                   tive learning algorithm and strong bounds on generalization
                                                                                                                                                    A
                                                                   performance [13, 9, 10].
                                                                       The third major contribution of this paper is a method
                                                                   for combining successively more complex classiļ¬ers in a
                                     single feature [2]. As a result eachstructure which dramatically increases the speed of
                                                                   cascade stage of the boosting
                                   single feature [2]. Asa a resultdetector by focusingboosting on promising regions of
                                                                   the each stage of be attention
                                     process, which selects new weak classiļ¬er, can theviewed
                                   process, which selects a new weak classiļ¬er, can behind focus of attention approaches
                                                                   the image. The notion be viewed
                                     as a feature selection process. AdaBoost provides an effec-                                                    C
                                   as a feature selection process.thatbounds on generalization
                                     tive learning algorithm and strong it is often possible to rapidly determine where in an
                                                                   is AdaBoost provides an effec-
                                                                                                                    A                                 B
                                   tive learning algorithm and image an object might occur [17, 8, 1]. More complex pro-
                                     performance [13, 9, 10].      strong bounds on generalization




                                                                                                                                                            ct
                                                                                                                                                             B
                                   performance [13, 9, contribution of this paper isonly for these promising regions. The
                                         The third major 10]. cessing is reserved a method                               A
                                                                                                                                      Figure 1: Example rectangle features shown r
                                     for combining successivelykey measure of such is a method is the ā€œfalse negativeā€ rate
                                                                     more complex classiļ¬ers in a
                                        The third major contribution of this paper an approach                                        enclosing detection window. The sum of the
                                     cascade structure which dramatically increases process. of must be the case that all, or
                                                                       the complex the speed in
                                                                   ofmoreattentional classiļ¬ersIt a




                                                                                                                                                      je
                                   for combining successively

                                                 Decision stumps
                                                                                                                                      lie within the white rectangles are subtracted f
  single feature [2]. As a result the detector by focusing dramaticallypromising regions of of
                                      each stage of the boosting
                                   cascade structure which
                                                                attention on object instances are selected by the attentional
                                                                   almost all,increases the speed                                     of pixels in the grey rectangles. Two-rectangle
  process, which selects a new weak image. Thecan be viewedfocus of attention approaches of
                                     the classiļ¬er, notion behind  ļ¬lter. on promising regions
                                   the that it is often possible attention determine where in an
                                         detector by focusing to rapidly
                                                                                                                    C                                 D
                                                                                                                                      shown in (A) and (B). Figure (C) shows a th
                                     is




                                                                                                                                               re
  as a feature selection process. the image. The notion behind focus ofdescribe a approaches training an extremely sim-
                                    AdaBoost provides an effec- We will attention process for                            C            feature, and (D) a four-rectangle feature.
                                                                                                                                                             D
                                     image an object might occur [17, 8, 1]. More complex pro-
  tive learning algorithm and strong bounds reserved only forple and efļ¬cient regions. The an can be used as a ā€œsuper-
                                                  on generalizationrapidly determine wherewhich
                                   iscessing is often possible tothese promising
                                       that it is                                    classiļ¬er in
  performance [13, 9, 10].                                         visedā€ focus of attention operator.Figure 1: Example rectangle features shown relative to the
                                                                                         A               The term supervised B
                                   image an objectsuch an occur [17, 8, 1]. More complex pro- enclosing detection window. The sum of the pixels which
                                     key measure of might approach is the ā€œfalse negativeā€ rate
      The third major contribution of this paper isprocess. refers tobe the case regions. or
                                   cessing attentional a methodthese promising that all, The lieoperator 1: white rectangles are subtracted shown relative to the
                                     of the is reserved only for must the fact that the attentional within the trained to rectangleusing features rather than the pixels direct
                                                                   It                                              is                 for
                                                                                                           Figure Example                  features from the sum
  for combining successively more measureobject instancesdetect is the ā€œfalse negativeā€ rate of In the domain of face
                                   key complexof such an approach examples the a particular class. pixels in thedetection window. The reason is that are which act to en
                                     almost all, classiļ¬ers in are selected by of attentional
                                                                    a                                                                 common                    features can
                                                                                                                       grey rectangles. Two-rectangleof the pixels
                                                                                                           enclosing false neg-                 sum features is difļ¬cult to learn u
  cascade structure which dramatically attentionalthe speed of must be is possiblethatachieve fewer than (A) and (B). Figure (C) shows a three-rectangle
                                   of the increases process. detection it the case to all, or shown in 1% the white rectangles are subtractedthat the sum
                                     ļ¬lter.                         It                                                                domain knowledge
                                                                                                           lie within                                       from
  the detector by focusing attention We all, object instances are training anby thepositives usingfeature, and (D) a four-rectangle feature. of training data. For this system th
                                   almost will describe a process for selected false attentional a classiļ¬er constructed rectangles. Two-rectangle features are
                                          on promising regions of  atives and 40% extremely sim-                                      quantity
                                                                                                           of pixels in the grey
  the image. The notion behind focusandattention classiļ¬er which two be used asfeatures. The effect of this ļ¬lter isD to
                                   ļ¬lter. of efļ¬cient approaches
                                     ple                           from can Harr-like a ā€œsuper-
                                                                                         C
                                                                                                                                      second critical motivation for features: the f
                                                                                                           shown in where the
                                                                                                                      (A) and (B). Figure (C) showsmuch faster than a pixel-based
                                                                                                                                      system operates a three-rectangle
  is that it is often possible to rapidly determine attention operator.byThe term supervised
                                     visedā€ focus of where in an   reduce      over one half the number of locations
                                        We will describe a process for training an extremely sim-          feature, and (D) a four-rectangle feature.
ā€¢ Priority ordered? How?
Proposals      ā€¢ Pruned / Exhaustive?

               ā€¢ Class-speciļ¬c?




               ā€¢ Local or global feature?
               ā€¢ Shared parts across classes?
 Detectors     ā€¢ Cascaded?

               ā€¢ Conļ¬dence ā‰ˆ likelihood?




Post-process
ā€¢ Priority ordered? How?
Proposals      ā€¢ Pruned / Exhaustive?

               ā€¢ Class-speciļ¬c?




               ā€¢ Local or global feature?
               ā€¢ Shared parts across classes?
 Detectors     ā€¢ Cascaded?

               ā€¢ Conļ¬dence ā‰ˆ likelihood?




Post-process
Post-process
ā€¢ Priority ordered? How?
Proposals      ā€¢ Pruned / Exhaustive?

               ā€¢ Class-speciļ¬c?




               ā€¢ Local or global feature?
               ā€¢ Shared parts across classes?
 Detectors     ā€¢ Cascaded?

               ā€¢ Conļ¬dence ā‰ˆ likelihood?




               ā€¢ NMS/Meanshift?
Post-process   ā€¢ Context? (Inter-object?)
ā€¢ Priority ordered? How?
Proposals      ā€¢ Pruned / Exhaustive?

               ā€¢ Class-speciļ¬c?




               ā€¢ Local or global feature?
               ā€¢ Shared parts across classes?
 Detectors     ā€¢ Cascaded?

               ā€¢ Conļ¬dence ā‰ˆ likelihood?




               ā€¢ NMS/Meanshift?
Post-process   ā€¢ Context? (Inter-object?)
Where we are


Cascaded Deformable Part Models.
Per class, ~1 sec / medium-sized image.
Where we are

ā€¢ PASCAL: ~5K test images, 20 classes. 28
  hours to process.
ā€¢ ImageNet ā€™11: ~450K test images, 3000
  classes. 375,000 hours to process.
Where we are


ā€¢ Standard movie: ~130K frames. 36 hours
 per object class.
So what can we do?
Not look for everything
      everywhere!
New Performance
             Evaluation
ā€¢ Goal: Be able to stop detection and have the
  most correct detections and the fewest
  incorrect detections at any time.

     AP                      AP


                      vs.
              time                    time
How?
Attention
ā€¢ Natural bottleneck in animal vision.
ā€¢ Two kinds:
   ā€¢ Bottom-up: rapid, driven by
     featurization.
  ā€¢ Top-down: secondary, driven by task.
 ā€¢ Eye ļ¬xations are a good proxy for implicit
   attention. Necessary because of the fovea.
Tilke Judd
 tjudd@mit.edu          Krista Ehinger
                       kehinger@mit.edu              FrĀ“ do Durand
                                                       e
                                                  fredo@csail.mit.edu                Anton
                                                                                  torralba
tjudd@mit.edu     kehinger@mit.edu        fredo@csail.mit.edu     torralba

                          Basic ideas
MIT Computer Science Artiļ¬cial Intelligence Laboratory and MIT Brain and Co
MIT Computer Science Artiļ¬cial Intelligence Laboratory and MIT Brain and Co

                   Abstract
                   Abstract
      ā€¢     Single saliency map from
or many applications in graphics, design, and human
or many applicationsis essential todesign, and human
puter interaction, it in graphics, understand where
ans look in a scene. isfoci eye toattention
            which Where oftracking devices are
puter interaction, it essential        understand where
            are selected.
a viable option, models of saliency can be used to pre-
ans look in a scene. Where eye tracking devices are
 ļ¬xation option, models of saliency can be used to pre-
   viable locations. Most saliency approaches are based
      ā€¢
ļ¬xation locations. Most saliency not consider are based
 ottom-up computation that does approaches top-down
            Sequential selection due
ge semantics and often that doesmatch actual eye move-
ottom-up computation does not not consider top-down
 s. To address ā€œinhibition of return,ā€
            to this problem, we collected eye tracking
 e semantics and often does not match actual eye move-
  of To viewers on 1003 images and use thiseye tracking
     15 address this problem, we collected database as
 ing and testinginformationmodeldatabase as
            or onexamples to learn use this of saliency
 s.
 of 15 viewers 1003 images and       a
 d on low, maximization.model of saliency
             middle and high-level image features. This
 ing and testing examples to learn a
e databasemiddle and high-level image features. This
d on low, of eye tracking data is publicly available
      ā€¢
  this paper.
            Inļ¬‚uenced from the top.
   database of eye tracking data is publicly available
 this paper.
                                                          Figure 1. Eye tracking data. We co
ntroduction                                               on 1003 images from 15 viewers to us
                                                          Figure 1. Eye tracking data. We co
model. On average, images contained 4.6 cars and 2.1 pedestrians.   targets (cars or pedestrians) and press a key to indicate co




x




                   d
given in Eqs. (1)ā€“(5) induced by the three main assumptions.



rmined by the scene description S (e.g., vectorial
perties such as global illumination, scene iden-
resent). The product of the likelihood P(IjS) and
Attentional Object Detector

  Assume we have a powerful but expensive
           per-class classiļ¬er.
ā€¢ How should we pick locations to consider?
ā€¢ What should we look for at a location?
Attentional Object Detector


          Proposals




          Detector
Some related work
Vogel and Freitas. Target-directed attention:
Sequential decision-making for gaze planning. ICRA
                        2008.

                      ā€¢ GIST and a simple
                        regressor to compute
                        likelihood map.
                      ā€¢ Reinforcement learning
                        to ļ¬nd best gaze
                        sequence.
                      ā€¢ ā€œHeavierā€ feature and
                        regressor to evaluate
                        the ļ¬xation locations.
Vogel and Freitas. Target-directed attention:
Sequential decision-making for gaze planning. ICRA
                        2008.



ā€¢ Evaluated only on Caltech Office scenes.
ā€¢ Gaze planning improves over just using
  bottom-up saliency while being only slightly
  slower.
ā€¢ Detection rate is lower than full image, but
  maximum precision is higher.
Gualdi et al. Multi-stage Sampling with Boosting
    200 CascadesPrati, and R. Cucchiara
          G. Gualdi, A. for Pedestrian Detection in Images and
                             Videos. ECCV 2010.

                                                            ā€¢ LogitBoost classiļ¬er
                                                                 with covariance
                                                                 descriptors.
                                                            ā€¢ Score falls off over
                                                           some region of
Multi-stage Sampling with Boosting Cascades for Pedestrian Detection   203

    Fig. 1. Region of support for the cascade of LogitBoost classiļ¬ers trained on INRIA
                                                           support. to 48x144),
    pedestrian dataset, averaged over a total 62 pedestrian patches; (a) a positive patch
    (pedestrian is 48x144); (b-d) response of the classiļ¬er: (b) ļ¬xed w (equal
                                                                             s
    sliding wx , wy ; (c) ļ¬xed wx (equal to x of patch center), sliding ws , wy ; (d) ļ¬xed wy
                                                            ā€¢ Sample points in image
    (equal to y of patch center), sliding wx , ws ; (e) 3D plot of the response in (b).

                                                                 to estimate P(O|I).
    scale variations, i.e. the response of the classiļ¬er in the close neighborhood (both
                                                                 Resample close to
    in position and scale) of the window encompassing a pedestrian, remains positive
    (ā€œregion of support ā€ of a positive detection). Having a suļ¬ƒciently wide region of
                                                                 promising points.
    support allows to uniformly prune the SW S, up to the point of having at least
    one window targeting the region of support of each pedestrian in the frame. V ice
    versa, a too wide region of support could generate de-localized detections [4].
   Distribution of samples important advantage of =
       O n this regard, an across the stages: m         the 5 and
                                                             covariance descriptors is its
Gualdi et al. Multi-stage Sampling with Boosting
 Cascades for Pedestrian Detection in Images and
                 Videos. ECCV 2010.




ā€¢ Evaluated on INRIA Pedestrians, Graz02, and
  some videos.
ā€¢ Always reduces miss rate over sliding
  window, while being 2-6x faster.
fewer than 25 successive ļ¬xations, this foveated approach
                                                                    provide a useful way to improve the search efļ¬ciency of
will be faster than exhaustively applying object detection to
                                                                    speciļ¬c object detectors, i.e., most regions without objects
   Butko and Movellan. Optimal Scanning for Faster
a high resolution image.
   Two particular challenges are: (1) sequentially picking
                                                                    tend to have low visual saliency [5]. Unfortunately visual
                                                                    saliency ļ¬lters are computationally expensive [17] and need
            Object Detection. CVPR 2009.
the ļ¬xation locations; (2) integrating the information ac-          to be applied to entire images, making them less attractive
                                                                    for scanning very high resolution images.
                                                                        Our work also relates to recent work on optimal image
                                                                    search, like the Efļ¬cient Subwindow Search [10]. Our ap-
                                                                    proach is data driven and detector independent, where the
                                                                    ESS approach is more analytic. Our approach requires a
                                                                    dataset of labeled images to build a statistical model of
                                                                    the performance of a given object detector. The ESS ap-
                                                                                                 Ė†
                                                                    proach requires a function f that must be constructed ana-

                                                                    ā€¢ Digital fovea placed
                                                                    lytically for each speciļ¬c object detector for the guarantees
                                                                    of the algorithm to hold, but only some object detectors are
                                                                    amenable to such a construction. The efļ¬ciency of the al-
                                                                         sequentially to maximize
                                                                    gorithm depends on the tightness of the upper bound that f
                                                                    computes and the computational overhead of evaulating f .  Ė†
                                                                                                                                 Ė†


                                                                         expected of Eye-Movement
                                                                    2. I-POMDP: A Model
                                                                                        information gain.
                                                                    ā€¢ Liken it to stochastic
                                                                        Najemnik & Geisler developed an information maxi-
                                                                    mization (Infomax) model of eye-movements and applied
                                                                    it to explain visual search of simple objects in pink noise
                                                                         optimal control, and use a
                                                                    image backgrounds [12]. The model uses a greedy search
                                                                    approach: saccades are planned one at a time with the next
                                                                         ā€œmultinomial infomax
                                                                    saccade made to the location in the image plane that is ex-
                                                                    pected to yield the highest chance of correctly guessing the
                                                                         POMDPā€ to pick the
                                                                    target location. The Najemnik & Geisler model success-
                                                                    fully captured some aspects of human saccades but it has

                                                                         sequence.
                                                                    two important limitations: (1) Its ļ¬xation policy is greedy,
                                                                    i.e., it maximizes the instantaneous information gain rather
                                                                    than the long term gathering of information. (2) It is appli-
                                                                    cable only to artiļ¬cially constructed images.
                                                                        Butko & Movellan [4] proposed the I-POMDP frame-
                                                                    work for modeling visual search. The framework ex-
Figure 1. A digital fovea: Several concentric Image Patches (IPs)   tends the Najemnik & Geisler model by applying long-term
(Top) are arranged around a point of ļ¬xation. The image por-        POMPDP planning methods. They showed that long-term
tions contained within each rectangle are reduced to a common       information maximization reduces search time. Moreover
Butko and Movellan. Optimal Scanning for Faster
                  Object Detection. CVPR 2009.




    Fixation 1             Fixation 2              Fixation 3                                 4

                                                                                             3.5


                                                                           ā€¢ Evaluate on own faces
                                                                                              3
                                                                                                                              I!POMDP
                                                                                                                              Viola Jones




                                                                        Error (grid cells)
                                                                                                   dataset against V-J. 2x
                                                                                             2.5

                                                                                              2
    Fixation 4             Fixation 5              Fixation 6
                                                                                             1.5   speedup, but small
                                                                                              1    decrease in accuracy.
                                                                                             0.5

                                                                                              0
                                                                                               0     0.02     0.04     0.06       0.08      0.1
                                                                                                            Runtime (seconds)
Figure 6. Successive ļ¬xation choices by the MI-POMDP policy.
The face is found in six ļ¬xations. The ļ¬nal estimation of the face     Figure 8. By changing the Viola Jones scaling factor, both Viola
location is one grid-cell diagonal from the labeled location, giving   Jones and I-POMDP become faster and less accurate. MI-POMDP
a euclidean distance error of 1.4 grid-cells.                          is usually closer to the origin on a time-error curve, showing that
                                                                       it gives a better speed-accuracy tradeoff than just applying Viola
                                                                       Jones.
crease in accuracy, as shown in the Table below. Both meth-
ods on average placed the face between one and two grid-
cells off the true face location.                                      4.2. Speed-Accuracy Tradeoff
Vijayanarasimhan and Kapoor. Visual Recognition and
    Detection Under Bounded Computational Resources.
                        CVPR 2010.
                                                                                                       Computation
                                                               Feature          Channel        Dim
                                                                                                         time (ms)
                                                                SIFT          R, G, B, Gray 128             0.21
                                                           P64 Figure 3. 17 grid weights learnt for each category in the ETHZ
                                                               T1a S2 The         Gray          68           1.2
                                                               shape dataset.
                                                            P18 T2 S2 9           Gray          36          0.09
                                                 Table 2. Attributes of theresources. used in the experiments.
                                                            of computational features
                                                        ā€¢ Hough voting with multiplethe INRIA
                                                             Datasets: We use two challenging object detection
                                                          datasets namely, the ETHZ shape dataset and
                                                  (ļ¬ve in our feature types.to compare against several state-of-
                                                                  experiments) order generate an initial set of
                                                                   horses dataset in and
                                                  potheses. Then, we run each selection strategy iterativ
                                                                   the-art hough based detection approaches [21, 24, 11, 10].
             Figure 2. A summary of our algorithm.
                                                  updatingā€¢ Uses Value ofisInformationweighted a ļ¬x
                                                                   hypotheses as dataset contains 255 the to for ļ¬ve
                                                               the The ETHZ probability then modeled asgiraffes, mugssum
                                                                                   shape                       images
                                                                                            features get added until and
                                                                   shape-based classes (applelogos, bottles,
                                                                    conditional
        |f ). This term depends on the feature f which is timethe probabilities(1 its lookneighbors: the
                                                  amount of pick region of horsesin our case). 170 images
                                                                   swans). lapsed to nearest atcontains
                                                                    of has The INRIA sec dataset and
(O,x)
i
to be extracted.
However, since we are only trying to determine the
                                                      In type best feature to qualitative 170 imagescomp
                                                                    5 we or more some extract. results with-
                                                          Figure withthe category.side-views of horses and objects occur in
                                                                   out
                                                                         one
                                                                               show In both the datasets,
                                                  ing the ļ¬rst highly cluttered natural scenes with large variations in both
eature to extract, we instead estimate the expected value          1000 points selected by our p(h|f ) select
                                                                                p(gi
                                                                                     (O,x)
                                                                                           |f, l) =        qi active
                                                                                                            h
                                                                                                                            (2)
he term p(gi
                (O,x)
                                                              ā€¢ scale passive selection hāˆˆN (f
                                                  approach Active approach extracts less ob-
                      |f ) for every feature type t. We do this and theand appearance, and sometimes) contain The ļ¬rst r
                                                                                                    baseline. multiple
                                                  contains example imagesfeature inlessfair comparisons. qih ET
                                                                  features, takeseverydatabase FOand = time, and
considering all the features in the training database that         jects per image. We use the same training and testing setup
of type t and obtain the average value of the term. The            used by h[10] on both datasets for category in the
                                                                    where       is a from the
                                                  shape, the has higher accuracy on ETHZpoints
ure type with the largest value can be interpreted as the second row refers third conditional probability for part
                                                                        (O,x)
                                                                              |h, l) and to the rows show the
                                                                       Implementation Details: Parameter learning of the
                                                                    p(gi
  that is expected to provide the best evidence for object the grid model is performed by ļ¬rst scaling strategies, truth
                                                  lected by and Horses. ļ¬xed selectionisallmodelour exper-   a the ground resp
                                                                   active and random height term pixels in parameter
                                                                    presence given the features. This
   gi . For example, for the ā€œbodyā€ of a giraffe, texture-         bounding boxes to a                (100
                                                                    that needs to be estimated from the training data for every
                                                  tively. Brightiments) denote selectedgaspect ratio. Then the points
ed features could provide the best evidence and there-              feature h and every grid part feature points.
                                                                     dots while preserving the i . And,
                                    (O,x)                      are uniformly sampled along the edges (using a Canny edge
Image Attributions
ā€¢   Girschick et al. - Cascaded deformable
    part models.
ā€¢   Viola & Jones - Rapid object detection.
ā€¢   Judd et al. - Learning to predict where
    humans looks.
ā€¢   Chikkerur et al. - What and where? A
    Bayesian theory of attention.
ā€¢   ...and the papers reviewed.

More Related Content

What's hot

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
Ā 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Sergey Karayev
Ā 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
Ā 
Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...Seonho Park
Ā 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
Ā 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
Ā 
Convolutional neural networks for image classification ā€” evidence from Kaggle...
Convolutional neural networks for image classification ā€” evidence from Kaggle...Convolutional neural networks for image classification ā€” evidence from Kaggle...
Convolutional neural networks for image classification ā€” evidence from Kaggle...Dmytro Mishkin
Ā 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
Ā 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
Ā 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
Ā 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IWanjin Yu
Ā 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper ReviewLEE HOSEONG
Ā 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
Ā 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
Ā 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionLEE HOSEONG
Ā 
Gdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadGdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadJunsik Whang
Ā 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
Ā 

What's hot (20)

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
Ā 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Ā 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Ā 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
Ā 
Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimerā€™s disease diagnosis with Neuroim...
Ā 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
Ā 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
Ā 
Convolutional neural networks for image classification ā€” evidence from Kaggle...
Convolutional neural networks for image classification ā€” evidence from Kaggle...Convolutional neural networks for image classification ā€” evidence from Kaggle...
Convolutional neural networks for image classification ā€” evidence from Kaggle...
Ā 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
Ā 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Ā 
AlexNet
AlexNetAlexNet
AlexNet
Ā 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
Ā 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Ā 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
Ā 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review
Ā 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
Ā 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
Ā 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
Ā 
Gdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadGdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_upload
Ā 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Ā 

Viewers also liked

Reese ICT4902 Information Overload
Reese ICT4902 Information OverloadReese ICT4902 Information Overload
Reese ICT4902 Information OverloadMichael Reese
Ā 
PEShare.co.uk Shared Resource
PEShare.co.uk Shared ResourcePEShare.co.uk Shared Resource
PEShare.co.uk Shared Resourcepeshare.co.uk
Ā 
Session 4 concentration lesson 2
Session 4 concentration lesson 2Session 4 concentration lesson 2
Session 4 concentration lesson 2neilmcgraw
Ā 
Face Detection
Face DetectionFace Detection
Face DetectionReber Novanta
Ā 
Computational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cuesComputational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cuesAkisato Kimura
Ā 
FW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and AnxietyFW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and AnxietyMatt Sanders
Ā 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat PolitĆØcnica de Catalunya
Ā 
FW279 Intro to Sport Psychology
FW279 Intro to Sport PsychologyFW279 Intro to Sport Psychology
FW279 Intro to Sport PsychologyMatt Sanders
Ā 
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...Trevor van Gorp
Ā 
The Psychology of Sport & Exercise
The Psychology of Sport & Exercise The Psychology of Sport & Exercise
The Psychology of Sport & Exercise PsychFutures
Ā 
Intrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivationIntrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivationBrandon Lum
Ā 
Intrinsic and Extrinsic Motivation
Intrinsic and Extrinsic MotivationIntrinsic and Extrinsic Motivation
Intrinsic and Extrinsic MotivationTantri Sundari
Ā 
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...NephroTube - Dr.Gawad
Ā 
Extrinsic motivation and goal-setting
Extrinsic motivation and goal-settingExtrinsic motivation and goal-setting
Extrinsic motivation and goal-settingJames Neill
Ā 
Types of Motivation
Types of MotivationTypes of Motivation
Types of Motivationvirrajill
Ā 

Viewers also liked (18)

What is attentional blink
What is attentional blinkWhat is attentional blink
What is attentional blink
Ā 
Reese ICT4902 Information Overload
Reese ICT4902 Information OverloadReese ICT4902 Information Overload
Reese ICT4902 Information Overload
Ā 
PEShare.co.uk Shared Resource
PEShare.co.uk Shared ResourcePEShare.co.uk Shared Resource
PEShare.co.uk Shared Resource
Ā 
Session 4 concentration lesson 2
Session 4 concentration lesson 2Session 4 concentration lesson 2
Session 4 concentration lesson 2
Ā 
Face Detection
Face DetectionFace Detection
Face Detection
Ā 
Computational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cuesComputational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cues
Ā 
FW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and AnxietyFW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and Anxiety
Ā 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
Ā 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Ā 
FW279 Intro to Sport Psychology
FW279 Intro to Sport PsychologyFW279 Intro to Sport Psychology
FW279 Intro to Sport Psychology
Ā 
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Ā 
The Psychology of Sport & Exercise
The Psychology of Sport & Exercise The Psychology of Sport & Exercise
The Psychology of Sport & Exercise
Ā 
Intrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivationIntrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivation
Ā 
Intrinsic and Extrinsic Motivation
Intrinsic and Extrinsic MotivationIntrinsic and Extrinsic Motivation
Intrinsic and Extrinsic Motivation
Ā 
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Ā 
Extrinsic motivation and goal-setting
Extrinsic motivation and goal-settingExtrinsic motivation and goal-setting
Extrinsic motivation and goal-setting
Ā 
Types of Motivation
Types of MotivationTypes of Motivation
Types of Motivation
Ā 
Motivation ppt
Motivation pptMotivation ppt
Motivation ppt
Ā 

Similar to Attentional Object Detection - introductory slides.

A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal
Ā 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slideswolf
Ā 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback dannyijwest
Ā 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshiftirisshicat
Ā 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_papershanullah3
Ā 
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...IDES Editor
Ā 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Eswar Publications
Ā 
Paper on experimental setup for verifying - "Slow Learners are Fast"
Paper  on experimental setup for verifying  - "Slow Learners are Fast"Paper  on experimental setup for verifying  - "Slow Learners are Fast"
Paper on experimental setup for verifying - "Slow Learners are Fast"Robin Srivastava
Ā 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET Journal
Ā 
A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification TechniquesIRJET Journal
Ā 
Review paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractionReview paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractioneSAT Journals
Ā 
Threshold adaptation and XOR accumulation algorithm for objects detection
Threshold adaptation and XOR accumulation algorithm for  objects detectionThreshold adaptation and XOR accumulation algorithm for  objects detection
Threshold adaptation and XOR accumulation algorithm for objects detectionIJECEIAES
Ā 
D04402024029
D04402024029D04402024029
D04402024029ijceronline
Ā 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
Ā 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET Journal
Ā 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsIRJET Journal
Ā 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
Ā 
Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Batyrbek Ryskhan
Ā 

Similar to Attentional Object Detection - introductory slides. (20)

A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...
Ā 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
Ā 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
Ā 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshift
Ā 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
Ā 
PR12-CAM
PR12-CAMPR12-CAM
PR12-CAM
Ā 
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Ā 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Ā 
Paper on experimental setup for verifying - "Slow Learners are Fast"
Paper  on experimental setup for verifying  - "Slow Learners are Fast"Paper  on experimental setup for verifying  - "Slow Learners are Fast"
Paper on experimental setup for verifying - "Slow Learners are Fast"
Ā 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
Ā 
A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification Techniques
Ā 
Review paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractionReview paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extraction
Ā 
Threshold adaptation and XOR accumulation algorithm for objects detection
Threshold adaptation and XOR accumulation algorithm for  objects detectionThreshold adaptation and XOR accumulation algorithm for  objects detection
Threshold adaptation and XOR accumulation algorithm for objects detection
Ā 
D04402024029
D04402024029D04402024029
D04402024029
Ā 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
Ā 
Gc2005vk
Gc2005vkGc2005vk
Gc2005vk
Ā 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
Ā 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its Variants
Ā 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
Ā 
Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Face detection ppt by Batyrbek
Face detection ppt by Batyrbek
Ā 

More from Sergey Karayev

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Sergey Karayev
Ā 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningSergey Karayev
Ā 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningSergey Karayev
Ā 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningSergey Karayev
Ā 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
Ā 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSergey Karayev
Ā 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningSergey Karayev
Ā 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
Ā 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019Sergey Karayev
Ā 

More from Sergey Karayev (14)

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Ā 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Ā 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Ā 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Ā 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Ā 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Ā 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
Ā 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep Learning
Ā 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
Ā 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Ā 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep Learning
Ā 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
Ā 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
Ā 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
Ā 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationRadu Cotescu
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...gurkirankumar98700
Ā 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĆŗjo
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
Ā 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Ā 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Ā 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service šŸø 8923113531 šŸŽ° Avail...
Ā 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Ā 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Ā 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Ā 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Ā 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Ā 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Ā 

Attentional Object Detection - introductory slides.

  • 1. Attentional Object Detection Why look for everything everywhere? Sergey Karayev for UC Berkeley Computer Vision Retreat 2011
  • 2. Problem: Recognition and localization of objects of multiple classes in cluttered scenes.
  • 3. Proposals Detectors Object Detection Post-process
  • 4. Proposals Detectors Object Detection Post-process
  • 5. etc. Sliding window Proposals ...with priors/ Voting pruning Efficient search
  • 6. etc. Sliding window Proposals ā€¢Too slow: quadratic in number of search dimensions (x,y,scale,class). ā€¢Speed-ups: ā€¢Parallelization. ā˜…Priors/Pruning with non-detector features. ā˜…Algorithmic efficiency.
  • 7. Proposals Priors/pruning ā€¢Usesnon-detector features (location, geometry, context, depth, ā€œobjectnessā€) ā€¢Often done in post-processing.
  • 8. Proposals Currently only works for local features. Voting Efficient subwindow search
  • 9. ā€¢ Priority ordered? How? Proposals ā€¢ Pruned / Exhaustive? ā€¢ Class-speciļ¬c? Detectors Post-process
  • 10. ā€¢ Priority ordered? How? Proposals ā€¢ Pruned / Exhaustive? ā€¢ Class-speciļ¬c? Detectors Post-process
  • 11. Detector Template/Parts single feature [2]. As a result each stage of the boosting Local features process, which selects a new weak classiļ¬er, can be viewed as a feature selection process. AdaBoost provides an effec- tive learning algorithm and strong bounds on generalization A performance [13, 9, 10]. The third major contribution of this paper is a method for combining successively more complex classiļ¬ers in a single feature [2]. As a result eachstructure which dramatically increases the speed of cascade stage of the boosting single feature [2]. Asa a resultdetector by focusingboosting on promising regions of the each stage of be attention process, which selects new weak classiļ¬er, can theviewed process, which selects a new weak classiļ¬er, can behind focus of attention approaches the image. The notion be viewed as a feature selection process. AdaBoost provides an effec- C as a feature selection process.thatbounds on generalization tive learning algorithm and strong it is often possible to rapidly determine where in an is AdaBoost provides an effec- A B tive learning algorithm and image an object might occur [17, 8, 1]. More complex pro- performance [13, 9, 10]. strong bounds on generalization ct B performance [13, 9, contribution of this paper isonly for these promising regions. The The third major 10]. cessing is reserved a method A Figure 1: Example rectangle features shown r for combining successivelykey measure of such is a method is the ā€œfalse negativeā€ rate more complex classiļ¬ers in a The third major contribution of this paper an approach enclosing detection window. The sum of the cascade structure which dramatically increases process. of must be the case that all, or the complex the speed in ofmoreattentional classiļ¬ersIt a je for combining successively Decision stumps lie within the white rectangles are subtracted f single feature [2]. As a result the detector by focusing dramaticallypromising regions of of each stage of the boosting cascade structure which attention on object instances are selected by the attentional almost all,increases the speed of pixels in the grey rectangles. Two-rectangle process, which selects a new weak image. Thecan be viewedfocus of attention approaches of the classiļ¬er, notion behind ļ¬lter. on promising regions the that it is often possible attention determine where in an detector by focusing to rapidly C D shown in (A) and (B). Figure (C) shows a th is re as a feature selection process. the image. The notion behind focus ofdescribe a approaches training an extremely sim- AdaBoost provides an effec- We will attention process for C feature, and (D) a four-rectangle feature. D image an object might occur [17, 8, 1]. More complex pro- tive learning algorithm and strong bounds reserved only forple and efļ¬cient regions. The an can be used as a ā€œsuper- on generalizationrapidly determine wherewhich iscessing is often possible tothese promising that it is classiļ¬er in performance [13, 9, 10]. visedā€ focus of attention operator.Figure 1: Example rectangle features shown relative to the A The term supervised B image an objectsuch an occur [17, 8, 1]. More complex pro- enclosing detection window. The sum of the pixels which key measure of might approach is the ā€œfalse negativeā€ rate The third major contribution of this paper isprocess. refers tobe the case regions. or cessing attentional a methodthese promising that all, The lieoperator 1: white rectangles are subtracted shown relative to the of the is reserved only for must the fact that the attentional within the trained to rectangleusing features rather than the pixels direct It is for Figure Example features from the sum for combining successively more measureobject instancesdetect is the ā€œfalse negativeā€ rate of In the domain of face key complexof such an approach examples the a particular class. pixels in thedetection window. The reason is that are which act to en almost all, classiļ¬ers in are selected by of attentional a common features can grey rectangles. Two-rectangleof the pixels enclosing false neg- sum features is difļ¬cult to learn u cascade structure which dramatically attentionalthe speed of must be is possiblethatachieve fewer than (A) and (B). Figure (C) shows a three-rectangle of the increases process. detection it the case to all, or shown in 1% the white rectangles are subtractedthat the sum ļ¬lter. It domain knowledge lie within from the detector by focusing attention We all, object instances are training anby thepositives usingfeature, and (D) a four-rectangle feature. of training data. For this system th almost will describe a process for selected false attentional a classiļ¬er constructed rectangles. Two-rectangle features are on promising regions of atives and 40% extremely sim- quantity of pixels in the grey the image. The notion behind focusandattention classiļ¬er which two be used asfeatures. The effect of this ļ¬lter isD to ļ¬lter. of efļ¬cient approaches ple from can Harr-like a ā€œsuper- C second critical motivation for features: the f shown in where the (A) and (B). Figure (C) showsmuch faster than a pixel-based system operates a three-rectangle is that it is often possible to rapidly determine attention operator.byThe term supervised visedā€ focus of where in an reduce over one half the number of locations We will describe a process for training an extremely sim- feature, and (D) a four-rectangle feature.
  • 12. ā€¢ Priority ordered? How? Proposals ā€¢ Pruned / Exhaustive? ā€¢ Class-speciļ¬c? ā€¢ Local or global feature? ā€¢ Shared parts across classes? Detectors ā€¢ Cascaded? ā€¢ Conļ¬dence ā‰ˆ likelihood? Post-process
  • 13. ā€¢ Priority ordered? How? Proposals ā€¢ Pruned / Exhaustive? ā€¢ Class-speciļ¬c? ā€¢ Local or global feature? ā€¢ Shared parts across classes? Detectors ā€¢ Cascaded? ā€¢ Conļ¬dence ā‰ˆ likelihood? Post-process
  • 15. ā€¢ Priority ordered? How? Proposals ā€¢ Pruned / Exhaustive? ā€¢ Class-speciļ¬c? ā€¢ Local or global feature? ā€¢ Shared parts across classes? Detectors ā€¢ Cascaded? ā€¢ Conļ¬dence ā‰ˆ likelihood? ā€¢ NMS/Meanshift? Post-process ā€¢ Context? (Inter-object?)
  • 16. ā€¢ Priority ordered? How? Proposals ā€¢ Pruned / Exhaustive? ā€¢ Class-speciļ¬c? ā€¢ Local or global feature? ā€¢ Shared parts across classes? Detectors ā€¢ Cascaded? ā€¢ Conļ¬dence ā‰ˆ likelihood? ā€¢ NMS/Meanshift? Post-process ā€¢ Context? (Inter-object?)
  • 17. Where we are Cascaded Deformable Part Models. Per class, ~1 sec / medium-sized image.
  • 18. Where we are ā€¢ PASCAL: ~5K test images, 20 classes. 28 hours to process. ā€¢ ImageNet ā€™11: ~450K test images, 3000 classes. 375,000 hours to process.
  • 19. Where we are ā€¢ Standard movie: ~130K frames. 36 hours per object class.
  • 20. So what can we do? Not look for everything everywhere!
  • 21. New Performance Evaluation ā€¢ Goal: Be able to stop detection and have the most correct detections and the fewest incorrect detections at any time. AP AP vs. time time
  • 22. How?
  • 23. Attention ā€¢ Natural bottleneck in animal vision. ā€¢ Two kinds: ā€¢ Bottom-up: rapid, driven by featurization. ā€¢ Top-down: secondary, driven by task. ā€¢ Eye ļ¬xations are a good proxy for implicit attention. Necessary because of the fovea.
  • 24. Tilke Judd tjudd@mit.edu Krista Ehinger kehinger@mit.edu FrĀ“ do Durand e fredo@csail.mit.edu Anton torralba tjudd@mit.edu kehinger@mit.edu fredo@csail.mit.edu torralba Basic ideas MIT Computer Science Artiļ¬cial Intelligence Laboratory and MIT Brain and Co MIT Computer Science Artiļ¬cial Intelligence Laboratory and MIT Brain and Co Abstract Abstract ā€¢ Single saliency map from or many applications in graphics, design, and human or many applicationsis essential todesign, and human puter interaction, it in graphics, understand where ans look in a scene. isfoci eye toattention which Where oftracking devices are puter interaction, it essential understand where are selected. a viable option, models of saliency can be used to pre- ans look in a scene. Where eye tracking devices are ļ¬xation option, models of saliency can be used to pre- viable locations. Most saliency approaches are based ā€¢ ļ¬xation locations. Most saliency not consider are based ottom-up computation that does approaches top-down Sequential selection due ge semantics and often that doesmatch actual eye move- ottom-up computation does not not consider top-down s. To address ā€œinhibition of return,ā€ to this problem, we collected eye tracking e semantics and often does not match actual eye move- of To viewers on 1003 images and use thiseye tracking 15 address this problem, we collected database as ing and testinginformationmodeldatabase as or onexamples to learn use this of saliency s. of 15 viewers 1003 images and a d on low, maximization.model of saliency middle and high-level image features. This ing and testing examples to learn a e databasemiddle and high-level image features. This d on low, of eye tracking data is publicly available ā€¢ this paper. Inļ¬‚uenced from the top. database of eye tracking data is publicly available this paper. Figure 1. Eye tracking data. We co ntroduction on 1003 images from 15 viewers to us Figure 1. Eye tracking data. We co
  • 25. model. On average, images contained 4.6 cars and 2.1 pedestrians. targets (cars or pedestrians) and press a key to indicate co x d given in Eqs. (1)ā€“(5) induced by the three main assumptions. rmined by the scene description S (e.g., vectorial perties such as global illumination, scene iden- resent). The product of the likelihood P(IjS) and
  • 26. Attentional Object Detector Assume we have a powerful but expensive per-class classiļ¬er. ā€¢ How should we pick locations to consider? ā€¢ What should we look for at a location?
  • 27. Attentional Object Detector Proposals Detector
  • 29. Vogel and Freitas. Target-directed attention: Sequential decision-making for gaze planning. ICRA 2008. ā€¢ GIST and a simple regressor to compute likelihood map. ā€¢ Reinforcement learning to ļ¬nd best gaze sequence. ā€¢ ā€œHeavierā€ feature and regressor to evaluate the ļ¬xation locations.
  • 30. Vogel and Freitas. Target-directed attention: Sequential decision-making for gaze planning. ICRA 2008. ā€¢ Evaluated only on Caltech Office scenes. ā€¢ Gaze planning improves over just using bottom-up saliency while being only slightly slower. ā€¢ Detection rate is lower than full image, but maximum precision is higher.
  • 31. Gualdi et al. Multi-stage Sampling with Boosting 200 CascadesPrati, and R. Cucchiara G. Gualdi, A. for Pedestrian Detection in Images and Videos. ECCV 2010. ā€¢ LogitBoost classiļ¬er with covariance descriptors. ā€¢ Score falls off over some region of Multi-stage Sampling with Boosting Cascades for Pedestrian Detection 203 Fig. 1. Region of support for the cascade of LogitBoost classiļ¬ers trained on INRIA support. to 48x144), pedestrian dataset, averaged over a total 62 pedestrian patches; (a) a positive patch (pedestrian is 48x144); (b-d) response of the classiļ¬er: (b) ļ¬xed w (equal s sliding wx , wy ; (c) ļ¬xed wx (equal to x of patch center), sliding ws , wy ; (d) ļ¬xed wy ā€¢ Sample points in image (equal to y of patch center), sliding wx , ws ; (e) 3D plot of the response in (b). to estimate P(O|I). scale variations, i.e. the response of the classiļ¬er in the close neighborhood (both Resample close to in position and scale) of the window encompassing a pedestrian, remains positive (ā€œregion of support ā€ of a positive detection). Having a suļ¬ƒciently wide region of promising points. support allows to uniformly prune the SW S, up to the point of having at least one window targeting the region of support of each pedestrian in the frame. V ice versa, a too wide region of support could generate de-localized detections [4]. Distribution of samples important advantage of = O n this regard, an across the stages: m the 5 and covariance descriptors is its
  • 32. Gualdi et al. Multi-stage Sampling with Boosting Cascades for Pedestrian Detection in Images and Videos. ECCV 2010. ā€¢ Evaluated on INRIA Pedestrians, Graz02, and some videos. ā€¢ Always reduces miss rate over sliding window, while being 2-6x faster.
  • 33. fewer than 25 successive ļ¬xations, this foveated approach provide a useful way to improve the search efļ¬ciency of will be faster than exhaustively applying object detection to speciļ¬c object detectors, i.e., most regions without objects Butko and Movellan. Optimal Scanning for Faster a high resolution image. Two particular challenges are: (1) sequentially picking tend to have low visual saliency [5]. Unfortunately visual saliency ļ¬lters are computationally expensive [17] and need Object Detection. CVPR 2009. the ļ¬xation locations; (2) integrating the information ac- to be applied to entire images, making them less attractive for scanning very high resolution images. Our work also relates to recent work on optimal image search, like the Efļ¬cient Subwindow Search [10]. Our ap- proach is data driven and detector independent, where the ESS approach is more analytic. Our approach requires a dataset of labeled images to build a statistical model of the performance of a given object detector. The ESS ap- Ė† proach requires a function f that must be constructed ana- ā€¢ Digital fovea placed lytically for each speciļ¬c object detector for the guarantees of the algorithm to hold, but only some object detectors are amenable to such a construction. The efļ¬ciency of the al- sequentially to maximize gorithm depends on the tightness of the upper bound that f computes and the computational overhead of evaulating f . Ė† Ė† expected of Eye-Movement 2. I-POMDP: A Model information gain. ā€¢ Liken it to stochastic Najemnik & Geisler developed an information maxi- mization (Infomax) model of eye-movements and applied it to explain visual search of simple objects in pink noise optimal control, and use a image backgrounds [12]. The model uses a greedy search approach: saccades are planned one at a time with the next ā€œmultinomial infomax saccade made to the location in the image plane that is ex- pected to yield the highest chance of correctly guessing the POMDPā€ to pick the target location. The Najemnik & Geisler model success- fully captured some aspects of human saccades but it has sequence. two important limitations: (1) Its ļ¬xation policy is greedy, i.e., it maximizes the instantaneous information gain rather than the long term gathering of information. (2) It is appli- cable only to artiļ¬cially constructed images. Butko & Movellan [4] proposed the I-POMDP frame- work for modeling visual search. The framework ex- Figure 1. A digital fovea: Several concentric Image Patches (IPs) tends the Najemnik & Geisler model by applying long-term (Top) are arranged around a point of ļ¬xation. The image por- POMPDP planning methods. They showed that long-term tions contained within each rectangle are reduced to a common information maximization reduces search time. Moreover
  • 34. Butko and Movellan. Optimal Scanning for Faster Object Detection. CVPR 2009. Fixation 1 Fixation 2 Fixation 3 4 3.5 ā€¢ Evaluate on own faces 3 I!POMDP Viola Jones Error (grid cells) dataset against V-J. 2x 2.5 2 Fixation 4 Fixation 5 Fixation 6 1.5 speedup, but small 1 decrease in accuracy. 0.5 0 0 0.02 0.04 0.06 0.08 0.1 Runtime (seconds) Figure 6. Successive ļ¬xation choices by the MI-POMDP policy. The face is found in six ļ¬xations. The ļ¬nal estimation of the face Figure 8. By changing the Viola Jones scaling factor, both Viola location is one grid-cell diagonal from the labeled location, giving Jones and I-POMDP become faster and less accurate. MI-POMDP a euclidean distance error of 1.4 grid-cells. is usually closer to the origin on a time-error curve, showing that it gives a better speed-accuracy tradeoff than just applying Viola Jones. crease in accuracy, as shown in the Table below. Both meth- ods on average placed the face between one and two grid- cells off the true face location. 4.2. Speed-Accuracy Tradeoff
  • 35. Vijayanarasimhan and Kapoor. Visual Recognition and Detection Under Bounded Computational Resources. CVPR 2010. Computation Feature Channel Dim time (ms) SIFT R, G, B, Gray 128 0.21 P64 Figure 3. 17 grid weights learnt for each category in the ETHZ T1a S2 The Gray 68 1.2 shape dataset. P18 T2 S2 9 Gray 36 0.09 Table 2. Attributes of theresources. used in the experiments. of computational features ā€¢ Hough voting with multiplethe INRIA Datasets: We use two challenging object detection datasets namely, the ETHZ shape dataset and (ļ¬ve in our feature types.to compare against several state-of- experiments) order generate an initial set of horses dataset in and potheses. Then, we run each selection strategy iterativ the-art hough based detection approaches [21, 24, 11, 10]. Figure 2. A summary of our algorithm. updatingā€¢ Uses Value ofisInformationweighted a ļ¬x hypotheses as dataset contains 255 the to for ļ¬ve the The ETHZ probability then modeled asgiraffes, mugssum shape images features get added until and shape-based classes (applelogos, bottles, conditional |f ). This term depends on the feature f which is timethe probabilities(1 its lookneighbors: the amount of pick region of horsesin our case). 170 images swans). lapsed to nearest atcontains of has The INRIA sec dataset and (O,x) i to be extracted. However, since we are only trying to determine the In type best feature to qualitative 170 imagescomp 5 we or more some extract. results with- Figure withthe category.side-views of horses and objects occur in out one show In both the datasets, ing the ļ¬rst highly cluttered natural scenes with large variations in both eature to extract, we instead estimate the expected value 1000 points selected by our p(h|f ) select p(gi (O,x) |f, l) = qi active h (2) he term p(gi (O,x) ā€¢ scale passive selection hāˆˆN (f approach Active approach extracts less ob- |f ) for every feature type t. We do this and theand appearance, and sometimes) contain The ļ¬rst r baseline. multiple contains example imagesfeature inlessfair comparisons. qih ET features, takeseverydatabase FOand = time, and considering all the features in the training database that jects per image. We use the same training and testing setup of type t and obtain the average value of the term. The used by h[10] on both datasets for category in the where is a from the shape, the has higher accuracy on ETHZpoints ure type with the largest value can be interpreted as the second row refers third conditional probability for part (O,x) |h, l) and to the rows show the Implementation Details: Parameter learning of the p(gi that is expected to provide the best evidence for object the grid model is performed by ļ¬rst scaling strategies, truth lected by and Horses. ļ¬xed selectionisallmodelour exper- a the ground resp active and random height term pixels in parameter presence given the features. This gi . For example, for the ā€œbodyā€ of a giraffe, texture- bounding boxes to a (100 that needs to be estimated from the training data for every tively. Brightiments) denote selectedgaspect ratio. Then the points ed features could provide the best evidence and there- feature h and every grid part feature points. dots while preserving the i . And, (O,x) are uniformly sampled along the edges (using a Canny edge
  • 36. Image Attributions ā€¢ Girschick et al. - Cascaded deformable part models. ā€¢ Viola & Jones - Rapid object detection. ā€¢ Judd et al. - Learning to predict where humans looks. ā€¢ Chikkerur et al. - What and where? A Bayesian theory of attention. ā€¢ ...and the papers reviewed.

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. This is related to current research work in ML on anytime algorithms.\nI think that the only solution to this goal is attentional detection.\n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. Sequential decision problem.\nNo post-process because at any point detection can be cut off.\n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n