SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Author manuscript, published in "The Eighth International Workshop on Visual Surveillance - VS2008, Marseille : France (2008)"




                                                            Learning Moving Cast Shadows for Foreground Detection

                                                                                 Jia-Bin Huang, and Chu-Song Chen
                                                                 Institute of Information Science, Academia Sinica, Taipei, Taiwan
                                                                                  {jbhuang,song}@iis.sinica.edu.tw


                                                                   Abstract                                around the shadow points remains unchanged since shad-
                                                                                                           ows do not alter the background surfaces.
                                             We present a new algorithm for detecting foreground and           Much research has been devoted to moving cast shadow
                                          moving shadows in surveillance videos. For each pixel, we        detection. Some methods are based on the observation that
                                          use the Gaussian Mixture Model (GMM) to learn the be-            the luminance values of shadowed pixels decrease respect
                                          havior of cast shadows on background surfaces. The pixel-        to the corresponding background while maintaining chro-
                                          based model has the advantages over regional or global           maticity values. For examples, Horprasert et al. [6] used
                                          model for their adaptability to local lighting conditions,       a computational color model which separates the bright-
                                          particularly for scenes under complex illumination condi-
inria-00325645, version 1 - 29 Sep 2008




                                                                                                           ness from chromaticity component and define brightness
                                          tions. However, it would take a long time for convergence        and chromaticity distortion. Cucchiara et al. [3] and Schreer
                                          if motion is rare on that pixel. We hence build a global         et al. [15] addressed the shadow detection problem in HSV
                                          shadow model that uses global-level information to over-         and YUV color space respectively and detected shadows by
                                          come this drawback. The local shadow models are up-              exploiting the color differences between shadow and back-
                                          dated through confidence-rated GMM learning, in which             ground. Nadimi et al. [10] proposed a spatial-temporal
                                          the learning rate depends on the confidence predicted by the      albedo test and a dichromatic reflection model to separate
                                          global shadow model. For foreground modeling, we use a           cast shadows for moving objects.
                                          nonparametric density estimation method to model the com-
                                                                                                               Texture features extracted from spatial domain had also
                                          plex characteristics of the spatial and color information. Fi-
                                                                                                           been used to detect shadows. Zhang et al. [20] proved that
                                          nally, the background, shadow, and foreground models are
                                                                                                           ratio edge is illumination invariant and the distribution of
                                          built into a Markov random field energy function that can
                                                                                                           normalized ratio edge difference is a chi-square distribu-
                                          be efficiently minimized by the graph cut algorithm. Ex-
                                                                                                           tion. A significance test was then used to detect shadows.
                                          perimental results on various scene types demonstrate the
                                                                                                           Fung et al. [4] computed a confidence score for shadow
                                          effectiveness of the proposed method.
                                                                                                           detection based on the characteristics of shadows in lumi-
                                                                                                           nance, chrominance, gradient density, and geometry do-
                                                                                                           mains. However, the above-mentioned methods require to
                                          1. Introduction                                                  set parameters for different scenes and can not handle com-
                                                                                                           plex and time-varying lighting conditions. A comprehen-
                                              Moving object detection is at the core of many com-          sive survey of moving shadow detection approaches was
                                          puter vision applications, including video conference, vi-       presented in [13].
                                          sual surveillance, and intelligent transportation system.            Recently, the statistical prevalence of cast shadows had
                                          However, moving cast shadow points are often falsely la-         been exploited to learn shadows in the scenes. In [9],
                                          beled as foreground objects. This may severely degrade the       Martel-Brisson et al. used the Gaussian mixture model
                                          accuracy of object localization and detection. Therefore, an     (GMM) to describe moving cast shadows. Proikli et al. [11]
                                          effective shadow detection method is necessary for accurate      proposed a recursive method to learn cast shadows. Liu et
                                          foreground segmentation.                                         al. [8] presented to remove shadow using multi-level infor-
                                              Moving shadows in the scene are caused by the occlu-         mation in HSV color space. The major drawback in [9]
                                          sion of light sources. Shadows reduce the total energy in-       and [11] is that their shadow models need a long time to
                                          cident at the background surfaces where the light sources        converge while the lighting conditions should remain sta-
                                          are partially or totally blocked by the foreground objects.      ble. Moreover, in [9], the shadow model is merged into the
                                          Hence, shadow points have lower luminance values but sim-        background model. The Gaussian states for shadows will
                                          ilar chromaticity values. Also, the texture characteristic       be discarded in the background model when there are no
shadows for a long period. Therefore, these two approaches
                                          are less effective in a real-world environment. Liu et al. [8]
                                          attempted to improve the convergence speed of pixel-based
                                          shadow model by using multi-level information. They used
                                          region-level information to get more samples and global-
                                          level information to update a pre-classifier. However, the
                                          method still suffers from the slow learning of the conven-
                                          tional GMM [17], and the pre-classifier will become less
                                          discriminative in scenes having different types of shadows,
                                          e.g. light or heavy.
                                             Another foreground and shadow segmentation ap-
                                          proaches are through Bayesian methods, which construct
                                          background, shadow, and foreground model to evaluate the
                                                                                                                Figure 1. Flow chart of the proposed algo-
                                          data likelihood of the observed values [19], [1]. These two
                                                                                                                rithm
                                          approaches used global parameters to characterize shadows
                                          in an image, which are constant in [19] and probabilistic
                                          in [1]. Although the global models are free from the con-
                                          vergence speed problem, they lose the adaptability to lo-
                                          cal characteristics and cannot handle scenes with complex          lp ∈ {BG, SD, F G} corresponds to three classes: back-
                                          lightings.                                                         ground, shadow, and foreground, respectively.
                                             In this paper, we propose a Bayesian approach to mov-              The global labeling field L = {lp |p ∈ P } is modeled as
inria-00325645, version 1 - 29 Sep 2008




                                          ing shadow and object detection. We learn the behav-               MRF [5]. The first order energy function of MRF can be
                                          ior of cast shadows on surfaces by GMM for each pixel.             decomposed as a sum of two terms:
                                          A global shadow model (GSM) is maintained to improve
                                          the convergence speed of the local shadow model (LSM).
                                          This is achieved by updating the LSM through confidence-
                                          rated GMM learning, in which the learning rates are di-                     E(L) = Edata (L) + Esmooth (L),                           (1)
                                          rectly proportional to the confidence values predicted by the                      =          [Dp (lp ) +          Vp,q (lp , lq )]
                                          GSM. The convergence speed of LSM is thus significantly                                 p∈P                 q∈Np
                                          improved. We also present a novel description for fore-
                                          ground modeling, which uses local kernel density estima-
                                          tion to model the spatial color information instead of tem-        in which the first term Edata (L) evaluates the likelihood of
                                          poral statistics. The background, shadow, and foreground           each pixel belonging to one of the three classes, the sec-
                                          are then segmented by the graph cut algorithm, which min-          ond term Esmooth (L) imposes a spatial consistency of la-
                                          imizes a Markov Random Field (MRF) energy. A flow dia-              bels through a pairwise interaction MRF prior, Dp (lp ) and
                                          gram of the proposed algorithm is illustrated in Fig. 1.           Vp,q (lp , lq ) are the data and smoothness energy terms re-
                                             The remainder of this paper is organized as follows. The        spectively, and the set Np denotes the 4-connected neigh-
                                          formulation of the MRF energy function is presented in             boring pixels of point p. The energy function in (1) can be
                                          Section 2. In Section 3, we describe the background and            efficiently minimized by the graph cut algorithm [2], so that
                                          shadow model, primarily focusing on how to learn cast              an approximating maximum a posteriori (MAP) estimation
                                          shadows. Section 4 presents a nonparametric method for             of the labeling field can be obtained.
                                          estimating foreground probability. Experimental results are           The Potts model [12] is used as a MRF prior to stress
                                          presented in Section 5. Section 6 concludes the paper.             spatial context and the data cost Edata (L) is defined as

                                          2. Energy Minimization Formulation
                                                                                                                   Edata (L) =          Dp (lp ) =          − log p(zp |lp ),   (2)
                                             We pose the foreground/background/shadow segmenta-
                                                                                                                                 p∈P                 p∈P
                                          tion as an energy minimization problem. Given an input
                                          video sequence, a frame at time t is represented as an array
                                          zt = (zt (1), zt (2), ...zt (p), ...zt (N )) in RGB color space,   where p(zp |lp ) is the likelihood of a pixel p belonging to
                                          where N is the number of pixels in each frame. The seg-            background, shadow, or foreground. In the following sec-
                                          mentation is to assign a label lp to each pixel p ∈ P ,            tions, we will show how to learn and compute these likeli-
                                          where P is the set of pixels in the image and the label            hoods.
3. Background and Shadow Model                                     shadow classifier is not to detect moving shadows accu-
                                                                                                             rately, but to determine whether a pixel value fits with the
                                          3.1. Modeling the Background                                       property of cast shadows. For simplicity, we pose our prob-
                                                                                                             lem in RGB color space. Since cast shadows on a surface
                                             For each pixel, we model the background color infor-            reduce luminance values and change the saturation, we de-
                                          mation by the well-known GMM [17] with KBG states in               fine the potential shadow values fall into the conic volume
                                          RGB color space. The first B states with higher weights             around the corresponding background color [11], as illus-
                                          and smaller variances in the mixture of KBG distributions          trated in Fig. 2. For a moving pixel p, we evaluate the rela-
                                          are considered as background model. The index B is deter-          tionship of the observation pixel values zt (p) and the corre-
                                          mined by                                                           sponding background model bt (p). The relationship can be
                                                                                                             characterized by two parameters: luminance ratio rl (p) and
                                                                       b                                     angle variation θ(p), and can be written as:
                                                      B = argminb           ωBG,k > TBG ,           (3)
                                                                      k=1                                                                       bt (p)
                                                                                                                             rl (p) =                      ,           (5)
                                                                                                                                          zt (p) cos(θ(p))
                                          where TBG is the pre-defined weight threshold, and ωBG,k
                                          is the weight of the kth Gaussian state in the mixture model.                                         zt (p), bt (p)
                                                                                                                          θ(p) = arccos                           ,    (6)
                                              The likelihood of a given pixel p belonging to the back-                                         zt (p) bt (p)
                                          ground can be written as (the subscript time t is ignored):
                                                                                                           where ·, · is the inner product operator, and · is the
                                                                       B                                   norm of a vector. A pixel p is considered as a potential cast
                                                                1
                                          p(z(p)|lp = BG) =                 ωBG,k G(z(p), µBG,k , ΣBG,k ), shadow point if it satisfies the following criteria:
inria-00325645, version 1 - 29 Sep 2008




                                                               WBG
                                                                      k=1
                                                                                                 (4)                    rmin < rl (p) < rmax , θ(p) < θmax ,           (7)
                                                                 B
                                          where WBG =            j=1 ωBG,j is a normalization con-
                                                                                                             where rmax < 1 and rmin > 0 define the maximum bright-
                                          stant, and G(z(p), µBG,k , ΣBG,k ) is the probability den-
                                                                                                             ness and darkness respectively, and θmax is the allowed
                                          sity function of the kth Gaussian with parameters θBG,k =
                                                                                                             maximum angle variation.
                                          {µBG,k , ΣBG,k }, in which µBG,k is the mean vector and
                                          ΣBG,k is the covariance matrix.

                                          3.2. Learning Moving Cast Shadow

                                             This subsection presents how to construct the shadow
                                          models from the background model. Since different fore-
                                          ground objects often block the light sources in a similar
                                          way, cast shadows on background surfaces are thus similar
                                          and independent of foreground objects. The ratio of shad-
                                          owed and illuminated value of a given surface point is con-
                                          sidered to be nearly constant. We exploit this regularity of
                                          shadows to describe the color ratio of a pixel under shadow
                                          and normal illumination. We first use the background model
                                          to detect moving pixels, which might consist of real fore-            Figure 2. The weak shadow detector. The ob-
                                          grounds and cast shadows. The weak shadow detector in                 servation value zt (p) will be considered as
                                          Section 3.2.1 is then employed as a pre-filter to decide pos-          shadow candidate if it falls into the shaded
                                          sible shadow points. These samples are used to train the              area
                                          LSM in Section 3.2.2 and a GSM in Section 3.2.3 over time.
                                          The slow convergence speed of LSM is refined by the su-
                                          pervision of the GSM through confidence-rated learning in
                                                                                                             3.2.2 Local Shadow Model (LSM)
                                          Section 3.3.
                                                                                                             The color ratio between shadowed and illuminated intensity
                                          3.2.1 Weak Shadow Detector                                         values of a given pixel p is represented as a vector of random
                                                                                                             variables:
                                          The weak shadow detector evaluates every moving pixels
                                                                                                                                         r      g      b
                                          detected by the background model to filter out some im-                                        zt (p) zt (p) zt (p)
                                          possible shadow points. The design principle of the weak                         rt (p) = (    r (p) b (p) , b
                                                                                                                                              , g            ),        (8)
                                                                                                                                        bt      t     bt (p)
where bi (p), i = r, g, b are the means of the most proba-
                                                    t                                                       other words, a single pixel should be shadowed many times
                                          ble Gaussian distributions (with the highest ω/||Σ|| value)       while under the same illuminating condition. This assump-
                                                                                             i
                                          of the background model in three channels, and zt (p), i =        tion is not always fulfilled, particularly on the regions where
                                          r, g, b are the observed pixel values. The weak shadow de-        motion is rare. Thus, the LSM on its own is not effective
                                          tector is applied to moving regions detected by background        enough to capture the characteristics of shadows.
                                          model to obtain possible shadow points. We call these pos-
                                          sible shadow points as ”shadow candidates”, which form            3.2.3 Global Shadow Model (GSM)
                                          the set Q. At the pixel level, we model rt (p), p ∈ Q using
                                          the GMM as the LSM to learn and describe the color feature        To increase the convergence rate, we propose to update
                                          of shadows.                                                       the LSM using samples along with their confidence values.
                                              Once the LSM is constructed at pixel p, it can serve as       This is achieved by maintaining a global shadow model,
                                          the shadow model to evaluate the likelihood of a local ob-        which adds all shadow candidates in an image (rt (p), p ∈
                                          servation zt (p). Since we describe shadow by its color ra-       Q) into the model. In contrast to the LSM, the GSM ob-
                                          tio of the pixel intensity values between shadow and back-        tains much more samples (every pixel p ∈ Q), and thus
                                          ground, the likelihood of an observation z(p) can be de-          does not suffer from slow learning. We characterize shad-
                                          rived by the transformation from ”color ratio” to ”color”.        ows of GSM in an image also by GMM.
                                          Let s(p) be a random variable describing the intensity val-          To evaluate the confidence value of a pixel p ∈ Q,
                                          ues of a pixel p when the pixel is shadowed. Then, s(p) can       we first compute the color feature rt (p) by (8) and check
                                          be characterized by the multiplication of two random vari-        against every states in the GSM. If rt (p) is associated with
                                          ables: s(p) = r(p)b(p), where b(p) is the most probable           the mth state in the mixture of GSM, described as Gm , the
                                          Gaussian distribution with parameter {µBG,1 , ΣBG,1 }, and        confidence value can be approximated by the probability of
inria-00325645, version 1 - 29 Sep 2008




                                          r(p) is the color ratio modeled as GMM with parameter set-        Gm being considered as shadows:
                                          tings {ωCR,k , µCR,k , ΣCR,k }, in which CR denotes ”color         C(rt (p)) = p(lp = SD|rt (p)) = p(lp = SD|Gm ). (13)
                                          ratio”. When the pixel p is shadowed, the likelihood can be
                                          written as:                                                       In GMM, Gaussian states with higher prior probabilities and
                                                                                                            smaller variances would be considered as shadows. There-
                                                                                                            fore, we approximate p(SD|Gm ) using logistic regression
                                                                      S
                                                               1                                            similar to that in [7] for background subtraction. On the
                                          p(z(p)|lp = SD) =                ωSD,k G(z(p), µSD,k , ΣSD,k ),   other hand, if there are no states in GSM associated with
                                                              WSD
                                                                     k=1
                                                                                                            the input color feature value rt (p), the confidence value is
                                                                                                 (9)
                                                                                                            set to zero.
                                          where normalization constant WSD and S are determined
                                                                                                               For each pixel p ∈ Q, we evaluate the confidence
                                          in similar ways as in (3) with threshold TSD ; the weight
                                                                                                            value C(rt (p)) predicted by the GSM and then update the
                                          ωSD,k , mean vector µSD,k and the diagonal covariance ma-
                                                                                                            LSM through confidence-rated learning (presented in Sec-
                                          trix ΣSD,k of the kth are:
                                                                                                            tion 3.3). With the proposed learning approach, the model
                                                                                                            needs not to obtain numerous samples to converge, but a few
                                                   ωSD,k = ωCR,k                                  (10)      samples having high confidence value are sufficient. The
                                                                                                            convergence rate is thus significantly improved.
                                                   µSD,k = µBG,1 ∗ µCR,k                          (11)
                                                   ΣSD,k = µ2             2
                                                            BG,1 ΣCR,k + µCR,k ΣBG,1 +            (12)      3.3. Confidence-rated Gaussian Mixture
                                                            ΣCR,k ΣBG,1                                          Learning

                                              Benedek et al. [1] also defined color features in CIELuv           We present an effective Gaussian mixture learning algo-
                                          color space in a similar form and performed a background-         rithm to overcome some drawbacks in conventional GMM
                                          shadow color value transformation. However, they did not          learning approach [17]. Let αω and αg be the learning rates
                                          take the uncertainty of the background model into consider-       for the weight and the Gaussian parameters (means and
                                          ation. Thus, the parameter estimation of the shadow model         variances) in the LSM, respectively. The updating scheme
                                          might become inaccurate for scenes with high noise level.         follows the the formulation of the combination of incremen-
                                              As gathering more samples, the precision of the LSM           tal EM learning and recursive filter:
                                          will become more accurate. With the online learning abil-                                  1 − αdef ault
                                          ity, the LSM can not only adapt to local characteristics of               αω = C(rt ) ∗ (                ) + αdef ault ,   (14)
                                                                                                                                       ΣK cj,t
                                                                                                                                        j=1
                                          the background, but also to global time-varying illumina-
                                          tion conditions. However, the LSM requires several train-                                  1 − αdef ault
                                                                                                                     αg = C(rt ) ∗ (               ) + αdef ault ,   (15)
                                          ing samples for the estimated parameters to converge. In                                       ck,t
where ck,t is the number of match of the kth Gaussian state,     Algorithm 1: Confidence-Rated Learning Proce-
                                          and αdef ault is a small constant, which is 0.005 in this pa-    dure in One Dimension
                                          per. In the initial learning stage, the total number of match     User-defined Variables : K, ωinit , σinit , αdef ault
                                          of the pixel ΣK cj,t is small and thus the learning rate for
                                                         j=1                                                Initialization : ∀j=1...K ωj = 0 , µj = Inf ,
                                          weight is relatively large. As time goes on, the pixel be-        σj = σinitial , cj = 0
                                          comes stable and the learning rate approaches to the type
                                                                                                             while new observed data rt with confidence C(rt ) do
                                          of recursive filter. Similarly, the learning rate for Gaussian                        1−α
                                                                                                                αω = C(rt ) ∗ ( ΣKdef ault ) + αdef ault
                                          parameters αg is relatively large for newly-generated states.                              cj,t
                                                                                                                                    j=1

                                          Instead of blind update scheme, which treats each sample               if rt is associated with the kth Gaussian then
                                          in the same way, we propose to update a sample with its                     for j ← 1to K do
                                          confidence value. The two learning rates are controlled by a                     if j=k then Mj,t =1 else Mj,t = 0
                                          confidence value C(rt ), which indicates how confident the                        ωj,t = (1 − αω ) · ωj,t−1 + αω · Mj,t
                                          sample belongs to the class. Observations with higher con-                  end
                                          fidence values will converge faster than those with low ones.                ck,t = ck,t−1 + 1
                                                                                                                                      1−α
                                          The confidence-rated learning procedure in one dimension                     αg = C(rt ) ∗ ( cdef ault ) + αdef ault
                                                                                                                                           k,t
                                          is described in Algorithm 1.                                                µk,t = (1 − αg ) · µk,t−1 + αg · rt
                                              Here, we describe the drawbacks of the conventional                     σk,t = (1 − αg ) · σk,t−1 + αg · (rt − µk,t−1 )2
                                                                                                                        2                  2

                                          GMM learning [17] for background modeling and how does                 else
                                          the proposed learning approach overcome these disadvan-                     for j ← 1to K do
                                          tages. Firstly, the method suffers from slow learning in the                    ωj,t = (1 − αω ) · ωj,t−1
                                          initial stage. If the first value of a pixel is a foreground                                        2     2
inria-00325645, version 1 - 29 Sep 2008




                                                                                                                          µj,t = µj,t−1 , σj,t = σj,t−1
                                          object, the background model will have only one Gaussian                    end
                                          state with weight equaling to unity. It will take a long time                               ωj,t
                                                                                                                      k = argminj { σj,t }
                                          for the true pixel values to be considered as part of back-                 ck,t = 1
                                          ground. In the initial learning stage, our method will follow                                1−ω
                                                                                                                      ω,t = C(zt ) ∗ ( ΣK init ) + ωinit
                                                                                                                                            cj,t
                                          the incremental EM learning and approaches to recursive                               2
                                                                                                                                       j=1
                                                                                                                                       2
                                          filter over time. Therefore, we do not suffer from the slow               µk,t = rt , σk,t = σinitial
                                          learning in the initial stage. Secondly, the method faces             end
                                          the trade-off problem between model convergence speed              end
                                          and stability. In order to maintain the system stability, a
                                          very small learning rate will be chosen to preserve a long
                                          learning history. However, a small learning rate results in
                                          slow convergence speed. While larger learning rate im-          ground. In [16], they exploited the temporal persistence as
                                          proves the convergence speed, the model becomes unsta-          a property to model foreground using joint domain-range
                                          ble. In the proposed method, the adaptability of the learn-     nonparametric density function. However, the foreground
                                          ing rate for Gaussian parameters allows fast convergence        model in the previous frame cannot provide an accurate de-
                                          for the newly-generated states without degrading the stabil-    scription for foreground in the current frame when large
                                          ity. Lastly, there are tradeoffs regarding where to update      motion occurs or new moving objects enter into the scene.
                                          in the image. Typically, there are two ways to update the       Therefore, instead of exploring temporal statistics, we turn
                                          model: selective update and blind update Selective update       to spatial color information in the current frame. We use the
                                          only adds samples being considered as background and thus       background and shadow model to generate a partial fore-
                                          enhances the detection of foreground. Unfortunately, this       ground mask, i.e. a set of pixels that can be confidently
                                          approach would fall into a deadlock situation whenever the      labeled as foreground. Nonparametric method is used to
                                          incorrect update decision was made. On the other hand, the      estimate the foreground probability because the spatial aug-
                                          blind update adds all samples into the model, thus it might     mented GMM might face the problem of choosing the cor-
                                          result in more false negatives. By updating all samples with    rect number of modes. Details are introduced below.
                                          corresponding confidence, the trade-off regarding how to
                                          update in the image can be avoid.                                   In the partial foreground generation stage, we aim not
                                                                                                          to find all foreground pixels, but some samples from fore-
                                                                                                          ground. This is achieved by finding pixels whose inten-
                                          4. Foreground Model and Segmentation                            sity values are impossible to be generated from the existing
                                                                                                          background and shadow model. In other words, the partial
                                             The foreground model has been described as a uniform         foreground mask F can be generated by simple threshold-
                                          distribution [19], providing a weak description of the fore-    ing:
Algorithm 2: Learning Process
                                              F = {p|EBG (zt (p)) > κ, ESD (zt (p)) > κ},         (16)     At time t, with segmentation fields St from the
                                                                                                           detection process
                                          where E(.) is the minus log of the likelihood of a pixel be-     for each pixel p ∈ P do
                                          longing to background or shadow, κ is a threshold.                   Update the background model
                                             We can now estimate the foreground probability of                 if St (p) = BG then
                                          a query pixel p using samples from foreground. We                        if pixel p satisfies shadow property (Eq. 7)
                                          use the neighborhood pixels around a query point p to                    then
                                          gather M relevant samples, which forms the set Gp =                           Compute confidence value C(rt (p)) from
                                          {g1 , g2 , · · · , gM }.                                                      global shadow model (Eq. 13)
                                             When a pixel p is in the foreground, the probability can                   Update global shadow model
                                          be estimated using kernel density estimation method [18]:                     Update local shadow model at pixel p with
                                                                                                                        confidence C(rt (p))
                                                                       1
                                                                           M                                       end
                                                   p(zp |lp = F G) =             KH (zp − gi ),   (17)         end
                                                                       M   i=1
                                                                                                           end
                                          where KH (x) = |H|−1/2 K(H1/2 x). The kernel, K, is
                                          taken to be a 3-variate density function with K(w)dw =
                                          1, wK(w)dw = 0, and wwT K(w)dw = I3 , and H                     Algorithm 3: Detection Process
                                          is a symmetric positive definite 3x3 bandwidth matrix. We         At time t,
inria-00325645, version 1 - 29 Sep 2008




                                          choose a common Gaussian density as our kernel K,                for each pixel p ∈ P do
                                                                                                               Compute background likelihood
                                                                             1                                 P (zt (p)|lp = BG) (Eq. 4)
                                              KH (x) = |H|−1/2 (2π)−d/2 exp(− xT H−1 x).          (18)         Compute shadow likelihood P (zt (p)|lp = SD)
                                                                             2
                                                                                                               Generate partial foreground mask F (Eq. 16)
                                             Although variable bandwidth kernel density estimators             Find samples respect to pixel p from the
                                          can lead to improvement over kernel density estimators               neighboring pixels of p ∈ F
                                          using global bandwidth [14], the computational cost of               Compute foreground likelihood P (zp |lp = F G)
                                          choosing an optimal bandwidth is expensive for a real-time           (Eq. 17)
                                          surveillance system. Hence, the bandwidth matrix H is as-        end
                                          sumed to be diagonal, in which only two parameters are           Construct the graph to minimize Eq. 1
                                          defined: variances in spatial σs , and in color domain σc .       Generate the segmentation fields St
                                             After computing the data likelihood of the local obser-
                                          vation zt (p) of three classes, we can perform maximum a
                                          posteriori (MAP) estimator for the label field L. The en-           The shadows appearing in the video are dark and cover
                                          ergy function of Eq. 1 can be efficiently minimized by the          large area in the scene.
                                          graph cut algorithms.
                                             A concrete description of the learning process and detec-     • ”Laboratory” video is a typical indoor sequence with
                                          tion process are shown in Algorithms 2 and 3, respectively.        light shadow. But, the lighting conditions are more
                                                                                                             complex than the outdoor scenes.
                                          5. Experimental Results                                          • ”Intelligent Room” [13] is an indoor sequence with
                                                                                                             low shadow strength.
                                             We have implemented and test the proposed approach
                                          on various scene types. In the LSM and GSM, three and            • ”Entrance” [1] are typical surveillance videos captured
                                          five Gaussian states are assigned, respectively. The initial        in different time of a day, thus with different lighting
                                          variance of both LSM and GSM are set as 0.0025. We use             conditions.
                                          three Gaussian models for the background, and the covari-
                                          ance matrix is assumed diagonal.                                  Figure. 3 and 4 demonstrate the effectiveness of the
                                             Here, we describe the test sequences we used in this pa-    proposed method in both outdoor and indoor scenes. (a)
                                          per.                                                           shows the original image in the sequence. (b) shows the
                                                                                                         moving pixels detected by the background model and the
                                            • ”Highway” video is a sequence from the benchmark           gray regions of (b) is labeled as shadow points by the weak
                                              set [13] . This sequence shows a traffic environment.       shadow detector. Note that the weak shadow detector has
a very high detection rate, but with some false detections.
                                          (c) is the confidence map predicted by GSM. The lighter
                                          these regions are, the higher confidence value the model
                                          predicts. (d) shows the segmentation result after perform-
                                          ing the graph cut algorithm.
                                             The effect of GSM is illustrated by Figure 5. The de-
                                          tection result without using the GSM is presented in Figure                       (a)                       (b)
                                          5(b), in which some portions of real foreground are miss la-
                                          beled as shadows. This is due to the slow learning of the
                                          LSM. In Figures 5(c), the foreground objects are accurately
                                          detected. Figure. 6 demonstrates the adaptability of the
                                          proposed algorithm to different lighting conditions. Figure.
                                          6(a),(c), and (e) show the same scene captured in different
                                          periods of the day: morning, noon, and afternoon, respec-                         (c)                       (d)
                                          tively. As shown in the Figure. 6(b),(d), and (f), our method
                                          can detect the foreground and shadows correctly. Table 5             Figure 4. Indoor sequence: ”Laboratory”. (a)
                                          and 5 show the quantitative comparison with previous ap-             Frame in the sequence. (b) Detection result
                                          proaches. The shadow detection and discriminative rate η             by the weak shadow detector. (c) The confi-
                                          and ξ follow the metrics described in [13]. The readers              dence map by GSM. (d) Foreground and Seg-
                                          should refer to [13] for exact equations.                            mentation result
inria-00325645, version 1 - 29 Sep 2008




                                                         (a)                          (b)
                                                                                                                      (a)               (b)                 (c)

                                                                                                               Figure 5. Effect of global shadow model in
                                                                                                               the sequence ”Intelligent Room”. (a) Frame
                                                                                                               in the sequence. (b) Detection result without
                                                                                                               using GSM (c) Detection result with GSM
                                                         (c)                          (d)

                                             Figure 3. Outdoor sequence: ”Highway”. (a)
                                             Frame in the sequence. (b) Detection result
                                             by the weak shadow detector. (c) The confi-                     complex lighting conditions. To solve the slow convergence
                                             dence map by GSM. (d) Foreground and Seg-                      speed of local shadow model, we maintain a global shadow
                                             mentation result                                               model to predict the confidence value of a given sample,
                                                                                                            and update the local shadow model through confidence-
                                                                                                            rated learning. We have developed a nonparametric fore-
                                                                                                            ground model that exploits the spatial color characteristics,
                                          6. Conclusion and Future Work                                     free from the assumption that the object motion is slow.
                                                                                                            The likelihoods of background, shadow, and foreground
                                              In this paper, we have presented a general model for fore-    are built into MRF energy function in which an optimal
                                          ground/shadow/background segmentation. There are sev-             global inference can be achieved by the graph cut algorithm.
                                          eral novelties in this work. For cast shadow detection, we        The effectiveness and robustness of the proposed method
                                          have proposed a pixel-based model to describe the prop-           have been validated on both indoor and outdoor surveillance
                                          erties of shadows. The pixel-based model has the advan-           videos. We have only introduced the color feature in this
                                          tage over the global model that it can adapt to the local illu-   work. In the future, other features such as edge or motion
                                          mination conditions, particularly for the background under        can also be easily incorporated into our framework.
Table 2. Quantitative comparison on ”Intelli-
                                                                                                              gent Room” sequence.

                                                                                                                            Method          η%          ξ%
                                                   (a)                   (c)                   (e)                         Proposed       83.12%      94.31%
                                                                                                                           SNP [13]       72.82%      88.90%
                                                                                                                            SP [13]       76.27%      90.74%
                                                                                                                          DNM1 [13]       78.61%      90.29%
                                                                                                                          DNM2 [13]       62.00%      93.89%
                                                  (b)                    (d)                   (f)
                                                                                                             [5] S. Geman and D. Geman. Stochastic relaxation, gibbs dis-
                                            Figure 6. Different periods of the day in the                        tributions, and the bayesian restoration of images. PAMI,
                                            ”Entrance” sequence. (a) In the morning. (c)                         pages 721–741, Nov. 1984.
                                            At noon. (e) In the afternoon. (b)(d)(f) Detec-                  [6] T. Horprasert, D. Harwood, and L. S. Davis. A statistical
                                            tion results of frame in (a),(c), and (e), respec-                   approach for real-time robust background subtraction and
                                            tively                                                               shadow detection. In Proc. ICCV Frame-rate Workshop,
                                                                                                                 1999.
                                                                                                             [7] D. S. Lee. Effective gaussian mixture learning for video
                                                                                                                 background subtraction. PAMI, 27(5):827–832, 2005.
                                                                                                             [8] Z. Liu, K. Huang, T. Tan, and L. Wang. Cast shadow re-
inria-00325645, version 1 - 29 Sep 2008




                                            Table 1. Quantitative comparison on ”High-                           moval combining local and global features. In Proc. CVPR,
                                            way” sequence.                                                       pages 1–8, 2007.
                                                                                                             [9] N. Martel-Brisson and A. Zaccarin. Learning and removing
                                                           Method        η%         ξ%                           cast shadows through a multidistribution approach. PAMI,
                                                          Proposed     76.76%     95.12%                         29(7):1133–1146, 2007.
                                                                                                            [10] S. Nadimi and B. Bhanu. Physical models for moving
                                                          SNP [13]     81.59%     63.76%
                                                                                                                 shadow and object detection in video. PAMI, 26(8):1079–
                                                           SP [13]     59.59%     84.70%
                                                                                                                 1087, 2004.
                                                         DNM1 [13]     69.72%     76.93%                    [11] F. Porikli and J. Thornton. Shadow flow: a recursive method
                                                         DNM2 [13]     75.49%     62.38%                         to learn moving cast shadows. In Proc. ICCV, pages 891–
                                                                                                                 898 Vol. 1, Oct. 2005.
                                                                                                            [12] R. Potts. Some generalized order-disorder transformations.
                                          Acknowledgment                                                         In Proc. of the Cambridge Philosoph. Soc., volume 48,
                                                                                                                 page 81, 1952.
                                                                                                            [13] A. Prati, I. Mikic, M. Trivedi, and R. Cucchiara. Detect-
                                            This research was supported by the National Science                  ing moving shadows: algorithms and evaluation. PAMI,
                                          Council of Taiwan under Grant No. NSC 96-3113-H-001-                   25(7):918–923, 2003.
                                          011 and NSC 95-2221-E-001-028-MY3.                                [14] S. R. Sain. Multivariate locally adaptive density estimation.
                                                                                                                 Comput. Stat. Data Anal., 39(2):165–186, 2002.
                                                                                                            [15] O. Schreer, I. Feldmann, U. Golz, and P. A. Kauff. Fast and
                                          References                                                             robust shadow detection in videoconference applications. In
                                                                                                                 IEEE Int’l Symp. Video/Image Processing and Multimedia
                                                                                                                 Communications, pages 371–375, 2002.
                                           [1] C. Benedek and T. Sziranyi. Bayesian foreground and          [16] Y. Sheikh and M. Shah. Bayesian modeling of dynamic
                                               shadow detection in uncertain frame rate surveillance             scenes for object detection. PAMI, 27(11):1778–1792, Nov.
                                               videos. IEEE Trans. Image Processing, 17(4):608–621,              2005.
                                               April 2008.                                                  [17] C. Stauffer and W. E. L. Grimson. Adaptive background
                                           [2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate en-         mixture models for real-time tracking. In Proc. CVPR, vol-
                                               ergy minimization via graph cuts. PAMI, 23(11):1222–1239,         ume 2, pages –252, 1999.
                                               Nov 2001.                                                    [18] M. Wand and M. Jones. Kernel Smoothing. Chapman &
                                           [3] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati. Detect-        Hall/CRC, 1995.
                                                                                                            [19] Y. Wang, K.-F. Loe, and J.-K. Wu. A dynamic conditional
                                               ing moving objects, ghosts, and shadows in video streams.
                                                                                                                 random field model for foreground and shadow segmenta-
                                               PAMI, 25(10):1337–1342, 2003.
                                                                                                                 tion. PAMI, 28(2):279–289, Feb. 2006.
                                           [4] G. S. K. Fung, N. H. C. Yung, G. K. H. Pang, and A. H. S.    [20] Z. Wei, F. Xiang Zhong, X. K. Yang, and Q. M. J. Wu. Mov-
                                               Lai. Effective moving cast shadow detection for monoc-            ing cast shadows detection using ratio edge. IEEE Trans.
                                               ular color traffic image sequences. Optical Engineering,           Multimedia, 9(6):1202–1214, 2007.
                                               41(6):1425–1440, 2002.

Weitere ähnliche Inhalte

Was ist angesagt?

Literature study article
Literature study articleLiterature study article
Literature study articleakaspers
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person trackingsajit1975
 
Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...IDES Editor
 
Retrieving Informations from Satellite Images by Detecting and Removing Shadow
Retrieving Informations from Satellite Images by Detecting and Removing ShadowRetrieving Informations from Satellite Images by Detecting and Removing Shadow
Retrieving Informations from Satellite Images by Detecting and Removing ShadowIJTET Journal
 
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...ijtsrd
 
Fcv learn sudderth
Fcv learn sudderthFcv learn sudderth
Fcv learn sudderthzukun
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...c.choi
 
Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...
Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...
Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...Kalle
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningYu Huang
 
SAL3D presentation - AQSENSE's 3D machine vision library
SAL3D presentation - AQSENSE's 3D machine vision librarySAL3D presentation - AQSENSE's 3D machine vision library
SAL3D presentation - AQSENSE's 3D machine vision libraryAQSENSE S.L.
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iiiYu Huang
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Slide_N
 
E Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object ValidityE Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object ValidityTrimble Geospatial Munich
 

Was ist angesagt? (20)

Literature study article
Literature study articleLiterature study article
Literature study article
 
26.motion and feature based person tracking
26.motion and feature based person tracking26.motion and feature based person tracking
26.motion and feature based person tracking
 
Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...Image Splicing Detection involving Moment-based Feature Extraction and Classi...
Image Splicing Detection involving Moment-based Feature Extraction and Classi...
 
Retrieving Informations from Satellite Images by Detecting and Removing Shadow
Retrieving Informations from Satellite Images by Detecting and Removing ShadowRetrieving Informations from Satellite Images by Detecting and Removing Shadow
Retrieving Informations from Satellite Images by Detecting and Removing Shadow
 
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
Shadow Detection and Removal using Tricolor Attenuation Model Based on Featur...
 
Fcv learn sudderth
Fcv learn sudderthFcv learn sudderth
Fcv learn sudderth
 
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
Real-time 3D Object Pose Estimation and Tracking for Natural Landmark Based V...
 
Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...
Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...
Dorr Space Variant Spatio Temporal Filtering Of Video For Gaze Visualization ...
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Stereo vision
Stereo visionStereo vision
Stereo vision
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Chapter2 DIP
Chapter2 DIP Chapter2 DIP
Chapter2 DIP
 
Depth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep LearningDepth Fusion from RGB and Depth Sensors by Deep Learning
Depth Fusion from RGB and Depth Sensors by Deep Learning
 
SAL3D presentation - AQSENSE's 3D machine vision library
SAL3D presentation - AQSENSE's 3D machine vision librarySAL3D presentation - AQSENSE's 3D machine vision library
SAL3D presentation - AQSENSE's 3D machine vision library
 
Deep vo and slam iii
Deep vo and slam iiiDeep vo and slam iii
Deep vo and slam iii
 
Lm342080283
Lm342080283Lm342080283
Lm342080283
 
F0164348
F0164348F0164348
F0164348
 
Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3Deferred Pixel Shading on the PLAYSTATION®3
Deferred Pixel Shading on the PLAYSTATION®3
 
E Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object ValidityE Cognition User Summit2009 S Lang Zgis Object Validity
E Cognition User Summit2009 S Lang Zgis Object Validity
 
Fb3110231028
Fb3110231028Fb3110231028
Fb3110231028
 

Andere mochten auch

the internet for translators
the internet for translatorsthe internet for translators
the internet for translatorsJustyna G_P
 
How to have a healthy life
How to have a healthy lifeHow to have a healthy life
How to have a healthy lifessheistheone
 
Marknadsföring av vuxutbildning i sociala medier
Marknadsföring av vuxutbildning i sociala medierMarknadsföring av vuxutbildning i sociala medier
Marknadsföring av vuxutbildning i sociala medierguest113593
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
 
Espn wsummit blogpdf
Espn wsummit blogpdfEspn wsummit blogpdf
Espn wsummit blogpdfEmma Cookson
 
24symbols - Read and Share
24symbols - Read and Share24symbols - Read and Share
24symbols - Read and Share24Symbols
 
Knowledge Translation and Social Technologies
Knowledge Translation and Social TechnologiesKnowledge Translation and Social Technologies
Knowledge Translation and Social TechnologiesThom Kearney
 
Media studies as
Media studies asMedia studies as
Media studies asguest83f870
 
Community Stress Prevention Center
Community Stress Prevention CenterCommunity Stress Prevention Center
Community Stress Prevention CenterICSPC
 
Media studies as
Media studies asMedia studies as
Media studies asguest83f870
 
Don mc leod collaborative monkeys and other thoughts v1
Don mc leod collaborative monkeys and other thoughts v1 Don mc leod collaborative monkeys and other thoughts v1
Don mc leod collaborative monkeys and other thoughts v1 Thom Kearney
 
Spanishpower point
Spanishpower pointSpanishpower point
Spanishpower pointguest3b9684d
 
Caber bok trans jogja diubah jadi rp 5
Caber bok trans jogja diubah jadi rp 5Caber bok trans jogja diubah jadi rp 5
Caber bok trans jogja diubah jadi rp 5bogasi
 
Media studies as
Media studies asMedia studies as
Media studies asguest83f870
 
1ferrabe
1ferrabe1ferrabe
1ferrabeMaitane
 

Andere mochten auch (20)

the internet for translators
the internet for translatorsthe internet for translators
the internet for translators
 
How to have a healthy life
How to have a healthy lifeHow to have a healthy life
How to have a healthy life
 
Marknadsföring av vuxutbildning i sociala medier
Marknadsföring av vuxutbildning i sociala medierMarknadsföring av vuxutbildning i sociala medier
Marknadsföring av vuxutbildning i sociala medier
 
Juegos populares.
Juegos populares.Juegos populares.
Juegos populares.
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 
Espn wsummit blogpdf
Espn wsummit blogpdfEspn wsummit blogpdf
Espn wsummit blogpdf
 
24symbols - Read and Share
24symbols - Read and Share24symbols - Read and Share
24symbols - Read and Share
 
Knowledge Translation and Social Technologies
Knowledge Translation and Social TechnologiesKnowledge Translation and Social Technologies
Knowledge Translation and Social Technologies
 
Harvard Law Bulletin
Harvard Law BulletinHarvard Law Bulletin
Harvard Law Bulletin
 
Media studies as
Media studies asMedia studies as
Media studies as
 
Community Stress Prevention Center
Community Stress Prevention CenterCommunity Stress Prevention Center
Community Stress Prevention Center
 
Media studies as
Media studies asMedia studies as
Media studies as
 
Don mc leod collaborative monkeys and other thoughts v1
Don mc leod collaborative monkeys and other thoughts v1 Don mc leod collaborative monkeys and other thoughts v1
Don mc leod collaborative monkeys and other thoughts v1
 
Confirmation 18 & 19 done
Confirmation 18 & 19 doneConfirmation 18 & 19 done
Confirmation 18 & 19 done
 
Spanishpower point
Spanishpower pointSpanishpower point
Spanishpower point
 
Caber bok trans jogja diubah jadi rp 5
Caber bok trans jogja diubah jadi rp 5Caber bok trans jogja diubah jadi rp 5
Caber bok trans jogja diubah jadi rp 5
 
Media studies as
Media studies asMedia studies as
Media studies as
 
Enterprise & Transcoimmunication
Enterprise & TranscoimmunicationEnterprise & Transcoimmunication
Enterprise & Transcoimmunication
 
1ferrabe
1ferrabe1ferrabe
1ferrabe
 
Cop
CopCop
Cop
 

Ähnlich wie Learning Moving Cast Shadows for Foreground Detection (VS 2008)

Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Jia-Bin Huang
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)Jia-Bin Huang
 
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...IJSRD
 
IJARCCE 22
IJARCCE 22IJARCCE 22
IJARCCE 22Prasad K
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingMark Kilgard
 
Dj31514517
Dj31514517Dj31514517
Dj31514517IJMER
 
Dj31514517
Dj31514517Dj31514517
Dj31514517IJMER
 
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES sipij
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionPetteriTeikariPhD
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfSofianeHassine2
 
Blind seperation image sources via adaptive dictionary learning
Blind seperation image sources via adaptive dictionary learningBlind seperation image sources via adaptive dictionary learning
Blind seperation image sources via adaptive dictionary learningMohan Raj
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)npinto
 
Effective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.IEffective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.IIJMTST Journal
 
Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...
Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...
Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...Umbra
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshiftirisshicat
 
Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317Yashwant Kurmi
 
A Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesA Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesIRJET Journal
 

Ähnlich wie Learning Moving Cast Shadows for Foreground Detection (VS 2008) (20)

Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
Moving Cast Shadow Detection Using Physics-based Features (CVPR 2009)
 
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
A Physical Approach to Moving Cast Shadow Detection (ICASSP 2009)
 
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...
Multiple Person Tracking with Shadow Removal Using Adaptive Gaussian Mixture ...
 
IJARCCE 22
IJARCCE 22IJARCCE 22
IJARCCE 22
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated RenderingPractical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
Practical and Robust Stenciled Shadow Volumes for Hardware-Accelerated Rendering
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
Dj31514517
Dj31514517Dj31514517
Dj31514517
 
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
HUMAN ACTION RECOGNITION IN VIDEOS USING STABLE FEATURES
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer Vision
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdf
 
Blind seperation image sources via adaptive dictionary learning
Blind seperation image sources via adaptive dictionary learningBlind seperation image sources via adaptive dictionary learning
Blind seperation image sources via adaptive dictionary learning
 
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
MIT 6.870 - Template Matching and Histograms (Nicolas Pinto, MIT)
 
Effective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.IEffective Object Detection and Background Subtraction by using M.O.I
Effective Object Detection and Background Subtraction by using M.O.I
 
Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...
Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...
Shadow Caster Culling for Efficient Shadow Mapping (Authors: Jiří Bittner, Ol...
 
sibgrapi2015
sibgrapi2015sibgrapi2015
sibgrapi2015
 
L010427275
L010427275L010427275
L010427275
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshift
 
Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317Kurmi 2015-ijca-905317
Kurmi 2015-ijca-905317
 
A Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing ApproachesA Survey on Single Image Dehazing Approaches
A Survey on Single Image Dehazing Approaches
 

Mehr von Jia-Bin Huang

How to write a clear paper
How to write a clear paperHow to write a clear paper
How to write a clear paperJia-Bin Huang
 
Research 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXResearch 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXJia-Bin Huang
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash CourseJia-Bin Huang
 
Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Jia-Bin Huang
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialJia-Bin Huang
 
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Jia-Bin Huang
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB CodeJia-Bin Huang
 
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Jia-Bin Huang
 
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Jia-Bin Huang
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and RecognitionJia-Bin Huang
 
Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Jia-Bin Huang
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang
 
Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual trackingJia-Bin Huang
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression EnhancementJia-Bin Huang
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionJia-Bin Huang
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucJia-Bin Huang
 
Static and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionStatic and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionJia-Bin Huang
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionJia-Bin Huang
 

Mehr von Jia-Bin Huang (20)

How to write a clear paper
How to write a clear paperHow to write a clear paper
How to write a clear paper
 
Research 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeXResearch 101 - Paper Writing with LaTeX
Research 101 - Paper Writing with LaTeX
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.Applying for Graduate School in S.T.E.M.
Applying for Graduate School in S.T.E.M.
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Writing Fast MATLAB Code
Writing Fast MATLAB CodeWriting Fast MATLAB Code
Writing Fast MATLAB Code
 
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
Image Completion using Planar Structure Guidance (SIGGRAPH 2014)
 
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
Toward Accurate and Robust Cross-Ratio based Gaze Trackers Through Learning F...
 
Real-time Face Detection and Recognition
Real-time Face Detection and RecognitionReal-time Face Detection and Recognition
Real-time Face Detection and Recognition
 
Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013Transformation Guided Image Completion ICCP 2013
Transformation Guided Image Completion ICCP 2013
 
Jia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum VitaeJia-Bin Huang's Curriculum Vitae
Jia-Bin Huang's Curriculum Vitae
 
Pose aware online visual tracking
Pose aware online visual trackingPose aware online visual tracking
Pose aware online visual tracking
 
Face Expression Enhancement
Face Expression EnhancementFace Expression Enhancement
Face Expression Enhancement
 
Image Smoothing for Structure Extraction
Image Smoothing for Structure ExtractionImage Smoothing for Structure Extraction
Image Smoothing for Structure Extraction
 
Three Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiucThree Reasons to Join FVE at uiuc
Three Reasons to Join FVE at uiuc
 
Static and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture RecognitionStatic and Dynamic Hand Gesture Recognition
Static and Dynamic Hand Gesture Recognition
 
Real-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes RecognitionReal-Time Face Detection, Tracking, and Attributes Recognition
Real-Time Face Detection, Tracking, and Attributes Recognition
 

Learning Moving Cast Shadows for Foreground Detection (VS 2008)

  • 1. Author manuscript, published in "The Eighth International Workshop on Visual Surveillance - VS2008, Marseille : France (2008)" Learning Moving Cast Shadows for Foreground Detection Jia-Bin Huang, and Chu-Song Chen Institute of Information Science, Academia Sinica, Taipei, Taiwan {jbhuang,song}@iis.sinica.edu.tw Abstract around the shadow points remains unchanged since shad- ows do not alter the background surfaces. We present a new algorithm for detecting foreground and Much research has been devoted to moving cast shadow moving shadows in surveillance videos. For each pixel, we detection. Some methods are based on the observation that use the Gaussian Mixture Model (GMM) to learn the be- the luminance values of shadowed pixels decrease respect havior of cast shadows on background surfaces. The pixel- to the corresponding background while maintaining chro- based model has the advantages over regional or global maticity values. For examples, Horprasert et al. [6] used model for their adaptability to local lighting conditions, a computational color model which separates the bright- particularly for scenes under complex illumination condi- inria-00325645, version 1 - 29 Sep 2008 ness from chromaticity component and define brightness tions. However, it would take a long time for convergence and chromaticity distortion. Cucchiara et al. [3] and Schreer if motion is rare on that pixel. We hence build a global et al. [15] addressed the shadow detection problem in HSV shadow model that uses global-level information to over- and YUV color space respectively and detected shadows by come this drawback. The local shadow models are up- exploiting the color differences between shadow and back- dated through confidence-rated GMM learning, in which ground. Nadimi et al. [10] proposed a spatial-temporal the learning rate depends on the confidence predicted by the albedo test and a dichromatic reflection model to separate global shadow model. For foreground modeling, we use a cast shadows for moving objects. nonparametric density estimation method to model the com- Texture features extracted from spatial domain had also plex characteristics of the spatial and color information. Fi- been used to detect shadows. Zhang et al. [20] proved that nally, the background, shadow, and foreground models are ratio edge is illumination invariant and the distribution of built into a Markov random field energy function that can normalized ratio edge difference is a chi-square distribu- be efficiently minimized by the graph cut algorithm. Ex- tion. A significance test was then used to detect shadows. perimental results on various scene types demonstrate the Fung et al. [4] computed a confidence score for shadow effectiveness of the proposed method. detection based on the characteristics of shadows in lumi- nance, chrominance, gradient density, and geometry do- mains. However, the above-mentioned methods require to 1. Introduction set parameters for different scenes and can not handle com- plex and time-varying lighting conditions. A comprehen- Moving object detection is at the core of many com- sive survey of moving shadow detection approaches was puter vision applications, including video conference, vi- presented in [13]. sual surveillance, and intelligent transportation system. Recently, the statistical prevalence of cast shadows had However, moving cast shadow points are often falsely la- been exploited to learn shadows in the scenes. In [9], beled as foreground objects. This may severely degrade the Martel-Brisson et al. used the Gaussian mixture model accuracy of object localization and detection. Therefore, an (GMM) to describe moving cast shadows. Proikli et al. [11] effective shadow detection method is necessary for accurate proposed a recursive method to learn cast shadows. Liu et foreground segmentation. al. [8] presented to remove shadow using multi-level infor- Moving shadows in the scene are caused by the occlu- mation in HSV color space. The major drawback in [9] sion of light sources. Shadows reduce the total energy in- and [11] is that their shadow models need a long time to cident at the background surfaces where the light sources converge while the lighting conditions should remain sta- are partially or totally blocked by the foreground objects. ble. Moreover, in [9], the shadow model is merged into the Hence, shadow points have lower luminance values but sim- background model. The Gaussian states for shadows will ilar chromaticity values. Also, the texture characteristic be discarded in the background model when there are no
  • 2. shadows for a long period. Therefore, these two approaches are less effective in a real-world environment. Liu et al. [8] attempted to improve the convergence speed of pixel-based shadow model by using multi-level information. They used region-level information to get more samples and global- level information to update a pre-classifier. However, the method still suffers from the slow learning of the conven- tional GMM [17], and the pre-classifier will become less discriminative in scenes having different types of shadows, e.g. light or heavy. Another foreground and shadow segmentation ap- proaches are through Bayesian methods, which construct background, shadow, and foreground model to evaluate the Figure 1. Flow chart of the proposed algo- data likelihood of the observed values [19], [1]. These two rithm approaches used global parameters to characterize shadows in an image, which are constant in [19] and probabilistic in [1]. Although the global models are free from the con- vergence speed problem, they lose the adaptability to lo- cal characteristics and cannot handle scenes with complex lp ∈ {BG, SD, F G} corresponds to three classes: back- lightings. ground, shadow, and foreground, respectively. In this paper, we propose a Bayesian approach to mov- The global labeling field L = {lp |p ∈ P } is modeled as inria-00325645, version 1 - 29 Sep 2008 ing shadow and object detection. We learn the behav- MRF [5]. The first order energy function of MRF can be ior of cast shadows on surfaces by GMM for each pixel. decomposed as a sum of two terms: A global shadow model (GSM) is maintained to improve the convergence speed of the local shadow model (LSM). This is achieved by updating the LSM through confidence- rated GMM learning, in which the learning rates are di- E(L) = Edata (L) + Esmooth (L), (1) rectly proportional to the confidence values predicted by the = [Dp (lp ) + Vp,q (lp , lq )] GSM. The convergence speed of LSM is thus significantly p∈P q∈Np improved. We also present a novel description for fore- ground modeling, which uses local kernel density estima- tion to model the spatial color information instead of tem- in which the first term Edata (L) evaluates the likelihood of poral statistics. The background, shadow, and foreground each pixel belonging to one of the three classes, the sec- are then segmented by the graph cut algorithm, which min- ond term Esmooth (L) imposes a spatial consistency of la- imizes a Markov Random Field (MRF) energy. A flow dia- bels through a pairwise interaction MRF prior, Dp (lp ) and gram of the proposed algorithm is illustrated in Fig. 1. Vp,q (lp , lq ) are the data and smoothness energy terms re- The remainder of this paper is organized as follows. The spectively, and the set Np denotes the 4-connected neigh- formulation of the MRF energy function is presented in boring pixels of point p. The energy function in (1) can be Section 2. In Section 3, we describe the background and efficiently minimized by the graph cut algorithm [2], so that shadow model, primarily focusing on how to learn cast an approximating maximum a posteriori (MAP) estimation shadows. Section 4 presents a nonparametric method for of the labeling field can be obtained. estimating foreground probability. Experimental results are The Potts model [12] is used as a MRF prior to stress presented in Section 5. Section 6 concludes the paper. spatial context and the data cost Edata (L) is defined as 2. Energy Minimization Formulation Edata (L) = Dp (lp ) = − log p(zp |lp ), (2) We pose the foreground/background/shadow segmenta- p∈P p∈P tion as an energy minimization problem. Given an input video sequence, a frame at time t is represented as an array zt = (zt (1), zt (2), ...zt (p), ...zt (N )) in RGB color space, where p(zp |lp ) is the likelihood of a pixel p belonging to where N is the number of pixels in each frame. The seg- background, shadow, or foreground. In the following sec- mentation is to assign a label lp to each pixel p ∈ P , tions, we will show how to learn and compute these likeli- where P is the set of pixels in the image and the label hoods.
  • 3. 3. Background and Shadow Model shadow classifier is not to detect moving shadows accu- rately, but to determine whether a pixel value fits with the 3.1. Modeling the Background property of cast shadows. For simplicity, we pose our prob- lem in RGB color space. Since cast shadows on a surface For each pixel, we model the background color infor- reduce luminance values and change the saturation, we de- mation by the well-known GMM [17] with KBG states in fine the potential shadow values fall into the conic volume RGB color space. The first B states with higher weights around the corresponding background color [11], as illus- and smaller variances in the mixture of KBG distributions trated in Fig. 2. For a moving pixel p, we evaluate the rela- are considered as background model. The index B is deter- tionship of the observation pixel values zt (p) and the corre- mined by sponding background model bt (p). The relationship can be characterized by two parameters: luminance ratio rl (p) and b angle variation θ(p), and can be written as: B = argminb ωBG,k > TBG , (3) k=1 bt (p) rl (p) = , (5) zt (p) cos(θ(p)) where TBG is the pre-defined weight threshold, and ωBG,k is the weight of the kth Gaussian state in the mixture model. zt (p), bt (p) θ(p) = arccos , (6) The likelihood of a given pixel p belonging to the back- zt (p) bt (p) ground can be written as (the subscript time t is ignored): where ·, · is the inner product operator, and · is the B norm of a vector. A pixel p is considered as a potential cast 1 p(z(p)|lp = BG) = ωBG,k G(z(p), µBG,k , ΣBG,k ), shadow point if it satisfies the following criteria: inria-00325645, version 1 - 29 Sep 2008 WBG k=1 (4) rmin < rl (p) < rmax , θ(p) < θmax , (7) B where WBG = j=1 ωBG,j is a normalization con- where rmax < 1 and rmin > 0 define the maximum bright- stant, and G(z(p), µBG,k , ΣBG,k ) is the probability den- ness and darkness respectively, and θmax is the allowed sity function of the kth Gaussian with parameters θBG,k = maximum angle variation. {µBG,k , ΣBG,k }, in which µBG,k is the mean vector and ΣBG,k is the covariance matrix. 3.2. Learning Moving Cast Shadow This subsection presents how to construct the shadow models from the background model. Since different fore- ground objects often block the light sources in a similar way, cast shadows on background surfaces are thus similar and independent of foreground objects. The ratio of shad- owed and illuminated value of a given surface point is con- sidered to be nearly constant. We exploit this regularity of shadows to describe the color ratio of a pixel under shadow and normal illumination. We first use the background model to detect moving pixels, which might consist of real fore- Figure 2. The weak shadow detector. The ob- grounds and cast shadows. The weak shadow detector in servation value zt (p) will be considered as Section 3.2.1 is then employed as a pre-filter to decide pos- shadow candidate if it falls into the shaded sible shadow points. These samples are used to train the area LSM in Section 3.2.2 and a GSM in Section 3.2.3 over time. The slow convergence speed of LSM is refined by the su- pervision of the GSM through confidence-rated learning in 3.2.2 Local Shadow Model (LSM) Section 3.3. The color ratio between shadowed and illuminated intensity 3.2.1 Weak Shadow Detector values of a given pixel p is represented as a vector of random variables: The weak shadow detector evaluates every moving pixels r g b detected by the background model to filter out some im- zt (p) zt (p) zt (p) possible shadow points. The design principle of the weak rt (p) = ( r (p) b (p) , b , g ), (8) bt t bt (p)
  • 4. where bi (p), i = r, g, b are the means of the most proba- t other words, a single pixel should be shadowed many times ble Gaussian distributions (with the highest ω/||Σ|| value) while under the same illuminating condition. This assump- i of the background model in three channels, and zt (p), i = tion is not always fulfilled, particularly on the regions where r, g, b are the observed pixel values. The weak shadow de- motion is rare. Thus, the LSM on its own is not effective tector is applied to moving regions detected by background enough to capture the characteristics of shadows. model to obtain possible shadow points. We call these pos- sible shadow points as ”shadow candidates”, which form 3.2.3 Global Shadow Model (GSM) the set Q. At the pixel level, we model rt (p), p ∈ Q using the GMM as the LSM to learn and describe the color feature To increase the convergence rate, we propose to update of shadows. the LSM using samples along with their confidence values. Once the LSM is constructed at pixel p, it can serve as This is achieved by maintaining a global shadow model, the shadow model to evaluate the likelihood of a local ob- which adds all shadow candidates in an image (rt (p), p ∈ servation zt (p). Since we describe shadow by its color ra- Q) into the model. In contrast to the LSM, the GSM ob- tio of the pixel intensity values between shadow and back- tains much more samples (every pixel p ∈ Q), and thus ground, the likelihood of an observation z(p) can be de- does not suffer from slow learning. We characterize shad- rived by the transformation from ”color ratio” to ”color”. ows of GSM in an image also by GMM. Let s(p) be a random variable describing the intensity val- To evaluate the confidence value of a pixel p ∈ Q, ues of a pixel p when the pixel is shadowed. Then, s(p) can we first compute the color feature rt (p) by (8) and check be characterized by the multiplication of two random vari- against every states in the GSM. If rt (p) is associated with ables: s(p) = r(p)b(p), where b(p) is the most probable the mth state in the mixture of GSM, described as Gm , the Gaussian distribution with parameter {µBG,1 , ΣBG,1 }, and confidence value can be approximated by the probability of inria-00325645, version 1 - 29 Sep 2008 r(p) is the color ratio modeled as GMM with parameter set- Gm being considered as shadows: tings {ωCR,k , µCR,k , ΣCR,k }, in which CR denotes ”color C(rt (p)) = p(lp = SD|rt (p)) = p(lp = SD|Gm ). (13) ratio”. When the pixel p is shadowed, the likelihood can be written as: In GMM, Gaussian states with higher prior probabilities and smaller variances would be considered as shadows. There- fore, we approximate p(SD|Gm ) using logistic regression S 1 similar to that in [7] for background subtraction. On the p(z(p)|lp = SD) = ωSD,k G(z(p), µSD,k , ΣSD,k ), other hand, if there are no states in GSM associated with WSD k=1 the input color feature value rt (p), the confidence value is (9) set to zero. where normalization constant WSD and S are determined For each pixel p ∈ Q, we evaluate the confidence in similar ways as in (3) with threshold TSD ; the weight value C(rt (p)) predicted by the GSM and then update the ωSD,k , mean vector µSD,k and the diagonal covariance ma- LSM through confidence-rated learning (presented in Sec- trix ΣSD,k of the kth are: tion 3.3). With the proposed learning approach, the model needs not to obtain numerous samples to converge, but a few ωSD,k = ωCR,k (10) samples having high confidence value are sufficient. The convergence rate is thus significantly improved. µSD,k = µBG,1 ∗ µCR,k (11) ΣSD,k = µ2 2 BG,1 ΣCR,k + µCR,k ΣBG,1 + (12) 3.3. Confidence-rated Gaussian Mixture ΣCR,k ΣBG,1 Learning Benedek et al. [1] also defined color features in CIELuv We present an effective Gaussian mixture learning algo- color space in a similar form and performed a background- rithm to overcome some drawbacks in conventional GMM shadow color value transformation. However, they did not learning approach [17]. Let αω and αg be the learning rates take the uncertainty of the background model into consider- for the weight and the Gaussian parameters (means and ation. Thus, the parameter estimation of the shadow model variances) in the LSM, respectively. The updating scheme might become inaccurate for scenes with high noise level. follows the the formulation of the combination of incremen- As gathering more samples, the precision of the LSM tal EM learning and recursive filter: will become more accurate. With the online learning abil- 1 − αdef ault ity, the LSM can not only adapt to local characteristics of αω = C(rt ) ∗ ( ) + αdef ault , (14) ΣK cj,t j=1 the background, but also to global time-varying illumina- tion conditions. However, the LSM requires several train- 1 − αdef ault αg = C(rt ) ∗ ( ) + αdef ault , (15) ing samples for the estimated parameters to converge. In ck,t
  • 5. where ck,t is the number of match of the kth Gaussian state, Algorithm 1: Confidence-Rated Learning Proce- and αdef ault is a small constant, which is 0.005 in this pa- dure in One Dimension per. In the initial learning stage, the total number of match User-defined Variables : K, ωinit , σinit , αdef ault of the pixel ΣK cj,t is small and thus the learning rate for j=1 Initialization : ∀j=1...K ωj = 0 , µj = Inf , weight is relatively large. As time goes on, the pixel be- σj = σinitial , cj = 0 comes stable and the learning rate approaches to the type while new observed data rt with confidence C(rt ) do of recursive filter. Similarly, the learning rate for Gaussian 1−α αω = C(rt ) ∗ ( ΣKdef ault ) + αdef ault parameters αg is relatively large for newly-generated states. cj,t j=1 Instead of blind update scheme, which treats each sample if rt is associated with the kth Gaussian then in the same way, we propose to update a sample with its for j ← 1to K do confidence value. The two learning rates are controlled by a if j=k then Mj,t =1 else Mj,t = 0 confidence value C(rt ), which indicates how confident the ωj,t = (1 − αω ) · ωj,t−1 + αω · Mj,t sample belongs to the class. Observations with higher con- end fidence values will converge faster than those with low ones. ck,t = ck,t−1 + 1 1−α The confidence-rated learning procedure in one dimension αg = C(rt ) ∗ ( cdef ault ) + αdef ault k,t is described in Algorithm 1. µk,t = (1 − αg ) · µk,t−1 + αg · rt Here, we describe the drawbacks of the conventional σk,t = (1 − αg ) · σk,t−1 + αg · (rt − µk,t−1 )2 2 2 GMM learning [17] for background modeling and how does else the proposed learning approach overcome these disadvan- for j ← 1to K do tages. Firstly, the method suffers from slow learning in the ωj,t = (1 − αω ) · ωj,t−1 initial stage. If the first value of a pixel is a foreground 2 2 inria-00325645, version 1 - 29 Sep 2008 µj,t = µj,t−1 , σj,t = σj,t−1 object, the background model will have only one Gaussian end state with weight equaling to unity. It will take a long time ωj,t k = argminj { σj,t } for the true pixel values to be considered as part of back- ck,t = 1 ground. In the initial learning stage, our method will follow 1−ω ω,t = C(zt ) ∗ ( ΣK init ) + ωinit cj,t the incremental EM learning and approaches to recursive 2 j=1 2 filter over time. Therefore, we do not suffer from the slow µk,t = rt , σk,t = σinitial learning in the initial stage. Secondly, the method faces end the trade-off problem between model convergence speed end and stability. In order to maintain the system stability, a very small learning rate will be chosen to preserve a long learning history. However, a small learning rate results in slow convergence speed. While larger learning rate im- ground. In [16], they exploited the temporal persistence as proves the convergence speed, the model becomes unsta- a property to model foreground using joint domain-range ble. In the proposed method, the adaptability of the learn- nonparametric density function. However, the foreground ing rate for Gaussian parameters allows fast convergence model in the previous frame cannot provide an accurate de- for the newly-generated states without degrading the stabil- scription for foreground in the current frame when large ity. Lastly, there are tradeoffs regarding where to update motion occurs or new moving objects enter into the scene. in the image. Typically, there are two ways to update the Therefore, instead of exploring temporal statistics, we turn model: selective update and blind update Selective update to spatial color information in the current frame. We use the only adds samples being considered as background and thus background and shadow model to generate a partial fore- enhances the detection of foreground. Unfortunately, this ground mask, i.e. a set of pixels that can be confidently approach would fall into a deadlock situation whenever the labeled as foreground. Nonparametric method is used to incorrect update decision was made. On the other hand, the estimate the foreground probability because the spatial aug- blind update adds all samples into the model, thus it might mented GMM might face the problem of choosing the cor- result in more false negatives. By updating all samples with rect number of modes. Details are introduced below. corresponding confidence, the trade-off regarding how to update in the image can be avoid. In the partial foreground generation stage, we aim not to find all foreground pixels, but some samples from fore- ground. This is achieved by finding pixels whose inten- 4. Foreground Model and Segmentation sity values are impossible to be generated from the existing background and shadow model. In other words, the partial The foreground model has been described as a uniform foreground mask F can be generated by simple threshold- distribution [19], providing a weak description of the fore- ing:
  • 6. Algorithm 2: Learning Process F = {p|EBG (zt (p)) > κ, ESD (zt (p)) > κ}, (16) At time t, with segmentation fields St from the detection process where E(.) is the minus log of the likelihood of a pixel be- for each pixel p ∈ P do longing to background or shadow, κ is a threshold. Update the background model We can now estimate the foreground probability of if St (p) = BG then a query pixel p using samples from foreground. We if pixel p satisfies shadow property (Eq. 7) use the neighborhood pixels around a query point p to then gather M relevant samples, which forms the set Gp = Compute confidence value C(rt (p)) from {g1 , g2 , · · · , gM }. global shadow model (Eq. 13) When a pixel p is in the foreground, the probability can Update global shadow model be estimated using kernel density estimation method [18]: Update local shadow model at pixel p with confidence C(rt (p)) 1 M end p(zp |lp = F G) = KH (zp − gi ), (17) end M i=1 end where KH (x) = |H|−1/2 K(H1/2 x). The kernel, K, is taken to be a 3-variate density function with K(w)dw = 1, wK(w)dw = 0, and wwT K(w)dw = I3 , and H Algorithm 3: Detection Process is a symmetric positive definite 3x3 bandwidth matrix. We At time t, inria-00325645, version 1 - 29 Sep 2008 choose a common Gaussian density as our kernel K, for each pixel p ∈ P do Compute background likelihood 1 P (zt (p)|lp = BG) (Eq. 4) KH (x) = |H|−1/2 (2π)−d/2 exp(− xT H−1 x). (18) Compute shadow likelihood P (zt (p)|lp = SD) 2 Generate partial foreground mask F (Eq. 16) Although variable bandwidth kernel density estimators Find samples respect to pixel p from the can lead to improvement over kernel density estimators neighboring pixels of p ∈ F using global bandwidth [14], the computational cost of Compute foreground likelihood P (zp |lp = F G) choosing an optimal bandwidth is expensive for a real-time (Eq. 17) surveillance system. Hence, the bandwidth matrix H is as- end sumed to be diagonal, in which only two parameters are Construct the graph to minimize Eq. 1 defined: variances in spatial σs , and in color domain σc . Generate the segmentation fields St After computing the data likelihood of the local obser- vation zt (p) of three classes, we can perform maximum a posteriori (MAP) estimator for the label field L. The en- The shadows appearing in the video are dark and cover ergy function of Eq. 1 can be efficiently minimized by the large area in the scene. graph cut algorithms. A concrete description of the learning process and detec- • ”Laboratory” video is a typical indoor sequence with tion process are shown in Algorithms 2 and 3, respectively. light shadow. But, the lighting conditions are more complex than the outdoor scenes. 5. Experimental Results • ”Intelligent Room” [13] is an indoor sequence with low shadow strength. We have implemented and test the proposed approach on various scene types. In the LSM and GSM, three and • ”Entrance” [1] are typical surveillance videos captured five Gaussian states are assigned, respectively. The initial in different time of a day, thus with different lighting variance of both LSM and GSM are set as 0.0025. We use conditions. three Gaussian models for the background, and the covari- ance matrix is assumed diagonal. Figure. 3 and 4 demonstrate the effectiveness of the Here, we describe the test sequences we used in this pa- proposed method in both outdoor and indoor scenes. (a) per. shows the original image in the sequence. (b) shows the moving pixels detected by the background model and the • ”Highway” video is a sequence from the benchmark gray regions of (b) is labeled as shadow points by the weak set [13] . This sequence shows a traffic environment. shadow detector. Note that the weak shadow detector has
  • 7. a very high detection rate, but with some false detections. (c) is the confidence map predicted by GSM. The lighter these regions are, the higher confidence value the model predicts. (d) shows the segmentation result after perform- ing the graph cut algorithm. The effect of GSM is illustrated by Figure 5. The de- tection result without using the GSM is presented in Figure (a) (b) 5(b), in which some portions of real foreground are miss la- beled as shadows. This is due to the slow learning of the LSM. In Figures 5(c), the foreground objects are accurately detected. Figure. 6 demonstrates the adaptability of the proposed algorithm to different lighting conditions. Figure. 6(a),(c), and (e) show the same scene captured in different periods of the day: morning, noon, and afternoon, respec- (c) (d) tively. As shown in the Figure. 6(b),(d), and (f), our method can detect the foreground and shadows correctly. Table 5 Figure 4. Indoor sequence: ”Laboratory”. (a) and 5 show the quantitative comparison with previous ap- Frame in the sequence. (b) Detection result proaches. The shadow detection and discriminative rate η by the weak shadow detector. (c) The confi- and ξ follow the metrics described in [13]. The readers dence map by GSM. (d) Foreground and Seg- should refer to [13] for exact equations. mentation result inria-00325645, version 1 - 29 Sep 2008 (a) (b) (a) (b) (c) Figure 5. Effect of global shadow model in the sequence ”Intelligent Room”. (a) Frame in the sequence. (b) Detection result without using GSM (c) Detection result with GSM (c) (d) Figure 3. Outdoor sequence: ”Highway”. (a) Frame in the sequence. (b) Detection result by the weak shadow detector. (c) The confi- complex lighting conditions. To solve the slow convergence dence map by GSM. (d) Foreground and Seg- speed of local shadow model, we maintain a global shadow mentation result model to predict the confidence value of a given sample, and update the local shadow model through confidence- rated learning. We have developed a nonparametric fore- ground model that exploits the spatial color characteristics, 6. Conclusion and Future Work free from the assumption that the object motion is slow. The likelihoods of background, shadow, and foreground In this paper, we have presented a general model for fore- are built into MRF energy function in which an optimal ground/shadow/background segmentation. There are sev- global inference can be achieved by the graph cut algorithm. eral novelties in this work. For cast shadow detection, we The effectiveness and robustness of the proposed method have proposed a pixel-based model to describe the prop- have been validated on both indoor and outdoor surveillance erties of shadows. The pixel-based model has the advan- videos. We have only introduced the color feature in this tage over the global model that it can adapt to the local illu- work. In the future, other features such as edge or motion mination conditions, particularly for the background under can also be easily incorporated into our framework.
  • 8. Table 2. Quantitative comparison on ”Intelli- gent Room” sequence. Method η% ξ% (a) (c) (e) Proposed 83.12% 94.31% SNP [13] 72.82% 88.90% SP [13] 76.27% 90.74% DNM1 [13] 78.61% 90.29% DNM2 [13] 62.00% 93.89% (b) (d) (f) [5] S. Geman and D. Geman. Stochastic relaxation, gibbs dis- Figure 6. Different periods of the day in the tributions, and the bayesian restoration of images. PAMI, ”Entrance” sequence. (a) In the morning. (c) pages 721–741, Nov. 1984. At noon. (e) In the afternoon. (b)(d)(f) Detec- [6] T. Horprasert, D. Harwood, and L. S. Davis. A statistical tion results of frame in (a),(c), and (e), respec- approach for real-time robust background subtraction and tively shadow detection. In Proc. ICCV Frame-rate Workshop, 1999. [7] D. S. Lee. Effective gaussian mixture learning for video background subtraction. PAMI, 27(5):827–832, 2005. [8] Z. Liu, K. Huang, T. Tan, and L. Wang. Cast shadow re- inria-00325645, version 1 - 29 Sep 2008 Table 1. Quantitative comparison on ”High- moval combining local and global features. In Proc. CVPR, way” sequence. pages 1–8, 2007. [9] N. Martel-Brisson and A. Zaccarin. Learning and removing Method η% ξ% cast shadows through a multidistribution approach. PAMI, Proposed 76.76% 95.12% 29(7):1133–1146, 2007. [10] S. Nadimi and B. Bhanu. Physical models for moving SNP [13] 81.59% 63.76% shadow and object detection in video. PAMI, 26(8):1079– SP [13] 59.59% 84.70% 1087, 2004. DNM1 [13] 69.72% 76.93% [11] F. Porikli and J. Thornton. Shadow flow: a recursive method DNM2 [13] 75.49% 62.38% to learn moving cast shadows. In Proc. ICCV, pages 891– 898 Vol. 1, Oct. 2005. [12] R. Potts. Some generalized order-disorder transformations. Acknowledgment In Proc. of the Cambridge Philosoph. Soc., volume 48, page 81, 1952. [13] A. Prati, I. Mikic, M. Trivedi, and R. Cucchiara. Detect- This research was supported by the National Science ing moving shadows: algorithms and evaluation. PAMI, Council of Taiwan under Grant No. NSC 96-3113-H-001- 25(7):918–923, 2003. 011 and NSC 95-2221-E-001-028-MY3. [14] S. R. Sain. Multivariate locally adaptive density estimation. Comput. Stat. Data Anal., 39(2):165–186, 2002. [15] O. Schreer, I. Feldmann, U. Golz, and P. A. Kauff. Fast and References robust shadow detection in videoconference applications. In IEEE Int’l Symp. Video/Image Processing and Multimedia Communications, pages 371–375, 2002. [1] C. Benedek and T. Sziranyi. Bayesian foreground and [16] Y. Sheikh and M. Shah. Bayesian modeling of dynamic shadow detection in uncertain frame rate surveillance scenes for object detection. PAMI, 27(11):1778–1792, Nov. videos. IEEE Trans. Image Processing, 17(4):608–621, 2005. April 2008. [17] C. Stauffer and W. E. L. Grimson. Adaptive background [2] Y. Boykov, O. Veksler, and R. Zabih. Fast approximate en- mixture models for real-time tracking. In Proc. CVPR, vol- ergy minimization via graph cuts. PAMI, 23(11):1222–1239, ume 2, pages –252, 1999. Nov 2001. [18] M. Wand and M. Jones. Kernel Smoothing. Chapman & [3] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati. Detect- Hall/CRC, 1995. [19] Y. Wang, K.-F. Loe, and J.-K. Wu. A dynamic conditional ing moving objects, ghosts, and shadows in video streams. random field model for foreground and shadow segmenta- PAMI, 25(10):1337–1342, 2003. tion. PAMI, 28(2):279–289, Feb. 2006. [4] G. S. K. Fung, N. H. C. Yung, G. K. H. Pang, and A. H. S. [20] Z. Wei, F. Xiang Zhong, X. K. Yang, and Q. M. J. Wu. Mov- Lai. Effective moving cast shadow detection for monoc- ing cast shadows detection using ratio edge. IEEE Trans. ular color traffic image sequences. Optical Engineering, Multimedia, 9(6):1202–1214, 2007. 41(6):1425–1440, 2002.