SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
本著作採用創用CC 「姓名標示」授權條款台灣3.0版


Monocular Human Pose Estimation
    with Bayesian Networks
            Yuan-Kai Wang

     Electronic Engineering Department,
              Fu Jen University
                  2010/6/11
Wang, Yuan-Kai     Electronic Engineering Department, Fu Jen University   2




                         Outline
     1. Introduction
     2. Markless Monocular Human Pose
        Estimation
     3. Overview of the Approach
     4. Model Learning by EM algorithm
     5. Pose Estimation by Approximate Inference
     6. Feature Extraction
     7. Experimental Results
     8. Conclusions
Wang, Yuan-Kai        Electronic Engineering Department, Fu Jen University   3




                   1. Introduction
     • Applications of Human Motion
       Capture
           – Performance animation in movie making
           – Game
           – Medical diagnosis
           – Sport & Health
           – Visual surveillance
Wang, Yuan-Kai        Electronic Engineering Department, Fu Jen University   4




                 Performance Animation
     • Avatar                                  • The Lord of the
                                                 Rings
Wang, Yuan-Kai    Electronic Engineering Department, Fu Jen University   5




                            Game
     • Microsoft's Project Natal for XBOX360
Wang, Yuan-Kai      Electronic Engineering Department, Fu Jen University   6




                 Medical Diagnosis
     • Gait analysis for
       Rehabilitation
Wang, Yuan-Kai     Electronic Engineering Department, Fu Jen University   7




                 Sport & Health
     • Golf training
Wang, Yuan-Kai         Electronic Engineering Department, Fu Jen University   8




                 Visual Surveillance
     • Behavior analysis for event detection
           – Irregular movement, body language, and
             unusual interactions, fighting
           – Car crash
     • Content-based retrieval
Wang, Yuan-Kai              Electronic Engineering Department, Fu Jen University           9




                      Sensor Approaches
     • Active sensors
           – Types
                 • Electro-magnetic marker
                 • Optical
                 • Accelerometer
           – Wired connection
           – Drawbacks                                                              Too
                 • Intrusive                                                       Many
                 • Expensive                                                       Wires
                 • Time consuming
     • Passive sensors
       by camera
           – Marker-based
           – Markerless
Wang, Yuan-Kai               Electronic Engineering Department, Fu Jen University              10




                   Marker-based Sensors
     • Add visual markers on body
           – Active marker
                 • Visual/non-visual light
           – Passive marker
     • Need computer vision algorithms                                                Active
     • Advantages                                                                     marker
           – No wires
     • Drawbacks
           – Semi-intrusive                                                         Passive
           – Time consuming                                                         marker
Wang, Yuan-Kai        Electronic Engineering Department, Fu Jen University   11




                 Markerless Sensors
     • No attachment on human body
     • Heavily dependent on     Pure vision
       computer vision analyzer solution
           – Stereo/Multiple cameras
           – Monocular cameras
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University   12




                 Sensor v.s. Analyzer




T. B. Moeslund, "Computer vision-based human motion capture – a
survey", Technical report LIA 99-02, University of AALBORG, 1999.
Wang, Yuan-Kai         Electronic Engineering Department, Fu Jen University   13



                      Pose Estimation
                 v.s. Gesture Recognition

                       Pose Estimation




                      Gesture
                      Recognition
                                                                    Walking
Wang, Yuan-Kai   Electronic Engineering Department, Fu Jen University   14




                   2D v.s. 3D
Wang, Yuan-Kai          Electronic Engineering Department, Fu Jen University           15




                 2. Markerless Monocular
                 Human Motion Capture
      • Goal
            – Markless
            – Single camera
            – 3D poses
      • Challenges
            – Ill-posed
            – Highly articulated                                Depth ambiguities &
                                                                  occlusion using
            – Self-occluding                                   monocular silhouettes
Wang, Yuan-Kai       Electronic Engineering Department, Fu Jen University   16




                 Joint Representation
     • Articulated human body is linked by
       joints
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University        17




                 Abstract Representation
                     2D                                                          3D

   Stick



   Surface/
   Volume
Wang, Yuan-Kai                Electronic Engineering Department, Fu Jen University                                                                18




                       Literature Review
 Low-Level                                                                                                                High-Level
Observation                                                                                                               Abstraction
    • Background subtraction             P=f(S)
    • Object detection                                   P=f(F)                         P=f(J)
                                                                               Marker-based
        Image           Human               Image                      2D Joint                 3D Model
        Space        Segmentation          Feature                     Location             Parametric Space
    (Pixel domain)        (S)            Descriptor (F)                  (J)                (Pose domain, P)
                     • Full body     • Shape                       •Joint angle
                                                                                                           X


                     • Body          • Silhouette
                       parts         • Color                                  Θi                            Left                 Right

                                     • Appearance                                                         shoulder
                                                                                                                  Neck
                                                                                                                                shoulder


                                                                                                  Left                                Right

                                     • Motion                      •Joint
                                                                                                 elbow

                                                                                                  Left
                                                                                                                Left Bottom Right
                                                                                                                waist       waist
                                                                                                                                      elbow

                                                                                                                                     Right

                                     • Feature                      location
                                                                                                  hand                               hand     y


                                                                                                         Left                   Right

                                       point                                       Pi                    knee

                                                                                                         Left
                                                                                                                                knee

                                                                                                                                    Right
                                                                                                         foot

                                       (corner)                                              Z
                                                                                                                                     foot




                                     • ...


A two-stage approach is proposed                                    P=f1(f2(F))
Wang, Yuan-Kai                    Electronic Engineering Department, Fu Jen University   19




                                 Approaches
     • Model-free [Agarwal, 2006] [Loy, 2004]
           – No utilization of joints articulation to
             constrain the search of function mapping
             P = f(X)
     • Model-based [Rbert, 2006] [Rohr, 1994]
           – A model of human articulation to
             constrain the search of f and P
           – Two kinds of approach
                 • Discriminative
                 • Generative: Bayesian networks (BNs)
                   Training : f = arg max L1 (Training, f )
                              ˆ
                                           f

                   Inference : P = arg max L2 ( f | X , P)
                               ˆ                ˆ
                                            P
Wang, Yuan-Kai       Electronic Engineering Department, Fu Jen University                                                    20




                 An Articulated Model
                 = A Bayesian Network
      • Human body is represented as a
        kinematics tree, consisting of divisions
        linking by joints
      • Kinematics models are addressed with
                                                                                          X



        graphical probability network                                                      Left                 Right
                                                                                         shoulder              shoulder
                                                                                                 Neck



      • Graphical probability models are                                         Left
                                                                                elbow

                                                                                 Left
                                                                                 hand
                                                                                               Left Bottom Right
                                                                                               waist       waist
                                                                                                                     Right
                                                                                                                     elbow

                                                                                                                    Right
                                                                                                                    hand          y




        computed via Bayesian network                                                   Left
                                                                                        knee

                                                                                        Left
                                                                                        foot
                                                                                                               Right
                                                                                                               knee

                                                                                                                   Right
                                                                                                                    foot

                                                                            Z
Wang, Yuan-Kai                  Electronic Engineering Department, Fu Jen University                        21




                   Three Steps to Utilize BNs
      • Representation, learning and inference

                                                                                            X1
                                                                                                 Joints

f = arg max L1 (Training, f )
ˆ                                          Representation
           f
Feature-Joint correspondence                                                     X2         X3        X4
by Conditional
Probability                                                                                      Features
                                               Learning




                                                                                       X1

                       P(X1|X2,X3,X4)          Inference
  Pose Estimation
  P = arg max L ( f | X , P)
  ˆ               ˆ
                   2                                                        X2         X3        X4
               P
Wang, Yuan-Kai              Electronic Engineering Department, Fu Jen University   22




                 Two Causal Models in BNs
      • Undirected acyclic graph [Lan, 2008] [Hua, 2005]
           – Bayesian network is a tree or a graph model
             that the linking edge between two nodes has no
             direction.
                                            P(X1,X2)
                              X1                                       X2



      • Directed acyclic graph [Ramanan, 2007] [Lee, 2006] [Leonid, 2003]
           – Every node has directed arcs linked to another
             node.            P(X1|X2)
                              X1                                      X2
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University           23



          Directed Bayesian Articulated
                     Model
       • Nodes in directed acyclic graph (DAG) are
         not influenced by their child nodes.
       • Human body parts are not regarded as two-
         way                                   h2d,2


                  h2d,7     h2d,5     h2d,3     h2d,1    h2d,4      h2d,6        h2d,8




                                     h2d,10    h2d,9     h2d,11



                                     h2d,12              h2d,13




                                     h2d,14              h2d,15
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University            24




        Inference of Bayesian Networks
       • Top-down approach                                [Gavrila, 1996]


            – Has the strength at finding human body parts
              in the image.
       • Bottom-up approach                                 [Ren, 2005]


            – Has the strength at finding people in the image.
       • Combined approach                                 [Navaraman, 2005][Lee, 2002]


            – Has the benefit from the advantages of both.
Wang, Yuan-Kai                                   Electronic Engineering Department, Fu Jen University                              25




        3. Overview of the Approach
                                 2D                                                          X
                                                                                                    3D
                                    Head


                         Left                Right
                         shoulder            shoulder
                                                                                              Left                Right
                                    Neck                                                    shoulder
                                                        Right                                                    shoulder
                 Left                                                                               Neck
                 elbow                                  elbow
                                                                                   Left                                Right
                                    Bottom                                        elbow                                elbow
                 Left                                                                            Left Bottom Right
                                                        Right                                    waist       waist
                 hand                                                               Left                              Right
                                Left    Right           hand                                                          hand
                                                                                    hand                                       y
                                waist   waist
                         Left                   Right                                      Left                  Right
                         knee                   knee                                       knee                  knee
                                                                                          Left                       Right
                                                                                          foot                        foot
                         Left                    Right
                         foot                    foot
                                                                              Z


  They are belief propagation networks using
  an annealing Gibbs sampling algorithm.
Wang, Yuan-Kai                              Electronic Engineering Department, Fu Jen University                       26



                            System Architecture
       • We estimate the 2D human joint
         positions before 3D estimation.
                                                           Testing image

                        2D Model Training



                                    2D Bayesian
                                                              Feature
                                   Human Model
                                                             Extraction
                                      Setting
                                                                                             3D Model Training


                                                           2D Bayesian
                 Training                                                            3D Bayesian
                                                          Inference with
                 Features          EM Training                                      Human Model
                                                          Annealed Gibbs
                                                                                       Setting
                                                            Sampling



                                                           3D Bayesian
                                                          Inference with                                    Training
                                                                                     EM Training            Features
                                                          Annealed Gibbs
                                                            Sampling



                                                               Result
Wang, Yuan-Kai                                          Electronic Engineering Department, Fu Jen University                                27



             2D Human Graphical Model
     • The articulated structure of 2D human
       body is represented by a 15-node graphical
       model.
                                    Head
                                                                          H 2 D = {h2 d ,1 ,..., h2 d ,15}
                                                                                                           h2d,2
                         Left                Right
                         shoulder            shoulder
                                    Neck
                                                                            h2d,7     h2d,5      h2d,3     h2d,1   h2d,4    h2d,6   h2d,8
                 Left                                   Right
                 elbow                                  elbow

                                    Bottom
                 Left                                   Right
                 hand                                                                           h2d,10     h2d,9   h2d,11
                                Left    Right           hand
                                waist   waist
                         Left                   Right
                         knee                   knee                                            h2d,12             h2d,13


                         Left                    Right
                         foot                    foot
                                                                                                h2d,14             h2d,15

   2D stick figure (articulated model)
Wang, Yuan-Kai                                     Electronic Engineering Department, Fu Jen University                                       28



             3D Human Graphical Model
      • 3D human body model is described by a 45D
        vector H3D representing joint positions for
        dimensions of each joint node in the 3D space
                               X


                                                                         H 3 D = {h3d ,1 ,..., h3d ,15}
                                                                                                                   h3d,15
                                Left                  Right
                              shoulder              shoulder
                                      Neck                                                                         h3d,1
                                                                                                        h3d,2                 h3d,3
                      Left                                Right
                     elbow                                elbow
                                    Left Bottom Right                                           h3d,4                                 h3d,5
                                    waist       waist    Right
                      Left
                      hand                               hand            y
                                                                                                h3d,6              h3d,8              h3d,7
                                                                                                           h3d,9            h3d,10
                             Left                   Right
                             knee                   knee
                             Left                                                                       h3d,11                 h3d,12
                                                        Right
                             foot                        foot

                 Z                                                                                  h3d,13                      h3d,14

     3D stick figure (articulated model)
Wang, Yuan-Kai                    Electronic Engineering Department, Fu Jen University                                                     29




                              The BN Model
     • A directed acyclic graph
                 
                                                                                                                  h2d,2




         G = (V , E , C )
                                                                                         h2d,7   h2d,5    h2d,3   h2d,1   h2d,4    h2d,6   h2d,8




           – V: vertex set {Vi, 1≤i≤N}
             
                                                                                                         h2d,10   h2d,9   h2d,11




           – E : a set of directed edges (i,j)                                                           h2d,12           h2d,13




           – C: (i,j) → R+, edge cost functions                                                          h2d,14           h2d,15




     • To encode probabilistic information
           – An edge indicates a probabilistic
             dependence
           – C : P(Vi | Vj): conditional probability
             function set
     • The 2D and 3D BNs 
                                                                                  
                 G2 D = (V2 D , E2 D , C2 D )                 G3 D        = (V3 D , E3 D , C3 D )
Wang, Yuan-Kai             Electronic Engineering Department, Fu Jen University                                  30



                    2D Graphical Model

  V2 D = {H 2 D , O2 D }                                                      h2d,2



                                           h2d,7       h2d,5      h2d,3       h2d,1      h2d,4   h2d,6   h2d,8

            O2d :     Nc
                      S
                      A
                      C                                              h2d,9    h2d,8   h2d,10




   C2 D = {P(h2 d ,i | pa (h2 d ,i ))}                              h2d,11            h2d,12




                                                                    h2d,13            h2d,14
Wang, Yuan-Kai                                            Electronic Engineering Department, Fu Jen University         31




                                 3D Graphical Model
                h2d,3   h2d,1            h2d,9   h2d,4
                                                                                      V3 D = {H 3 D , O3 D }
                                                                                                                 h2d
               hu3d,2           hu3d,1           hu3d,3
                                                                                              O3d :
      h2d,5    hu3d,4                            hu3d,5      h2d,6
                                                                       Upper                                     wN
                                                                       body
      h2d,7    hu3d,6                            hu3d,7      h2d,8
                                                                                                                  L

              h2d,10            h2d,9            h2d,11
                                                                                     C3 D = {P(h3d ,i | pa (h3d ,i ))}
              hl3d,2            hl3d,1           hl3d,3



                                                                       Lower
    h2d,12    hl3d,4                             hl3d,5       h2d,13
                                                                       body
    h2d,14    hl3d,6                             hl3d,7      h2d,15
Wang, Yuan-Kai                                                     Electronic Engineering Department, Fu Jen University   32



                       Joint Probability Distribution
                                  (JPD)
     • The two proposed graphical models
       specify two unique JPDs:
       P2D(V2D) and P3D(V3D)
     • Let P(V) represent the two JPDs
                                                                                                 n
                                h2d,2                                 P(V ) = ∏ P(Vi | pa (Vi ))
      h2d,7   h2d,5   h2d,3     h2d,1      h2d,4   h2d,6   h2d,8
                                                                                               i =1
                                                               • The factorization of the JPD comes
                        h2d,9   h2d,8   h2d,10                   from the Markov Blanket, a local
                       h2d,11           h2d,12
                                                                 Markov property
                                                               • If we can learn the finite conditional
                       h2d,13           h2d,14

                                                                 probabilities, we can inference the
                                                                 human pose
Wang, Yuan-Kai          Electronic Engineering Department, Fu Jen University                                                           33




                     Two Problems
     • Training problem
           – Given a training set : {O2d, O3d}
           – How can we learn the edge cost function
             C = { P(h | pa(h)) }
                                                                                                    h2d,2

           – We apply the EM algorithm
                                                                          h2d,7   h2d,5   h2d,3     h2d,1      h2d,4   h2d,6   h2d,8


     • Inference problem
           – Given an evidence O                                                            h2d,9   h2d,8   h2d,10



           – How can we inference                                                          h2d,11           h2d,12


             the human pose
                                                                                           h2d,13           h2d,14

             P(H | O) by P(V)
           – We propose an annealed Gibbs sampling
             algorithm
Wang, Yuan-Kai              Electronic Engineering Department, Fu Jen University   34




             4. Model Learning by EM
     • Why apply the EM algorithm for model
       learning
           – The human poses and observations are
             incomplete and sparse
                 • Incomplete: occlusion due to single camera
                 • Sparse: small training samples in large-
                   dimension space
Wang, Yuan-Kai                         Electronic Engineering Department, Fu Jen University   35




                 The Likelihood Function
     • The training set D={D1,…DN}
           – N represents the number of training samples
           – Dl={V1[l],…,Vn[l]} is the l-th training sample
     • Let θ be the learning model: C = { P(h | pa(h)) }
     •  θ = arg max P(θ | D) = arg max P( D | θ ) P ((θ )) = arg max P( D | θ )
         ˆ                                        P
                                                      D
                 θ                  θ                             θ

             = arg max
                   θ
                         ∏ P( D | θ )
                         l =1~ N
                                   l

     • A log-likelihood function LD (θ ) = log( P( D | θ )) is
       formulated based on the independence
       assumption of training samples
                                 N                            
                   LD (θ ) = log ∏ P(V1[l ],...,Vn [l ] | θ )
                                  l =1                        
                    = ∑i =1 ∑l =1 log P(Vi [l ] | pai (Vi (l )),θ )
                         n     N
Wang, Yuan-Kai    Electronic Engineering Department, Fu Jen University                                                    36




                 MLE v.s. EM
     • If D is complete, we can apply the MLE
       (Maximum Likelihood Estimation) to
       find θ
     • However D is incomplete because of
       occlusion and partial observability
     • Let D=Y∪U                                                                       h2d,2



                                                             h2d,7   h2d,5   h2d,3     h2d,1      h2d,4   h2d,6   h2d,8


        – Y is observed data
        – U is the missing data                                                h2d,9   h2d,8   h2d,10




                                                                              h2d,11           h2d,12




                                                                              h2d,13           h2d,14
Wang, Yuan-Kai                   Electronic Engineering Department, Fu Jen University   37




                                       The EM
     • Expectation Step
           – Computes the expectation of
             the log likelihood function
           Q(θ | θ (t ) ) = Eθ ( t ) = [log P( D | θ ) | θ (t ) , Y ]
     • Maximization Step
       – Updates the t+1 step parameter θ(t+1) from
         current parameter θ(t)
                 θ   ( t +1)
                               = arg max Q(θ | θ )                           (t )
                                               θ
     • Stop condition of the E-M steps iteration
           – LD (θ (t +1) ) − LD (θ (t ) ) converges
Wang, Yuan-Kai                     Electronic Engineering Department, Fu Jen University                 38



                      5. Pose Estimation by
                     Approximate Inference
     • Let the observed data be O'=O-U
           – U is the set of hidden variables that are
             unobservable due to occlusion
     • The best estimated pose is a vector H*,
       which is defined as the pose with the
       maximum probability given O'.
                 H * = arg max P ( H | O' ) = arg max                         ∫ P( H , u | O' )du
                                                                            u∈U           n

                 = arg max    ∫ P( H , O' , u )du = arg max                         ∫ ∏ P(V | pa(V ))
                                                                                  u∈U i =1
                                                                                              i     i
                             u∈U
                                     P(V) V= H ∪ O' ∪ U
Wang, Yuan-Kai              Electronic Engineering Department, Fu Jen University   39




          Inference of Posterior Probability
     • How to calculate the posterior
       probability?
          H * = arg max ∫ ∏ P(Vi | pa (Vi ))du
                                     u∈U i =1...n

           – Exact inference
                 • Junction tree, Message passing
           – Approximate inference
                 • Loopy belief propagation , Variational method
                 • Markov chain Monte Carlo (MCMC) sampling
                    – Metropolis-Hasting
                    – Gibbs sampling
Wang, Yuan-Kai                          Electronic Engineering Department, Fu Jen University   40




             Approximate Inference (1/2)
       • MCMC algorithm uses sampling theorem
         • To approximate posterior distributions
           P(V) by random number generation
       • The key idea of MCMC is to simulate the
         sampling process as a Markov chain
       • Definition
            • A sample vector v of V
            • A proposal distribution q(v*|v(t-1)) to generate v*
            • An acceptance distribution α to accept v* as v(t)
                                                 p(v*)q(v (t −1) | v*) 
                    α (v   ( t −1)
                                                 p(v (t −1) )q (v* | v (t −1) ) 
                                     , v*) = min1,                              
                                                                                
Wang, Yuan-Kai              Electronic Engineering Department, Fu Jen University   41




             Approximate Inference (2/2)
     • MCMC will generate a Markov chain
       (v(0), v(1), ..., v(k), ...), as the transition
       probabilities from v(t-1) to v(t)
           – Depends only on v(t-1)
           – But not (v(0), v(1), ..., v(t-2))
     • The chain approaches its stationary
       distribution
           – Samples from the vector (v(k+1), ..., v(k+n)) are
             samples from P(V)
     • However, if V is in high dimensions,
       MCMC is not easy to converge
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University   42




         Annealed Gibbs Sampling (1/4)
     • Gibbs sampling method
           – Formally proposed by Geman&Geman in
             1984 for Markov Random Field (MRF)
           – Here the sampler is revised for the
             proposed two-stage Bayesian network
           – The basic idea
              • Sampling uni-variate conditional
                distributions
              • That is, Markov chain of (v(0), v(1), ..., v(k),
                ...) is achieved by only changing one variable
                of v
Wang, Yuan-Kai                     Electronic Engineering Department, Fu Jen University   43




         Annealed Gibbs Sampling (2/4)
     • We draw from the distribution
           v (jt ) ~ P (V j | v1(t ) ,, v (jt−)1 , v (jt+)1 ,, vnt ) )
                                                                  (


     • The Annealed Gibbs (AG) sampler
           – The uni-variate conditional distributions
             sampling is controlled by a stochastic
             process of simulated cooling
                                     p (v * | v−ij) ) if v− j = v−tj)
                                                (          *      (
                q (v* | v ( t ) ) =       j

                                     0                otherwise
                                                    1
                                                                             
                                p (v*)          T ( t ) q (v ( t ) | v*) 
                  α AG   = min1,                             j
                                                                             
                                   p (v ( t ) )          q (v* | v (jt ) ) 
                                       j       
                                                                            
Wang, Yuan-Kai          Electronic Engineering Department, Fu Jen University   44




         Annealed Gibbs Sampling (3/4)
     • Function T(t) is called cooling
                                t
                             Tf n
       schedule T (t ) = T0 ( )
                             T0
     • The particular value of T at any point in
       the chain is called the temperature
           – T0 is start temperature
           – Tf is the final cool down temperatures over
             n step
     • As the process proceeds, we decrease
       the probability of such down-hill
       moves
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University   45




         Annealed Gibbs Sampling (4/4)
     • The AG sampler adopts a stochastic iterative
       algorithm that converges to the set of points
       which are the global maxima of the given
       function
     • The advantage of the AG sampler is
           – Its efficiency compared to the Gibbs sampler is
             better
     • Because Instead of approximating P(V)
           – We want to find the global maximum, i.e., the ML
             estimate of posterior distribution.
           – We run a Markov chain of invariant distribution
             P(V) and estimate only the global mode
Wang, Yuan-Kai        Electronic Engineering Department, Fu Jen University   46




                 6. Feature Extraction
   • Human silhouette sampling

   • Normalized width                       Width




                                                    Length




   • Normalized center

   • Spatial distribution of skin color

   • Corners of silhouette
Wang, Yuan-Kai     Electronic Engineering Department, Fu Jen University   47




        Human Silhouette Sampling (S)
   • Human segmentation
   • Human silhouette capturing [Suzuki, 1985]
   • Uniform sampling is used in human
     silhouette sampling.
Wang, Yuan-Kai                   Electronic Engineering Department, Fu Jen University                                                                                            48




                 Normalized Width (wN )
                                                                                                                     Normalization

   • Human segmentation                                                                                                 width




   • Binary image profile
   • Width adjust
         wN = x R − x L
                                                                                                                                          Profile of X coordinate

                                                                                                           450

                                                                                                           400




                  hx ≥ threshold
                                                                                                           350




                                                                                pixel accumulation value
                                                                                                           300




          xL = x                            for x = 1 → w
                                                                                                           250




                 hx −1 < threshold
                                                                                                           200

                                                                                                           150

                                                                                                           100

                                                                                                           50




                  hx ≥ threshold
                                                                                                            0
                                                                                                                 0    100         200            300         400    500    600
                                                                                                                                          x coordinate of image




          xR = x                            for x = w → 1
                                                                                                                         Width




                 hx +1 < threshold
                                                                                                                                 Length




                                                                                                                                                                          48
Wang, Yuan-Kai                    Electronic Engineering Department, Fu Jen University   49




                 Normalized Center (Nc)
   • Boundary adjustment
   • Center of new boundary
         x N = x p + 0.5wN

         y N = y p + 0.5 L

                 Width




                         Length
Wang, Yuan-Kai          Electronic Engineering Department, Fu Jen University     50



                 Spatial Distribution of
                     Skin Color (A)
                    Skin color                                      Morphology
                    detection by
                    GMM




                    Region                         Spatial distribution of
                    segment                        skin color
Wang, Yuan-Kai                    Electronic Engineering Department, Fu Jen University   51




                 Corners of Silhouette (C)
     • Human segmentation
     • Human silhouette capturing
     • The level curve curvature approach
         [Lindeberg, 1998]   ~
                             I ( x, y ) = arg max Dx D yy + D y Dxx − 2 Dx D y Dxy
                                                   2          2



     • Adaptive corner choice
Wang, Yuan-Kai         Electronic Engineering Department, Fu Jen University   52




                 7. Experimental Results
    • Experimental environment
         – CPU:1.86G, RAM:1G, VC6.0
         – HumanEva database I
Wang, Yuan-Kai                 Electronic Engineering Department, Fu Jen University   53




                     HumanEva Database I
   • Provider:
        – Department of Computer Science in Brown Univ.
   • Actions of HumanEva I
                 Action   Description
                 Walking  Subjects walked in an elliptical around
                          the capture space.
                 Jog      Subjects jogged in an elliptical around
                          the capture space.
                 Gesture  Subjects performed “hello”
                          and ”good-bye” gestures in repetition.
                 Throw/Ca Subjects tossed and caught a baseball
                 tch      with the help of the lab assistant.
                 Box      Subjects imitated boxing.
                 Combo    Subjects performed combinational
                          actions of walking and jogging.
Wang, Yuan-Kai                                    Electronic Engineering Department, Fu Jen University   54



                    Environment Setting
     BW1                                                BW2

                                                                  • 7 cameras
                                                                        – 3 color cameras
                           3m

                                                                          ( C1, C2, C3 )
    C2           Capture Space
                                        2m
                                                          C3            – 4 gray level cameras
                                                                          ( BW1, BW2, BW3, BW4 )


      BW4                                               BW3
                      C1




                                Control Station
Wang, Yuan-Kai           Electronic Engineering Department, Fu Jen University   55



                 The Experimental Data
   • Our proposed method has been trained by 1900
     images from walking sequences of subjects 1 and 2
     from C1
   • 200 testing images:
        • 100 images from subject 1
        • 100 images from subject 2
   • Difficulties:
        – Self-occluding
        – Clothe variation
        – Large variation of
          joint location
Wang, Yuan-Kai              Electronic Engineering Department, Fu Jen University   56




                 Evaluation of Accuracy
       • Average distance error of poses
         between estimated results and ground

           • Let H = {h1, h2, ...hM}, where hm ∈ R3 (or xm ∈
         truth

               R2 for the 2D body model), be the position
               vector of the body pose in the world (or
               image respectively)
           • D(H, H*): the error in estimated pose H* to
               the ground truth pose H
                         M h −h         1 N T
                                  *

           D( H , H *) = ∑
                         m =1
                              m

                                M
                                  m
                                    ξ=    ∑∑
                                       NT n=1 t =1
                                                   D( H t ,n , H t*,n )
Wang, Yuan-Kai        Electronic Engineering Department, Fu Jen University   57




        Performance Comparison Between
        Two-stage and One-stage methods




     • AG sampler performs better than the Gibbs sampler,
     • Two-stage approach performs better than classical
       one-stage approach
     • AG sampler takes less inference time
Wang, Yuan-Kai          Electronic Engineering Department, Fu Jen University   58




                 Effect of Iteration Number
                         on Accuracy
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University         59




                    2D Results of Subject 1
                 Frame:                   GT
                                          AGs
                                                     Frame:                      GT
                                                                                 AGs
                 1122                                1149




                                                                                 GT
                 Frame:                  GT
                                         AGs
                                                    Frame:                       AGs
                 1172                               1200
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University         60




                    2D Results of Subject 2
                                        GT                                       GT
                 Frame:                 AGs
                                                    Frame:                       AGs
                 804                                835




                 Frame:                 GT
                                        AGs
                                                    Frame:                       GT
                                                                                 AGs
                 875                                899
Wang, Yuan-Kai                      Electronic Engineering Department, Fu Jen University                61




                                     3D Results
   • The 1110 frame of subject 1
                     Ground truth                                        AGs estimation result
        150                                              150


        100                                              100


         50                                                50


           0                                                 0


        -50                                               -50
          100    0                          -100            100
                      100
                     -100       0                                        0       100
                                                                                -100       0     -100
Wang, Yuan-Kai                      Electronic Engineering Department, Fu Jen University                   62




                        3D Results (Cont.)

     • The 1135 frame of subject 1
                     Ground truth                                         AGs estimation result


         150                                               150

         100                                               100

          50                                                 50


            0                                                 0


         -50                                      100      -50                                       100
         100                              0                100
                                                                            0                    0
                 0
                          -100 -100                                                  -100 -100
Wang, Yuan-Kai                      Electronic Engineering Department, Fu Jen University                63




                        3D Results (Cont.)
         • The 845 frame of subject 2
                     Ground truth                                         AGs estimation result


         150                                               150

         100                                               100

           50                                                50

            0                                                  0

         -50                                               -50
         100                                               100
                                                  100                                             100
                 0                   0                                   0                  0
                     -100 -100                                                 -100 -100
Wang, Yuan-Kai                   Electronic Engineering Department, Fu Jen University                   64




                       3D Results (Cont.)
       • The 872 frame of subject 2
                     Ground truth                                         AGs estimation result


          150                                            150

          100                                            100

           50                                              50

             0                                               0

          -50                                             -50
          100                                            100
                                                100                                               100
                 0                                                    0
                                    0                                                      0
                     -100 -100                                             -100 -100
Wang, Yuan-Kai            Electronic Engineering Department, Fu Jen University   65




                      8. Conclusions
       • A markerless and monocular motion
         capture problem is considered
       • The proposed two-stage annealed Gibbs
         sampling method can estimate more
         accurate poses with less computation time
       • The method can overcome three challenges
         of the problem
            – Self-occlusion
            – High-degree variation of joint locations
            – Clothing limitation
Wang, Yuan-Kai     Electronic Engineering Department, Fu Jen University   66




                  Future Work
     • Use GMM to approximate prior and
       posterior distribution of our human models
     • Combine model-free method and model-
       based methods to obtain benefits of both
     • Exploit HMM to inference human motions
       in time series
     • Add human parts detectors to help locate
       human joints
Wang, Yuan-Kai   Electronic Engineering Department, Fu Jen University   67
Wang, Yuan-Kai




                 本簡報授權聲明
      • 此簡報內容採用 Creative Commons 「姓名標示 - 非商
        業性台灣 3.0 版」授權條款
      • 歡迎非商業目的的重製、散布或修改本簡報的內容,但
        請標明: (1)原作者姓名:王元凱; (2)圖標示:
      • 簡報中所取用的部份圖形創作乃截取自網際網路,僅供
        演講者於自由軟體推廣演講時主張合理使用,請讀者不
        得對其再行取用,除非您本身自忖亦符合主張合理使用
        之情狀,且自負相關法律責任。

Weitere ähnliche Inhalte

Andere mochten auch

Docking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDocking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDavid Thompson
 
Pose Machine
Pose MachinePose Machine
Pose MachineWei Yang
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksWei Yang
 
Articulated human pose estimation by deep learning
Articulated human pose estimation by deep learningArticulated human pose estimation by deep learning
Articulated human pose estimation by deep learningWei Yang
 
All pose face alignment robust to occlusion
All pose face alignment robust to occlusionAll pose face alignment robust to occlusion
All pose face alignment robust to occlusionJongju Shin
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageWei Yang
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimationWei Yang
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and trackingJavier_Barbadillo
 
Efficient Running with Pose Method
Efficient Running with Pose MethodEfficient Running with Pose Method
Efficient Running with Pose Methodsuzyhgoodwin
 
Manifold learning
Manifold learningManifold learning
Manifold learningWei Yang
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Jia-Bin Huang
 

Andere mochten auch (13)

Docking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDocking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD up
 
Pose Machine
Pose MachinePose Machine
Pose Machine
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural Networks
 
Articulated human pose estimation by deep learning
Articulated human pose estimation by deep learningArticulated human pose estimation by deep learning
Articulated human pose estimation by deep learning
 
All pose face alignment robust to occlusion
All pose face alignment robust to occlusionAll pose face alignment robust to occlusion
All pose face alignment robust to occlusion
 
Towards the Extended Pose
Towards the Extended PoseTowards the Extended Pose
Towards the Extended Pose
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single image
 
[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation[Mmlab seminar 2016] deep learning for human pose estimation
[Mmlab seminar 2016] deep learning for human pose estimation
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and tracking
 
Efficient Running with Pose Method
Efficient Running with Pose MethodEfficient Running with Pose Method
Efficient Running with Pose Method
 
Manifold learning
Manifold learningManifold learning
Manifold learning
 
Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)Estimating Human Pose from Occluded Images (ACCV 2009)
Estimating Human Pose from Occluded Images (ACCV 2009)
 
Pose
PosePose
Pose
 

Ähnlich wie Monocular Human Pose Estimation with Bayesian Networks (6)

05 probabilistic graphical models
05 probabilistic graphical models05 probabilistic graphical models
05 probabilistic graphical models
 
03 Uncertainty inference(discrete)
03 Uncertainty inference(discrete)03 Uncertainty inference(discrete)
03 Uncertainty inference(discrete)
 
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)
SIGGRAPH 2014 Course on Computational Cameras and Displays (part 2)
 
08 probabilistic inference over time
08 probabilistic inference over time08 probabilistic inference over time
08 probabilistic inference over time
 
Towards Embedded Computer Vision邁向嵌入式電腦視覺
Towards Embedded Computer Vision邁向嵌入式電腦視覺Towards Embedded Computer Vision邁向嵌入式電腦視覺
Towards Embedded Computer Vision邁向嵌入式電腦視覺
 
When Remote Sensing Meets Artificial Intelligence
When Remote Sensing Meets Artificial IntelligenceWhen Remote Sensing Meets Artificial Intelligence
When Remote Sensing Meets Artificial Intelligence
 

Mehr von IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing

Mehr von IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing (13)

Computer Vision in the Age of IoT
Computer Vision in the Age of IoTComputer Vision in the Age of IoT
Computer Vision in the Age of IoT
 
2014/07/17 Parallelize computer vision by GPGPU computing
2014/07/17 Parallelize computer vision by GPGPU computing2014/07/17 Parallelize computer vision by GPGPU computing
2014/07/17 Parallelize computer vision by GPGPU computing
 
Towards Embedded Computer Vision - New @ 2013
Towards Embedded Computer Vision - New @ 2013Towards Embedded Computer Vision - New @ 2013
Towards Embedded Computer Vision - New @ 2013
 
老師與教學助理的互動經驗分享 1010217
老師與教學助理的互動經驗分享 1010217老師與教學助理的互動經驗分享 1010217
老師與教學助理的互動經驗分享 1010217
 
Parallel Vision by GPGPU/CUDA
Parallel Vision by GPGPU/CUDAParallel Vision by GPGPU/CUDA
Parallel Vision by GPGPU/CUDA
 
Markov Random Field (MRF)
Markov Random Field (MRF)Markov Random Field (MRF)
Markov Random Field (MRF)
 
07 approximate inference in bn
07 approximate inference in bn07 approximate inference in bn
07 approximate inference in bn
 
06 exact inference in bn
06 exact inference in bn06 exact inference in bn
06 exact inference in bn
 
04 Uncertainty inference(continuous)
04 Uncertainty inference(continuous)04 Uncertainty inference(continuous)
04 Uncertainty inference(continuous)
 
01 Probability review
01 Probability review01 Probability review
01 Probability review
 
02 Statistics review
02 Statistics review02 Statistics review
02 Statistics review
 
Intelligent Video Surveillance with Cloud Computing
Intelligent Video Surveillance with Cloud ComputingIntelligent Video Surveillance with Cloud Computing
Intelligent Video Surveillance with Cloud Computing
 
Intelligent Video Surveillance and Sousveillance
Intelligent Video Surveillance and SousveillanceIntelligent Video Surveillance and Sousveillance
Intelligent Video Surveillance and Sousveillance
 

Monocular Human Pose Estimation with Bayesian Networks

  • 1. 本著作採用創用CC 「姓名標示」授權條款台灣3.0版 Monocular Human Pose Estimation with Bayesian Networks Yuan-Kai Wang Electronic Engineering Department, Fu Jen University 2010/6/11
  • 2. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 2 Outline 1. Introduction 2. Markless Monocular Human Pose Estimation 3. Overview of the Approach 4. Model Learning by EM algorithm 5. Pose Estimation by Approximate Inference 6. Feature Extraction 7. Experimental Results 8. Conclusions
  • 3. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 3 1. Introduction • Applications of Human Motion Capture – Performance animation in movie making – Game – Medical diagnosis – Sport & Health – Visual surveillance
  • 4. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 4 Performance Animation • Avatar • The Lord of the Rings
  • 5. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 5 Game • Microsoft's Project Natal for XBOX360
  • 6. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 6 Medical Diagnosis • Gait analysis for Rehabilitation
  • 7. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 7 Sport & Health • Golf training
  • 8. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 8 Visual Surveillance • Behavior analysis for event detection – Irregular movement, body language, and unusual interactions, fighting – Car crash • Content-based retrieval
  • 9. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 9 Sensor Approaches • Active sensors – Types • Electro-magnetic marker • Optical • Accelerometer – Wired connection – Drawbacks Too • Intrusive Many • Expensive Wires • Time consuming • Passive sensors by camera – Marker-based – Markerless
  • 10. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 10 Marker-based Sensors • Add visual markers on body – Active marker • Visual/non-visual light – Passive marker • Need computer vision algorithms Active • Advantages marker – No wires • Drawbacks – Semi-intrusive Passive – Time consuming marker
  • 11. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 11 Markerless Sensors • No attachment on human body • Heavily dependent on Pure vision computer vision analyzer solution – Stereo/Multiple cameras – Monocular cameras
  • 12. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 12 Sensor v.s. Analyzer T. B. Moeslund, "Computer vision-based human motion capture – a survey", Technical report LIA 99-02, University of AALBORG, 1999.
  • 13. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 13 Pose Estimation v.s. Gesture Recognition Pose Estimation Gesture Recognition Walking
  • 14. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 14 2D v.s. 3D
  • 15. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 15 2. Markerless Monocular Human Motion Capture • Goal – Markless – Single camera – 3D poses • Challenges – Ill-posed – Highly articulated Depth ambiguities & occlusion using – Self-occluding monocular silhouettes
  • 16. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 16 Joint Representation • Articulated human body is linked by joints
  • 17. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 17 Abstract Representation 2D 3D Stick Surface/ Volume
  • 18. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 18 Literature Review Low-Level High-Level Observation Abstraction • Background subtraction P=f(S) • Object detection P=f(F) P=f(J) Marker-based Image Human Image 2D Joint 3D Model Space Segmentation Feature Location Parametric Space (Pixel domain) (S) Descriptor (F) (J) (Pose domain, P) • Full body • Shape •Joint angle X • Body • Silhouette parts • Color Θi Left Right • Appearance shoulder Neck shoulder Left Right • Motion •Joint elbow Left Left Bottom Right waist waist elbow Right • Feature location hand hand y Left Right point Pi knee Left knee Right foot (corner) Z foot • ... A two-stage approach is proposed P=f1(f2(F))
  • 19. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 19 Approaches • Model-free [Agarwal, 2006] [Loy, 2004] – No utilization of joints articulation to constrain the search of function mapping P = f(X) • Model-based [Rbert, 2006] [Rohr, 1994] – A model of human articulation to constrain the search of f and P – Two kinds of approach • Discriminative • Generative: Bayesian networks (BNs) Training : f = arg max L1 (Training, f ) ˆ f Inference : P = arg max L2 ( f | X , P) ˆ ˆ P
  • 20. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 20 An Articulated Model = A Bayesian Network • Human body is represented as a kinematics tree, consisting of divisions linking by joints • Kinematics models are addressed with X graphical probability network Left Right shoulder shoulder Neck • Graphical probability models are Left elbow Left hand Left Bottom Right waist waist Right elbow Right hand y computed via Bayesian network Left knee Left foot Right knee Right foot Z
  • 21. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 21 Three Steps to Utilize BNs • Representation, learning and inference X1 Joints f = arg max L1 (Training, f ) ˆ Representation f Feature-Joint correspondence X2 X3 X4 by Conditional Probability Features Learning X1 P(X1|X2,X3,X4) Inference Pose Estimation P = arg max L ( f | X , P) ˆ ˆ 2 X2 X3 X4 P
  • 22. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 22 Two Causal Models in BNs • Undirected acyclic graph [Lan, 2008] [Hua, 2005] – Bayesian network is a tree or a graph model that the linking edge between two nodes has no direction. P(X1,X2) X1 X2 • Directed acyclic graph [Ramanan, 2007] [Lee, 2006] [Leonid, 2003] – Every node has directed arcs linked to another node. P(X1|X2) X1 X2
  • 23. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 23 Directed Bayesian Articulated Model • Nodes in directed acyclic graph (DAG) are not influenced by their child nodes. • Human body parts are not regarded as two- way h2d,2 h2d,7 h2d,5 h2d,3 h2d,1 h2d,4 h2d,6 h2d,8 h2d,10 h2d,9 h2d,11 h2d,12 h2d,13 h2d,14 h2d,15
  • 24. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 24 Inference of Bayesian Networks • Top-down approach [Gavrila, 1996] – Has the strength at finding human body parts in the image. • Bottom-up approach [Ren, 2005] – Has the strength at finding people in the image. • Combined approach [Navaraman, 2005][Lee, 2002] – Has the benefit from the advantages of both.
  • 25. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 25 3. Overview of the Approach 2D X 3D Head Left Right shoulder shoulder Left Right Neck shoulder Right shoulder Left Neck elbow elbow Left Right Bottom elbow elbow Left Left Bottom Right Right waist waist hand Left Right Left Right hand hand hand y waist waist Left Right Left Right knee knee knee knee Left Right foot foot Left Right foot foot Z They are belief propagation networks using an annealing Gibbs sampling algorithm.
  • 26. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 26 System Architecture • We estimate the 2D human joint positions before 3D estimation. Testing image 2D Model Training 2D Bayesian Feature Human Model Extraction Setting 3D Model Training 2D Bayesian Training 3D Bayesian Inference with Features EM Training Human Model Annealed Gibbs Setting Sampling 3D Bayesian Inference with Training EM Training Features Annealed Gibbs Sampling Result
  • 27. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 27 2D Human Graphical Model • The articulated structure of 2D human body is represented by a 15-node graphical model. Head H 2 D = {h2 d ,1 ,..., h2 d ,15} h2d,2 Left Right shoulder shoulder Neck h2d,7 h2d,5 h2d,3 h2d,1 h2d,4 h2d,6 h2d,8 Left Right elbow elbow Bottom Left Right hand h2d,10 h2d,9 h2d,11 Left Right hand waist waist Left Right knee knee h2d,12 h2d,13 Left Right foot foot h2d,14 h2d,15 2D stick figure (articulated model)
  • 28. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 28 3D Human Graphical Model • 3D human body model is described by a 45D vector H3D representing joint positions for dimensions of each joint node in the 3D space X H 3 D = {h3d ,1 ,..., h3d ,15} h3d,15 Left Right shoulder shoulder Neck h3d,1 h3d,2 h3d,3 Left Right elbow elbow Left Bottom Right h3d,4 h3d,5 waist waist Right Left hand hand y h3d,6 h3d,8 h3d,7 h3d,9 h3d,10 Left Right knee knee Left h3d,11 h3d,12 Right foot foot Z h3d,13 h3d,14 3D stick figure (articulated model)
  • 29. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 29 The BN Model • A directed acyclic graph   h2d,2 G = (V , E , C ) h2d,7 h2d,5 h2d,3 h2d,1 h2d,4 h2d,6 h2d,8 – V: vertex set {Vi, 1≤i≤N}  h2d,10 h2d,9 h2d,11 – E : a set of directed edges (i,j) h2d,12 h2d,13 – C: (i,j) → R+, edge cost functions h2d,14 h2d,15 • To encode probabilistic information – An edge indicates a probabilistic dependence – C : P(Vi | Vj): conditional probability function set • The 2D and 3D BNs     G2 D = (V2 D , E2 D , C2 D ) G3 D = (V3 D , E3 D , C3 D )
  • 30. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 30 2D Graphical Model V2 D = {H 2 D , O2 D } h2d,2 h2d,7 h2d,5 h2d,3 h2d,1 h2d,4 h2d,6 h2d,8 O2d : Nc S A C h2d,9 h2d,8 h2d,10 C2 D = {P(h2 d ,i | pa (h2 d ,i ))} h2d,11 h2d,12 h2d,13 h2d,14
  • 31. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 31 3D Graphical Model h2d,3 h2d,1 h2d,9 h2d,4 V3 D = {H 3 D , O3 D } h2d hu3d,2 hu3d,1 hu3d,3 O3d : h2d,5 hu3d,4 hu3d,5 h2d,6 Upper wN body h2d,7 hu3d,6 hu3d,7 h2d,8 L h2d,10 h2d,9 h2d,11 C3 D = {P(h3d ,i | pa (h3d ,i ))} hl3d,2 hl3d,1 hl3d,3 Lower h2d,12 hl3d,4 hl3d,5 h2d,13 body h2d,14 hl3d,6 hl3d,7 h2d,15
  • 32. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 32 Joint Probability Distribution (JPD) • The two proposed graphical models specify two unique JPDs: P2D(V2D) and P3D(V3D) • Let P(V) represent the two JPDs n h2d,2 P(V ) = ∏ P(Vi | pa (Vi )) h2d,7 h2d,5 h2d,3 h2d,1 h2d,4 h2d,6 h2d,8 i =1 • The factorization of the JPD comes h2d,9 h2d,8 h2d,10 from the Markov Blanket, a local h2d,11 h2d,12 Markov property • If we can learn the finite conditional h2d,13 h2d,14 probabilities, we can inference the human pose
  • 33. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 33 Two Problems • Training problem – Given a training set : {O2d, O3d} – How can we learn the edge cost function C = { P(h | pa(h)) } h2d,2 – We apply the EM algorithm h2d,7 h2d,5 h2d,3 h2d,1 h2d,4 h2d,6 h2d,8 • Inference problem – Given an evidence O h2d,9 h2d,8 h2d,10 – How can we inference h2d,11 h2d,12 the human pose h2d,13 h2d,14 P(H | O) by P(V) – We propose an annealed Gibbs sampling algorithm
  • 34. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 34 4. Model Learning by EM • Why apply the EM algorithm for model learning – The human poses and observations are incomplete and sparse • Incomplete: occlusion due to single camera • Sparse: small training samples in large- dimension space
  • 35. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 35 The Likelihood Function • The training set D={D1,…DN} – N represents the number of training samples – Dl={V1[l],…,Vn[l]} is the l-th training sample • Let θ be the learning model: C = { P(h | pa(h)) } • θ = arg max P(θ | D) = arg max P( D | θ ) P ((θ )) = arg max P( D | θ ) ˆ P D θ θ θ = arg max θ ∏ P( D | θ ) l =1~ N l • A log-likelihood function LD (θ ) = log( P( D | θ )) is formulated based on the independence assumption of training samples N  LD (θ ) = log ∏ P(V1[l ],...,Vn [l ] | θ )  l =1  = ∑i =1 ∑l =1 log P(Vi [l ] | pai (Vi (l )),θ ) n N
  • 36. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 36 MLE v.s. EM • If D is complete, we can apply the MLE (Maximum Likelihood Estimation) to find θ • However D is incomplete because of occlusion and partial observability • Let D=Y∪U h2d,2 h2d,7 h2d,5 h2d,3 h2d,1 h2d,4 h2d,6 h2d,8 – Y is observed data – U is the missing data h2d,9 h2d,8 h2d,10 h2d,11 h2d,12 h2d,13 h2d,14
  • 37. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 37 The EM • Expectation Step – Computes the expectation of the log likelihood function Q(θ | θ (t ) ) = Eθ ( t ) = [log P( D | θ ) | θ (t ) , Y ] • Maximization Step – Updates the t+1 step parameter θ(t+1) from current parameter θ(t) θ ( t +1) = arg max Q(θ | θ ) (t ) θ • Stop condition of the E-M steps iteration – LD (θ (t +1) ) − LD (θ (t ) ) converges
  • 38. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 38 5. Pose Estimation by Approximate Inference • Let the observed data be O'=O-U – U is the set of hidden variables that are unobservable due to occlusion • The best estimated pose is a vector H*, which is defined as the pose with the maximum probability given O'. H * = arg max P ( H | O' ) = arg max ∫ P( H , u | O' )du u∈U n = arg max ∫ P( H , O' , u )du = arg max ∫ ∏ P(V | pa(V )) u∈U i =1 i i u∈U P(V) V= H ∪ O' ∪ U
  • 39. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 39 Inference of Posterior Probability • How to calculate the posterior probability? H * = arg max ∫ ∏ P(Vi | pa (Vi ))du u∈U i =1...n – Exact inference • Junction tree, Message passing – Approximate inference • Loopy belief propagation , Variational method • Markov chain Monte Carlo (MCMC) sampling – Metropolis-Hasting – Gibbs sampling
  • 40. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 40 Approximate Inference (1/2) • MCMC algorithm uses sampling theorem • To approximate posterior distributions P(V) by random number generation • The key idea of MCMC is to simulate the sampling process as a Markov chain • Definition • A sample vector v of V • A proposal distribution q(v*|v(t-1)) to generate v* • An acceptance distribution α to accept v* as v(t)  p(v*)q(v (t −1) | v*)  α (v ( t −1)  p(v (t −1) )q (v* | v (t −1) )  , v*) = min1,   
  • 41. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 41 Approximate Inference (2/2) • MCMC will generate a Markov chain (v(0), v(1), ..., v(k), ...), as the transition probabilities from v(t-1) to v(t) – Depends only on v(t-1) – But not (v(0), v(1), ..., v(t-2)) • The chain approaches its stationary distribution – Samples from the vector (v(k+1), ..., v(k+n)) are samples from P(V) • However, if V is in high dimensions, MCMC is not easy to converge
  • 42. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 42 Annealed Gibbs Sampling (1/4) • Gibbs sampling method – Formally proposed by Geman&Geman in 1984 for Markov Random Field (MRF) – Here the sampler is revised for the proposed two-stage Bayesian network – The basic idea • Sampling uni-variate conditional distributions • That is, Markov chain of (v(0), v(1), ..., v(k), ...) is achieved by only changing one variable of v
  • 43. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 43 Annealed Gibbs Sampling (2/4) • We draw from the distribution v (jt ) ~ P (V j | v1(t ) ,, v (jt−)1 , v (jt+)1 ,, vnt ) ) ( • The Annealed Gibbs (AG) sampler – The uni-variate conditional distributions sampling is controlled by a stochastic process of simulated cooling  p (v * | v−ij) ) if v− j = v−tj) ( * ( q (v* | v ( t ) ) =  j  0 otherwise  1    p (v*)  T ( t ) q (v ( t ) | v*)  α AG = min1,   j   p (v ( t ) )  q (v* | v (jt ) )    j   
  • 44. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 44 Annealed Gibbs Sampling (3/4) • Function T(t) is called cooling t Tf n schedule T (t ) = T0 ( ) T0 • The particular value of T at any point in the chain is called the temperature – T0 is start temperature – Tf is the final cool down temperatures over n step • As the process proceeds, we decrease the probability of such down-hill moves
  • 45. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 45 Annealed Gibbs Sampling (4/4) • The AG sampler adopts a stochastic iterative algorithm that converges to the set of points which are the global maxima of the given function • The advantage of the AG sampler is – Its efficiency compared to the Gibbs sampler is better • Because Instead of approximating P(V) – We want to find the global maximum, i.e., the ML estimate of posterior distribution. – We run a Markov chain of invariant distribution P(V) and estimate only the global mode
  • 46. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 46 6. Feature Extraction • Human silhouette sampling • Normalized width Width Length • Normalized center • Spatial distribution of skin color • Corners of silhouette
  • 47. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 47 Human Silhouette Sampling (S) • Human segmentation • Human silhouette capturing [Suzuki, 1985] • Uniform sampling is used in human silhouette sampling.
  • 48. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 48 Normalized Width (wN ) Normalization • Human segmentation width • Binary image profile • Width adjust wN = x R − x L Profile of X coordinate 450 400  hx ≥ threshold 350 pixel accumulation value 300 xL = x  for x = 1 → w 250 hx −1 < threshold 200 150 100 50  hx ≥ threshold 0 0 100 200 300 400 500 600 x coordinate of image xR = x  for x = w → 1 Width hx +1 < threshold Length 48
  • 49. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 49 Normalized Center (Nc) • Boundary adjustment • Center of new boundary x N = x p + 0.5wN y N = y p + 0.5 L Width Length
  • 50. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 50 Spatial Distribution of Skin Color (A) Skin color Morphology detection by GMM Region Spatial distribution of segment skin color
  • 51. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 51 Corners of Silhouette (C) • Human segmentation • Human silhouette capturing • The level curve curvature approach [Lindeberg, 1998] ~ I ( x, y ) = arg max Dx D yy + D y Dxx − 2 Dx D y Dxy 2 2 • Adaptive corner choice
  • 52. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 52 7. Experimental Results • Experimental environment – CPU:1.86G, RAM:1G, VC6.0 – HumanEva database I
  • 53. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 53 HumanEva Database I • Provider: – Department of Computer Science in Brown Univ. • Actions of HumanEva I Action Description Walking Subjects walked in an elliptical around the capture space. Jog Subjects jogged in an elliptical around the capture space. Gesture Subjects performed “hello” and ”good-bye” gestures in repetition. Throw/Ca Subjects tossed and caught a baseball tch with the help of the lab assistant. Box Subjects imitated boxing. Combo Subjects performed combinational actions of walking and jogging.
  • 54. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 54 Environment Setting BW1 BW2 • 7 cameras – 3 color cameras 3m ( C1, C2, C3 ) C2 Capture Space 2m C3 – 4 gray level cameras ( BW1, BW2, BW3, BW4 ) BW4 BW3 C1 Control Station
  • 55. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 55 The Experimental Data • Our proposed method has been trained by 1900 images from walking sequences of subjects 1 and 2 from C1 • 200 testing images: • 100 images from subject 1 • 100 images from subject 2 • Difficulties: – Self-occluding – Clothe variation – Large variation of joint location
  • 56. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 56 Evaluation of Accuracy • Average distance error of poses between estimated results and ground • Let H = {h1, h2, ...hM}, where hm ∈ R3 (or xm ∈ truth R2 for the 2D body model), be the position vector of the body pose in the world (or image respectively) • D(H, H*): the error in estimated pose H* to the ground truth pose H M h −h 1 N T * D( H , H *) = ∑ m =1 m M m ξ= ∑∑ NT n=1 t =1 D( H t ,n , H t*,n )
  • 57. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 57 Performance Comparison Between Two-stage and One-stage methods • AG sampler performs better than the Gibbs sampler, • Two-stage approach performs better than classical one-stage approach • AG sampler takes less inference time
  • 58. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 58 Effect of Iteration Number on Accuracy
  • 59. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 59 2D Results of Subject 1 Frame: GT AGs Frame: GT AGs 1122 1149 GT Frame: GT AGs Frame: AGs 1172 1200
  • 60. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 60 2D Results of Subject 2 GT GT Frame: AGs Frame: AGs 804 835 Frame: GT AGs Frame: GT AGs 875 899
  • 61. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 61 3D Results • The 1110 frame of subject 1 Ground truth AGs estimation result 150 150 100 100 50 50 0 0 -50 -50 100 0 -100 100 100 -100 0 0 100 -100 0 -100
  • 62. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 62 3D Results (Cont.) • The 1135 frame of subject 1 Ground truth AGs estimation result 150 150 100 100 50 50 0 0 -50 100 -50 100 100 0 100 0 0 0 -100 -100 -100 -100
  • 63. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 63 3D Results (Cont.) • The 845 frame of subject 2 Ground truth AGs estimation result 150 150 100 100 50 50 0 0 -50 -50 100 100 100 100 0 0 0 0 -100 -100 -100 -100
  • 64. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 64 3D Results (Cont.) • The 872 frame of subject 2 Ground truth AGs estimation result 150 150 100 100 50 50 0 0 -50 -50 100 100 100 100 0 0 0 0 -100 -100 -100 -100
  • 65. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 65 8. Conclusions • A markerless and monocular motion capture problem is considered • The proposed two-stage annealed Gibbs sampling method can estimate more accurate poses with less computation time • The method can overcome three challenges of the problem – Self-occlusion – High-degree variation of joint locations – Clothing limitation
  • 66. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 66 Future Work • Use GMM to approximate prior and posterior distribution of our human models • Combine model-free method and model- based methods to obtain benefits of both • Exploit HMM to inference human motions in time series • Add human parts detectors to help locate human joints
  • 67. Wang, Yuan-Kai Electronic Engineering Department, Fu Jen University 67
  • 68. Wang, Yuan-Kai 本簡報授權聲明 • 此簡報內容採用 Creative Commons 「姓名標示 - 非商 業性台灣 3.0 版」授權條款 • 歡迎非商業目的的重製、散布或修改本簡報的內容,但 請標明: (1)原作者姓名:王元凱; (2)圖標示: • 簡報中所取用的部份圖形創作乃截取自網際網路,僅供 演講者於自由軟體推廣演講時主張合理使用,請讀者不 得對其再行取用,除非您本身自忖亦符合主張合理使用 之情狀,且自負相關法律責任。