SlideShare ist ein Scribd-Unternehmen logo
1 von 54
Downloaden Sie, um offline zu lesen
Graphical Models                              Factor Graphs           Test-time Inference   Training




                       Part 2: Introduction to Graphical Models

                                 Sebastian Nowozin and Christoph H. Lampert



                                             Colorado Springs, 25th June 2011




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                   f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer              X                        Y
             presence and positions y of all objects                     f :X →Y
             shown




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference       Training

Graphical Models



Introduction
             Model: relating observations x to
             quantities of interest y
                                                                                            f
             Example 1: given RGB image x, infer
             depth y for each pixel
             Example 2: given RGB image x, infer                       X                        Y
             presence and positions y of all objects                              f :X →Y
             shown




                                             X : image, Y: object annotations
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference           Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise
             language to define this mapping                              x
             Mapping can be ambiguous:
                                                                                   f (x)
             measurement noise, lack of                       X                            Y
             well-posedness (e.g. occlusions)                            f :X →Y
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference       Training

Graphical Models



Introduction



             General case: mapping x ∈ X to y ∈ Y
             Graphical models are a concise                                        ?
             language to define this mapping                              x
             Mapping can be ambiguous:                                             ?
             measurement noise, lack of                       X                        Y
             well-posedness (e.g. occlusions)                            p(Y |X = x)
             Probabilistic graphical models: define
             form p(y |x) or p(x, y ) for all y ∈ Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference          Training

Graphical Models



Graphical Models

        A graphical model defines
                   a family of probability distributions over a set of random variables,
                   by means of a graph,
                   so that the random variables satisfy conditional independence
                   assumptions encoded in the graph.
     Popular classes of graphical models,
         Undirected graphical models (Markov
         random fields),
         Directed graphical models (Bayesian
         networks),
             Factor graphs,
             Others: chain graphs, influence
             diagrams, etc.

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs           Test-time Inference                Training

Graphical Models



Bayesian Networks

      Graph: G = (V , E), E ⊂ V × V                                                          Yi            Yj
              directed
              acyclic
      Variable domains Yi                                                                           Yk
      Factorization

                      p(Y = y ) =                  p(yi |ypaG (i) )
                                                                                                    Yl
                                             i∈V

      over distributions, by conditioning on parent                                         A simple Bayes net
      nodes.
      Example

      p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
                            p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
      Family of distributions
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                     Test-time Inference                  Training

Graphical Models



Undirected Graphical Models
                                                                                                Yi         Yj        Yk
             = Markov random field (MRF) = Markov
             network                                                                                  A simple MRF
             Graph: G = (V , E), E ⊂ V × V
                      undirected, no self-edges
             Variable domains Yi
             Factorization over potentials ψ at cliques,
                                              1
                                 p(y ) =                       ψC (yC )
                                              Z
                                                    C ∈C(G )


             Constant Z =                    y ∈Y     C ∈C(G )   ψC (yC )
             Example
                                    1
                     p(y ) =          ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                    Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                      Test-time Inference   Training

Graphical Models



Example 1


                                                  Yi               Yj              Yk




                   Cliques C(G ): set of vertex sets V with V ⊆ V ,
                   E ∩ (V × V ) = V × V
                   Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}}

                                                         1
                                             p(y ) =       ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
                                                         Z



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                       Test-time Inference   Training

Graphical Models



Example 2


                                                        Yi                 Yj




                                                        Yk                 Yl



                   Here C(G ) = 2V : all subsets of V are cliques

                                                             1
                                                   p(y ) =                      ψA (yA ).
                                                             Z
                                                                 A∈2{i,j,k,l}


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                  Test-time Inference                       Training

Factor Graphs



Factor Graphs


                Graph: G = (V , F, E), E ⊆ V × F                                              Yi                  Yj
                      variable nodes V ,
                      factor nodes F ,
                      edges E between variable and factor nodes.
                      scope of a factor,
                      N(F ) = {i ∈ V : (i, F ) ∈ E}
                                                                                              Yk                  Yl
                Variable domains Yi
                Factorization over potentials ψ at factors,                                        Factor graph
                                              1
                                 p(y ) =                   ψF (yN(F ) )
                                              Z
                                                    F ∈F

                Constant Z =                 y ∈Y     F ∈F    ψF (yN(F ) )


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs             Test-time Inference        Training

Factor Graphs



Why factor graphs?


                         Yi                  Yj              Yi   Yj         Yi              Yj




                         Yk                  Yl              Yk   Yl         Yk              Yl




                   Factor graphs are explicit about the factorization
                   Hence, easier to work with
                   Universal (just like MRFs and Bayesian networks)


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Factor Graphs



Capacity



           Yi                     Yj                                               Yi   Yj




           Yk                     Yl                                               Yk   Yl




                   Factor graph defines family of distributions
                   Some families are larger than others



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Factor Graphs



Four remaining pieces




           1. Conditional distributions (CRFs)
           2. Parameterization
           3. Test-time inference
           4. Learning the model from training data




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                           Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                   Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                    conditional
                treated as random variable                                          distribution




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                    Test-time Inference                        Training

Factor Graphs



Conditional Distributions

                We have discussed p(y ),                                                             Xi          Xj
                How do we define p(y |x)?
                Potentials become a function of xN(F )
                Partition function depends on x
                                                                                                     Yi              Yj
                Conditional random fields (CRFs)
                x is not part of the probability model, i.e. not                                      conditional
                treated as random variable                                                            distribution
                                               1
                                      p(y ) =        ψF (yN(F ) )
                                               Z
                                                              F ∈F

                                                       1
                                           p(y |x) =                 ψF (yN(F ) ; xN(F ) )
                                                     Z (x)
                                                              F ∈F



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                              Factor Graphs                         Test-time Inference                     Training

Factor Graphs



Potentials and Energy Functions

                   For each factor F ∈ F, YF =                     ×
                                                                  i∈N(F )
                                                                            Yi ,

                                                              EF : YN(F ) → R,

                   Potentials and energies (assume ψF (yF ) > 0)

                         ψF (yF ) = exp(−EF (yF )),                  and EF (yF ) = − log(ψF (yF )).

                   Then p(y ) can be written as
                                                                   1
                                             p(Y = y )        =                ψF (yF )
                                                                   Z
                                                                       F ∈F
                                                                   1
                                                              =      exp(−                EF (yF )),
                                                                   Z
                                                                                   F ∈F

                   Hence, p(y ) is completely determined by E (y ) =                                      F ∈F   EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                     Test-time Inference      Training

Factor Graphs



Energy Minimization

                                                                      1
                     argmax p(Y = y )                 =      argmax     exp(−               EF (yF ))
                        y ∈Y                                  y ∈Y    Z
                                                                                    F ∈F

                                                      =      argmax exp(−              EF (yF ))
                                                              y ∈Y
                                                                               F ∈F

                                                      =      argmax −           EF (yF )
                                                              y ∈Y
                                                                        F ∈F

                                                      =      argmin          EF (yF )
                                                              y ∈Y
                                                                      F ∈F
                                                      =      argmin E (y ).
                                                              y ∈Y


                   Energy minimization can be interpreted as solving for the most likely
                   state of some factor graph model
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs           Test-time Inference   Training

Factor Graphs



Parameterization
                   Factor graphs define a family of distributions
                   Parameterization: identifying individual members by parameters w


                         distributions
                         indexed
                         by w                                      pw1
                                                             pw2




                                                                          distributions
                                                                          in family

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization



                Image segmentation model
                Pairwise “Potts” energy function
                EF (yi , yj ; w1 ),

                      EF : {0, 1} × {0, 1} × R → R,

                EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0            image segmentation model
                EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference      Training

Factor Graphs



Example: Parameterization (cont)



                Image segmentation model
                Unary energy function EF (yi ; x, w ),

                   EF : {0, 1} × X × R{0,1}×D → R,

                EF (0; x, w ) = w (0), ψF (x)
                EF (1; x, w ) = w (1), ψF (x)                  image segmentation model
                Features ψF : X → RD , e.g. image
                filters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs          Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                             ...               ...        ...
                                w(1), ψF (x)
                                                                                          ...
                                                              0    w1
                                                              w1   0




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs               Test-time Inference         Training

Factor Graphs



Example: Parameterization (cont)

                                w(0), ψF (x)
                                                               ...                  ...        ...
                                w(1), ψF (x)
                                                                                               ...
                                                                 0     w1
                                                                 w1    0


                   Total number of parameters: D + D + 1
                   Parameters are shared, but energies differ because of different ψF (x)
                   General form, linear in w ,

                                              EF (yF ; xF , w ) = w (yF ), ψF (xF )

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Test-time Inference



Making Predictions




                   Making predictions: given x ∈ X , predict y ∈ Y
                   How to measure quality of prediction? (or function f : X → Y)




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                 Test-time Inference   Training

Test-time Inference



Loss function


                   Define a loss function

                                                             ∆ : Y × Y → R+ ,

                   so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗
                   is true.
                   The loss function is application dependent




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                      Test-time Inference   Training

Test-time Inference



Test-time Inference

                   Loss function ∆(y , f (x)): correct label y , predict f (x)

                                                             ∆:Y ×Y →R

                   True joint distribution d(X , Y ) and true conditional d(y |x)
                   Model distribution p(y |x)
                   Expected loss: quality of prediction

                                             R∆ (x)
                                              f              = Ey ∼d(y |x) ∆(y , f (x))
                                                             =          d(y |x) ∆(y , f (x)).
                                                                 y ∈Y
                                                             ≈ Ey ∼p(y |x;w ) ∆(y , f (x))

                   Assuming that p(y |x; w ) ≈ d(y |x)

Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                  Test-time Inference   Training

Test-time Inference



Example 1: 0/1 loss

        Loss 0 iff perfectly predicted, 1 otherwise:

                                                                        0      if y = y ∗
                                  ∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
                                                                        1      otherwise

        Plugging it in,

                                      y∗     := argmin Ey ∼p(y |x) ∆0/1 (y , y )
                                                       y ∈Y

                                             =      argmax p(y |x)
                                                       y ∈Y

                                             =      argmin E (y , x).
                                                       y ∈Y



                   Minimizing the expected 0/1-loss → MAP prediction (energy
                   minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                 Test-time Inference   Training

Test-time Inference



Example 2: Hamming loss
     Count the number of mislabeled variables:
                                             1
                      ∆H (y , y ∗ ) =                    I (yi = yi∗ )
                                            |V |
                                                   i∈V




        Plugging it in,

                                           y∗   := argmin Ey ∼p(y |x) [∆H (y , y )]
                                                            y ∈Y


                                                 =          argmax p(yi |x)
                                                                yi ∈Yi
                                                                              i∈V


                   Minimizing the expected Hamming loss → maximum posterior
                   marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                 Factor Graphs                           Test-time Inference   Training

Test-time Inference



Example 3: Squared error
     Assume a vector space on Yi (pixel intensities,
     optical flow vectors, etc.).
     Sum of squared errors
                                                 1
                      ∆Q (y , y ∗ ) =                            yi − yi∗ 2 .
                                                |V |
                                                       i∈V


        Plugging it in,
                                           y∗    := argmin Ey ∼p(y |x) [∆Q (y , y )]
                                                             y ∈Y
                                                                                  

                                                  =                      p(yi |x)yi 
                                                                 yi ∈Yi
                                                                                         i∈V

                   Minimizing the expected squared error → minimum mean squared
                   error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs      Test-time Inference   Training

Test-time Inference



Inference Task: Maximum A Posteriori (MAP) Inference




        Definition (Maximum A Posteriori (MAP) Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                             y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ).
                                           y ∈Y                y ∈Y




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs                        Test-time Inference   Training

Test-time Inference



Inference Task: Probabilistic Inference


        Definition (Probabilistic Inference)
        Given a factor graph, parameterization, and weight vector w , and given
        the observation x, find

                       log Z (x, w ) =              log             exp(−E (y ; x, w )),
                                                             y ∈Y
                               µF (yF )      = p(YF = yf |x, w ),                ∀F ∈ F, ∀yF ∈ YF .


                   This typically includes variable marginals

                                                             µi (yi ) = p(yi |x, w )



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference             Training

Test-time Inference



Example: Man-made structure detection


                                                                       Xi
                                                                     ψi2
                                                                    Yi            3
                                                                                 ψi,k   Yk
                                                                           ψi1




                   Left: input image x,
                   Middle: ground truth labeling on 16-by-16 pixel blocks,
                   Right: factor graph model

                   Features: gradient and color histograms
                   Estimate model parameters from ≈ 60 training images


Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference    Training

Test-time Inference



Example: Man-made structure detection




                   Left: input image x,
                   Middle (probabilistic inference): visualization of the variable
                   marginals p(yi = “manmade |x, w ),
                   Right (MAP inference): joint MAP labeling
                   y ∗ = argmaxy ∈Y p(y |x, w ).



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs   Test-time Inference   Training

Training



Training the Model




           What can be learned?
              Model structure: factors
                   Model variables: observed variables fixed, but we can add
                   unobserved variables
                   Factor energies: parameters




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                                Factor Graphs                Test-time Inference   Training

Training



Training: Overview



                   Assume a fully observed, independent and identically distributed
                   (iid) sample set

                                           {(x n , y n )}n=1,...,N ,   (x n , y n ) ∼ d(X , Y )

                   Goal: predict well,
                   Alternative goal: first model d(y |x) well by p(y |x, w ), then predict
                   by minimizing the expected loss




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs    Test-time Inference     Training

Training



Probabilistic Learning



           Problem (Probabilistic Parameter Learning)
           Let d(y |x) be the (unknown) conditional distribution of labels for a
           problem to be solved. For a parameterized conditional distribution
           p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
           the task of finding a point estimate of the parameter w ∗ that makes
           p(y |x, w ∗ ) closest to d(y |x).

                   We will discuss probabilistic parameter learning in detail.




Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
Graphical Models                             Factor Graphs        Test-time Inference   Training

Training



Loss-Minimizing Parameter Learning


           Problem (Loss-Minimizing Parameter Learning)
           Let d(x, y ) be the unknown distribution of data in labels, and let
           ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
           the task of finding a parameter value w ∗ such that the expected
           prediction risk
                                   E(x,y )∼d(x,y ) [∆(y , fp (x))]
           is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).

                   Requires loss function at training time
                   Directly learns a prediction function fp (x)



Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models

Weitere ähnliche Inhalte

Was ist angesagt?

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear ApproximationMatthew Leingang
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problemsjfrchicanog
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite IntegralMatthew Leingang
 
Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern Altair
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based EncryptionPratik Poddar
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problemsjfrchicanog
 
Lecture11
Lecture11Lecture11
Lecture11Bo Li
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJordan Open Source Association
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image ProcessingGabriel Peyré
 
Image formation
Image formationImage formation
Image formationpotaters
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talkChristian Robert
 
Kernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problemsKernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problemsAnthony Perez
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesGabriel Peyré
 
Camera parameters
Camera parametersCamera parameters
Camera parametersTheYacine
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal TransportGabriel Peyré
 
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of MaterialsBayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of MaterialsAleksandar Dogandžić
 
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusA type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusAlejandro Díaz-Caro
 

Was ist angesagt? (19)

Lesson 12: Linear Approximation
Lesson 12: Linear ApproximationLesson 12: Linear Approximation
Lesson 12: Linear Approximation
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
 
JavaYDL13
JavaYDL13JavaYDL13
JavaYDL13
 
Lesson 25: The Definite Integral
Lesson 25: The Definite IntegralLesson 25: The Definite Integral
Lesson 25: The Definite Integral
 
Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern Robust Shape and Topology Optimization - Northwestern
Robust Shape and Topology Optimization - Northwestern
 
Identity Based Encryption
Identity Based EncryptionIdentity Based Encryption
Identity Based Encryption
 
Elementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization ProblemsElementary Landscape Decomposition of Combinatorial Optimization Problems
Elementary Landscape Decomposition of Combinatorial Optimization Problems
 
Lecture11
Lecture11Lecture11
Lecture11
 
JOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured DataJOSA TechTalks - Machine Learning on Graph-Structured Data
JOSA TechTalks - Machine Learning on Graph-Structured Data
 
Adaptive Signal and Image Processing
Adaptive Signal and Image ProcessingAdaptive Signal and Image Processing
Adaptive Signal and Image Processing
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Image formation
Image formationImage formation
Image formation
 
Discussion of Faming Liang's talk
Discussion of Faming Liang's talkDiscussion of Faming Liang's talk
Discussion of Faming Liang's talk
 
Kernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problemsKernelization algorithms for graph and other structure modification problems
Kernelization algorithms for graph and other structure modification problems
 
Optimal Transport in Imaging Sciences
Optimal Transport in Imaging SciencesOptimal Transport in Imaging Sciences
Optimal Transport in Imaging Sciences
 
Camera parameters
Camera parametersCamera parameters
Camera parameters
 
An Introduction to Optimal Transport
An Introduction to Optimal TransportAn Introduction to Optimal Transport
An Introduction to Optimal Transport
 
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of MaterialsBayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
Bayesian Defect Signal Analysis for Nondestructive Evaluation of Materials
 
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculusA type system for the vectorial aspects of the linear-algebraic lambda-calculus
A type system for the vectorial aspects of the linear-algebraic lambda-calculus
 

Ähnlich wie 01 graphical models

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsLARCA UPC
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talkjasonj383
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論岳華 杜
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic DifferentiationSSA KPI
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slideszukun
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Deb Roy
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2Robin Ryder
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Alex (Oleksiy) Varfolomiyev
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projectionNBER
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusSpringer
 
Tuto part2
Tuto part2Tuto part2
Tuto part2Bo Li
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodAlexander Decker
 

Ähnlich wie 01 graphical models (20)

A discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functionsA discussion on sampling graphs to approximate network classification functions
A discussion on sampling graphs to approximate network classification functions
 
Physics of Algorithms Talk
Physics of Algorithms TalkPhysics of Algorithms Talk
Physics of Algorithms Talk
 
YSC 2013
YSC 2013YSC 2013
YSC 2013
 
從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論從 VAE 走向深度學習新理論
從 VAE 走向深度學習新理論
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic Differentiation
 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slidesCVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
CVPR2010: Advanced ITinCVPR in a Nutshell: part 4: additional slides
 
Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01Slides2 130201091056-phpapp01
Slides2 130201091056-phpapp01
 
Bayesian case studies, practical 2
Bayesian case studies, practical 2Bayesian case studies, practical 2
Bayesian case studies, practical 2
 
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
Optimal Finite Difference Grids for Elliptic and Parabolic PDEs with Applicat...
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projection
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
UCB 2012-02-28
UCB 2012-02-28UCB 2012-02-28
UCB 2012-02-28
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...Maximum likelihood estimation of regularisation parameters in inverse problem...
Maximum likelihood estimation of regularisation parameters in inverse problem...
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
An introduction to quantum stochastic calculus
An introduction to quantum stochastic calculusAn introduction to quantum stochastic calculus
An introduction to quantum stochastic calculus
 
Tuto part2
Tuto part2Tuto part2
Tuto part2
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis method
 

Mehr von zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Mehr von zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Kürzlich hochgeladen

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Kürzlich hochgeladen (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

01 graphical models

  • 1. Graphical Models Factor Graphs Test-time Inference Training Part 2: Introduction to Graphical Models Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 2. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 3. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction Model: relating observations x to quantities of interest y f Example 1: given RGB image x, infer depth y for each pixel Example 2: given RGB image x, infer X Y presence and positions y of all objects f :X →Y shown X : image, Y: object annotations Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 4. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise language to define this mapping x Mapping can be ambiguous: f (x) measurement noise, lack of X Y well-posedness (e.g. occlusions) f :X →Y Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 5. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Introduction General case: mapping x ∈ X to y ∈ Y Graphical models are a concise ? language to define this mapping x Mapping can be ambiguous: ? measurement noise, lack of X Y well-posedness (e.g. occlusions) p(Y |X = x) Probabilistic graphical models: define form p(y |x) or p(x, y ) for all y ∈ Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 6. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 7. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Graphical Models A graphical model defines a family of probability distributions over a set of random variables, by means of a graph, so that the random variables satisfy conditional independence assumptions encoded in the graph. Popular classes of graphical models, Undirected graphical models (Markov random fields), Directed graphical models (Bayesian networks), Factor graphs, Others: chain graphs, influence diagrams, etc. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 8. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 9. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Bayesian Networks Graph: G = (V , E), E ⊂ V × V Yi Yj directed acyclic Variable domains Yi Yk Factorization p(Y = y ) = p(yi |ypaG (i) ) Yl i∈V over distributions, by conditioning on parent A simple Bayes net nodes. Example p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj ) p(Yi = yi )p(Yj = yj ). Sebastian Nowozin and Christoph H. Lampert Family of distributions Part 2: Introduction to Graphical Models
  • 10. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 11. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Undirected Graphical Models Yi Yj Yk = Markov random field (MRF) = Markov network A simple MRF Graph: G = (V , E), E ⊂ V × V undirected, no self-edges Variable domains Yi Factorization over potentials ψ at cliques, 1 p(y ) = ψC (yC ) Z C ∈C(G ) Constant Z = y ∈Y C ∈C(G ) ψC (yC ) Example 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 12. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 1 Yi Yj Yk Cliques C(G ): set of vertex sets V with V ⊆ V , E ∩ (V × V ) = V × V Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}} 1 p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj ) Z Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 13. Graphical Models Factor Graphs Test-time Inference Training Graphical Models Example 2 Yi Yj Yk Yl Here C(G ) = 2V : all subsets of V are cliques 1 p(y ) = ψA (yA ). Z A∈2{i,j,k,l} Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 14. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 15. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Factor Graphs Graph: G = (V , F, E), E ⊆ V × F Yi Yj variable nodes V , factor nodes F , edges E between variable and factor nodes. scope of a factor, N(F ) = {i ∈ V : (i, F ) ∈ E} Yk Yl Variable domains Yi Factorization over potentials ψ at factors, Factor graph 1 p(y ) = ψF (yN(F ) ) Z F ∈F Constant Z = y ∈Y F ∈F ψF (yN(F ) ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 16. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Why factor graphs? Yi Yj Yi Yj Yi Yj Yk Yl Yk Yl Yk Yl Factor graphs are explicit about the factorization Hence, easier to work with Universal (just like MRFs and Bayesian networks) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 17. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Capacity Yi Yj Yi Yj Yk Yl Yk Yl Factor graph defines family of distributions Some families are larger than others Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 18. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 19. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Four remaining pieces 1. Conditional distributions (CRFs) 2. Parameterization 3. Test-time inference 4. Learning the model from training data Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 20. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 21. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 22. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Conditional Distributions We have discussed p(y ), Xi Xj How do we define p(y |x)? Potentials become a function of xN(F ) Partition function depends on x Yi Yj Conditional random fields (CRFs) x is not part of the probability model, i.e. not conditional treated as random variable distribution 1 p(y ) = ψF (yN(F ) ) Z F ∈F 1 p(y |x) = ψF (yN(F ) ; xN(F ) ) Z (x) F ∈F Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 23. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 24. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 25. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Potentials and Energy Functions For each factor F ∈ F, YF = × i∈N(F ) Yi , EF : YN(F ) → R, Potentials and energies (assume ψF (yF ) > 0) ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )). Then p(y ) can be written as 1 p(Y = y ) = ψF (yF ) Z F ∈F 1 = exp(− EF (yF )), Z F ∈F Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 26. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Energy Minimization 1 argmax p(Y = y ) = argmax exp(− EF (yF )) y ∈Y y ∈Y Z F ∈F = argmax exp(− EF (yF )) y ∈Y F ∈F = argmax − EF (yF ) y ∈Y F ∈F = argmin EF (yF ) y ∈Y F ∈F = argmin E (y ). y ∈Y Energy minimization can be interpreted as solving for the most likely state of some factor graph model Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 27. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 28. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Parameterization Factor graphs define a family of distributions Parameterization: identifying individual members by parameters w distributions indexed by w pw1 pw2 distributions in family Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 29. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization Image segmentation model Pairwise “Potts” energy function EF (yi , yj ; w1 ), EF : {0, 1} × {0, 1} × R → R, EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 30. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) Image segmentation model Unary energy function EF (yi ; x, w ), EF : {0, 1} × X × R{0,1}×D → R, EF (0; x, w ) = w (0), ψF (x) EF (1; x, w ) = w (1), ψF (x) image segmentation model Features ψF : X → RD , e.g. image filters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 31. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 32. Graphical Models Factor Graphs Test-time Inference Training Factor Graphs Example: Parameterization (cont) w(0), ψF (x) ... ... ... w(1), ψF (x) ... 0 w1 w1 0 Total number of parameters: D + D + 1 Parameters are shared, but energies differ because of different ψF (x) General form, linear in w , EF (yF ; xF , w ) = w (yF ), ψF (xF ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 33. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Making Predictions Making predictions: given x ∈ X , predict y ∈ Y How to measure quality of prediction? (or function f : X → Y) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 34. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Loss function Define a loss function ∆ : Y × Y → R+ , so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗ is true. The loss function is application dependent Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 35. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 36. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 37. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Test-time Inference Loss function ∆(y , f (x)): correct label y , predict f (x) ∆:Y ×Y →R True joint distribution d(X , Y ) and true conditional d(y |x) Model distribution p(y |x) Expected loss: quality of prediction R∆ (x) f = Ey ∼d(y |x) ∆(y , f (x)) = d(y |x) ∆(y , f (x)). y ∈Y ≈ Ey ∼p(y |x;w ) ∆(y , f (x)) Assuming that p(y |x; w ) ≈ d(y |x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 38. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 39. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 1: 0/1 loss Loss 0 iff perfectly predicted, 1 otherwise: 0 if y = y ∗ ∆0/1 (y , y ∗ ) = I (y = y ∗ ) = 1 otherwise Plugging it in, y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y ) y ∈Y = argmax p(y |x) y ∈Y = argmin E (y , x). y ∈Y Minimizing the expected 0/1-loss → MAP prediction (energy minimization) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 40. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 41. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 2: Hamming loss Count the number of mislabeled variables: 1 ∆H (y , y ∗ ) = I (yi = yi∗ ) |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆H (y , y )] y ∈Y = argmax p(yi |x) yi ∈Yi i∈V Minimizing the expected Hamming loss → maximum posterior marginal (MPM, Max-Marg) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 42. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 43. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example 3: Squared error Assume a vector space on Yi (pixel intensities, optical flow vectors, etc.). Sum of squared errors 1 ∆Q (y , y ∗ ) = yi − yi∗ 2 . |V | i∈V Plugging it in, y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )] y ∈Y   =  p(yi |x)yi  yi ∈Yi i∈V Minimizing the expected squared error → minimum mean squared error (MMSE) prediction Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 44. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Maximum A Posteriori (MAP) Inference Definition (Maximum A Posteriori (MAP) Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ). y ∈Y y ∈Y Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 45. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Inference Task: Probabilistic Inference Definition (Probabilistic Inference) Given a factor graph, parameterization, and weight vector w , and given the observation x, find log Z (x, w ) = log exp(−E (y ; x, w )), y ∈Y µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF . This typically includes variable marginals µi (yi ) = p(yi |x, w ) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 46. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Xi ψi2 Yi 3 ψi,k Yk ψi1 Left: input image x, Middle: ground truth labeling on 16-by-16 pixel blocks, Right: factor graph model Features: gradient and color histograms Estimate model parameters from ≈ 60 training images Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 47. Graphical Models Factor Graphs Test-time Inference Training Test-time Inference Example: Man-made structure detection Left: input image x, Middle (probabilistic inference): visualization of the variable marginals p(yi = “manmade |x, w ), Right (MAP inference): joint MAP labeling y ∗ = argmaxy ∈Y p(y |x, w ). Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 48. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 49. Graphical Models Factor Graphs Test-time Inference Training Training Training the Model What can be learned? Model structure: factors Model variables: observed variables fixed, but we can add unobserved variables Factor energies: parameters Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 50. Graphical Models Factor Graphs Test-time Inference Training Training Training: Overview Assume a fully observed, independent and identically distributed (iid) sample set {(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y ) Goal: predict well, Alternative goal: first model d(y |x) well by p(y |x, w ), then predict by minimizing the expected loss Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 51. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 52. Graphical Models Factor Graphs Test-time Inference Training Training Probabilistic Learning Problem (Probabilistic Parameter Learning) Let d(y |x) be the (unknown) conditional distribution of labels for a problem to be solved. For a parameterized conditional distribution p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is the task of finding a point estimate of the parameter w ∗ that makes p(y |x, w ∗ ) closest to d(y |x). We will discuss probabilistic parameter learning in detail. Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 53. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models
  • 54. Graphical Models Factor Graphs Test-time Inference Training Training Loss-Minimizing Parameter Learning Problem (Loss-Minimizing Parameter Learning) Let d(x, y ) be the unknown distribution of data in labels, and let ∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is the task of finding a parameter value w ∗ such that the expected prediction risk E(x,y )∼d(x,y ) [∆(y , fp (x))] is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ). Requires loss function at training time Directly learns a prediction function fp (x) Sebastian Nowozin and Christoph H. Lampert Part 2: Introduction to Graphical Models