"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
01 graphical models
1. Graphical Models Factor Graphs Test-time Inference Training
Part 2: Introduction to Graphical Models
Sebastian Nowozin and Christoph H. Lampert
Colorado Springs, 25th June 2011
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
2. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Introduction
Model: relating observations x to
quantities of interest y
f
Example 1: given RGB image x, infer
depth y for each pixel
Example 2: given RGB image x, infer X Y
presence and positions y of all objects f :X →Y
shown
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
3. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Introduction
Model: relating observations x to
quantities of interest y
f
Example 1: given RGB image x, infer
depth y for each pixel
Example 2: given RGB image x, infer X Y
presence and positions y of all objects f :X →Y
shown
X : image, Y: object annotations
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
4. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Introduction
General case: mapping x ∈ X to y ∈ Y
Graphical models are a concise
language to define this mapping x
Mapping can be ambiguous:
f (x)
measurement noise, lack of X Y
well-posedness (e.g. occlusions) f :X →Y
Probabilistic graphical models: define
form p(y |x) or p(x, y ) for all y ∈ Y
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
5. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Introduction
General case: mapping x ∈ X to y ∈ Y
Graphical models are a concise ?
language to define this mapping x
Mapping can be ambiguous: ?
measurement noise, lack of X Y
well-posedness (e.g. occlusions) p(Y |X = x)
Probabilistic graphical models: define
form p(y |x) or p(x, y ) for all y ∈ Y
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
6. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Graphical Models
A graphical model defines
a family of probability distributions over a set of random variables,
by means of a graph,
so that the random variables satisfy conditional independence
assumptions encoded in the graph.
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
7. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Graphical Models
A graphical model defines
a family of probability distributions over a set of random variables,
by means of a graph,
so that the random variables satisfy conditional independence
assumptions encoded in the graph.
Popular classes of graphical models,
Undirected graphical models (Markov
random fields),
Directed graphical models (Bayesian
networks),
Factor graphs,
Others: chain graphs, influence
diagrams, etc.
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
8. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Bayesian Networks
Graph: G = (V , E), E ⊂ V × V Yi Yj
directed
acyclic
Variable domains Yi Yk
Factorization
p(Y = y ) = p(yi |ypaG (i) )
Yl
i∈V
over distributions, by conditioning on parent A simple Bayes net
nodes.
Example
p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
Family of distributions
Part 2: Introduction to Graphical Models
9. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Bayesian Networks
Graph: G = (V , E), E ⊂ V × V Yi Yj
directed
acyclic
Variable domains Yi Yk
Factorization
p(Y = y ) = p(yi |ypaG (i) )
Yl
i∈V
over distributions, by conditioning on parent A simple Bayes net
nodes.
Example
p(Y = y ) =p(Yl = yl |Yk = yk )p(Yk = yk |Yi = yi , Yj = yj )
p(Yi = yi )p(Yj = yj ).
Sebastian Nowozin and Christoph H. Lampert
Family of distributions
Part 2: Introduction to Graphical Models
10. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Undirected Graphical Models
Yi Yj Yk
= Markov random field (MRF) = Markov
network A simple MRF
Graph: G = (V , E), E ⊂ V × V
undirected, no self-edges
Variable domains Yi
Factorization over potentials ψ at cliques,
1
p(y ) = ψC (yC )
Z
C ∈C(G )
Constant Z = y ∈Y C ∈C(G ) ψC (yC )
Example
1
p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
11. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Undirected Graphical Models
Yi Yj Yk
= Markov random field (MRF) = Markov
network A simple MRF
Graph: G = (V , E), E ⊂ V × V
undirected, no self-edges
Variable domains Yi
Factorization over potentials ψ at cliques,
1
p(y ) = ψC (yC )
Z
C ∈C(G )
Constant Z = y ∈Y C ∈C(G ) ψC (yC )
Example
1
p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
12. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Example 1
Yi Yj Yk
Cliques C(G ): set of vertex sets V with V ⊆ V ,
E ∩ (V × V ) = V × V
Here C(G ) = {{i}, {i, j}, {j}, {j, k}, {k}}
1
p(y ) = ψi (yi )ψj (yj )ψl (yl )ψi,j (yi , yj )
Z
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
13. Graphical Models Factor Graphs Test-time Inference Training
Graphical Models
Example 2
Yi Yj
Yk Yl
Here C(G ) = 2V : all subsets of V are cliques
1
p(y ) = ψA (yA ).
Z
A∈2{i,j,k,l}
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
14. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Factor Graphs
Graph: G = (V , F, E), E ⊆ V × F Yi Yj
variable nodes V ,
factor nodes F ,
edges E between variable and factor nodes.
scope of a factor,
N(F ) = {i ∈ V : (i, F ) ∈ E}
Yk Yl
Variable domains Yi
Factorization over potentials ψ at factors, Factor graph
1
p(y ) = ψF (yN(F ) )
Z
F ∈F
Constant Z = y ∈Y F ∈F ψF (yN(F ) )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
15. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Factor Graphs
Graph: G = (V , F, E), E ⊆ V × F Yi Yj
variable nodes V ,
factor nodes F ,
edges E between variable and factor nodes.
scope of a factor,
N(F ) = {i ∈ V : (i, F ) ∈ E}
Yk Yl
Variable domains Yi
Factorization over potentials ψ at factors, Factor graph
1
p(y ) = ψF (yN(F ) )
Z
F ∈F
Constant Z = y ∈Y F ∈F ψF (yN(F ) )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
16. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Why factor graphs?
Yi Yj Yi Yj Yi Yj
Yk Yl Yk Yl Yk Yl
Factor graphs are explicit about the factorization
Hence, easier to work with
Universal (just like MRFs and Bayesian networks)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
17. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Capacity
Yi Yj Yi Yj
Yk Yl Yk Yl
Factor graph defines family of distributions
Some families are larger than others
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
18. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Four remaining pieces
1. Conditional distributions (CRFs)
2. Parameterization
3. Test-time inference
4. Learning the model from training data
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
19. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Four remaining pieces
1. Conditional distributions (CRFs)
2. Parameterization
3. Test-time inference
4. Learning the model from training data
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
20. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Conditional Distributions
We have discussed p(y ), Xi Xj
How do we define p(y |x)?
Potentials become a function of xN(F )
Partition function depends on x
Yi Yj
Conditional random fields (CRFs)
x is not part of the probability model, i.e. not conditional
treated as random variable distribution
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
21. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Conditional Distributions
We have discussed p(y ), Xi Xj
How do we define p(y |x)?
Potentials become a function of xN(F )
Partition function depends on x
Yi Yj
Conditional random fields (CRFs)
x is not part of the probability model, i.e. not conditional
treated as random variable distribution
1
p(y ) = ψF (yN(F ) )
Z
F ∈F
1
p(y |x) = ψF (yN(F ) ; xN(F ) )
Z (x)
F ∈F
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
22. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Conditional Distributions
We have discussed p(y ), Xi Xj
How do we define p(y |x)?
Potentials become a function of xN(F )
Partition function depends on x
Yi Yj
Conditional random fields (CRFs)
x is not part of the probability model, i.e. not conditional
treated as random variable distribution
1
p(y ) = ψF (yN(F ) )
Z
F ∈F
1
p(y |x) = ψF (yN(F ) ; xN(F ) )
Z (x)
F ∈F
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
23. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Potentials and Energy Functions
For each factor F ∈ F, YF = ×
i∈N(F )
Yi ,
EF : YN(F ) → R,
Potentials and energies (assume ψF (yF ) > 0)
ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )).
Then p(y ) can be written as
1
p(Y = y ) = ψF (yF )
Z
F ∈F
1
= exp(− EF (yF )),
Z
F ∈F
Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
24. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Potentials and Energy Functions
For each factor F ∈ F, YF = ×
i∈N(F )
Yi ,
EF : YN(F ) → R,
Potentials and energies (assume ψF (yF ) > 0)
ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )).
Then p(y ) can be written as
1
p(Y = y ) = ψF (yF )
Z
F ∈F
1
= exp(− EF (yF )),
Z
F ∈F
Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
25. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Potentials and Energy Functions
For each factor F ∈ F, YF = ×
i∈N(F )
Yi ,
EF : YN(F ) → R,
Potentials and energies (assume ψF (yF ) > 0)
ψF (yF ) = exp(−EF (yF )), and EF (yF ) = − log(ψF (yF )).
Then p(y ) can be written as
1
p(Y = y ) = ψF (yF )
Z
F ∈F
1
= exp(− EF (yF )),
Z
F ∈F
Hence, p(y ) is completely determined by E (y ) = F ∈F EF (yF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
26. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Energy Minimization
1
argmax p(Y = y ) = argmax exp(− EF (yF ))
y ∈Y y ∈Y Z
F ∈F
= argmax exp(− EF (yF ))
y ∈Y
F ∈F
= argmax − EF (yF )
y ∈Y
F ∈F
= argmin EF (yF )
y ∈Y
F ∈F
= argmin E (y ).
y ∈Y
Energy minimization can be interpreted as solving for the most likely
state of some factor graph model
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
27. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Parameterization
Factor graphs define a family of distributions
Parameterization: identifying individual members by parameters w
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
28. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Parameterization
Factor graphs define a family of distributions
Parameterization: identifying individual members by parameters w
distributions
indexed
by w pw1
pw2
distributions
in family
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
29. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Example: Parameterization
Image segmentation model
Pairwise “Potts” energy function
EF (yi , yj ; w1 ),
EF : {0, 1} × {0, 1} × R → R,
EF (0, 0; w1 ) = EF (1, 1; w1 ) = 0 image segmentation model
EF (0, 1; w1 ) = EF (1, 0; w1 ) = w1
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
30. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Example: Parameterization (cont)
Image segmentation model
Unary energy function EF (yi ; x, w ),
EF : {0, 1} × X × R{0,1}×D → R,
EF (0; x, w ) = w (0), ψF (x)
EF (1; x, w ) = w (1), ψF (x) image segmentation model
Features ψF : X → RD , e.g. image
filters
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
31. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Example: Parameterization (cont)
w(0), ψF (x)
... ... ...
w(1), ψF (x)
...
0 w1
w1 0
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
32. Graphical Models Factor Graphs Test-time Inference Training
Factor Graphs
Example: Parameterization (cont)
w(0), ψF (x)
... ... ...
w(1), ψF (x)
...
0 w1
w1 0
Total number of parameters: D + D + 1
Parameters are shared, but energies differ because of different ψF (x)
General form, linear in w ,
EF (yF ; xF , w ) = w (yF ), ψF (xF )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
33. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Making Predictions
Making predictions: given x ∈ X , predict y ∈ Y
How to measure quality of prediction? (or function f : X → Y)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
34. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Loss function
Define a loss function
∆ : Y × Y → R+ ,
so that ∆(y , y ∗ ) measures the loss incurred by predicting y when y ∗
is true.
The loss function is application dependent
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
35. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Test-time Inference
Loss function ∆(y , f (x)): correct label y , predict f (x)
∆:Y ×Y →R
True joint distribution d(X , Y ) and true conditional d(y |x)
Model distribution p(y |x)
Expected loss: quality of prediction
R∆ (x)
f = Ey ∼d(y |x) ∆(y , f (x))
= d(y |x) ∆(y , f (x)).
y ∈Y
≈ Ey ∼p(y |x;w ) ∆(y , f (x))
Assuming that p(y |x; w ) ≈ d(y |x)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
36. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Test-time Inference
Loss function ∆(y , f (x)): correct label y , predict f (x)
∆:Y ×Y →R
True joint distribution d(X , Y ) and true conditional d(y |x)
Model distribution p(y |x)
Expected loss: quality of prediction
R∆ (x)
f = Ey ∼d(y |x) ∆(y , f (x))
= d(y |x) ∆(y , f (x)).
y ∈Y
≈ Ey ∼p(y |x;w ) ∆(y , f (x))
Assuming that p(y |x; w ) ≈ d(y |x)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
37. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Test-time Inference
Loss function ∆(y , f (x)): correct label y , predict f (x)
∆:Y ×Y →R
True joint distribution d(X , Y ) and true conditional d(y |x)
Model distribution p(y |x)
Expected loss: quality of prediction
R∆ (x)
f = Ey ∼d(y |x) ∆(y , f (x))
= d(y |x) ∆(y , f (x)).
y ∈Y
≈ Ey ∼p(y |x;w ) ∆(y , f (x))
Assuming that p(y |x; w ) ≈ d(y |x)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
38. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example 1: 0/1 loss
Loss 0 iff perfectly predicted, 1 otherwise:
0 if y = y ∗
∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
1 otherwise
Plugging it in,
y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y )
y ∈Y
= argmax p(y |x)
y ∈Y
= argmin E (y , x).
y ∈Y
Minimizing the expected 0/1-loss → MAP prediction (energy
minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
39. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example 1: 0/1 loss
Loss 0 iff perfectly predicted, 1 otherwise:
0 if y = y ∗
∆0/1 (y , y ∗ ) = I (y = y ∗ ) =
1 otherwise
Plugging it in,
y∗ := argmin Ey ∼p(y |x) ∆0/1 (y , y )
y ∈Y
= argmax p(y |x)
y ∈Y
= argmin E (y , x).
y ∈Y
Minimizing the expected 0/1-loss → MAP prediction (energy
minimization)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
40. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example 2: Hamming loss
Count the number of mislabeled variables:
1
∆H (y , y ∗ ) = I (yi = yi∗ )
|V |
i∈V
Plugging it in,
y∗ := argmin Ey ∼p(y |x) [∆H (y , y )]
y ∈Y
= argmax p(yi |x)
yi ∈Yi
i∈V
Minimizing the expected Hamming loss → maximum posterior
marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
41. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example 2: Hamming loss
Count the number of mislabeled variables:
1
∆H (y , y ∗ ) = I (yi = yi∗ )
|V |
i∈V
Plugging it in,
y∗ := argmin Ey ∼p(y |x) [∆H (y , y )]
y ∈Y
= argmax p(yi |x)
yi ∈Yi
i∈V
Minimizing the expected Hamming loss → maximum posterior
marginal (MPM, Max-Marg) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
42. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example 3: Squared error
Assume a vector space on Yi (pixel intensities,
optical flow vectors, etc.).
Sum of squared errors
1
∆Q (y , y ∗ ) = yi − yi∗ 2 .
|V |
i∈V
Plugging it in,
y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )]
y ∈Y
= p(yi |x)yi
yi ∈Yi
i∈V
Minimizing the expected squared error → minimum mean squared
error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
43. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example 3: Squared error
Assume a vector space on Yi (pixel intensities,
optical flow vectors, etc.).
Sum of squared errors
1
∆Q (y , y ∗ ) = yi − yi∗ 2 .
|V |
i∈V
Plugging it in,
y∗ := argmin Ey ∼p(y |x) [∆Q (y , y )]
y ∈Y
= p(yi |x)yi
yi ∈Yi
i∈V
Minimizing the expected squared error → minimum mean squared
error (MMSE) prediction
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
44. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Inference Task: Maximum A Posteriori (MAP) Inference
Definition (Maximum A Posteriori (MAP) Inference)
Given a factor graph, parameterization, and weight vector w , and given
the observation x, find
y ∗ = argmax p(Y = y |x, w ) = argmin E (y ; x, w ).
y ∈Y y ∈Y
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
45. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Inference Task: Probabilistic Inference
Definition (Probabilistic Inference)
Given a factor graph, parameterization, and weight vector w , and given
the observation x, find
log Z (x, w ) = log exp(−E (y ; x, w )),
y ∈Y
µF (yF ) = p(YF = yf |x, w ), ∀F ∈ F, ∀yF ∈ YF .
This typically includes variable marginals
µi (yi ) = p(yi |x, w )
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
46. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example: Man-made structure detection
Xi
ψi2
Yi 3
ψi,k Yk
ψi1
Left: input image x,
Middle: ground truth labeling on 16-by-16 pixel blocks,
Right: factor graph model
Features: gradient and color histograms
Estimate model parameters from ≈ 60 training images
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
47. Graphical Models Factor Graphs Test-time Inference Training
Test-time Inference
Example: Man-made structure detection
Left: input image x,
Middle (probabilistic inference): visualization of the variable
marginals p(yi = “manmade |x, w ),
Right (MAP inference): joint MAP labeling
y ∗ = argmaxy ∈Y p(y |x, w ).
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
48. Graphical Models Factor Graphs Test-time Inference Training
Training
Training the Model
What can be learned?
Model structure: factors
Model variables: observed variables fixed, but we can add
unobserved variables
Factor energies: parameters
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
49. Graphical Models Factor Graphs Test-time Inference Training
Training
Training the Model
What can be learned?
Model structure: factors
Model variables: observed variables fixed, but we can add
unobserved variables
Factor energies: parameters
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
50. Graphical Models Factor Graphs Test-time Inference Training
Training
Training: Overview
Assume a fully observed, independent and identically distributed
(iid) sample set
{(x n , y n )}n=1,...,N , (x n , y n ) ∼ d(X , Y )
Goal: predict well,
Alternative goal: first model d(y |x) well by p(y |x, w ), then predict
by minimizing the expected loss
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
51. Graphical Models Factor Graphs Test-time Inference Training
Training
Probabilistic Learning
Problem (Probabilistic Parameter Learning)
Let d(y |x) be the (unknown) conditional distribution of labels for a
problem to be solved. For a parameterized conditional distribution
p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
the task of finding a point estimate of the parameter w ∗ that makes
p(y |x, w ∗ ) closest to d(y |x).
We will discuss probabilistic parameter learning in detail.
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
52. Graphical Models Factor Graphs Test-time Inference Training
Training
Probabilistic Learning
Problem (Probabilistic Parameter Learning)
Let d(y |x) be the (unknown) conditional distribution of labels for a
problem to be solved. For a parameterized conditional distribution
p(y |x, w ) with parameters w ∈ RD , probabilistic parameter learning is
the task of finding a point estimate of the parameter w ∗ that makes
p(y |x, w ∗ ) closest to d(y |x).
We will discuss probabilistic parameter learning in detail.
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
53. Graphical Models Factor Graphs Test-time Inference Training
Training
Loss-Minimizing Parameter Learning
Problem (Loss-Minimizing Parameter Learning)
Let d(x, y ) be the unknown distribution of data in labels, and let
∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
the task of finding a parameter value w ∗ such that the expected
prediction risk
E(x,y )∼d(x,y ) [∆(y , fp (x))]
is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).
Requires loss function at training time
Directly learns a prediction function fp (x)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models
54. Graphical Models Factor Graphs Test-time Inference Training
Training
Loss-Minimizing Parameter Learning
Problem (Loss-Minimizing Parameter Learning)
Let d(x, y ) be the unknown distribution of data in labels, and let
∆ : Y × Y → R be a loss function. Loss minimizing parameter learning is
the task of finding a parameter value w ∗ such that the expected
prediction risk
E(x,y )∼d(x,y ) [∆(y , fp (x))]
is as small as possible, where fp (x) = argmaxy ∈Y p(y |x, w ∗ ).
Requires loss function at training time
Directly learns a prediction function fp (x)
Sebastian Nowozin and Christoph H. Lampert
Part 2: Introduction to Graphical Models