SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Probability Distributions for ML
Sung-Yub Kim
Dept of IE, Seoul National University
January 29, 2017
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Bishop, C. M. Pattern Recognition and Machine Learning Information Science and Statistics, Springer, 2006.
Kevin P. Murphy. Machine Learning - A Probabilistic Perspective Adaptive Computation and Machine
Learning, MIT press, 2012.
Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning Computer Science and Intelligent
Systems, MIT Press, 2016.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Purpose: Density Estimation
Assumption: Data Points are independent and identically distributed.(i.i.d)
Parametric and Nonparametric
Parametric estimations are more intuitive but has very strong assumption.
Nonparametric estimation also has some parameters, but they control
model complexity.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Bernouli and Binomial Distribution
MLE of Bernouli parameter
The Beta Distribution
Bayesian Inference on binary variables
Diļ¬€erence between prior and posterior
Bernouli Distribution(Ber(Īø))
Bernouli Distribution has only one parameter Īø which means the success
probability of the trial. PMF of bernouli dist is shown like
Ber(x|Īø) = ĪøI(x=1)
(1 āˆ’ Īø)I(x=0)
Binomial Distribution(Bin(n,Īø))
Binomial Distribution has two parameters n for number of trials, Īø for
success prob. PMF of binomial dist is shown like
Bin(k|n, Īø) =
n
k
Īøk
(1 āˆ’ Īø)nāˆ’k
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Bernouli and Binomial Distribution
MLE of Bernouli parameter
The Beta Distribution
Bayesian Inference on binary variables
Diļ¬€erence between prior and posterior
Likelihood of Data
By i.i.d assumption, we get
p(D|Āµ) =
N
n=1
p(xn|Āµ) =
N
n=1
Āµxn
(1 āˆ’ Āµ)1āˆ’xn
(1)
Log-likelihood of Data
Take logarithm, we get
ln p(D|Āµ) =
N
n=1
ln p(xn|Āµ) =
N
n=1
{xn ln Āµ + (1 āˆ’ xn) ln(1 āˆ’ Āµ)} (2)
MLE
Since maximizer is stationary point, we get
ĀµML := Ė†Āµ =
1
N
N
n=1
xn (3)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Bernouli and Binomial Distribution
MLE of Bernouli parameter
The Beta Distribution
Bayesian Inference on binary variables
Diļ¬€erence between prior and posterior
Prior Distribution
The weak point of MLE is you can be overļ¬tted to data. To overcome this
deļ¬ciency, we need to make some prior distribution.
But same time our prior distribution need to has a simple interpretation
and useful analytical properties.
Conjugate Prior
Conjugate prior for a likelihood is a prior distribution which your prior and
posterior distribution are same given your likelihood.
In this case, we need to make our prior proportional to powers of Āµ and
(1 āˆ’ Āµ). Therefore, we choose Beta Distribution
Beta(Āµ|a, b) =
Ī“(a + b)
Ī“(a)Ī“(b)
Āµaāˆ’1
(1 āˆ’ Āµ)bāˆ’1
(4)
Beta Distribution has two parameters a,b each counts how many occurs
each classes(eļ¬€ective number of observations). Also we can easily valid
that posterior is also beta distribution.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Bernouli and Binomial Distribution
MLE of Bernouli parameter
The Beta Distribution
Bayesian Inference on binary variables
Diļ¬€erence between prior and posterior
Posterior Distribution
By some calculation,
p(Āµ|m, l, a, b) =
Ī“(m + l + a + b)
Ī“(m + a)Ī“(l + b)
Āµm+aāˆ’1
(1 āˆ’ Āµ)l+bāˆ’1
(5)
where m,l are observed data.
Bayesian Inference
Now we can make some bayesian inference on binary variables. We want
to know
p(x = 1|D) =
1
0
p(x = 1|Āµ)p(Āµ|D)dĀµ =
1
0
Āµp(Āµ|D)dĀµ = E[Āµ|D] (6)
Therefore we get
p(x = 1|D) =
m + a
m + a + l + b
(7)
If observed data(m,l) are suļ¬ƒciently big, its asymptotic property is
identical to MLE, and this property is very general.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Bernouli and Binomial Distribution
MLE of Bernouli parameter
The Beta Distribution
Bayesian Inference on binary variables
Diļ¬€erence between prior and posterior
Since
EĪø[Īø] = ED[EĪø[Īø|D]] (8)
we know that poseterior mean of Īø, averaged over the distribution generating
the data, is equal to the prior mean of Īø.
Also since
VarĪø[Īø] = ED[VarĪø[Īø|D]] + VarD[EĪø[Īø|D]] (9)
We know that on average, the posterior variance of Īø is smaller than the prior
variance.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Multinomials and Multinouli Distributions
MLE of Multinouli parameters
The Dirichlet Distribution and Bayesian Inference
Multinomial Distribution(Mu(x|n, Īø))
Multinomial distribution is diļ¬€erent from binomial with respect to
dimension of ouput and Īø. In binomial, k means the number of success. In
multinomial each index of x means the number of state. Therefore we can
see binomial as multinomial when the dimension of x and Īø is 2.
Mu(x|n, Īø) =
n
x0, . . . , xKāˆ’1
Kāˆ’1
j=0
Īø
xj
j
Multinouli Distribution(Mu(x|1, Īø))
Sometimes we are intersted in the special case of Multinomial when the n
is 1 that is called Multinouli distribution:
Mu(x|1, Īø) =
Kāˆ’1
j=0
Īø
I(xj =1)
j
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Multinomials and Multinouli Distributions
MLE of Multinouli parameters
The Dirichlet Distribution and Bayesian Inference
Likelihood of Data
By i.i.d assumption, we get
p(D|Āµ) =
N
n=1
K
k=1
Āµ
xnk
k =
K
k=1
Āµ n xnk
k =
K
k=1
Āµ
mk
k (10)
where mk = n xnk (suļ¬ƒcient statistics)
Log-likelihood of Data
Take logarithm, we get
ln p(D|Āµ) =
K
k=1
mk ln Āµk (11)
MLE
Therefore, we need to solve following optimization problem for MLE
max{
K
k=1
mk ln Āµk |
K
k=1
Āµk = 1} (12)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Multinomials and Multinouli Distributions
MLE of Multinouli parameters
The Dirichlet Distribution and Bayesian Inference
MLE(cont.)
We already know that Lagrangian stationaty point is a necessary condition
for constrained optimization problem. Therefore,
ĀµL(Āµ; Ī») = 0, Ī»L(Āµ; Ī») = 0 (13)
where
L(Āµ; Ī») =
K
k=1
mk ln Āµk + Ī»(
K
k=1
Āµk āˆ’ 1) (14)
Therefore, we get
ĀµML
k =
mk
N
(15)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Multinomials and Multinouli Distributions
MLE of Multinouli parameters
The Dirichlet Distribution and Bayesian Inference
Dirichlet Distribution
By the same intuition in Beta distribution, we can get conjugate prior for
Multinouli
Dir(Āµ|Ī±) =
Ī“(Ī±0)
Ī“(Ī±1) Ā· Ā· Ā· Ī“(Ī±K )
K
k=1
Āµ
Ī±k āˆ’1
k (16)
where Ī±0 = k Ī±k
Bayesian Inference
By the same argument in binomial, we can get posterior probability
p(Āµ|D, Ī±) = Dir(Āµ|Ī± + m) =
Ī“(Ī±0 + N)
Ī“(Ī±1 + m1) Ā· Ā· Ā· Ī“(Ī±K + mK )
K
k=1
Āµ
Ī±k +mk āˆ’1
k
(17)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Univariate Gaussian Distribution(N(x|Āµ, Ļƒ2
) = N(x|Āµ, Ī²āˆ’1
))
N(x|Āµ, Ļƒ2
) =
1
āˆš
2Ļ€Ļƒ2
exp(āˆ’
1
2Ļƒ2
(x āˆ’ Āµ)2
) (18)
N(x|Āµ, Ī²āˆ’1
) =
Ī²
2Ļ€
exp(āˆ’
Ī²
2
(x āˆ’ Āµ)2
) (19)
Multivariate Gaussian Distribution(N(x|Āµ, Ī£) = N(x|Āµ, Ī²āˆ’1
))
N(x|Āµ, Ī£) =
1
(2Ļ€)
D
2 det(Ī£)
1
2
exp(āˆ’
1
2
(x āˆ’ Āµ) Ī£āˆ’1
(x āˆ’ Āµ)) (20)
N(x|Āµ, Ī²āˆ’1
) =
1
(2Ļ€)
D
2 det(Ī£)
1
2
exp(āˆ’
1
2
(x āˆ’ Āµ) Ī²(x āˆ’ Āµ)) (21)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Mahalanobis Distance
By EVD, we can get
āˆ†2
= (x āˆ’ Āµ) Ī£āˆ’1
(x āˆ’ Āµ) =
D
i=1
y2
i
Ī»i
(22)
where yi = ui (x āˆ’ Āµ)
Change of Variable in Gaussian
By above, we can get
p(y) = p(x)|Jyā†’x | =
D
j=1
1
(2Ļ€Ī»j )
1
2
exp{āˆ’
y2
j
2Ī»j
} (23)
which means product of D independent univariate Gaussian Distribution.
First and Second Moment of Gaussian
By using above, we can get
E[x] = Āµ, E[xx ] = ĀµĀµ + Ī£ (24)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Limitations of Gaussian and Solutions
There are two main limitations for Gaussian.
First, we have to infer so many covariance parameters.
Second, we cannot represent multi-modal ditriubtions. Therefore, we
deļ¬ne some auxilarily concepts.
Diagonal Covariance
Ī£ = diag(s2
) (25)
Isotropic Covariance
Ī£ = Ļƒ2
I (26)
Mixture Model
p(x) =
K
k=1
Ļ€k p(x|Ļ€k ) (27)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Partitions of Mahalanobis distance
First, partition the covariance matrix and precision matrix.
Ī£ =
Ī£aa Ī£ab
Ī£ba Ī£bb
, Ī£āˆ’1
= Ī› =
Ī›aa Ī›ab
Ī›ba Ī›bb
(28)
where aa, bb are symmetric and ab and ba are conjugate transpose.
Now, partition the Mahalanobis distance.
(x āˆ’ Āµ) Ī£āˆ’1
(x āˆ’ Āµ)
= (xa āˆ’ Āµ) Ī£āˆ’1
aa (xa āˆ’ Āµ) + (xa āˆ’ Āµ) Ī£āˆ’1
ab (xb āˆ’ Āµ)
+(xb āˆ’ Āµ) Ī£āˆ’1
ba (xa āˆ’ Āµ) + (xb āˆ’ Āµ) Ī£āˆ’1
bb (xb āˆ’ Āµ)(29)
Schur Complement
Like gaussian elimination, we can use some block matrix elimination by
Schur Complement
A B
C D
āˆ’1
=
M āˆ’MBDāˆ’1
āˆ’Dāˆ’1
CM Dāˆ’1
+ Dāˆ’1
CMBDāˆ’1 (30)
where M = (A āˆ’ BDāˆ’1
C)āˆ’1
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Schur Complement(cont.)
Therefore, we get
Ī›aa = (Ī£aa āˆ’ Ī£abĪ£āˆ’1
bb Ī£ba)āˆ’1
(31)
Ī›ab = āˆ’(Ī£aa āˆ’ Ī£abĪ£āˆ’1
bb Ī£ba)āˆ’1
Ī£abĪ£āˆ’1
bb (32)
Conditional Distribution
Therefore, we get
xa|xb āˆ¼ N(x|Āµa|b, Ī£a|b) (33)
where
Āµa|b = Āµa + Ī£abĪ£āˆ’1
bb (xb āˆ’ xa) (34)
Ī£a|b = Ī£aa āˆ’ Ī£abĪ£āˆ’1
bb Ī£ba (35)
Marginal Distribution
Removing xb by integrating, we can get marginal distribution of xa
p(xa) = āˆ’
1
2
xa (Ī›aa āˆ’ Ī›abĪ›bbĪ›ba)xa + xa (Ī›aa āˆ’ Ī›abĪ›bbĪ›ba)Āµa + const (36)
Therefore, we get
xa āˆ¼ N(x|Āµa, Ī£aa) (37)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Given a marginal Gaussian for x and a conditional Gaussian for y given x in the
form
x āˆ¼ N(x|Āµ, Ī›āˆ’1
) (38)
y|x āˆ¼ N(y|Ax + b, Lāˆ’1
) (39)
Then we can get marginal distribution of y and the conditional distribution of x
given y are given by
y āˆ¼ N(y|AĀµ + b, Lāˆ’1
+ AĪ›āˆ’1
A ) (40)
x|y āˆ¼ N(x|Ī£{A L(y āˆ’ b) + AĀµ}, Ī£) (41)
where
Ī£ = (Ī› + A LA)āˆ’1
(42)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Log-likelihood for data
By same argument in categorical data, we can get log-likelihood for
Gaussian
ln p(D|Āµ, Ī£) = āˆ’
ND
2
ln 2Ļ€ āˆ’
N
2
ln |Ī£| āˆ’
1
2
N
n=1
(xn āˆ’ Āµ) Ī£āˆ’1
(xn āˆ’ Āµ) (43)
and this log-likelihood depends only on these quantities called Suļ¬ƒcient
Statistics
N
n=1
xn,
N
n=1
xnxn (44)
MLE for Gaussian
Since MLE is a maximizer for log-likelihood, we can get
ĀµML =
1
N
N
n=1
xn (45)
Ī£ML =
1
N
N
n=1
(xn āˆ’ ĀµML)(xn āˆ’ ĀµML) (46)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Sequential estimation
Since we get MLE for gaussian analytically, we can do this sequentially like
ĀµN
ML = ĀµNāˆ’1
ML +
1
N
(xN āˆ’ ĀµNāˆ’1
ML ) (47)
Robbins-Monro Algorithm
By same intuition, we can generalize sequential learning. Robbins-Monro
algorithm gives us root Īø such that f (Īø) = E[z|Īø] = 0. The iterate process
of RM algorithm can be represented by
ĪøN
= ĪøNāˆ’1
āˆ’ aNāˆ’1z(ĪøNāˆ’1
) (48)
where z(ĪøNāˆ’1
) means observed value of z when Īø takes the value ĪøNāˆ’1
and aN is an sequence satisfy
lim
Nā†’āˆž
aN = 0,
āˆž
N=1
aN = āˆž,
āˆž
N=1
aN < āˆž (49)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Generalized Sequential Learning
We can apply RM algorithm for sequential learning. In this case, our f (Īø)
is a gradient of log-likelihood function. Therefore, we can get
z(Īø) = āˆ’
āˆ‚
āˆ‚Īø
ln p(x|Īø) (50)
In Gaussian case, we put aN to Ļƒ2
/N.
Bayesian Inference for mean given variance
Since gaussian likelihood takes the form of the exponential of a quadratic
form in Āµ, we can choose a prior also Gaussian. Therefore, if we choose
Āµ āˆ¼ N(Āµ|Āµ0, Ļƒ2
0) (51)
for prior, we get following for posterior
Āµ|D āˆ¼ N(Āµ|ĀµN , Ļƒ2
N ) (52)
where
ĀµN =
Ļƒ2
NĻƒ2
0 + Ļƒ2
Āµ0 +
NĻƒ2
0
NĻƒ2
0 + Ļƒ2
ĀµML,
1
Ļƒ2
N
=
1
Ļƒ2
0
+
N
Ļƒ2
(53)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Bayesian Inference for mean given variance(cont.)
1. Posterior mean compromises between the priot and the MLE.
2. Precision is given by the precision of the prior plus one contribution of
the data precision from each of the observed data.
3. If we take Ļƒ2
0 ā†’ āˆž then the posterior mean reduces to the MLE.
Bayesian Inference for variance given mean
Since gaussian likelihood takes the form of proportional to the product of
a power of precision and the exponential of a linear function of precision.
We choose gamma distribution which is deļ¬ned by
Gam(Ī»|a0, b0) =
1
Ī“(a0)
ba
00Ī»a0āˆ’1
exp(āˆ’b0Ī») (54)
Then we can get posterior
Ī»|D āˆ¼ Gam(Ī»|aN , bN ) (55)
where
aN = a0 +
N
2
, bN = b0 +
N
2
Ļƒ2
ML (56)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Bayesian Inference for variance given mean(cont.)
1. We can interpret the parameter 2a0 eļ¬€ective prior observations for
number of data. 2. We can interpret the parameter b0/a0 eļ¬€ective prior
observations for variance.
Bayesian Inference for no data
By apply same argument on mean and variance, we can get prior
p(Āµ, Ī») āˆ¼ N(Āµ|Āµ0, (Ī²Ī»)āˆ’1
)Gam(Ī»|a, b) (57)
where
Āµ0 = c/Ī², a = 1 + Ī²/2, b = d āˆ’ c2
/2Ī² (58)
Note that precision of Āµ is a linear function of Ī»
For Multivariate case, we can similarly get prior
p(Āµ, Ī›|Āµ0, Ī², W , Ī½) = N(Āµ|Āµ0, (Ī²Ī›)āˆ’1
)W(Ī›|W , Ī½) (59)
where W is Wishart distribution.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Uni and Multi variate Gaussian
Basic Property
Conditional and Marginal Distributions
Inference for Gaussian
Studentā€™s t-distribution
Univariate t-distribution
If we integrate out the precision given that our prior for precision is
Gamma, we get t-distribution.
St(x|Āµ, Ī», Ī½) =
Ī“(Ī½/2 + 1/2)
Ī“(Ī½/2)
(
Ī»
Ļ€Ī½
)1/2
[1 +
Ī»(x āˆ’ Āµ)2
Ī½
]āˆ’Ī½/2āˆ’1/2
(60)
where Ī½ = 2a(degrees of freedom) and Ī» = a/b.
We can think t-dstribution as an inļ¬nite mixture of Gaussians.
Since t-distribution has fat tail(than Gaussian), we can obtain more robust
model when we estimate.
Multivariate t-distribution
We also can get multivariate case of inļ¬nite mixture of Gaussians, then we
get multivariate t-distribution
St(x|Āµ, Ī›, Ī½) =
Ī“(Ī½/2 + D/2)
Ī“(Ī½/2)
(
Ī›1/2
(Ļ€Ī½)D/2
)[1 +
āˆ†2
Ī½
]āˆ’Ī½/2āˆ’D/2
(61)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Distribution for the exponential family
Sigmoid and Softmax
MLE for the exponential family
Conjugate priors for exponential family
Noninformative priors
The Exponential Family
The exponential family of distributions over x, given parameters Ī·, is
deļ¬ned to be the set of distributions of the form
p(x|Ī·) = g(Ī·)h(x) exp{Ī· u(x)} (62)
where Ī· is natural parameters of the distribution, and u(x) is a function
of x.
The fnuction g(Ī·) can be interpereted as the normalization factor.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Distribution for the exponential family
Sigmoid and Softmax
MLE for the exponential family
Conjugate priors for exponential family
Noninformative priors
Logistic Sigmoid
In case of bernouli distribution, our parameter is Āµ, although our natural
parameter is Ī·. Those two parameter can be connected by following
Ī· = ln(
Āµ
1 āˆ’ Āµ
), Āµ := Ļƒ(Ī·) =
exp(Āµ)
1 + exp(Āµ)
(63)
And we call this Ļƒ(Ī·) sigmoid function.
Softmax function
By same argument, we can ļ¬nd some realtionship between our parameter
and natural parameter. That is Softmax function.
Āµk =
exp(Ī·k )
K
j=1 exp(Ī·j )
(64)
Note that in this case, u(x) = 1, h(x) = 1, g(x) = ( K
j=1 exp(Ī·j ))āˆ’1
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Distribution for the exponential family
Sigmoid and Softmax
MLE for the exponential family
Conjugate priors for exponential family
Noninformative priors
Gaussian
Gaussian also can be interpreted as the exponential family by
u(x) =
x
x2 (65)
Ī· =
Āµ/Ļƒ2
āˆ’1/2Ļƒ2 (66)
g(Ī·) = (āˆ’2Ī·2)1/2
exp(
Ī·2
1
4Ī·2
) (67)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Distribution for the exponential family
Sigmoid and Softmax
MLE for the exponential family
Conjugate priors for exponential family
Noninformative priors
Problem of estimating the natural parameter
We can generalize the argument in MLE in other cases.
First, we consider the log-likelihood of the data.
ln p(D|Ī·) =
N
n=1
h(xn) + N ln g(Ī·) + Ī·
N
n=1
u(xn) (68)
Next, we need to ļ¬nd the stationary point of the log-likelihood.
N Ī· ln g(Ī·) +
N
n=1
u(xn) = 0 (69)
Therfore, we get MLE
āˆ’ Ī· ln g(Ī·) =
1
N
N
n=1
u(xn) (70)
We see that the solution for the MLE depedns on the data only through
Ļƒnu(xn), which is therefore called the suļ¬ƒcient statistic of the
exponential family.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Distribution for the exponential family
Sigmoid and Softmax
MLE for the exponential family
Conjugate priors for exponential family
Noninformative priors
Conjugate prior
For any member of the exponential family, there exists a conjugate prior
that can be written in the form
p(Ī·|Ļ‡, Ī½) = f (Ļ‡, Ī½)g(Ī·)Ī½
exp{Ī½Ī· Ļ‡} (71)
where f (Ļ‡, Ī½) is a normalization factor, and g(Ī·) is the same function as
the exponential family.
Posterior distribution
If we choose prior as conjugate prior, we get
p(Ī·|D, Ļ‡, Ī½) āˆ g(Ī·)Ī½+N
exp{Ī· (
N
n=1
u(xn) + Ī½Ļ‡)} (72)
Therefore, we see that the parameter Ī½ can be interpreted as the eļ¬€ective
number of pseudo-observations in the prior, each of which has a value
for the suļ¬ƒcient statistics u(x) given by Ļ‡.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Distribution for the exponential family
Sigmoid and Softmax
MLE for the exponential family
Conjugate priors for exponential family
Noninformative priors
Noninformative Priors
We may seek a form of prior distribution, called a noninformative prior,
which is intended to have as little inļ¬‚uence on the posterior distribution as
possible.
Generalizations of Noninformative priors
It leads to two generalizations, namely the principle of transformation
groups as in the Jeļ¬€reys prior, and the principle of maximum entropy.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Histogram Technique
Kernel Density Estimation
Nearest-Neighbour methods
Histogram Technique
Standard histograms simply partition x into distinct bins of width āˆ†i and
then count the number ni of observations of x falling in bin i. In order to
turn this count into a normalized probability density, we simply divide by
the total number N of observations and by the width āˆ†i of the bins to
obtain probability values for each bin given by
pi =
ni
Nāˆ†i
(73)
Limitations of Hitogram
The estimated density has discontinuities that are due to the bin edges
rather than any property of the underlying distribution that generated the
data.
Histogram approach also sacling with dimensionality.
Lessons of Histogram
First, to estimate the probability density at a particular location, we should
consider the data points that lie within some local neighbourhood of that
point.
Second, the value of the smoothing parameter should be neither too large
nor too small in order to obtain good results.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Histogram Technique
Kernel Density Estimation
Nearest-Neighbour methods
Motivation
For large N, the bernouli trial that data point fall within small region
mathcalR will be sharply peaked around the mean and so
K NP (74)
If, however, we also assume that the region R is suļ¬ƒciently small that the
probability density p(x) is roughlt over the region, then we have
P p(x)V (75)
where V is the volume of R. Therefore,
p(x) =
K
NV
(76)
Note that in our assumption, R is suļ¬ƒciently small tha the density is
approximately constant over the region and the yet suļ¬ƒciently large that
the number K of points falling inside the region is suļ¬ƒcient for the
binomial distribution to be sharply peaked.
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Histogram Technique
Kernel Density Estimation
Nearest-Neighbour methods
Kernel Density Estimation(KDE)
If we ļ¬x V and determine K from the data, we use kernel approach. For
instance, we ļ¬x V to 1 and count the data point by following function
k(u) =
1, if |ui | ā‰¤ 1/2, i = 1, Ā· Ā· Ā· , D,
0, otherwise
(77)
which called Parzen window In this case, we can use this by
K =
N
n=1
k(
x āˆ’ xn
h
) (78)
and it leads density function
p(x) =
1
N
N
n=1
1
hD
k(
x āˆ’ xn
h
) (79)
We can also use another kernel like Gaussian kernel. If we do so, then we
get
p(x) =
1
N
N
n=1
1
(2Ļ€h2)D/2
exp{āˆ’
x āˆ’ xn
2h2
} (80)
Sung-Yub Kim Probability Distributions for ML
Introduction
Binary Variables
Multinomial Variables
The Gaussian Distribution
The Exponential Family
Nonparametric Methods
Histogram Technique
Kernel Density Estimation
Nearest-Neighbour methods
Limitation of KDE
One of the diļ¬ƒculties with the kernel approach to density estimation is
that the parameter h governing the kernel width is ļ¬xed for all kernels. In
regions of high data density, a large value of h may lead to over-smoothing
and in lower data density, a small value of h may lead to overļ¬tting. Thus
the optimal choice for h may be dependent on location within data space.
Nereat-Neighbor(NN)
Therefore we consider a ļ¬xing K and use the data to ļ¬nd an appropriate V
and we call this method K-NN methods.
In this case, the value of K governs the degree of smoothing and we need
to optimizae(hyper-parameter optimize) K.
Erro of KNN
Note that for suļ¬ƒciently big N, the error rate is never more than twice the
minimum achievable error rate of an optimal classiļ¬er.
Sung-Yub Kim Probability Distributions for ML

Weitere Ƥhnliche Inhalte

Was ist angesagt?

Parametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture ModelsParametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture ModelsPardis N
Ā 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...zukun
Ā 
[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded Imagination
[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded Imagination[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded Imagination
[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded ImaginationDeep Learning JP
Ā 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
Ā 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
Ā 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)NYversity
Ā 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
Ā 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
Ā 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
Ā 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
Ā 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
Ā 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
Ā 
interpolation of unequal intervals
interpolation of unequal intervalsinterpolation of unequal intervals
interpolation of unequal intervalsvaani pathak
Ā 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
Ā 

Was ist angesagt? (15)

Parametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture ModelsParametric Density Estimation using Gaussian Mixture Models
Parametric Density Estimation using Gaussian Mixture Models
Ā 
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
CVPR2010: Advanced ITinCVPR in a Nutshell: part 5: Shape, Matching and Diverg...
Ā 
[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded Imagination
[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded Imagination[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded Imagination
[DLč¼ŖčŖ­ä¼š]Generative Models of Visually Grounded Imagination
Ā 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Spatially Informed Var...
Ā 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
Ā 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
Ā 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
Ā 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
Ā 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
Ā 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
Ā 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
Ā 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
Ā 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
Ā 
interpolation of unequal intervals
interpolation of unequal intervalsinterpolation of unequal intervals
interpolation of unequal intervals
Ā 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
Ā 

Andere mochten auch

Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataYueshen Xu
Ā 
non parametric methods for power spectrum estimaton
non parametric methods for power spectrum estimatonnon parametric methods for power spectrum estimaton
non parametric methods for power spectrum estimatonBhavika Jethani
Ā 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningTerry Taewoong Um
Ā 
źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°
źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°
źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°Terry Taewoong Um
Ā 
Chapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and GraphsChapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and GraphsMong Mara
Ā 
Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2ģ •ķ›ˆ ģ„œ
Ā 
C12
C12C12
C12grane09
Ā 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
Ā 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
Ā 
Forecasting Slides
Forecasting SlidesForecasting Slides
Forecasting Slidesknksmart
Ā 

Andere mochten auch (13)

Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete data
Ā 
non parametric methods for power spectrum estimaton
non parametric methods for power spectrum estimatonnon parametric methods for power spectrum estimaton
non parametric methods for power spectrum estimaton
Ā 
Timeseries Analysis with R
Timeseries Analysis with RTimeseries Analysis with R
Timeseries Analysis with R
Ā 
INTRODUCTION TO TIME SERIES ANALYSIS WITH ā€œRā€ JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH ā€œRā€ JUNE 2014INTRODUCTION TO TIME SERIES ANALYSIS WITH ā€œRā€ JUNE 2014
INTRODUCTION TO TIME SERIES ANALYSIS WITH ā€œRā€ JUNE 2014
Ā 
Problems statistics 1
Problems statistics 1Problems statistics 1
Problems statistics 1
Ā 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
Ā 
źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°
źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°
źø°ź³„ķ•™ģŠµ(Machine learning) ģž…ė¬øķ•˜źø°
Ā 
Chapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and GraphsChapter 2: Frequency Distribution and Graphs
Chapter 2: Frequency Distribution and Graphs
Ā 
Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2
Ā 
C12
C12C12
C12
Ā 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
Ā 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
Ā 
Forecasting Slides
Forecasting SlidesForecasting Slides
Forecasting Slides
Ā 

Ƅhnlich wie Probability distributions for ml

Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksFederico Cerutti
Ā 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
Ā 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:Sean Golliher
Ā 
HW1 MIT Fall 2005
HW1 MIT Fall 2005HW1 MIT Fall 2005
HW1 MIT Fall 2005Aditya Kapoor
Ā 
Neural Networks with Complex Sample Data
Neural Networks with Complex Sample DataNeural Networks with Complex Sample Data
Neural Networks with Complex Sample DataSavano Pereira
Ā 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
Ā 
Lecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisLecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisdevcb13d
Ā 
Dealing with Constraints in Estimation of Distribution Algorithms
Dealing with Constraints in Estimation of Distribution AlgorithmsDealing with Constraints in Estimation of Distribution Algorithms
Dealing with Constraints in Estimation of Distribution AlgorithmsFacultad de InformƔtica UCM
Ā 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statisticsPierre Pudlo
Ā 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)NYversity
Ā 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iAlexander Decker
Ā 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iAlexander Decker
Ā 
Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...IOSR Journals
Ā 
Data Science Cheatsheet.pdf
Data Science Cheatsheet.pdfData Science Cheatsheet.pdf
Data Science Cheatsheet.pdfqawali1
Ā 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsUniversity of Salerno
Ā 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learningYujiro Katagiri
Ā 
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...ijsrd.com
Ā 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9VuTran231
Ā 

Ƅhnlich wie Probability distributions for ml (20)

Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
Ā 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
Ā 
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction:
Ā 
2009 asilomar
2009 asilomar2009 asilomar
2009 asilomar
Ā 
HW1 MIT Fall 2005
HW1 MIT Fall 2005HW1 MIT Fall 2005
HW1 MIT Fall 2005
Ā 
Neural Networks with Complex Sample Data
Neural Networks with Complex Sample DataNeural Networks with Complex Sample Data
Neural Networks with Complex Sample Data
Ā 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Ā 
Lecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysisLecture on linerar discriminatory analysis
Lecture on linerar discriminatory analysis
Ā 
Dealing with Constraints in Estimation of Distribution Algorithms
Dealing with Constraints in Estimation of Distribution AlgorithmsDealing with Constraints in Estimation of Distribution Algorithms
Dealing with Constraints in Estimation of Distribution Algorithms
Ā 
Likelihood free computational statistics
Likelihood free computational statisticsLikelihood free computational statistics
Likelihood free computational statistics
Ā 
Machine learning (10)
Machine learning (10)Machine learning (10)
Machine learning (10)
Ā 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
Ā 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
Ā 
Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Ā 
Data Science Cheatsheet.pdf
Data Science Cheatsheet.pdfData Science Cheatsheet.pdf
Data Science Cheatsheet.pdf
Ā 
Introduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov ChainsIntroduction to Bootstrap and elements of Markov Chains
Introduction to Bootstrap and elements of Markov Chains
Ā 
Uncertainty in deep learning
Uncertainty in deep learningUncertainty in deep learning
Uncertainty in deep learning
Ā 
Diffusion Homework Help
Diffusion Homework HelpDiffusion Homework Help
Diffusion Homework Help
Ā 
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Sensitivity Analysis of GRA Method for Interval Valued Intuitionistic Fuzzy M...
Ā 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9
Ā 

KĆ¼rzlich hochgeladen

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
Ā 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
Ā 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
Ā 
原ē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£
原ē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£åŽŸē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£
原ē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£208367051
Ā 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
Ā 
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)jennyeacort
Ā 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
Ā 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
Ā 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
Ā 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]šŸ“Š Markus Baersch
Ā 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
Ā 
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€åŠžē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€F sss
Ā 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
Ā 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
Ā 
办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degree
办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degree办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degree
办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degreeyuu sss
Ā 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
Ā 
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹yuu sss
Ā 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
Ā 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
Ā 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
Ā 

KĆ¼rzlich hochgeladen (20)

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Ā 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
Ā 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
Ā 
原ē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£
原ē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£åŽŸē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£
原ē‰ˆ1:1å®šåˆ¶å—åå­—ę˜Ÿå¤§å­¦ęƕäøščƁļ¼ˆSCUęƕäøščƁļ¼‰#ę–‡å‡­ęˆē»©å•#ēœŸå®žē•™äæ”å­¦åŽ†č®¤čƁę°øä¹…å­˜ę”£
Ā 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
Ā 
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Call Us āž„97111āˆš47426šŸ¤³Call Girls in Aerocity (Delhi NCR)
Ā 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
Ā 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
Ā 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
Ā 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
Ā 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
Ā 
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€åŠžē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
办ē†å­¦ä½čƁäø­ä½›ē½—é‡Œč¾¾å¤§å­¦ęƕäøščƁ,UCFꈐē»©å•åŽŸē‰ˆäø€ęƔäø€
Ā 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
Ā 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
Ā 
办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degree
办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degree办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degree
办ē¾Žå›½é˜æč‚Æč‰²å¤§å­¦å°ēŸ³åŸŽåˆ†ę ”ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹#ēœŸå®žē•™äæ”å…„åŗ“#ę°øä¹…å­˜ę”£#ēœŸå®žåÆęŸ„#diploma#degree
Ā 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
Ā 
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
äø“äøšäø€ęƔäø€ē¾Žå›½äæ„äŗ„äæ„大学ęƕäøščÆęˆē»©å•pdfē”µå­ē‰ˆåˆ¶ä½œäæ®ę”¹
Ā 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
Ā 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
Ā 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
Ā 

Probability distributions for ml

  • 1. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Probability Distributions for ML Sung-Yub Kim Dept of IE, Seoul National University January 29, 2017 Sung-Yub Kim Probability Distributions for ML
  • 2. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Bishop, C. M. Pattern Recognition and Machine Learning Information Science and Statistics, Springer, 2006. Kevin P. Murphy. Machine Learning - A Probabilistic Perspective Adaptive Computation and Machine Learning, MIT press, 2012. Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning Computer Science and Intelligent Systems, MIT Press, 2016. Sung-Yub Kim Probability Distributions for ML
  • 3. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Purpose: Density Estimation Assumption: Data Points are independent and identically distributed.(i.i.d) Parametric and Nonparametric Parametric estimations are more intuitive but has very strong assumption. Nonparametric estimation also has some parameters, but they control model complexity. Sung-Yub Kim Probability Distributions for ML
  • 4. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Bernouli and Binomial Distribution MLE of Bernouli parameter The Beta Distribution Bayesian Inference on binary variables Diļ¬€erence between prior and posterior Bernouli Distribution(Ber(Īø)) Bernouli Distribution has only one parameter Īø which means the success probability of the trial. PMF of bernouli dist is shown like Ber(x|Īø) = ĪøI(x=1) (1 āˆ’ Īø)I(x=0) Binomial Distribution(Bin(n,Īø)) Binomial Distribution has two parameters n for number of trials, Īø for success prob. PMF of binomial dist is shown like Bin(k|n, Īø) = n k Īøk (1 āˆ’ Īø)nāˆ’k Sung-Yub Kim Probability Distributions for ML
  • 5. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Bernouli and Binomial Distribution MLE of Bernouli parameter The Beta Distribution Bayesian Inference on binary variables Diļ¬€erence between prior and posterior Likelihood of Data By i.i.d assumption, we get p(D|Āµ) = N n=1 p(xn|Āµ) = N n=1 Āµxn (1 āˆ’ Āµ)1āˆ’xn (1) Log-likelihood of Data Take logarithm, we get ln p(D|Āµ) = N n=1 ln p(xn|Āµ) = N n=1 {xn ln Āµ + (1 āˆ’ xn) ln(1 āˆ’ Āµ)} (2) MLE Since maximizer is stationary point, we get ĀµML := Ė†Āµ = 1 N N n=1 xn (3) Sung-Yub Kim Probability Distributions for ML
  • 6. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Bernouli and Binomial Distribution MLE of Bernouli parameter The Beta Distribution Bayesian Inference on binary variables Diļ¬€erence between prior and posterior Prior Distribution The weak point of MLE is you can be overļ¬tted to data. To overcome this deļ¬ciency, we need to make some prior distribution. But same time our prior distribution need to has a simple interpretation and useful analytical properties. Conjugate Prior Conjugate prior for a likelihood is a prior distribution which your prior and posterior distribution are same given your likelihood. In this case, we need to make our prior proportional to powers of Āµ and (1 āˆ’ Āµ). Therefore, we choose Beta Distribution Beta(Āµ|a, b) = Ī“(a + b) Ī“(a)Ī“(b) Āµaāˆ’1 (1 āˆ’ Āµ)bāˆ’1 (4) Beta Distribution has two parameters a,b each counts how many occurs each classes(eļ¬€ective number of observations). Also we can easily valid that posterior is also beta distribution. Sung-Yub Kim Probability Distributions for ML
  • 7. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Bernouli and Binomial Distribution MLE of Bernouli parameter The Beta Distribution Bayesian Inference on binary variables Diļ¬€erence between prior and posterior Posterior Distribution By some calculation, p(Āµ|m, l, a, b) = Ī“(m + l + a + b) Ī“(m + a)Ī“(l + b) Āµm+aāˆ’1 (1 āˆ’ Āµ)l+bāˆ’1 (5) where m,l are observed data. Bayesian Inference Now we can make some bayesian inference on binary variables. We want to know p(x = 1|D) = 1 0 p(x = 1|Āµ)p(Āµ|D)dĀµ = 1 0 Āµp(Āµ|D)dĀµ = E[Āµ|D] (6) Therefore we get p(x = 1|D) = m + a m + a + l + b (7) If observed data(m,l) are suļ¬ƒciently big, its asymptotic property is identical to MLE, and this property is very general. Sung-Yub Kim Probability Distributions for ML
  • 8. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Bernouli and Binomial Distribution MLE of Bernouli parameter The Beta Distribution Bayesian Inference on binary variables Diļ¬€erence between prior and posterior Since EĪø[Īø] = ED[EĪø[Īø|D]] (8) we know that poseterior mean of Īø, averaged over the distribution generating the data, is equal to the prior mean of Īø. Also since VarĪø[Īø] = ED[VarĪø[Īø|D]] + VarD[EĪø[Īø|D]] (9) We know that on average, the posterior variance of Īø is smaller than the prior variance. Sung-Yub Kim Probability Distributions for ML
  • 9. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Multinomials and Multinouli Distributions MLE of Multinouli parameters The Dirichlet Distribution and Bayesian Inference Multinomial Distribution(Mu(x|n, Īø)) Multinomial distribution is diļ¬€erent from binomial with respect to dimension of ouput and Īø. In binomial, k means the number of success. In multinomial each index of x means the number of state. Therefore we can see binomial as multinomial when the dimension of x and Īø is 2. Mu(x|n, Īø) = n x0, . . . , xKāˆ’1 Kāˆ’1 j=0 Īø xj j Multinouli Distribution(Mu(x|1, Īø)) Sometimes we are intersted in the special case of Multinomial when the n is 1 that is called Multinouli distribution: Mu(x|1, Īø) = Kāˆ’1 j=0 Īø I(xj =1) j Sung-Yub Kim Probability Distributions for ML
  • 10. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Multinomials and Multinouli Distributions MLE of Multinouli parameters The Dirichlet Distribution and Bayesian Inference Likelihood of Data By i.i.d assumption, we get p(D|Āµ) = N n=1 K k=1 Āµ xnk k = K k=1 Āµ n xnk k = K k=1 Āµ mk k (10) where mk = n xnk (suļ¬ƒcient statistics) Log-likelihood of Data Take logarithm, we get ln p(D|Āµ) = K k=1 mk ln Āµk (11) MLE Therefore, we need to solve following optimization problem for MLE max{ K k=1 mk ln Āµk | K k=1 Āµk = 1} (12) Sung-Yub Kim Probability Distributions for ML
  • 11. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Multinomials and Multinouli Distributions MLE of Multinouli parameters The Dirichlet Distribution and Bayesian Inference MLE(cont.) We already know that Lagrangian stationaty point is a necessary condition for constrained optimization problem. Therefore, ĀµL(Āµ; Ī») = 0, Ī»L(Āµ; Ī») = 0 (13) where L(Āµ; Ī») = K k=1 mk ln Āµk + Ī»( K k=1 Āµk āˆ’ 1) (14) Therefore, we get ĀµML k = mk N (15) Sung-Yub Kim Probability Distributions for ML
  • 12. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Multinomials and Multinouli Distributions MLE of Multinouli parameters The Dirichlet Distribution and Bayesian Inference Dirichlet Distribution By the same intuition in Beta distribution, we can get conjugate prior for Multinouli Dir(Āµ|Ī±) = Ī“(Ī±0) Ī“(Ī±1) Ā· Ā· Ā· Ī“(Ī±K ) K k=1 Āµ Ī±k āˆ’1 k (16) where Ī±0 = k Ī±k Bayesian Inference By the same argument in binomial, we can get posterior probability p(Āµ|D, Ī±) = Dir(Āµ|Ī± + m) = Ī“(Ī±0 + N) Ī“(Ī±1 + m1) Ā· Ā· Ā· Ī“(Ī±K + mK ) K k=1 Āµ Ī±k +mk āˆ’1 k (17) Sung-Yub Kim Probability Distributions for ML
  • 13. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Univariate Gaussian Distribution(N(x|Āµ, Ļƒ2 ) = N(x|Āµ, Ī²āˆ’1 )) N(x|Āµ, Ļƒ2 ) = 1 āˆš 2Ļ€Ļƒ2 exp(āˆ’ 1 2Ļƒ2 (x āˆ’ Āµ)2 ) (18) N(x|Āµ, Ī²āˆ’1 ) = Ī² 2Ļ€ exp(āˆ’ Ī² 2 (x āˆ’ Āµ)2 ) (19) Multivariate Gaussian Distribution(N(x|Āµ, Ī£) = N(x|Āµ, Ī²āˆ’1 )) N(x|Āµ, Ī£) = 1 (2Ļ€) D 2 det(Ī£) 1 2 exp(āˆ’ 1 2 (x āˆ’ Āµ) Ī£āˆ’1 (x āˆ’ Āµ)) (20) N(x|Āµ, Ī²āˆ’1 ) = 1 (2Ļ€) D 2 det(Ī£) 1 2 exp(āˆ’ 1 2 (x āˆ’ Āµ) Ī²(x āˆ’ Āµ)) (21) Sung-Yub Kim Probability Distributions for ML
  • 14. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Mahalanobis Distance By EVD, we can get āˆ†2 = (x āˆ’ Āµ) Ī£āˆ’1 (x āˆ’ Āµ) = D i=1 y2 i Ī»i (22) where yi = ui (x āˆ’ Āµ) Change of Variable in Gaussian By above, we can get p(y) = p(x)|Jyā†’x | = D j=1 1 (2Ļ€Ī»j ) 1 2 exp{āˆ’ y2 j 2Ī»j } (23) which means product of D independent univariate Gaussian Distribution. First and Second Moment of Gaussian By using above, we can get E[x] = Āµ, E[xx ] = ĀµĀµ + Ī£ (24) Sung-Yub Kim Probability Distributions for ML
  • 15. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Limitations of Gaussian and Solutions There are two main limitations for Gaussian. First, we have to infer so many covariance parameters. Second, we cannot represent multi-modal ditriubtions. Therefore, we deļ¬ne some auxilarily concepts. Diagonal Covariance Ī£ = diag(s2 ) (25) Isotropic Covariance Ī£ = Ļƒ2 I (26) Mixture Model p(x) = K k=1 Ļ€k p(x|Ļ€k ) (27) Sung-Yub Kim Probability Distributions for ML
  • 16. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Partitions of Mahalanobis distance First, partition the covariance matrix and precision matrix. Ī£ = Ī£aa Ī£ab Ī£ba Ī£bb , Ī£āˆ’1 = Ī› = Ī›aa Ī›ab Ī›ba Ī›bb (28) where aa, bb are symmetric and ab and ba are conjugate transpose. Now, partition the Mahalanobis distance. (x āˆ’ Āµ) Ī£āˆ’1 (x āˆ’ Āµ) = (xa āˆ’ Āµ) Ī£āˆ’1 aa (xa āˆ’ Āµ) + (xa āˆ’ Āµ) Ī£āˆ’1 ab (xb āˆ’ Āµ) +(xb āˆ’ Āµ) Ī£āˆ’1 ba (xa āˆ’ Āµ) + (xb āˆ’ Āµ) Ī£āˆ’1 bb (xb āˆ’ Āµ)(29) Schur Complement Like gaussian elimination, we can use some block matrix elimination by Schur Complement A B C D āˆ’1 = M āˆ’MBDāˆ’1 āˆ’Dāˆ’1 CM Dāˆ’1 + Dāˆ’1 CMBDāˆ’1 (30) where M = (A āˆ’ BDāˆ’1 C)āˆ’1 Sung-Yub Kim Probability Distributions for ML
  • 17. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Schur Complement(cont.) Therefore, we get Ī›aa = (Ī£aa āˆ’ Ī£abĪ£āˆ’1 bb Ī£ba)āˆ’1 (31) Ī›ab = āˆ’(Ī£aa āˆ’ Ī£abĪ£āˆ’1 bb Ī£ba)āˆ’1 Ī£abĪ£āˆ’1 bb (32) Conditional Distribution Therefore, we get xa|xb āˆ¼ N(x|Āµa|b, Ī£a|b) (33) where Āµa|b = Āµa + Ī£abĪ£āˆ’1 bb (xb āˆ’ xa) (34) Ī£a|b = Ī£aa āˆ’ Ī£abĪ£āˆ’1 bb Ī£ba (35) Marginal Distribution Removing xb by integrating, we can get marginal distribution of xa p(xa) = āˆ’ 1 2 xa (Ī›aa āˆ’ Ī›abĪ›bbĪ›ba)xa + xa (Ī›aa āˆ’ Ī›abĪ›bbĪ›ba)Āµa + const (36) Therefore, we get xa āˆ¼ N(x|Āµa, Ī£aa) (37) Sung-Yub Kim Probability Distributions for ML
  • 18. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Given a marginal Gaussian for x and a conditional Gaussian for y given x in the form x āˆ¼ N(x|Āµ, Ī›āˆ’1 ) (38) y|x āˆ¼ N(y|Ax + b, Lāˆ’1 ) (39) Then we can get marginal distribution of y and the conditional distribution of x given y are given by y āˆ¼ N(y|AĀµ + b, Lāˆ’1 + AĪ›āˆ’1 A ) (40) x|y āˆ¼ N(x|Ī£{A L(y āˆ’ b) + AĀµ}, Ī£) (41) where Ī£ = (Ī› + A LA)āˆ’1 (42) Sung-Yub Kim Probability Distributions for ML
  • 19. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Log-likelihood for data By same argument in categorical data, we can get log-likelihood for Gaussian ln p(D|Āµ, Ī£) = āˆ’ ND 2 ln 2Ļ€ āˆ’ N 2 ln |Ī£| āˆ’ 1 2 N n=1 (xn āˆ’ Āµ) Ī£āˆ’1 (xn āˆ’ Āµ) (43) and this log-likelihood depends only on these quantities called Suļ¬ƒcient Statistics N n=1 xn, N n=1 xnxn (44) MLE for Gaussian Since MLE is a maximizer for log-likelihood, we can get ĀµML = 1 N N n=1 xn (45) Ī£ML = 1 N N n=1 (xn āˆ’ ĀµML)(xn āˆ’ ĀµML) (46) Sung-Yub Kim Probability Distributions for ML
  • 20. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Sequential estimation Since we get MLE for gaussian analytically, we can do this sequentially like ĀµN ML = ĀµNāˆ’1 ML + 1 N (xN āˆ’ ĀµNāˆ’1 ML ) (47) Robbins-Monro Algorithm By same intuition, we can generalize sequential learning. Robbins-Monro algorithm gives us root Īø such that f (Īø) = E[z|Īø] = 0. The iterate process of RM algorithm can be represented by ĪøN = ĪøNāˆ’1 āˆ’ aNāˆ’1z(ĪøNāˆ’1 ) (48) where z(ĪøNāˆ’1 ) means observed value of z when Īø takes the value ĪøNāˆ’1 and aN is an sequence satisfy lim Nā†’āˆž aN = 0, āˆž N=1 aN = āˆž, āˆž N=1 aN < āˆž (49) Sung-Yub Kim Probability Distributions for ML
  • 21. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Generalized Sequential Learning We can apply RM algorithm for sequential learning. In this case, our f (Īø) is a gradient of log-likelihood function. Therefore, we can get z(Īø) = āˆ’ āˆ‚ āˆ‚Īø ln p(x|Īø) (50) In Gaussian case, we put aN to Ļƒ2 /N. Bayesian Inference for mean given variance Since gaussian likelihood takes the form of the exponential of a quadratic form in Āµ, we can choose a prior also Gaussian. Therefore, if we choose Āµ āˆ¼ N(Āµ|Āµ0, Ļƒ2 0) (51) for prior, we get following for posterior Āµ|D āˆ¼ N(Āµ|ĀµN , Ļƒ2 N ) (52) where ĀµN = Ļƒ2 NĻƒ2 0 + Ļƒ2 Āµ0 + NĻƒ2 0 NĻƒ2 0 + Ļƒ2 ĀµML, 1 Ļƒ2 N = 1 Ļƒ2 0 + N Ļƒ2 (53) Sung-Yub Kim Probability Distributions for ML
  • 22. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Bayesian Inference for mean given variance(cont.) 1. Posterior mean compromises between the priot and the MLE. 2. Precision is given by the precision of the prior plus one contribution of the data precision from each of the observed data. 3. If we take Ļƒ2 0 ā†’ āˆž then the posterior mean reduces to the MLE. Bayesian Inference for variance given mean Since gaussian likelihood takes the form of proportional to the product of a power of precision and the exponential of a linear function of precision. We choose gamma distribution which is deļ¬ned by Gam(Ī»|a0, b0) = 1 Ī“(a0) ba 00Ī»a0āˆ’1 exp(āˆ’b0Ī») (54) Then we can get posterior Ī»|D āˆ¼ Gam(Ī»|aN , bN ) (55) where aN = a0 + N 2 , bN = b0 + N 2 Ļƒ2 ML (56) Sung-Yub Kim Probability Distributions for ML
  • 23. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Bayesian Inference for variance given mean(cont.) 1. We can interpret the parameter 2a0 eļ¬€ective prior observations for number of data. 2. We can interpret the parameter b0/a0 eļ¬€ective prior observations for variance. Bayesian Inference for no data By apply same argument on mean and variance, we can get prior p(Āµ, Ī») āˆ¼ N(Āµ|Āµ0, (Ī²Ī»)āˆ’1 )Gam(Ī»|a, b) (57) where Āµ0 = c/Ī², a = 1 + Ī²/2, b = d āˆ’ c2 /2Ī² (58) Note that precision of Āµ is a linear function of Ī» For Multivariate case, we can similarly get prior p(Āµ, Ī›|Āµ0, Ī², W , Ī½) = N(Āµ|Āµ0, (Ī²Ī›)āˆ’1 )W(Ī›|W , Ī½) (59) where W is Wishart distribution. Sung-Yub Kim Probability Distributions for ML
  • 24. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Uni and Multi variate Gaussian Basic Property Conditional and Marginal Distributions Inference for Gaussian Studentā€™s t-distribution Univariate t-distribution If we integrate out the precision given that our prior for precision is Gamma, we get t-distribution. St(x|Āµ, Ī», Ī½) = Ī“(Ī½/2 + 1/2) Ī“(Ī½/2) ( Ī» Ļ€Ī½ )1/2 [1 + Ī»(x āˆ’ Āµ)2 Ī½ ]āˆ’Ī½/2āˆ’1/2 (60) where Ī½ = 2a(degrees of freedom) and Ī» = a/b. We can think t-dstribution as an inļ¬nite mixture of Gaussians. Since t-distribution has fat tail(than Gaussian), we can obtain more robust model when we estimate. Multivariate t-distribution We also can get multivariate case of inļ¬nite mixture of Gaussians, then we get multivariate t-distribution St(x|Āµ, Ī›, Ī½) = Ī“(Ī½/2 + D/2) Ī“(Ī½/2) ( Ī›1/2 (Ļ€Ī½)D/2 )[1 + āˆ†2 Ī½ ]āˆ’Ī½/2āˆ’D/2 (61) Sung-Yub Kim Probability Distributions for ML
  • 25. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Distribution for the exponential family Sigmoid and Softmax MLE for the exponential family Conjugate priors for exponential family Noninformative priors The Exponential Family The exponential family of distributions over x, given parameters Ī·, is deļ¬ned to be the set of distributions of the form p(x|Ī·) = g(Ī·)h(x) exp{Ī· u(x)} (62) where Ī· is natural parameters of the distribution, and u(x) is a function of x. The fnuction g(Ī·) can be interpereted as the normalization factor. Sung-Yub Kim Probability Distributions for ML
  • 26. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Distribution for the exponential family Sigmoid and Softmax MLE for the exponential family Conjugate priors for exponential family Noninformative priors Logistic Sigmoid In case of bernouli distribution, our parameter is Āµ, although our natural parameter is Ī·. Those two parameter can be connected by following Ī· = ln( Āµ 1 āˆ’ Āµ ), Āµ := Ļƒ(Ī·) = exp(Āµ) 1 + exp(Āµ) (63) And we call this Ļƒ(Ī·) sigmoid function. Softmax function By same argument, we can ļ¬nd some realtionship between our parameter and natural parameter. That is Softmax function. Āµk = exp(Ī·k ) K j=1 exp(Ī·j ) (64) Note that in this case, u(x) = 1, h(x) = 1, g(x) = ( K j=1 exp(Ī·j ))āˆ’1 Sung-Yub Kim Probability Distributions for ML
  • 27. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Distribution for the exponential family Sigmoid and Softmax MLE for the exponential family Conjugate priors for exponential family Noninformative priors Gaussian Gaussian also can be interpreted as the exponential family by u(x) = x x2 (65) Ī· = Āµ/Ļƒ2 āˆ’1/2Ļƒ2 (66) g(Ī·) = (āˆ’2Ī·2)1/2 exp( Ī·2 1 4Ī·2 ) (67) Sung-Yub Kim Probability Distributions for ML
  • 28. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Distribution for the exponential family Sigmoid and Softmax MLE for the exponential family Conjugate priors for exponential family Noninformative priors Problem of estimating the natural parameter We can generalize the argument in MLE in other cases. First, we consider the log-likelihood of the data. ln p(D|Ī·) = N n=1 h(xn) + N ln g(Ī·) + Ī· N n=1 u(xn) (68) Next, we need to ļ¬nd the stationary point of the log-likelihood. N Ī· ln g(Ī·) + N n=1 u(xn) = 0 (69) Therfore, we get MLE āˆ’ Ī· ln g(Ī·) = 1 N N n=1 u(xn) (70) We see that the solution for the MLE depedns on the data only through Ļƒnu(xn), which is therefore called the suļ¬ƒcient statistic of the exponential family. Sung-Yub Kim Probability Distributions for ML
  • 29. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Distribution for the exponential family Sigmoid and Softmax MLE for the exponential family Conjugate priors for exponential family Noninformative priors Conjugate prior For any member of the exponential family, there exists a conjugate prior that can be written in the form p(Ī·|Ļ‡, Ī½) = f (Ļ‡, Ī½)g(Ī·)Ī½ exp{Ī½Ī· Ļ‡} (71) where f (Ļ‡, Ī½) is a normalization factor, and g(Ī·) is the same function as the exponential family. Posterior distribution If we choose prior as conjugate prior, we get p(Ī·|D, Ļ‡, Ī½) āˆ g(Ī·)Ī½+N exp{Ī· ( N n=1 u(xn) + Ī½Ļ‡)} (72) Therefore, we see that the parameter Ī½ can be interpreted as the eļ¬€ective number of pseudo-observations in the prior, each of which has a value for the suļ¬ƒcient statistics u(x) given by Ļ‡. Sung-Yub Kim Probability Distributions for ML
  • 30. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Distribution for the exponential family Sigmoid and Softmax MLE for the exponential family Conjugate priors for exponential family Noninformative priors Noninformative Priors We may seek a form of prior distribution, called a noninformative prior, which is intended to have as little inļ¬‚uence on the posterior distribution as possible. Generalizations of Noninformative priors It leads to two generalizations, namely the principle of transformation groups as in the Jeļ¬€reys prior, and the principle of maximum entropy. Sung-Yub Kim Probability Distributions for ML
  • 31. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Histogram Technique Kernel Density Estimation Nearest-Neighbour methods Histogram Technique Standard histograms simply partition x into distinct bins of width āˆ†i and then count the number ni of observations of x falling in bin i. In order to turn this count into a normalized probability density, we simply divide by the total number N of observations and by the width āˆ†i of the bins to obtain probability values for each bin given by pi = ni Nāˆ†i (73) Limitations of Hitogram The estimated density has discontinuities that are due to the bin edges rather than any property of the underlying distribution that generated the data. Histogram approach also sacling with dimensionality. Lessons of Histogram First, to estimate the probability density at a particular location, we should consider the data points that lie within some local neighbourhood of that point. Second, the value of the smoothing parameter should be neither too large nor too small in order to obtain good results. Sung-Yub Kim Probability Distributions for ML
  • 32. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Histogram Technique Kernel Density Estimation Nearest-Neighbour methods Motivation For large N, the bernouli trial that data point fall within small region mathcalR will be sharply peaked around the mean and so K NP (74) If, however, we also assume that the region R is suļ¬ƒciently small that the probability density p(x) is roughlt over the region, then we have P p(x)V (75) where V is the volume of R. Therefore, p(x) = K NV (76) Note that in our assumption, R is suļ¬ƒciently small tha the density is approximately constant over the region and the yet suļ¬ƒciently large that the number K of points falling inside the region is suļ¬ƒcient for the binomial distribution to be sharply peaked. Sung-Yub Kim Probability Distributions for ML
  • 33. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Histogram Technique Kernel Density Estimation Nearest-Neighbour methods Kernel Density Estimation(KDE) If we ļ¬x V and determine K from the data, we use kernel approach. For instance, we ļ¬x V to 1 and count the data point by following function k(u) = 1, if |ui | ā‰¤ 1/2, i = 1, Ā· Ā· Ā· , D, 0, otherwise (77) which called Parzen window In this case, we can use this by K = N n=1 k( x āˆ’ xn h ) (78) and it leads density function p(x) = 1 N N n=1 1 hD k( x āˆ’ xn h ) (79) We can also use another kernel like Gaussian kernel. If we do so, then we get p(x) = 1 N N n=1 1 (2Ļ€h2)D/2 exp{āˆ’ x āˆ’ xn 2h2 } (80) Sung-Yub Kim Probability Distributions for ML
  • 34. Introduction Binary Variables Multinomial Variables The Gaussian Distribution The Exponential Family Nonparametric Methods Histogram Technique Kernel Density Estimation Nearest-Neighbour methods Limitation of KDE One of the diļ¬ƒculties with the kernel approach to density estimation is that the parameter h governing the kernel width is ļ¬xed for all kernels. In regions of high data density, a large value of h may lead to over-smoothing and in lower data density, a small value of h may lead to overļ¬tting. Thus the optimal choice for h may be dependent on location within data space. Nereat-Neighbor(NN) Therefore we consider a ļ¬xing K and use the data to ļ¬nd an appropriate V and we call this method K-NN methods. In this case, the value of K governs the degree of smoothing and we need to optimizae(hyper-parameter optimize) K. Erro of KNN Note that for suļ¬ƒciently big N, the error rate is never more than twice the minimum achievable error rate of an optimal classiļ¬er. Sung-Yub Kim Probability Distributions for ML