SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Bregman divergences from comparative convexity
Frank Nielsen Richard Nock
2017
1
Convexity, and
Jensen/Bregman
divergences
2
Midpoint convexity, continuity and convexity
In 1906, Jensen [9] introduced the midpoint convex
property of a function F:
F(p) + F(q)
2
≥ F
p + q
2
, ∀p, q ∈ X.
A continuous function F satisfying the midpoint convexity
implies it also obeys the convexity property [14]:
∀p, q, ∀λ ∈ [0, 1], F(λp + (1 − λ)q) ≤ λF(p) + (1 − λ)F(q).
When inequality is strict for distinct points and λ ∈ (0, 1), this
inequality defines the strict convex property of F.
A function satisfying only the midpoint convexity inequality
may not be continuous [11] (and hence not convex).
3
Jensen divergences and skew Jensen divergences
A divergence D(p, q) is proper iff D(p, q) ≥ 0 with equality iff
p = q.
Jensen divergence (Burbea and Rao divergence [6]) defined for
a strictly convex function F ∈ F<:
JF (p, q):=
F(p) + F(q)
2
− F
p + q
2
.
Skew Jensen divergences [19, 15] for α ∈ (0, 1):
JF,α(p : q):=(1 − α)F(p) + αF(q) − F((1 − α)p + αq),
with JF,α(q : p) = JF,1−α(p : q).
4
Bregman divergences
Bregman divergences [4] (1967) for a strictly convex and
differentiable generator F:
BF (p : q):=F(p) − F(q) − (p − q) F(q).
Scaled skew Jensen divergences [19, 15]:
JF,α(p : q):=
1
α(1 − α)
JF,α(p : q),
with limit cases recovering Bregman divergences:
lim
α→1−
JF,α(p : q) = BF (p : q),
lim
α→0+
JF,α(p : q) = BF (q : p).
5
Comparative convexity
and generalized
divergences
6
Jensen midpoint convexity wrt comparative convexity [14]
Let M(·, ·) and N(·, ·) be two abstract mean functions [5].
min{p, q} ≤ M(p, q) ≤ max{p, q}. (1)
Functions F ∈ C≤
M,N is said midpoint (M, N)-convex iff
∀p, q ∈ X, F(M(p, q)) ≤ N(F(p), F(q)).
When M(p, q) = N(p, q) = A(p, q) = p+q
2 are the arithmetic
means, recover the ordinary Jensen midpoint convexity:
∀p, q ∈ X, F
p + q
2
≤
F(p) + F(q)
2
Note that there exist discontinuous functions that satisfy the
Jensen midpoint convexity.
7
Means, regular means,
and weighted means
8
Regular means
A mean is said regular if it is
homogeneous: M(λp, λq) = λM(p, q) for any λ > 0
symmetric: M(p, q) = M(q, p)
continuous,
increasing in each variable.
Examples of regular means: power means or Hölder means [8]:
Pδ(x, y) =
xδ + yδ
2
1
δ
, P0(x, y) = G(x, y) =
√
xy
belong to a broader family of quasi-arithmetic means. For a
continuous and strictly increasing function f : I ⊂ R → J ⊂ R:
Mf (p, q):=f −1 f (p) + f (q)
2
.
also called Kolmogorov-Nagumo-de Finetti means [10, 13, 7].
9
Examples of quasi-arithmetic means: Pythagorean means
f (x) = x, arithmetic mean (A):
A(p, q) =
p + q
2
f (x) = log x, geometric mean (G):
G(p, q) =
√
pq
f (x) = 1
x , harmonic mean (H):
H(p, q) =
2
1
p + 1
q
=
2pq
p + q
10
Lagrange means [3]
Lagrange means [3] (Lagrangean means) are derived from the mean
value theorem.
Assume wlog that p < q so that the mean m ∈ [p, q].
From the mean value theorem, we have for a differentiable function
f :
∃λ ∈ [p, q] : f (λ) =
f (q) − f (p)
q − p
.
Thus when f is a monotonic function, its inverse function f −1 is
well-defined, and the unique mean value mean λ ∈ [p, q] can be
defined as:
Lf (p, q) = λ = (f )−1 f (q) − f (p)
q − p
.
11
An example of a Lagrange mean [3]
For example, f (x) = log(x) and f (x) = (f )−1(x) = 1
x , we get the
logarithmic mean (L), that is not a quasi-arithmetic mean:
m(p, q) =



0 if p = 0 or q = 0,
x if p = q,
q−p
log q−log p otherwise,
12
Cauchy means [12]
Two positive differentiable and strictly monotonic functions f
and g such that f
g has an inverse function.
The Cauchy mean-value mean:
Cf ,g (p, q) =
f
g
−1
f (q) − f (p)
g(q) − g(p)
, q = p,
with Cf ,g (p, p) = p.
Cauchy means as Lagrange means [12] by the following
identity:
Cf ,g (p, q) = Lf ◦g−1 (g(p), g(q))
since ((f ◦ g−1)(x)) = f (g−1(x))
g (g−1(x))
:
Proof:
Lf ◦g−1 (g(p), g(q)) = ((f ◦ g
−1
) (g(x)))
−1 (f ◦ g−1
)(g(q)) − (f ◦ g−1
)(g(p))
g(q) − g(p)
,
=
f (g−1
(g(x)))
g (g−1(g(x)))
−1
f (q) − f (p)
g(q) − g(p)
,
= Cf ,g (p, q).
13
Stolarsky regular means
The Stolarsky regular means are not quasi-arithmetic means nor
mean-value means [5], and are defined by:
Sp(x, y) =
xp − yp
p(x − y)
1
p−1
, p ∈ {0, 1}.
In limit cases, the Stolarsky family of means yields the logarithmic
mean (L) when p → 0:
L(x, y) =
y − x
log y − log x
,
and the identric mean (I) when p → 1:
I(x, y) =
yy
xx
1
y−x
.
14
Lehmer and Gini (non-regular) means
Weighted Lehmer mean [2] of order δ is defined for δ ∈ R
as:
Lδ(x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ+1
i
n
i=1 wi xδ
i
.
The Lehmer means intersect with the Hölder means only for
the arithmetic (A), geometric (G) and harmonic (H) means.
Lehmer barycentric means belong to Gini means:
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ1
i
n
i=1 wi xδ2
i
1
δ1−δ2
,
when δ1 = δ2, and
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1
x
wi xδ
i
i
1
n
i=1
wi xδ
i
,
when δ1 = δ2 = δ.
15
Lehmer and Gini (non-regular) means
Those families of Gini and Lehmer means are (positively)
homogeneous means:
Gδ1,δ2 (λx1, . . . , λxn; w1, . . . , wn) = λGδ1,δ2 (x1, . . . , xn; w1, . . . , wn),
for any λ > 0.
The family of Gini means include the power means: G0,δ = Pδ
for δ ≤ 0 and Gδ,0 = Pδ for δ ≥ 0.
The Lehmer and Gini means are not always regular since L2 is
not regular.
16
Weighted abstract means and (M, N)-convexity
Consider weighted means M and N,
Mα(p, q):=M(p, q; 1 − α, α).
Function F is (M, N)-convex if and only if:
F(M(p, q; 1 − α, α)) ≤ N(F(p), F(q); 1 − α, α).
For regular means and continuous function F, define the skew
(M, N)-Jensen divergence:
JM,N
F,α (p : q) = Nα(F(p), F(q)) − F(Mα(p, q)) ≥ 0
We have JM,N
F,α (q, p) = JM,N
F,1−α(p : q)
Continuous function + midpoint convexity = (M, N)-convexity
property (Theorem A of [14]):
Nα(F(p), F(q)) ≥ F(Mα(p, q)), ∀p, q ∈ X, ∀α ∈ [0, 1].
17
Generalized Bregman
divergences
18
Generalized Bregman divergences wrt comparative convexity
Define the (M, N)-Bregman divergence for regular means as
limit of scaled skew (M, N)-Jensen divergences:
BM,N
F (p : q) = lim
α→1−
1
α(1 − α)
JM,N
F,α (p : q),
= lim
α→1−
1
α(1 − α)
(Nα(F(p), F(q))) − F(Mα(p, q)))
BM,N
F (q : p) = lim
α→0+
1
α(1 − α)
JM,N
F,α (p : q)
Need to prove that the limit exists and that the generalized
Bregman divergences are proper: BM,N
F (q : p) ≥ 0 with
equality iff p = q.
19
Quasi-arithmetic Bregman divergences
Theorem (Quasi-arithmetic Bregman divergences)
Let F : I ⊂ R → R be a real-valued strictly (ρ, τ)-convex function
defined on an interval I for two strictly monotone and differentiable
functions ρ and τ (with ρ and τ the respective derivatives). The
Quasi-Arithmetic Bregman divergence (QABD) induced by the
comparative convexity is:
Bρ,τ
F (p : q) =
τ(F(p)) − τ(F(q))
τ (F(q))
−
ρ(p) − ρ(q)
ρ (q)
F (q),
= κτ (F(q) : F(p)) − κρ(q : p)F (q),
where primes ’ denote derivatives and
κγ(x : y) =
γ(y) − γ(x)
γ (x)
.
20
Quasi-arithmetic Bregman divergences (proof 1/2)
First-order Taylor expansion of τ−1(x) at x0:
τ−1
(x) x0 τ−1
(x0) + (x − x0)(τ−1
) (x0).
Using the derivative of an inverse function (τ−1) (x) = 1
(τ (τ−1)(x))
,
it follows:
τ−1
(x) τ−1
(x0) + (x − x0)
1
(τ (τ−1)(x0))
.
Plugging x0 = τ(p) and x = τ(p) + α(τ(q) − τ(p)), get a
first-order approximation of the weighted quasi-arithmetic mean
Mτ,α when α → 0:
Mτ,α(p, q) p +
α(τ(q) − τ(p))
τ (p)
.
21
Quasi-arithmetic Bregman divergences (proof 2/2)
Comparative convexity skew Jensen Divergence defined by:
Jτ,ρ
F,α(p : q) = Mτ,α(F(p), F(q)) − F(Mρ,α(p, q))
Apply a first-order Taylor expansion to get
F(Mρ,α(p, q))) F p +
α(ρ(q) − ρ(p))
ρ (p)
F(p) +
α(τ(q) − τ(p))
τ (p)
F (p).
It follows the quasi-arithmetic Bregman divergence:
Bρ,τ
F (q : p) = lim
α→0
J
τ,ρ
α (p : q) =
τ(F(q)) − τ(F(p))
τ (F(p))
−
ρ(q) − ρ(p)
ρ (p)
F (p)
similar for reverse quasi-arithmetic Bregman divergence.
22
Quasi-arithmetic Bregman divergences are proper
Ordinary Bregman divergence on the convex generator
G(x) = τ(F(ρ−1(x))) for a (ρ, τ)-convex function F:
G (x) = τ(F(ρ−1
(x))) =
1
(ρ (ρ−1)(x))
F (ρ−1
(x))τ (F(ρ−1
(x))).
Get an ordinary Bregman divergence that is, in general, different
from the generalized quasi-arithmetic Bregman divergence
(BG (p : q) = Bρ,τ
F (p : q)) :
BG (p : q) = τ(F(ρ−1
(p))) − τ(F(ρ−1
(q)))
−(p − q)
F (ρ−1(q))τ (F(ρ−1(q)))
(ρ (ρ−1)(q))
Sanity check: BG (p : q) = Bρ,τ
F (p : q) when ρ(x) = τ(x) = x.
Remarkable identity:
Bρ,τ
F (p : q) =
1
τ (F(q))
BG (ρ(p) : ρ(q)).
23
Power mean Bregman divergences
A subfamily of quasi-arithmetic Bregman divergences obtained for
the generators ρ(x) = xδ1 and τ(x) = xδ2 :
Corollary (Power Mean Bregman Divergences)
For δ1, δ2 ∈ R{0} with F ∈ CPδ1
,Pδ2
, get the family of Power
Mean Bregman Divergences (PMBDs):
Bδ1,δ2
F (p : q) =
Fδ2 (p) − Fδ2 (q)
δ2Fδ2−1(q)
−
pδ1 − qδ1
δ1qδ1−1
F (q)
Sanity check for δ1 = δ2 = 1: recover the ordinary Bregman
divergence.
Quasi-arithmetic to ordinary convexity criterion
To check whether a function F is (M, N)-convex or not for
quasi-arithmetic means Mρ and Mτ , we use an equivalent test to
ordinary convexity:
Lemma ((ρ, τ)-convexity ↔ ordinary convexity [1])
Let ρ : I → R and τ : J → R be two continuous and strictly
monotone real-valued functions with τ increasing, then function
F : I → J is (ρ, τ)-convex iff function G = Fρ,τ = τ ◦ F ◦ ρ−1 is
(ordinary) convex on ρ(I).
F ∈ C≤
ρ,τ ⇔ G = τ ◦ F ◦ ρ−1
∈ C≤
Quasi-arithmetic to ordinary convexity criterion (proof)
Let us rewrite the (ρ, τ)-convexity midpoint inequality as follows:
F(Mρ(x, y)) ≤ Mτ (F(x), F(y)),
F ρ−1 ρ(x) + ρ(y)
2
≤ τ−1 τ(F(x)) + τ(F(y))
2
,
Since τ is strictly increasing, we have:
(τ ◦ F ◦ ρ−1
)
ρ(x) + ρ(y)
2
≤
(τ ◦ F)(x) + (τ ◦ F)(y)
2
.
Let u = ρ(x) and v = ρ(y) so that x = ρ−1(u) and y = ρ−1(v)
(with u, v ∈ ρ(I)). Then it comes that:
(τ ◦ F ◦ ρ−1
)
u + v
2
≤
(τ ◦ F ◦ ρ−1)(u) + (τ ◦ F ◦ ρ−1)(v)
2
.
This last inequality is precisely the ordinary midpoint convexity
inequality for function G = Fρ,τ = τ ◦ F ◦ ρ−1. Thus a function F is
(ρ, τ)-convex iff G = τ ◦ F ◦ ρ−1 is ordinary convex, and vice-versa.
26
Summary of contributions [17, 16]
defined generalized Jensen divergences and generalized
Bregman divergences using comparative convexity
reported an explicit formula for the quasi-arithmetic Bregman
divergences (QABD)
The QABD can be interpreted as a conformal Bregman
divergence [18] on the ρ-representation
emphasizes that the theory of means [5] is at the very heart of
distances.
see arXiv report for furthre results including a generalization of
Bhattacharyya statistical distance using comparable means
27
References I
[1] John Aczél.
A generalization of the notion of convex functions.
Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947.
[2] Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo Sánchez.
A practical guide to averaging functions, volume 329.
Springer, 2015.
[3] Lucio R Berrone and Julio Moro.
Lagrangian means.
Aequationes Mathematicae, 55(3):217–226, 1998.
[4] Lev M Bregman.
The relaxation method of finding the common point of convex sets and its application to the
solution of problems in convex programming.
USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
[5] Peter S Bullen, Dragoslav S Mitrinovic, and M Vasic.
Means and their Inequalities, volume 31.
Springer Science & Business Media, 2013.
[6] Jacob Burbea and C Rao.
On the convexity of some divergence measures based on entropy functions.
IEEE Transactions on Information Theory, 28(3):489–495, 1982.
[7] Bruno De Finetti.
Sul concetto di media.
3:369Ű396, 1931.
Istituto italiano degli attuari.
[8] Otto Ludwig Holder.
Über einen Mittelwertssatz.
Nachr. Akad. Wiss. Gottingen Math.-Phys. Kl., pages 38–47, 1889.
28
References II
[9] Johan Ludwig William Valdemar Jensen.
Sur les fonctions convexes et les inégalités entre les valeurs moyennes.
Acta mathematica, 30(1):175–193, 1906.
[10] Andrey Nikolaevich Kolmogorov.
Sur la notion de moyenne.
12:388Ű391, 1930.
Acad. Naz. Lincei Mem. Cl. Sci. His. Mat. Natur. Sez.
[11] Gyula Maksa and Zsolt Páles.
Convexity with respect to families of means.
Aequationes mathematicae, 89(1):161–167, 2015.
[12] Janusz Matkowski.
On weighted extensions of Cauchy’s means.
Journal of mathematical analysis and applications, 319(1):215–227, 2006.
[13] Mitio Nagumo.
Über eine klasse der mittelwerte.
In Japanese journal of mathematics: transactions and abstracts, volume 7, pages 71–79. The
Mathematical Society of Japan, 1930.
[14] Constantin P. Niculescu and Lars-Erik Persson.
Convex functions and their applications: A contemporary approach.
Springer Science & Business Media, 2006.
[15] Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
29
References III
[16] Frank Nielsen and Richard Nock.
Generalizing Jensen and Bregman divergences with comparative convexity and the statistical
Bhattacharyya distances with comparable means.
CoRR, abs/1702.04877, 2017.
[17] Frank Nielsen and Richard Nock.
Generalizing skew Jensen divergences and Bregman divergences with comparative convexity.
IEEE Signal Processing Letters, 24(8):1123–1127, Aug 2017.
[18] Richard Nock, Frank Nielsen, and Shun-ichi Amari.
On conformal divergences and their population minimizers.
IEEE Trans. Information Theory, 62(1):527–538, 2016.
[19] Jun Zhang.
Divergence function, duality, and convex analysis.
Neural Computation, 16(1):159–195, 2004.
30

Weitere ähnliche Inhalte

Was ist angesagt?

Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryFrank Nielsen
 
A sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentialsA sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentialsVjekoslavKovac1
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsVjekoslavKovac1
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureVjekoslavKovac1
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distancesChristian Robert
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningFrank Nielsen
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsVjekoslavKovac1
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeVjekoslavKovac1
 

Was ist angesagt? (20)

Clustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometryClustering in Hilbert simplex geometry
Clustering in Hilbert simplex geometry
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
A sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentialsA sharp nonlinear Hausdorff-Young inequality for small potentials
A sharp nonlinear Hausdorff-Young inequality for small potentials
 
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operatorsA T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
A T(1)-type theorem for entangled multilinear Calderon-Zygmund operators
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Multilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structureMultilinear singular integrals with entangled structure
Multilinear singular integrals with entangled structure
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
ABC with Wasserstein distances
ABC with Wasserstein distancesABC with Wasserstein distances
ABC with Wasserstein distances
 
Clustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learningClustering in Hilbert geometry for machine learning
Clustering in Hilbert geometry for machine learning
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Density theorems for anisotropic point configurations
Density theorems for anisotropic point configurationsDensity theorems for anisotropic point configurations
Density theorems for anisotropic point configurations
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
A Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cubeA Szemeredi-type theorem for subsets of the unit cube
A Szemeredi-type theorem for subsets of the unit cube
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 

Ähnlich wie Bregman divergences from comparative convexity

Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdfBenoitValea
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdfBenoitValea
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCEFrank Nielsen
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-DivergencesFrank Nielsen
 
Finance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfFinance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfCarlosLazo45
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesVjekoslavKovac1
 
Group theory notes
Group theory notesGroup theory notes
Group theory notesmkumaresan
 
Note on Character Theory-summer 2013
Note on Character Theory-summer 2013Note on Character Theory-summer 2013
Note on Character Theory-summer 2013Fan Huang (Wright)
 
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...UniversitasGadjahMada
 
5.1 Defining and visualizing functions. A handout.
5.1 Defining and visualizing functions. A handout.5.1 Defining and visualizing functions. A handout.
5.1 Defining and visualizing functions. A handout.Jan Plaza
 
Darmon Points for fields of mixed signature
Darmon Points for fields of mixed signatureDarmon Points for fields of mixed signature
Darmon Points for fields of mixed signaturemmasdeu
 
Practical computation of Hecke operators
Practical computation of Hecke operatorsPractical computation of Hecke operators
Practical computation of Hecke operatorsMathieu Dutour Sikiric
 
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESNONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESTahia ZERIZER
 

Ähnlich wie Bregman divergences from comparative convexity (20)

Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
 
Imc2017 day2-solutions
Imc2017 day2-solutionsImc2017 day2-solutions
Imc2017 day2-solutions
 
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCETHE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
THE CHORD GAP DIVERGENCE AND A GENERALIZATION OF THE BHATTACHARYYA DISTANCE
 
Slides mc gill-v4
Slides mc gill-v4Slides mc gill-v4
Slides mc gill-v4
 
Slides mc gill-v3
Slides mc gill-v3Slides mc gill-v3
Slides mc gill-v3
 
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 On Clustering Histograms with k-Means by Using Mixed α-Divergences On Clustering Histograms with k-Means by Using Mixed α-Divergences
On Clustering Histograms with k-Means by Using Mixed α-Divergences
 
Finance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdfFinance Enginering from Columbia.pdf
Finance Enginering from Columbia.pdf
 
Quantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averagesQuantitative norm convergence of some ergodic averages
Quantitative norm convergence of some ergodic averages
 
LPS talk notes
LPS talk notesLPS talk notes
LPS talk notes
 
Group theory notes
Group theory notesGroup theory notes
Group theory notes
 
Igv2008
Igv2008Igv2008
Igv2008
 
Note on Character Theory-summer 2013
Note on Character Theory-summer 2013Note on Character Theory-summer 2013
Note on Character Theory-summer 2013
 
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
ON OPTIMALITY OF THE INDEX OF SUM, PRODUCT, MAXIMUM, AND MINIMUM OF FINITE BA...
 
5.1 Defining and visualizing functions. A handout.
5.1 Defining and visualizing functions. A handout.5.1 Defining and visualizing functions. A handout.
5.1 Defining and visualizing functions. A handout.
 
Darmon Points for fields of mixed signature
Darmon Points for fields of mixed signatureDarmon Points for fields of mixed signature
Darmon Points for fields of mixed signature
 
Practical computation of Hecke operators
Practical computation of Hecke operatorsPractical computation of Hecke operators
Practical computation of Hecke operators
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESNONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
 

Kürzlich hochgeladen

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 

Kürzlich hochgeladen (20)

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 

Bregman divergences from comparative convexity

  • 1. Bregman divergences from comparative convexity Frank Nielsen Richard Nock 2017 1
  • 3. Midpoint convexity, continuity and convexity In 1906, Jensen [9] introduced the midpoint convex property of a function F: F(p) + F(q) 2 ≥ F p + q 2 , ∀p, q ∈ X. A continuous function F satisfying the midpoint convexity implies it also obeys the convexity property [14]: ∀p, q, ∀λ ∈ [0, 1], F(λp + (1 − λ)q) ≤ λF(p) + (1 − λ)F(q). When inequality is strict for distinct points and λ ∈ (0, 1), this inequality defines the strict convex property of F. A function satisfying only the midpoint convexity inequality may not be continuous [11] (and hence not convex). 3
  • 4. Jensen divergences and skew Jensen divergences A divergence D(p, q) is proper iff D(p, q) ≥ 0 with equality iff p = q. Jensen divergence (Burbea and Rao divergence [6]) defined for a strictly convex function F ∈ F<: JF (p, q):= F(p) + F(q) 2 − F p + q 2 . Skew Jensen divergences [19, 15] for α ∈ (0, 1): JF,α(p : q):=(1 − α)F(p) + αF(q) − F((1 − α)p + αq), with JF,α(q : p) = JF,1−α(p : q). 4
  • 5. Bregman divergences Bregman divergences [4] (1967) for a strictly convex and differentiable generator F: BF (p : q):=F(p) − F(q) − (p − q) F(q). Scaled skew Jensen divergences [19, 15]: JF,α(p : q):= 1 α(1 − α) JF,α(p : q), with limit cases recovering Bregman divergences: lim α→1− JF,α(p : q) = BF (p : q), lim α→0+ JF,α(p : q) = BF (q : p). 5
  • 7. Jensen midpoint convexity wrt comparative convexity [14] Let M(·, ·) and N(·, ·) be two abstract mean functions [5]. min{p, q} ≤ M(p, q) ≤ max{p, q}. (1) Functions F ∈ C≤ M,N is said midpoint (M, N)-convex iff ∀p, q ∈ X, F(M(p, q)) ≤ N(F(p), F(q)). When M(p, q) = N(p, q) = A(p, q) = p+q 2 are the arithmetic means, recover the ordinary Jensen midpoint convexity: ∀p, q ∈ X, F p + q 2 ≤ F(p) + F(q) 2 Note that there exist discontinuous functions that satisfy the Jensen midpoint convexity. 7
  • 8. Means, regular means, and weighted means 8
  • 9. Regular means A mean is said regular if it is homogeneous: M(λp, λq) = λM(p, q) for any λ > 0 symmetric: M(p, q) = M(q, p) continuous, increasing in each variable. Examples of regular means: power means or Hölder means [8]: Pδ(x, y) = xδ + yδ 2 1 δ , P0(x, y) = G(x, y) = √ xy belong to a broader family of quasi-arithmetic means. For a continuous and strictly increasing function f : I ⊂ R → J ⊂ R: Mf (p, q):=f −1 f (p) + f (q) 2 . also called Kolmogorov-Nagumo-de Finetti means [10, 13, 7]. 9
  • 10. Examples of quasi-arithmetic means: Pythagorean means f (x) = x, arithmetic mean (A): A(p, q) = p + q 2 f (x) = log x, geometric mean (G): G(p, q) = √ pq f (x) = 1 x , harmonic mean (H): H(p, q) = 2 1 p + 1 q = 2pq p + q 10
  • 11. Lagrange means [3] Lagrange means [3] (Lagrangean means) are derived from the mean value theorem. Assume wlog that p < q so that the mean m ∈ [p, q]. From the mean value theorem, we have for a differentiable function f : ∃λ ∈ [p, q] : f (λ) = f (q) − f (p) q − p . Thus when f is a monotonic function, its inverse function f −1 is well-defined, and the unique mean value mean λ ∈ [p, q] can be defined as: Lf (p, q) = λ = (f )−1 f (q) − f (p) q − p . 11
  • 12. An example of a Lagrange mean [3] For example, f (x) = log(x) and f (x) = (f )−1(x) = 1 x , we get the logarithmic mean (L), that is not a quasi-arithmetic mean: m(p, q) =    0 if p = 0 or q = 0, x if p = q, q−p log q−log p otherwise, 12
  • 13. Cauchy means [12] Two positive differentiable and strictly monotonic functions f and g such that f g has an inverse function. The Cauchy mean-value mean: Cf ,g (p, q) = f g −1 f (q) − f (p) g(q) − g(p) , q = p, with Cf ,g (p, p) = p. Cauchy means as Lagrange means [12] by the following identity: Cf ,g (p, q) = Lf ◦g−1 (g(p), g(q)) since ((f ◦ g−1)(x)) = f (g−1(x)) g (g−1(x)) : Proof: Lf ◦g−1 (g(p), g(q)) = ((f ◦ g −1 ) (g(x))) −1 (f ◦ g−1 )(g(q)) − (f ◦ g−1 )(g(p)) g(q) − g(p) , = f (g−1 (g(x))) g (g−1(g(x))) −1 f (q) − f (p) g(q) − g(p) , = Cf ,g (p, q). 13
  • 14. Stolarsky regular means The Stolarsky regular means are not quasi-arithmetic means nor mean-value means [5], and are defined by: Sp(x, y) = xp − yp p(x − y) 1 p−1 , p ∈ {0, 1}. In limit cases, the Stolarsky family of means yields the logarithmic mean (L) when p → 0: L(x, y) = y − x log y − log x , and the identric mean (I) when p → 1: I(x, y) = yy xx 1 y−x . 14
  • 15. Lehmer and Gini (non-regular) means Weighted Lehmer mean [2] of order δ is defined for δ ∈ R as: Lδ(x1, . . . , xn; w1, . . . , wn) = n i=1 wi xδ+1 i n i=1 wi xδ i . The Lehmer means intersect with the Hölder means only for the arithmetic (A), geometric (G) and harmonic (H) means. Lehmer barycentric means belong to Gini means: Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) = n i=1 wi xδ1 i n i=1 wi xδ2 i 1 δ1−δ2 , when δ1 = δ2, and Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) = n i=1 x wi xδ i i 1 n i=1 wi xδ i , when δ1 = δ2 = δ. 15
  • 16. Lehmer and Gini (non-regular) means Those families of Gini and Lehmer means are (positively) homogeneous means: Gδ1,δ2 (λx1, . . . , λxn; w1, . . . , wn) = λGδ1,δ2 (x1, . . . , xn; w1, . . . , wn), for any λ > 0. The family of Gini means include the power means: G0,δ = Pδ for δ ≤ 0 and Gδ,0 = Pδ for δ ≥ 0. The Lehmer and Gini means are not always regular since L2 is not regular. 16
  • 17. Weighted abstract means and (M, N)-convexity Consider weighted means M and N, Mα(p, q):=M(p, q; 1 − α, α). Function F is (M, N)-convex if and only if: F(M(p, q; 1 − α, α)) ≤ N(F(p), F(q); 1 − α, α). For regular means and continuous function F, define the skew (M, N)-Jensen divergence: JM,N F,α (p : q) = Nα(F(p), F(q)) − F(Mα(p, q)) ≥ 0 We have JM,N F,α (q, p) = JM,N F,1−α(p : q) Continuous function + midpoint convexity = (M, N)-convexity property (Theorem A of [14]): Nα(F(p), F(q)) ≥ F(Mα(p, q)), ∀p, q ∈ X, ∀α ∈ [0, 1]. 17
  • 19. Generalized Bregman divergences wrt comparative convexity Define the (M, N)-Bregman divergence for regular means as limit of scaled skew (M, N)-Jensen divergences: BM,N F (p : q) = lim α→1− 1 α(1 − α) JM,N F,α (p : q), = lim α→1− 1 α(1 − α) (Nα(F(p), F(q))) − F(Mα(p, q))) BM,N F (q : p) = lim α→0+ 1 α(1 − α) JM,N F,α (p : q) Need to prove that the limit exists and that the generalized Bregman divergences are proper: BM,N F (q : p) ≥ 0 with equality iff p = q. 19
  • 20. Quasi-arithmetic Bregman divergences Theorem (Quasi-arithmetic Bregman divergences) Let F : I ⊂ R → R be a real-valued strictly (ρ, τ)-convex function defined on an interval I for two strictly monotone and differentiable functions ρ and τ (with ρ and τ the respective derivatives). The Quasi-Arithmetic Bregman divergence (QABD) induced by the comparative convexity is: Bρ,τ F (p : q) = τ(F(p)) − τ(F(q)) τ (F(q)) − ρ(p) − ρ(q) ρ (q) F (q), = κτ (F(q) : F(p)) − κρ(q : p)F (q), where primes ’ denote derivatives and κγ(x : y) = γ(y) − γ(x) γ (x) . 20
  • 21. Quasi-arithmetic Bregman divergences (proof 1/2) First-order Taylor expansion of τ−1(x) at x0: τ−1 (x) x0 τ−1 (x0) + (x − x0)(τ−1 ) (x0). Using the derivative of an inverse function (τ−1) (x) = 1 (τ (τ−1)(x)) , it follows: τ−1 (x) τ−1 (x0) + (x − x0) 1 (τ (τ−1)(x0)) . Plugging x0 = τ(p) and x = τ(p) + α(τ(q) − τ(p)), get a first-order approximation of the weighted quasi-arithmetic mean Mτ,α when α → 0: Mτ,α(p, q) p + α(τ(q) − τ(p)) τ (p) . 21
  • 22. Quasi-arithmetic Bregman divergences (proof 2/2) Comparative convexity skew Jensen Divergence defined by: Jτ,ρ F,α(p : q) = Mτ,α(F(p), F(q)) − F(Mρ,α(p, q)) Apply a first-order Taylor expansion to get F(Mρ,α(p, q))) F p + α(ρ(q) − ρ(p)) ρ (p) F(p) + α(τ(q) − τ(p)) τ (p) F (p). It follows the quasi-arithmetic Bregman divergence: Bρ,τ F (q : p) = lim α→0 J τ,ρ α (p : q) = τ(F(q)) − τ(F(p)) τ (F(p)) − ρ(q) − ρ(p) ρ (p) F (p) similar for reverse quasi-arithmetic Bregman divergence. 22
  • 23. Quasi-arithmetic Bregman divergences are proper Ordinary Bregman divergence on the convex generator G(x) = τ(F(ρ−1(x))) for a (ρ, τ)-convex function F: G (x) = τ(F(ρ−1 (x))) = 1 (ρ (ρ−1)(x)) F (ρ−1 (x))τ (F(ρ−1 (x))). Get an ordinary Bregman divergence that is, in general, different from the generalized quasi-arithmetic Bregman divergence (BG (p : q) = Bρ,τ F (p : q)) : BG (p : q) = τ(F(ρ−1 (p))) − τ(F(ρ−1 (q))) −(p − q) F (ρ−1(q))τ (F(ρ−1(q))) (ρ (ρ−1)(q)) Sanity check: BG (p : q) = Bρ,τ F (p : q) when ρ(x) = τ(x) = x. Remarkable identity: Bρ,τ F (p : q) = 1 τ (F(q)) BG (ρ(p) : ρ(q)). 23
  • 24. Power mean Bregman divergences A subfamily of quasi-arithmetic Bregman divergences obtained for the generators ρ(x) = xδ1 and τ(x) = xδ2 : Corollary (Power Mean Bregman Divergences) For δ1, δ2 ∈ R{0} with F ∈ CPδ1 ,Pδ2 , get the family of Power Mean Bregman Divergences (PMBDs): Bδ1,δ2 F (p : q) = Fδ2 (p) − Fδ2 (q) δ2Fδ2−1(q) − pδ1 − qδ1 δ1qδ1−1 F (q) Sanity check for δ1 = δ2 = 1: recover the ordinary Bregman divergence.
  • 25. Quasi-arithmetic to ordinary convexity criterion To check whether a function F is (M, N)-convex or not for quasi-arithmetic means Mρ and Mτ , we use an equivalent test to ordinary convexity: Lemma ((ρ, τ)-convexity ↔ ordinary convexity [1]) Let ρ : I → R and τ : J → R be two continuous and strictly monotone real-valued functions with τ increasing, then function F : I → J is (ρ, τ)-convex iff function G = Fρ,τ = τ ◦ F ◦ ρ−1 is (ordinary) convex on ρ(I). F ∈ C≤ ρ,τ ⇔ G = τ ◦ F ◦ ρ−1 ∈ C≤
  • 26. Quasi-arithmetic to ordinary convexity criterion (proof) Let us rewrite the (ρ, τ)-convexity midpoint inequality as follows: F(Mρ(x, y)) ≤ Mτ (F(x), F(y)), F ρ−1 ρ(x) + ρ(y) 2 ≤ τ−1 τ(F(x)) + τ(F(y)) 2 , Since τ is strictly increasing, we have: (τ ◦ F ◦ ρ−1 ) ρ(x) + ρ(y) 2 ≤ (τ ◦ F)(x) + (τ ◦ F)(y) 2 . Let u = ρ(x) and v = ρ(y) so that x = ρ−1(u) and y = ρ−1(v) (with u, v ∈ ρ(I)). Then it comes that: (τ ◦ F ◦ ρ−1 ) u + v 2 ≤ (τ ◦ F ◦ ρ−1)(u) + (τ ◦ F ◦ ρ−1)(v) 2 . This last inequality is precisely the ordinary midpoint convexity inequality for function G = Fρ,τ = τ ◦ F ◦ ρ−1. Thus a function F is (ρ, τ)-convex iff G = τ ◦ F ◦ ρ−1 is ordinary convex, and vice-versa. 26
  • 27. Summary of contributions [17, 16] defined generalized Jensen divergences and generalized Bregman divergences using comparative convexity reported an explicit formula for the quasi-arithmetic Bregman divergences (QABD) The QABD can be interpreted as a conformal Bregman divergence [18] on the ρ-representation emphasizes that the theory of means [5] is at the very heart of distances. see arXiv report for furthre results including a generalization of Bhattacharyya statistical distance using comparable means 27
  • 28. References I [1] John Aczél. A generalization of the notion of convex functions. Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947. [2] Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo Sánchez. A practical guide to averaging functions, volume 329. Springer, 2015. [3] Lucio R Berrone and Julio Moro. Lagrangian means. Aequationes Mathematicae, 55(3):217–226, 1998. [4] Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3):200–217, 1967. [5] Peter S Bullen, Dragoslav S Mitrinovic, and M Vasic. Means and their Inequalities, volume 31. Springer Science & Business Media, 2013. [6] Jacob Burbea and C Rao. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 28(3):489–495, 1982. [7] Bruno De Finetti. Sul concetto di media. 3:369Ű396, 1931. Istituto italiano degli attuari. [8] Otto Ludwig Holder. Über einen Mittelwertssatz. Nachr. Akad. Wiss. Gottingen Math.-Phys. Kl., pages 38–47, 1889. 28
  • 29. References II [9] Johan Ludwig William Valdemar Jensen. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Acta mathematica, 30(1):175–193, 1906. [10] Andrey Nikolaevich Kolmogorov. Sur la notion de moyenne. 12:388Ű391, 1930. Acad. Naz. Lincei Mem. Cl. Sci. His. Mat. Natur. Sez. [11] Gyula Maksa and Zsolt Páles. Convexity with respect to families of means. Aequationes mathematicae, 89(1):161–167, 2015. [12] Janusz Matkowski. On weighted extensions of Cauchy’s means. Journal of mathematical analysis and applications, 319(1):215–227, 2006. [13] Mitio Nagumo. Über eine klasse der mittelwerte. In Japanese journal of mathematics: transactions and abstracts, volume 7, pages 71–79. The Mathematical Society of Japan, 1930. [14] Constantin P. Niculescu and Lars-Erik Persson. Convex functions and their applications: A contemporary approach. Springer Science & Business Media, 2006. [15] Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8):5455–5466, 2011. 29
  • 30. References III [16] Frank Nielsen and Richard Nock. Generalizing Jensen and Bregman divergences with comparative convexity and the statistical Bhattacharyya distances with comparable means. CoRR, abs/1702.04877, 2017. [17] Frank Nielsen and Richard Nock. Generalizing skew Jensen divergences and Bregman divergences with comparative convexity. IEEE Signal Processing Letters, 24(8):1123–1127, Aug 2017. [18] Richard Nock, Frank Nielsen, and Shun-ichi Amari. On conformal divergences and their population minimizers. IEEE Trans. Information Theory, 62(1):527–538, 2016. [19] Jun Zhang. Divergence function, duality, and convex analysis. Neural Computation, 16(1):159–195, 2004. 30