Bregman divergences from comparative convexity

Bregman divergences from comparative convexity
Frank Nielsen Richard Nock
2017
1

Convexity, and
Jensen/Bregman
divergences
2

Midpoint convexity, continuity and convexity
In 1906, Jensen [9] introduced the midpoint convex
property of a function F:
F(p) + F(q)
2
≥ F
p + q
2
, ∀p, q ∈ X.
A continuous function F satisfying the midpoint convexity
implies it also obeys the convexity property [14]:
∀p, q, ∀λ ∈ [0, 1], F(λp + (1 − λ)q) ≤ λF(p) + (1 − λ)F(q).
When inequality is strict for distinct points and λ ∈ (0, 1), this
inequality deﬁnes the strict convex property of F.
A function satisfying only the midpoint convexity inequality
may not be continuous [11] (and hence not convex).
3

Jensen divergences and skew Jensen divergences
A divergence D(p, q) is proper iff D(p, q) ≥ 0 with equality iff
p = q.
Jensen divergence (Burbea and Rao divergence [6]) defined for
a strictly convex function F ∈ F<:
JF (p, q):=
F(p) + F(q)
2
− F
p + q
2
.
Skew Jensen divergences [19, 15] for α ∈ (0, 1):
JF,α(p : q):=(1 − α)F(p) + αF(q) − F((1 − α)p + αq),
with JF,α(q : p) = JF,1−α(p : q).
4

Bregman divergences
Bregman divergences [4] (1967) for a strictly convex and
diﬀerentiable generator F:
BF (p : q):=F(p) − F(q) − (p − q) F(q).
Scaled skew Jensen divergences [19, 15]:
JF,α(p : q):=
1
α(1 − α)
JF,α(p : q),
with limit cases recovering Bregman divergences:
lim
α→1−
JF,α(p : q) = BF (p : q),
lim
α→0+
JF,α(p : q) = BF (q : p).
5

Comparative convexity
and generalized
divergences
6

Jensen midpoint convexity wrt comparative convexity [14]
Let M(·, ·) and N(·, ·) be two abstract mean functions [5].
min{p, q} ≤ M(p, q) ≤ max{p, q}. (1)
Functions F ∈ C≤
M,N is said midpoint (M, N)-convex iﬀ
∀p, q ∈ X, F(M(p, q)) ≤ N(F(p), F(q)).
When M(p, q) = N(p, q) = A(p, q) = p+q
2 are the arithmetic
means, recover the ordinary Jensen midpoint convexity:
∀p, q ∈ X, F
p + q
2
≤
F(p) + F(q)
2
Note that there exist discontinuous functions that satisfy the
Jensen midpoint convexity.
7

Means, regular means,
and weighted means
8

Regular means
A mean is said regular if it is
homogeneous: M(λp, λq) = λM(p, q) for any λ > 0
symmetric: M(p, q) = M(q, p)
continuous,
increasing in each variable.
Examples of regular means: power means or Hölder means [8]:
Pδ(x, y) =
xδ + yδ
2
1
δ
, P0(x, y) = G(x, y) =
√
xy
belong to a broader family of quasi-arithmetic means. For a
continuous and strictly increasing function f : I ⊂ R → J ⊂ R:
Mf (p, q):=f −1 f (p) + f (q)
2
.
also called Kolmogorov-Nagumo-de Finetti means [10, 13, 7].
9

Examples of quasi-arithmetic means: Pythagorean means
f (x) = x, arithmetic mean (A):
A(p, q) =
p + q
2
f (x) = log x, geometric mean (G):
G(p, q) =
√
pq
f (x) = 1
x , harmonic mean (H):
H(p, q) =
2
1
p + 1
q
=
2pq
p + q
10

Lagrange means [3]
Lagrange means [3] (Lagrangean means) are derived from the mean
value theorem.
Assume wlog that p < q so that the mean m ∈ [p, q].
From the mean value theorem, we have for a differentiable function
f :
∃λ ∈ [p, q] : f (λ) =
f (q) − f (p)
q − p
.
Thus when f is a monotonic function, its inverse function f −1 is
well-defined, and the unique mean value mean λ ∈ [p, q] can be
defined as:
Lf (p, q) = λ = (f )−1 f (q) − f (p)
q − p
.
11

An example of a Lagrange mean [3]
For example, f (x) = log(x) and f (x) = (f )−1(x) = 1
x , we get the
logarithmic mean (L), that is not a quasi-arithmetic mean:
m(p, q) =



0 if p = 0 or q = 0,
x if p = q,
q−p
log q−log p otherwise,
12

Cauchy means [12]
Two positive diﬀerentiable and strictly monotonic functions f
and g such that f
g has an inverse function.
The Cauchy mean-value mean:
Cf ,g (p, q) =
f
g
−1
f (q) − f (p)
g(q) − g(p)
, q = p,
with Cf ,g (p, p) = p.
Cauchy means as Lagrange means [12] by the following
identity:
Cf ,g (p, q) = Lf ◦g−1 (g(p), g(q))
since ((f ◦ g−1)(x)) = f (g−1(x))
g (g−1(x))
:
Proof:
Lf ◦g−1 (g(p), g(q)) = ((f ◦ g
−1
) (g(x)))
−1 (f ◦ g−1
)(g(q)) − (f ◦ g−1
)(g(p))
g(q) − g(p)
,
=
f (g−1
(g(x)))
g (g−1(g(x)))
−1
f (q) − f (p)
g(q) − g(p)
,
= Cf ,g (p, q).
13

Stolarsky regular means
The Stolarsky regular means are not quasi-arithmetic means nor
mean-value means [5], and are deﬁned by:
Sp(x, y) =
xp − yp
p(x − y)
1
p−1
, p ∈ {0, 1}.
In limit cases, the Stolarsky family of means yields the logarithmic
mean (L) when p → 0:
L(x, y) =
y − x
log y − log x
,
and the identric mean (I) when p → 1:
I(x, y) =
yy
xx
1
y−x
.
14

Lehmer and Gini (non-regular) means
Weighted Lehmer mean [2] of order δ is deﬁned for δ ∈ R
as:
Lδ(x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ+1
i
n
i=1 wi xδ
i
.
The Lehmer means intersect with the Hölder means only for
the arithmetic (A), geometric (G) and harmonic (H) means.
Lehmer barycentric means belong to Gini means:
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1 wi xδ1
i
n
i=1 wi xδ2
i
1
δ1−δ2
,
when δ1 = δ2, and
Gδ1,δ2 (x1, . . . , xn; w1, . . . , wn) =
n
i=1
x
wi xδ
i
i
1
n
i=1
wi xδ
i
,
when δ1 = δ2 = δ.
15

Lehmer and Gini (non-regular) means
Those families of Gini and Lehmer means are (positively)
homogeneous means:
Gδ1,δ2 (λx1, . . . , λxn; w1, . . . , wn) = λGδ1,δ2 (x1, . . . , xn; w1, . . . , wn),
for any λ > 0.
The family of Gini means include the power means: G0,δ = Pδ
for δ ≤ 0 and Gδ,0 = Pδ for δ ≥ 0.
The Lehmer and Gini means are not always regular since L2 is
not regular.
16

Weighted abstract means and (M, N)-convexity
Consider weighted means M and N,
Mα(p, q):=M(p, q; 1 − α, α).
Function F is (M, N)-convex if and only if:
F(M(p, q; 1 − α, α)) ≤ N(F(p), F(q); 1 − α, α).
For regular means and continuous function F, deﬁne the skew
(M, N)-Jensen divergence:
JM,N
F,α (p : q) = Nα(F(p), F(q)) − F(Mα(p, q)) ≥ 0
We have JM,N
F,α (q, p) = JM,N
F,1−α(p : q)
Continuous function + midpoint convexity = (M, N)-convexity
property (Theorem A of [14]):
Nα(F(p), F(q)) ≥ F(Mα(p, q)), ∀p, q ∈ X, ∀α ∈ [0, 1].
17

Generalized Bregman
divergences
18

Generalized Bregman divergences wrt comparative convexity
Deﬁne the (M, N)-Bregman divergence for regular means as
limit of scaled skew (M, N)-Jensen divergences:
BM,N
F (p : q) = lim
α→1−
1
α(1 − α)
JM,N
F,α (p : q),
= lim
α→1−
1
α(1 − α)
(Nα(F(p), F(q))) − F(Mα(p, q)))
BM,N
F (q : p) = lim
α→0+
1
α(1 − α)
JM,N
F,α (p : q)
Need to prove that the limit exists and that the generalized
Bregman divergences are proper: BM,N
F (q : p) ≥ 0 with
equality iﬀ p = q.
19

Quasi-arithmetic Bregman divergences
Theorem (Quasi-arithmetic Bregman divergences)
Let F : I ⊂ R → R be a real-valued strictly (ρ, τ)-convex function
deﬁned on an interval I for two strictly monotone and diﬀerentiable
functions ρ and τ (with ρ and τ the respective derivatives). The
Quasi-Arithmetic Bregman divergence (QABD) induced by the
comparative convexity is:
Bρ,τ
F (p : q) =
τ(F(p)) − τ(F(q))
τ (F(q))
−
ρ(p) − ρ(q)
ρ (q)
F (q),
= κτ (F(q) : F(p)) − κρ(q : p)F (q),
where primes ’ denote derivatives and
κγ(x : y) =
γ(y) − γ(x)
γ (x)
.
20

Quasi-arithmetic Bregman divergences (proof 1/2)
First-order Taylor expansion of τ−1(x) at x0:
τ−1
(x) x0 τ−1
(x0) + (x − x0)(τ−1
) (x0).
Using the derivative of an inverse function (τ−1) (x) = 1
(τ (τ−1)(x))
,
it follows:
τ−1
(x) τ−1
(x0) + (x − x0)
1
(τ (τ−1)(x0))
.
Plugging x0 = τ(p) and x = τ(p) + α(τ(q) − τ(p)), get a
ﬁrst-order approximation of the weighted quasi-arithmetic mean
Mτ,α when α → 0:
Mτ,α(p, q) p +
α(τ(q) − τ(p))
τ (p)
.
21

Quasi-arithmetic Bregman divergences (proof 2/2)
Comparative convexity skew Jensen Divergence deﬁned by:
Jτ,ρ
F,α(p : q) = Mτ,α(F(p), F(q)) − F(Mρ,α(p, q))
Apply a ﬁrst-order Taylor expansion to get
F(Mρ,α(p, q))) F p +
α(ρ(q) − ρ(p))
ρ (p)
F(p) +
α(τ(q) − τ(p))
τ (p)
F (p).
It follows the quasi-arithmetic Bregman divergence:
Bρ,τ
F (q : p) = lim
α→0
J
τ,ρ
α (p : q) =
τ(F(q)) − τ(F(p))
τ (F(p))
−
ρ(q) − ρ(p)
ρ (p)
F (p)
similar for reverse quasi-arithmetic Bregman divergence.
22

Quasi-arithmetic Bregman divergences are proper
Ordinary Bregman divergence on the convex generator
G(x) = τ(F(ρ−1(x))) for a (ρ, τ)-convex function F:
G (x) = τ(F(ρ−1
(x))) =
1
(ρ (ρ−1)(x))
F (ρ−1
(x))τ (F(ρ−1
(x))).
Get an ordinary Bregman divergence that is, in general, diﬀerent
from the generalized quasi-arithmetic Bregman divergence
(BG (p : q) = Bρ,τ
F (p : q)) :
BG (p : q) = τ(F(ρ−1
(p))) − τ(F(ρ−1
(q)))
−(p − q)
F (ρ−1(q))τ (F(ρ−1(q)))
(ρ (ρ−1)(q))
Sanity check: BG (p : q) = Bρ,τ
F (p : q) when ρ(x) = τ(x) = x.
Remarkable identity:
Bρ,τ
F (p : q) =
1
τ (F(q))
BG (ρ(p) : ρ(q)).
23

Power mean Bregman divergences
A subfamily of quasi-arithmetic Bregman divergences obtained for
the generators ρ(x) = xδ1 and τ(x) = xδ2 :
Corollary (Power Mean Bregman Divergences)
For δ1, δ2 ∈ R{0} with F ∈ CPδ1
,Pδ2
, get the family of Power
Mean Bregman Divergences (PMBDs):
Bδ1,δ2
F (p : q) =
Fδ2 (p) − Fδ2 (q)
δ2Fδ2−1(q)
−
pδ1 − qδ1
δ1qδ1−1
F (q)
Sanity check for δ1 = δ2 = 1: recover the ordinary Bregman
divergence.

Quasi-arithmetic to ordinary convexity criterion
To check whether a function F is (M, N)-convex or not for
quasi-arithmetic means Mρ and Mτ , we use an equivalent test to
ordinary convexity:
Lemma ((ρ, τ)-convexity ↔ ordinary convexity [1])
Let ρ : I → R and τ : J → R be two continuous and strictly
monotone real-valued functions with τ increasing, then function
F : I → J is (ρ, τ)-convex iﬀ function G = Fρ,τ = τ ◦ F ◦ ρ−1 is
(ordinary) convex on ρ(I).
F ∈ C≤
ρ,τ ⇔ G = τ ◦ F ◦ ρ−1
∈ C≤

Quasi-arithmetic to ordinary convexity criterion (proof)
Let us rewrite the (ρ, τ)-convexity midpoint inequality as follows:
F(Mρ(x, y)) ≤ Mτ (F(x), F(y)),
F ρ−1 ρ(x) + ρ(y)
2
≤ τ−1 τ(F(x)) + τ(F(y))
2
,
Since τ is strictly increasing, we have:
(τ ◦ F ◦ ρ−1
)
ρ(x) + ρ(y)
2
≤
(τ ◦ F)(x) + (τ ◦ F)(y)
2
.
Let u = ρ(x) and v = ρ(y) so that x = ρ−1(u) and y = ρ−1(v)
(with u, v ∈ ρ(I)). Then it comes that:
(τ ◦ F ◦ ρ−1
)
u + v
2
≤
(τ ◦ F ◦ ρ−1)(u) + (τ ◦ F ◦ ρ−1)(v)
2
.
This last inequality is precisely the ordinary midpoint convexity
inequality for function G = Fρ,τ = τ ◦ F ◦ ρ−1. Thus a function F is
(ρ, τ)-convex iﬀ G = τ ◦ F ◦ ρ−1 is ordinary convex, and vice-versa.
26

Summary of contributions [17, 16]
deﬁned generalized Jensen divergences and generalized
Bregman divergences using comparative convexity
reported an explicit formula for the quasi-arithmetic Bregman
divergences (QABD)
The QABD can be interpreted as a conformal Bregman
divergence [18] on the ρ-representation
emphasizes that the theory of means [5] is at the very heart of
distances.
see arXiv report for furthre results including a generalization of
Bhattacharyya statistical distance using comparable means
27

References I
[1] John Aczél.
A generalization of the notion of convex functions.
Det Kongelige Norske Videnskabers Selskabs Forhandlinger, Trondheim, 19(24):87–90, 1947.
[2] Gleb Beliakov, Humberto Bustince Sola, and Tomasa Calvo Sánchez.
A practical guide to averaging functions, volume 329.
Springer, 2015.
[3] Lucio R Berrone and Julio Moro.
Lagrangian means.
Aequationes Mathematicae, 55(3):217–226, 1998.
[4] Lev M Bregman.
The relaxation method of ﬁnding the common point of convex sets and its application to the
solution of problems in convex programming.
USSR computational mathematics and mathematical physics, 7(3):200–217, 1967.
[5] Peter S Bullen, Dragoslav S Mitrinovic, and M Vasic.
Means and their Inequalities, volume 31.
Springer Science & Business Media, 2013.
[6] Jacob Burbea and C Rao.
On the convexity of some divergence measures based on entropy functions.
IEEE Transactions on Information Theory, 28(3):489–495, 1982.
[7] Bruno De Finetti.
Sul concetto di media.
3:369Ű396, 1931.
Istituto italiano degli attuari.
[8] Otto Ludwig Holder.
Über einen Mittelwertssatz.
Nachr. Akad. Wiss. Gottingen Math.-Phys. Kl., pages 38–47, 1889.
28

References II
[9] Johan Ludwig William Valdemar Jensen.
Sur les fonctions convexes et les inégalités entre les valeurs moyennes.
Acta mathematica, 30(1):175–193, 1906.
[10] Andrey Nikolaevich Kolmogorov.
Sur la notion de moyenne.
12:388Ű391, 1930.
Acad. Naz. Lincei Mem. Cl. Sci. His. Mat. Natur. Sez.
[11] Gyula Maksa and Zsolt Páles.
Convexity with respect to families of means.
Aequationes mathematicae, 89(1):161–167, 2015.
[12] Janusz Matkowski.
On weighted extensions of Cauchy’s means.
Journal of mathematical analysis and applications, 319(1):215–227, 2006.
[13] Mitio Nagumo.
Über eine klasse der mittelwerte.
In Japanese journal of mathematics: transactions and abstracts, volume 7, pages 71–79. The
Mathematical Society of Japan, 1930.
[14] Constantin P. Niculescu and Lars-Erik Persson.
Convex functions and their applications: A contemporary approach.
Springer Science & Business Media, 2006.
[15] Frank Nielsen and Sylvain Boltz.
The Burbea-Rao and Bhattacharyya centroids.
IEEE Transactions on Information Theory, 57(8):5455–5466, 2011.
29

References III
[16] Frank Nielsen and Richard Nock.
Generalizing Jensen and Bregman divergences with comparative convexity and the statistical
Bhattacharyya distances with comparable means.
CoRR, abs/1702.04877, 2017.
[17] Frank Nielsen and Richard Nock.
Generalizing skew Jensen divergences and Bregman divergences with comparative convexity.
IEEE Signal Processing Letters, 24(8):1123–1127, Aug 2017.
[18] Richard Nock, Frank Nielsen, and Shun-ichi Amari.
On conformal divergences and their population minimizers.
IEEE Trans. Information Theory, 62(1):527–538, 2016.
[19] Jun Zhang.
Divergence function, duality, and convex analysis.
Neural Computation, 16(1):159–195, 2004.
30

Bregman divergences from comparative convexity

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Bregman divergences from comparative convexity

Ähnlich wie Bregman divergences from comparative convexity (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Bregman divergences from comparative convexity