Nonsmooth Optimization

Preliminaries

• Rn, n-dimensional real Euclidean space and x, y ∈ Rn

n
• Usual inner product (x, y) = xT y = [ xiyi]
i=1

1
• Euclidean norm x = (x, x) = (xT x) 2

• f : O → R is smooth (continuously diﬀerentiable), if the
gradient f : O → R is deﬁned and continuous on an open
T
∂f (x) ∂f (x) ∂f (x)
set O ⊆ Rn: f (x) = , ,...,
∂x1 ∂x2 ∂xn

2

Smooth Functions - Directional Derivative

• Directional derivatives f (x; u), f (x; −u) of f at x ∈ O,
in the direction of u ∈ Rn:
f (x + αu) − f (x)
f (x; u) := lim = ( f (x), u),
α→+0 α

• f (x; e1), f (x; e2), . . . , f (x; en), ei(i = 1, 2, . . . , n) unit vectors

• ( f (x), e1) = fx1 , ( f (x), e2) = fx2 and ( f (x), en) = fxn .

• Note that f (x; u) = −f (x; −u).

3

Smooth Functions - 1st order approximation

• A ﬁrst-order approximation of f near x ∈ O
by means of the Taylor series with remainder term:
f (x + δ) = f (x) + ( f (x), δ) + ox(δ) (x + δ ∈ O),

ox(αδ)
• lim = 0 where δ ∈ Rn is small enough.
α→0 α

• a smooth function can be locally replaced by a “simple” linear
approximation of it

4

Smooth Functions - Optimality Conditions

First-order necessary conditions for an extremum:

• For x∗ ∈ O to be a local minimizer of f on Rn, it is necessary
that f (x∗) = 0n,

• For x∗ ∈ O to be a local maximizer of f on Rn, it is necessary
that f (x∗) = 0n.

5

Smooth Functions - Descent/Ascent Directions

Directions of steepest descent and ascent if x is not a stationary
point,

• the unit steepest descent direction ud of the function f at a
f (x)
point x: ud(x) = − ,
f (x)

• the unit steepest ascent direction ua of the function f at a
f (x)
point x: ua(x) = .
f (x)

• One steepest descent direction, only one steepest ascent di-
rection and u0(x) = −u1(x)

6

Smooth Functions - Chain Rule

• Chain rule: Let f : Rn → R, g : Rn → R, h : Rn → Rn.

• If f ∈ C 1(O), g ∈ C 1(O) and f (x) = g(h(x)) then, T f (x) =
T g(h(x)) h(x)

∂hj (x)
• h(x) = is an n × n matrix.
∂xi i,j=1,2,...,n

7

Nonsmooth Optimization

• Deals with nondifferentiable functions

• The problem is to find a proper replacement for the concept
of gradient

• Different research groups work on nonsmooth function classes;
hence there are different theories to handle the different non-
smooth problems

• Tools replacing the gradient

8

Keywords of Nonsmooth Optimization

• Convex Functions, Lipschitz Continuous Functions

• Generalized directional derivatives, Generalized Derivatives

• Subgradient method, Bundle method, Discrete Gradient Al-
gorithm

• Asplund Spaces

9

Convex Functions

• O ⊆ Rn a nonempty convex set
if αx + (1 − α)y ∈ O for all x, y ∈ O, α ∈ [0, 1]

• f : O → R, R := [−∞, ∞] s.t.
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
for any x, y ∈ O, λ ∈ [0, 1].

10

Convex Functions

• Every local minimum is a global minimum

• ξ a subgradient of f at a nondifferentiable point x ∈ domf
if it satisfies the subgradient inequality, i.e.,

f (y) ≥ f (x) + (ξ, y − x).

• Set of subgradients of is called subdifferential, ∂f (x)
∂f (x) := {ξ ∈ Rn | f (y) ≥ f (x) + (ξ, y − x) ∀y ∈ Rn}.

11

Convex Functions

• The subgradients at a point can be characterized by direc-
tional derivative: f (x; u) = sup (ξ, u).
ξ∈∂f (x)

• x in the interior of domf , subdifferential ∂f (x) is compact
then the directional derivative is finite

• Subdifferential in relation with the directional derivative
∂f (x) = {ξ ∈ Rn | f (x; u) ≥ (ξ, u) ∀u ∈ Rn}.

12

Lipschitz Continuous Functions

• f : O → R is Lipschitz continuous for some constant K
if for all y, z in an open set O: |f (y) − f (z)| ≤ K y − z

• Differentiable almost everywhere

• Clarke subdifferential ∂C f (x) of Lipschitz continuous f at x
∂C f (x) = co{ξ ∈ Rn | ξ = lim f (xk ), xk → x, xk ∈ D}
k→∞
D is the set where the function is differentiable.

13

Lipschitz Continuous Functions

• Mean Value Theorem for Clarke subdiﬀerentials ξ
f (b) − f (a) = (ξ, b − a)

• Nonsmooth chain rule with respect to Clarke subdiﬀerential
m
∂C (g ◦ F )(x) ⊆ co ξiµi | ξ = (ξ1, ξ2, . . . , ξm) ∈ ∂C g(F (x))
i=1
µi ∈ ∂C fi(x) (i = 1, 2, . . . , m)

• F (·) = (f1(·), f2(·), . . . , fm(·)) a vector valued function,
g : Rm → R, g ◦ F : Rn → R are Lipschitz continuous

14

Regular Functions

• Locally Lipschitz functions have directional derivative
fC (x; u) = f (x; u)

• Ex: Semismooth functions: f : Rn → R at x ∈ Rn is locally
Lipschitz for every u ∈ Rn the following limit exists:
lim (ξ, u)
ξ∈∂f (x+αu)
v→u
α→+0

15

Max- and Min-type Functions

• f (x) = max {f1(x), f2(x), . . . , fm(x)}, fi : Rn → R (i = 1, 2, . . . , m)

 
 
• ∂C f (x) ⊆ co ∂C fi(x) ,
i∈J(x)
 
where J(x) := {i = 1, 2, . . . , m | f (x) = fi(x)}

• Ex: f (x) = max {f1(x), f2(x)}

16

Quasidifferentiable Functions

• f : Rn → R is quasidifferentiable
if f (x; u) exist finitely ∀x in the direction u and
¯
there exists [∂f (x), ∂ f (x)]

• f (x; u) = max (ξ, u) + min (φ, u)
ξ∈∂f (x) ¯
φ∈∂ f (x)

¯
• [∂f (x), ∂ f (x)] is the quasidifferential, ∂f (x) subdifferential,
∂f (x) superdifferential

17

Directional Derivatives

f : O → R, O ⊂ Rn, x ∈ O in the direction u ∈ Rn

• Dini Directional Derivative

• Hadamard Directional Derivative

• Clarke Directional Derivative

• Michel-Penot Directional Derivative

18

Dini Directional Derivative

• upper Dini directionally differentiable
fD (x; u) := lim sup f (x+αu)−f (x)
α
α→+0

• lower Dini directionally differentiable
fD (x; −u) := lim inf f (x+αu)−f (x)
α
α→+0

• Dini subdifferentiable fD (x; u) = fD (x; −u)

19

Hadamard Directional Derivative

• upper Hadamard directionally differentiable
fH (x; u) := lim sup f (x+αv)−f (x)
α
α→+0v→u

• lower Hadamard directionally differentiable
fH (x; −u) := lim inf f (x+αv)−f (x)
α
α→+0v→u

• Hadamard Subdifferentiable fH (x; u) = fH (x; −u)

20

Clarke Directional Derivative

• upper Clarke directionally differentiable
fC (x; u) := lim sup f (x+αu)−f (y)
α
y→xα→+0

• lower Clarke directionally differentiable
fC (x; −u) := lim inf f (x+αu)−f (y)
α
y→xα→+0

• Clarke Subdifferentiable fC (x; u) = fC (x; −u)

21

Michel-Penot Directional Derivative

• upper Michel-Penot directionally differentiable
1
fM P (x; u) := sup {lim sup α [f (x + α(u + v)) − f (x + αv)]}
v∈Rn α→0

• lower Michel-Penot directionally differentiable
1
fM P (x; −u) := inf {lim inf α [f (x + α(u + v)) − f (x + αv)]}
v∈Rn α→0

• Michel-Penot Subdifferentiable fM P (x; u) = fM P (x; −u)

22

Subdiﬀerentials and Optimality Conditions

• f (x; u) = max (ξ, u) ∀u ∈ Rn
ξ∈∂f (x)

• For a point x∗ to be a minimizer,
it is necessary that 0n ∈ ∂f (x)

• A point x∗ satisfying 0n ∈ ∂f (x) is called stationary point

23

Nonsmooth Optimization Methods

• Subgradient Algorithm (and -Subgradient Methods)

• Bundle Methods

• Discrete Gradients

24

Descent Methods

• min f (x) subject to x ∈ Rn

• Objective is to ﬁnd dk f (xk + dk ) < f (xk ),

• min f (xk + d) − f (xk ) subject to d ∈ Rn.

• f (x) twice continuously diﬀerentiable, expanding f (xk + d)
f (xk + d) − f (xk ) = f (xk , d) + d (d)
(d) → 0 as d → 0

25

Descent Methods

• We know f (xk , d) = f (xk )T d

• min f (xk )T d
d∈Rn
subject to d ≤ 1.

• Search direction in descent is obtained
− f (xk )
f (x )
k

• To ﬁnd xk+1, a line search performed along dk
to obtain t from which next point xk + tdk is computed

26

Subgradient Algorithm

• Developed for minimizing convex functions

• min f (x) subject to x ∈ Rn

• x0 given, generates a sequence {xk }∞ according to
k=0
x k+1 = xk − α v k , v k ∈ ∂f (xk )
k

• Simple generalization of a descent method with line search

• Opposite direction of subgradient is not descent
line search cannot be used

27

Subgradient Algorithm

• Does not converge to a stationary point

• Special rules for computation of a step size

• Theorem by Shor N.Z.:
S ∗ set of minimum points of f , {xk } using step αk := α
vk
for any and any x∗ ∈ S ∗, one can ﬁnd a k = ¯ k
f (¯) = f (x¯) and x − x∗ < α(1+ )
x k ¯ 2

28

Bundle Method

• At current iterate xk , we have trial points
y j ∈ Rn (j ∈ Jk ⊂ {1, 2, . . . , k})

• Idea: underestimate f by using a piecewise-linear functions

• Subdiﬀerential of f at x:
∂f (x) = {v j ∈ Rn | (v, z − x) ≤ f (z) − f (x) ∀z ∈ Rn}

• fk (x) = max {f (y j ) + (v j , x − y j )}
ˆ
j∈Jk

• fk (x) ≤ f (x) ∀x ∈ Rn and fk (y j ) = f (y j ) j ∈ Jk
ˆ ˆ

29

Bundle Method

• Serious Step: xk+1 := y k+1 := xk + tdk , t > 0
in case a suﬃcient decrease achieved at xk+1,

• Null Step: xk+1 := xk , in case no suﬃcient decrease achieved,
gradient information is enriched by new subgradient
vk+1 ∈ ∂f (yk+1) in the bundle.

30

Bundle Method

• Standart concepts: serious step and null step

• The convergence problem is avoided by making sure that
they are descent methods.

• Descent direction is found by solving a QP involving the
cutting plane approximation of the function over a bunddle
of subgradients.

• Utilize the information from the previous iterations by storing
the subgradient information into a bundle.

31

Asplund Spaces

• Nonsmooth referred to functions, spaces can also be referred

• Banach spaces: complete normed vector spaces

• Frechet derivative, Gateaux derivative

• f is Frechet diﬀerentiable on an open set U ⊂ V ,
if its Gateaux derivative linear, bounded at each point of U
and the Gateaux derivative is a continuous map U → L(V, W ).

• Asplund Spaces: a Banach space, every convex continuous
function is generically Frechet diﬀerentiable

32

Referanslar

Clarke, F.H., 1983. Optimization and Nonsmooth Analysis,
Wiley-Interscience, New York.

Demyanov, V.F., 2002. The Rise of Nonsmooth Analysis: Its
Main Tools, Cybernetics and Systems Analysis, 38(4), 2002.

Jongen, H. Th., Pallaschke, D., 1988. On linearization and
continuous selections of functions, Optimization 19(3), 343-353.

Rockafellar, R.T., 1972. Convex Analysis, Princeton University
Press, New Jersey.

Schittkowski K., 1992. Solving nonlinear programming problems
with very many constraints, Optimization, 25, 179-196.
33

Weber, G.-W., 1993. Minimization of a max-type function:
Characterization of structural stability, in: Parametric Optimiza-
tion and Related Topics III, J. Guddat, J., H. Th. Jongen, and
B. Kummer, and F. Nozicka, eds., Peter Lang publishing house,
Frankfurt a.M., Bern, New York, pp. 519538.

Nonsmooth Optimization

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Nonsmooth Optimization

Ähnlich wie Nonsmooth Optimization (20)

Mehr von SSA KPI

Mehr von SSA KPI (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Nonsmooth Optimization