This document provides an overview of concepts in nonsmooth optimization, including:
1) It defines key concepts like directional derivatives, subgradients, and subdifferentials that generalize the gradient for nondifferentiable functions.
2) It discusses classes of nonsmooth functions like convex, Lipschitz continuous, and regular functions and how their properties relate to generalized derivatives.
3) It introduces optimization methods for nonsmooth problems like subgradient methods, bundle methods, and discrete gradients that replace the gradient descent approach for nondifferentiable objectives.
2. Preliminaries
• Rn, n-dimensional real Euclidean space and x, y ∈ Rn
n
• Usual inner product (x, y) = xT y = [ xiyi]
i=1
1
• Euclidean norm x = (x, x) = (xT x) 2
• f : O → R is smooth (continuously differentiable), if the
gradient f : O → R is defined and continuous on an open
T
∂f (x) ∂f (x) ∂f (x)
set O ⊆ Rn: f (x) = , ,...,
∂x1 ∂x2 ∂xn
2
3. Smooth Functions - Directional Derivative
• Directional derivatives f (x; u), f (x; −u) of f at x ∈ O,
in the direction of u ∈ Rn:
f (x + αu) − f (x)
f (x; u) := lim = ( f (x), u),
α→+0 α
• f (x; e1), f (x; e2), . . . , f (x; en), ei(i = 1, 2, . . . , n) unit vectors
• ( f (x), e1) = fx1 , ( f (x), e2) = fx2 and ( f (x), en) = fxn .
• Note that f (x; u) = −f (x; −u).
3
4. Smooth Functions - 1st order approximation
• A first-order approximation of f near x ∈ O
by means of the Taylor series with remainder term:
f (x + δ) = f (x) + ( f (x), δ) + ox(δ) (x + δ ∈ O),
ox(αδ)
• lim = 0 where δ ∈ Rn is small enough.
α→0 α
• a smooth function can be locally replaced by a “simple” linear
approximation of it
4
5. Smooth Functions - Optimality Conditions
First-order necessary conditions for an extremum:
• For x∗ ∈ O to be a local minimizer of f on Rn, it is necessary
that f (x∗) = 0n,
• For x∗ ∈ O to be a local maximizer of f on Rn, it is necessary
that f (x∗) = 0n.
5
6. Smooth Functions - Descent/Ascent Directions
Directions of steepest descent and ascent if x is not a stationary
point,
• the unit steepest descent direction ud of the function f at a
f (x)
point x: ud(x) = − ,
f (x)
• the unit steepest ascent direction ua of the function f at a
f (x)
point x: ua(x) = .
f (x)
• One steepest descent direction, only one steepest ascent di-
rection and u0(x) = −u1(x)
6
7. Smooth Functions - Chain Rule
• Chain rule: Let f : Rn → R, g : Rn → R, h : Rn → Rn.
• If f ∈ C 1(O), g ∈ C 1(O) and f (x) = g(h(x)) then, T f (x) =
T g(h(x)) h(x)
∂hj (x)
• h(x) = is an n × n matrix.
∂xi i,j=1,2,...,n
7
8. Nonsmooth Optimization
• Deals with nondifferentiable functions
• The problem is to find a proper replacement for the concept
of gradient
• Different research groups work on nonsmooth function classes;
hence there are different theories to handle the different non-
smooth problems
• Tools replacing the gradient
8
10. Convex Functions
• O ⊆ Rn a nonempty convex set
if αx + (1 − α)y ∈ O for all x, y ∈ O, α ∈ [0, 1]
• f : O → R, R := [−∞, ∞] s.t.
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
for any x, y ∈ O, λ ∈ [0, 1].
10
11. Convex Functions
• Every local minimum is a global minimum
• ξ a subgradient of f at a nondifferentiable point x ∈ domf
if it satisfies the subgradient inequality, i.e.,
f (y) ≥ f (x) + (ξ, y − x).
• Set of subgradients of is called subdifferential, ∂f (x)
∂f (x) := {ξ ∈ Rn | f (y) ≥ f (x) + (ξ, y − x) ∀y ∈ Rn}.
11
12. Convex Functions
• The subgradients at a point can be characterized by direc-
tional derivative: f (x; u) = sup (ξ, u).
ξ∈∂f (x)
• x in the interior of domf , subdifferential ∂f (x) is compact
then the directional derivative is finite
• Subdifferential in relation with the directional derivative
∂f (x) = {ξ ∈ Rn | f (x; u) ≥ (ξ, u) ∀u ∈ Rn}.
12
13. Lipschitz Continuous Functions
• f : O → R is Lipschitz continuous for some constant K
if for all y, z in an open set O: |f (y) − f (z)| ≤ K y − z
• Differentiable almost everywhere
• Clarke subdifferential ∂C f (x) of Lipschitz continuous f at x
∂C f (x) = co{ξ ∈ Rn | ξ = lim f (xk ), xk → x, xk ∈ D}
k→∞
D is the set where the function is differentiable.
13
14. Lipschitz Continuous Functions
• Mean Value Theorem for Clarke subdifferentials ξ
f (b) − f (a) = (ξ, b − a)
• Nonsmooth chain rule with respect to Clarke subdifferential
m
∂C (g ◦ F )(x) ⊆ co ξiµi | ξ = (ξ1, ξ2, . . . , ξm) ∈ ∂C g(F (x))
i=1
µi ∈ ∂C fi(x) (i = 1, 2, . . . , m)
• F (·) = (f1(·), f2(·), . . . , fm(·)) a vector valued function,
g : Rm → R, g ◦ F : Rn → R are Lipschitz continuous
14
15. Regular Functions
• Locally Lipschitz functions have directional derivative
fC (x; u) = f (x; u)
• Ex: Semismooth functions: f : Rn → R at x ∈ Rn is locally
Lipschitz for every u ∈ Rn the following limit exists:
lim (ξ, u)
ξ∈∂f (x+αu)
v→u
α→+0
15
16. Max- and Min-type Functions
• f (x) = max {f1(x), f2(x), . . . , fm(x)}, fi : Rn → R (i = 1, 2, . . . , m)
• ∂C f (x) ⊆ co ∂C fi(x) ,
i∈J(x)
where J(x) := {i = 1, 2, . . . , m | f (x) = fi(x)}
• Ex: f (x) = max {f1(x), f2(x)}
16
17. Quasidifferentiable Functions
• f : Rn → R is quasidifferentiable
if f (x; u) exist finitely ∀x in the direction u and
¯
there exists [∂f (x), ∂ f (x)]
• f (x; u) = max (ξ, u) + min (φ, u)
ξ∈∂f (x) ¯
φ∈∂ f (x)
¯
• [∂f (x), ∂ f (x)] is the quasidifferential, ∂f (x) subdifferential,
∂f (x) superdifferential
17
18. Directional Derivatives
f : O → R, O ⊂ Rn, x ∈ O in the direction u ∈ Rn
• Dini Directional Derivative
• Hadamard Directional Derivative
• Clarke Directional Derivative
• Michel-Penot Directional Derivative
18
19. Dini Directional Derivative
• upper Dini directionally differentiable
fD (x; u) := lim sup f (x+αu)−f (x)
α
α→+0
• lower Dini directionally differentiable
fD (x; −u) := lim inf f (x+αu)−f (x)
α
α→+0
• Dini subdifferentiable fD (x; u) = fD (x; −u)
19
21. Clarke Directional Derivative
• upper Clarke directionally differentiable
fC (x; u) := lim sup f (x+αu)−f (y)
α
y→xα→+0
• lower Clarke directionally differentiable
fC (x; −u) := lim inf f (x+αu)−f (y)
α
y→xα→+0
• Clarke Subdifferentiable fC (x; u) = fC (x; −u)
21
22. Michel-Penot Directional Derivative
• upper Michel-Penot directionally differentiable
1
fM P (x; u) := sup {lim sup α [f (x + α(u + v)) − f (x + αv)]}
v∈Rn α→0
• lower Michel-Penot directionally differentiable
1
fM P (x; −u) := inf {lim inf α [f (x + α(u + v)) − f (x + αv)]}
v∈Rn α→0
• Michel-Penot Subdifferentiable fM P (x; u) = fM P (x; −u)
22
23. Subdifferentials and Optimality Conditions
• f (x; u) = max (ξ, u) ∀u ∈ Rn
ξ∈∂f (x)
• For a point x∗ to be a minimizer,
it is necessary that 0n ∈ ∂f (x)
• A point x∗ satisfying 0n ∈ ∂f (x) is called stationary point
23
25. Descent Methods
• min f (x) subject to x ∈ Rn
• Objective is to find dk f (xk + dk ) < f (xk ),
• min f (xk + d) − f (xk ) subject to d ∈ Rn.
• f (x) twice continuously differentiable, expanding f (xk + d)
f (xk + d) − f (xk ) = f (xk , d) + d (d)
(d) → 0 as d → 0
25
26. Descent Methods
• We know f (xk , d) = f (xk )T d
• min f (xk )T d
d∈Rn
subject to d ≤ 1.
• Search direction in descent is obtained
− f (xk )
f (x )
k
• To find xk+1, a line search performed along dk
to obtain t from which next point xk + tdk is computed
26
27. Subgradient Algorithm
• Developed for minimizing convex functions
• min f (x) subject to x ∈ Rn
• x0 given, generates a sequence {xk }∞ according to
k=0
x k+1 = xk − α v k , v k ∈ ∂f (xk )
k
• Simple generalization of a descent method with line search
• Opposite direction of subgradient is not descent
line search cannot be used
27
28. Subgradient Algorithm
• Does not converge to a stationary point
• Special rules for computation of a step size
• Theorem by Shor N.Z.:
S ∗ set of minimum points of f , {xk } using step αk := α
vk
for any and any x∗ ∈ S ∗, one can find a k = ¯ k
f (¯) = f (x¯) and x − x∗ < α(1+ )
x k ¯ 2
28
29. Bundle Method
• At current iterate xk , we have trial points
y j ∈ Rn (j ∈ Jk ⊂ {1, 2, . . . , k})
• Idea: underestimate f by using a piecewise-linear functions
• Subdifferential of f at x:
∂f (x) = {v j ∈ Rn | (v, z − x) ≤ f (z) − f (x) ∀z ∈ Rn}
• fk (x) = max {f (y j ) + (v j , x − y j )}
ˆ
j∈Jk
• fk (x) ≤ f (x) ∀x ∈ Rn and fk (y j ) = f (y j ) j ∈ Jk
ˆ ˆ
29
30. Bundle Method
• Serious Step: xk+1 := y k+1 := xk + tdk , t > 0
in case a sufficient decrease achieved at xk+1,
• Null Step: xk+1 := xk , in case no sufficient decrease achieved,
gradient information is enriched by new subgradient
vk+1 ∈ ∂f (yk+1) in the bundle.
30
31. Bundle Method
• Standart concepts: serious step and null step
• The convergence problem is avoided by making sure that
they are descent methods.
• Descent direction is found by solving a QP involving the
cutting plane approximation of the function over a bunddle
of subgradients.
• Utilize the information from the previous iterations by storing
the subgradient information into a bundle.
31
32. Asplund Spaces
• Nonsmooth referred to functions, spaces can also be referred
• Banach spaces: complete normed vector spaces
• Frechet derivative, Gateaux derivative
• f is Frechet differentiable on an open set U ⊂ V ,
if its Gateaux derivative linear, bounded at each point of U
and the Gateaux derivative is a continuous map U → L(V, W ).
• Asplund Spaces: a Banach space, every convex continuous
function is generically Frechet differentiable
32
33. Referanslar
Clarke, F.H., 1983. Optimization and Nonsmooth Analysis,
Wiley-Interscience, New York.
Demyanov, V.F., 2002. The Rise of Nonsmooth Analysis: Its
Main Tools, Cybernetics and Systems Analysis, 38(4), 2002.
Jongen, H. Th., Pallaschke, D., 1988. On linearization and
continuous selections of functions, Optimization 19(3), 343-353.
Rockafellar, R.T., 1972. Convex Analysis, Princeton University
Press, New Jersey.
Schittkowski K., 1992. Solving nonlinear programming problems
with very many constraints, Optimization, 25, 179-196.
33
34. Weber, G.-W., 1993. Minimization of a max-type function:
Characterization of structural stability, in: Parametric Optimiza-
tion and Related Topics III, J. Guddat, J., H. Th. Jongen, and
B. Kummer, and F. Nozicka, eds., Peter Lang publishing house,
Frankfurt a.M., Bern, New York, pp. 519538.