1. Distributed Subgradient Methods
for Saddle-Point Problems
David Mateos-N´u˜nez Jorge Cort´es
University of California, San Diego
{dmateosn,cortes}@ucsd.edu
Conference on Decision and Control
Osaka, Japan, December 17, 2015
1 / 21
3. General agenda for today
Review of (consensus based) distributed convex optimization
Part 1:
Distributed optimization with separable constraints
via agreement on the Lagrange multipliers
General saddle-point problems with explicit agreement
Part 2:
Convex-concave problems not arising from Lagrangians
e.g., strict concave part
Distributed low-rank matrix completion
through a sadlle-point characterization of the nuclear norm
3 / 21
4. Review: consensus based distributed convex optimization
x∗
∈ arg min
x∈Rd
N
i=1
f i
(x) (basic unconstrained problem)
Agent i has access to f i
Agent i can share its estimate of x∗ with “neighboring” agents
f 1
f 2
f 3
f 4
f 5
A =
a13 a14
a21 a25
a31 a32
a41
a52
Adjacency matrix
Parallel computations: Tsitsiklis 84, Bertsekas and Tsitsiklis 95
Consensus: Jadbabaie et al. 03, Olfati-Saber, Murray 04, Boyd et al. 05
Distributed multi-agent optimization: A. Nedi´c and A. Ozdaglar 07 4 / 21
5. Review: the Laplacian matrix
L = diag(A1) − A =
a13 + a14 −a13 −a14
−a21 a21 + a25 −a25
−a31 −a32 a31 + a32
−a41 a41
−a52 a52
Nullspace is agreement ⇔ graph has spanning tree
Consensus via feedback on disagreement −[Lx]i = N
j=1 aij (xj − xi )
5 / 21
6. Part 1: Distributed constrained convex optimization
min
wi ∈Wi , ∀i
D∈D
N
i=1
f i
(wi
, D)
s.t. g1
(w1
, D) + · · · + gN
(wN
, D) ≤ 0
Constraints might couple decisions of agents that cannot communicate directly
e.g., wi 2
2 − 10 ≤ 0
Agent i only knows how wi enters the constraint through gi
D is the usual decision vector the agents need to agree upon
Constraints are useful models for
Traffic and routing (Flow conservation)
Resource allocation (Budgets)
Optimal control (System evolution)
Network formation (Relative positions/angles)
6 / 21
7. Agenda for distributed constrained optimization
Previous work and limitations
Distributing the constraints through the Lagrangian decomposition
Idea: Agreement on the multiplier
General saddle-point problems with agreement constraints
Our distributed saddle-point dynamics with Laplacian averaging
Theorem of convergence: ∼ 1√
# iter.
saddle-point evaluation error
7 / 21
8. Previous work by type of constraint & info. structure
N
i=1
f i
(x)
s.t. g(x) ≤ 0
All agents know g
2011 D. Yuan, S. Xu, and H. Zhao
2012 M. Zhu and S. Mart´ınez
Increasing literature
min
wi ∈Wi
N
i=1
f i
(wi
)
s.t.
N
i=1
gi
(wi
) ≤ 0
Agent i knows only gi
when Aw ≤ 0, knows only column i
versus column i & row i of A
Less studied information structure:
’10 D. Mosk-Aoyama,
T. Roughgarden and D. Shah
(only linear constraints)
’13 M. B¨urger, G. Notarstefano,
and F. Allgwer
Dual cutting-plane consensus methods
’13 T.-H. Chang, A. Nedi´c,
and A. Scaglione
Primal-dual perturbation methods
8 / 21
9. Distributing the constraint via agreement on multipliers
min
wi ∈Wi
D∈D
N
i=1
f i
(wi
, D)
s.t. g1
(w1
, D) + · · · + gN
(wN
, D) ≤ 0
same as
min
wi ∈Wi
D∈D
max
z∈Rm
≥0
N
i=1
f i
(wi
, D) + z
N
i=1
gi
(wi
, D)
= min
wi ∈Wi
D∈D
max
zi ∈Rm
≥0
zi =zj ∀i,j
N
i=1
f i
(wi
, D) + zi
gi
(wi
, D)
= min
wi ∈Wi
Di ∈D
Di =Dj ∀i,j
max
zi ∈Rm
≥0
zi =zj ∀i,j
N
i=1
f i
(wi
, D
i
) + zi
gi
(wi
, D
i
)
Local coupled through agreement
(Existence of saddle-points ⇒ Max-min property = Strong duality )9 / 21
10. Saddle-point problems with explicit agreement
A more general framework
min
w∈W
(D1,...,DN )∈DN
Di =Dj , ∀i,j
max
µ∈M
(z1,...,zN )∈ZN
zi =zj , ∀i,j
φ w, (D
1
, . . . , D
N
)
convex
, µ, (z1
, . . . , zN
)
concave
Distributed setting unstudied in the literature
Inspiration from A. Nedi´c and A. Ozdaglar, 09 and K. Arrow, et al. 1958
Particularizes to...
Convex-concave functions arising from Lagrangians
The concave part is linear
Min-max formulation of nuclear norm regularization (later in talk)
The concave part is quadratic
10 / 21
12. Theorem (Distributed saddle-point approximation)
Assume that
φ(w, D, µ, z) is convex in (w, D) ∈ W × DN and concave in
(µ, z) ∈ M × ZN
The dynamics is bounded (maybe achieved through projections)
The sequence of weight-balanced communication digraphs is
δ-nondegenerate (aij > δ whenever aij > 0)
B-jointly-connected (unions of length B are strongly connected)
For a suitable choice of consensus stepsize σ and (decreasing) subgradient
stepsizes {ηt}, then, for any saddle point (w∗
, D
∗
, µ∗
, z∗
)
D∗=D∗⊗1,z∗=z∗⊗1
of φ ,
−
α
√
t − 1
≤ φ(wav
t , D
av
t , zav
t , µav
t ) − φ(w∗
, D
∗
, z∗
, µ∗
) ≤
α
√
t − 1
wav
t+1 := 1
t+1
t+1
s=1
ws = t
t+1wav
t + 1
t+1wt+1
can be computed recursively
12 / 21
13. Part 2: Beyond Lagrangians
Lagrangians are particular cases of convex-concave functions
– the concave part (Lagrange multipliers) is always linear
Other min-max problems can benefit from distributed formulations
– e.g., min-max formulations of the nuclear norm
Agenda for distributed optimization with nuclear norm
regularization
Definition of nuclear norm
Application to low-rank matrix completion
Our dynamics for distributed optimization with nuclear norm
Theorem of convergence (corollary from previous result)
13 / 21
14. Review: definition of nuclear norm
Given a matrix W =
| |
w1 · · · wN
| |
∈ Rd×N
W 2
∗ := sum of singular values of W
= trace
√
WW = trace
N
i=1
wi wi
Optimization with nuclear norm regularization
min
wi ∈Wi
N
i=1
f i
(wi ) + γ W 2
∗
favors vectors {wi }N
i=1 belonging to a low-dimensional subspace
14 / 21
15. Distributed low-rank matrix completion
Tara Philip Mauricio Miroslav
Toy Story
Jurasic Park
· · · · · · · · · · · · · · ·
W = [ W:,1 W:,2 W:,3 W:,4 ]
Estimate W from the revealed entries {Zij }
min
W ∈Rd×N
(i,j)∈revealed
Wij − Zij
2
2 + γ W ∗
Nuclear norm
γ depends on application, dimensions... (Regularization, not penalty)
Netflix: users N ∼ 107 movies d ∼ 105
Why making it distributed? Because users may not want to share
their ratings
15 / 21
16. Formulation of nuclear norm as saddle-point problem
Drawing from another paper of the authors (ignore details)
min
wi ∈Wi
N
i=1
f i
(wi ) + γ [ W |
√
Id ] ∗
= min
wi ∈Wi , Di ∈{D cId }
Di = Dj ∀i, j
sup
xi ∈Rd
Yi ∈Rd×d
N
i=1
Fi (wi , Di
convex
, xi , Yi
concave
)
with convex-concave local functions
Fi (w, D, x, Y )
Rd ×{D cId }×Rd ×Rd×d
:= fi (w)
convex
+ γ trace D(−xx − N YY
quadratic concave part because D 0
)
−2γw x − 2γ
N
trace(Y ) +
1
N
trace(D)
linear in each variable
See Distributed optimization for multi-task learning via nuclear-norm
approximation, NecSys15, D. Mateos-N´u˜nez, J. Cort´es 16 / 21
17. Distributed saddle-point dynamics for nuclear optimization
wi (k + 1) = PW wi (k) − ηk gi (k) − 2γxi (k)
Di (k + 1) = P{D cId } Di (k) − ηkγ − xi xi − N Yi Yi + 1
N Id
+ σ
N
j=1
aij,t(Dj (k) − Di (k))
“Only” communication, size d × d
xi (k + 1) = xi (k) + ηkγ − 2Di (k)xi (k) − 2wi (k)
Yi (k + 1) = Yi (k) + ηkγ −
2
N
Di (k)Yi (k) −
2
N
Id
Convergence is a corollary from previous theorem
User i does not need to share wi with its neighbors!!
Di → N
i=1 wi wi + Id conveys only mixed information
Complexity per iteration: orthogonal projection onto {D cId }
17 / 21
18. Simulation of matrix completion
20 users × 8 movies. Each user rates 5 movies. Ratings are private
10
0
10
1
10
2
10
3
10
4
0
0.5
1
1.5
2
Distributed saddle-point dynamics
Centralized subgradient descent
10
0
10
1
10
2
10
3
10
4
10
3
10
4
0 5000 10000 15000
0
1
2
3
4
5
W (k)−Z F
Z F
N
i=1 j∈Υi
(Wij (k) − Zij )2 + γ W (k)|
√
Id ∗
( N
i=1 Di (k) − 1
N
N
i=1 Di (k) 2
F )1/2
Matrix
fitting error
Network
cost function
Disagreement
local matrices
18 / 21
19. Conclusions
More details in
arXiv “Distributed saddle-point subgradient algorithms with
Laplacian averaging,” submitted to Transactions on Automatic Control
Our algorithms particularize to deal with
Saddle-points of Lagrangians for distributed constrained optimization
Less studied type of constraints/information structure in the literature
Constraints couple decisions of agents that can’t communicate directly
Min-max distributed formulations of nuclear norm
“Distributed optimization for multi-task learning via
nuclear-norm approximation”, D. Mateos-N´u˜nez, J. Cort´es
First multi-agent treatment of nuclear norm regularization
19 / 21
20. Future directions
Bounds on Lagrange multipliers in a distributed way
Necessary to guarantee boundedness of the dynamics’ trajectories
One such procedure in arXiv version
Application to semidefinite constraints with chordal sparsity
agents update the entries corresponding to maximal cliques subject to
agreement on the intersections
Other applications that you can find...
IEEE Spectrum. Japan project of Orbital Solar Farm
20 / 21
22. (Back slide) Outline of the proof
Inequality techniques from A. Nedi´c and A. Ozdaglar, 2009
Saddle-point evaluation error
tφ(wav
t+1, D
av
t+1, µav
t+1, zav
t+1) − tφ(w∗
, D
∗
, µ∗
, z∗
) (1)
at running-time averages, wav
t+1 := 1
t
t
s=1 ws, etc.
Bound for (1) in terms of
initial conditions
bound on subgradients and states of the dynamics
disagreement
sum of learning rates
Input-to-state stability with respect to agreement
LKDt 2 ≤ CI D1 2 1 −
˜δ
4N2
t−1
B
+ CU max
1≤s≤t−1
ds 2
subgradients as disturbances
Doubling Trick scheme: for m = 0, 1, 2, . . . , log2 t , take
ηs = 1√
2m
in each period of 2m rounds s = 2m, . . . , 2m+1 − 1
1 / 1