Control Techniques for Complex Systems

Control Techniques for Complex Systems
Department of Electrical & Computer Engineering
University of Florida

Sean P. Meyn

Coordinated Science Laboratory
and the Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign, USA

April 21, 2011

1 / 26

Outline

Control Techniques Markov Chains
FOR and
Complex Networks Stochastic Stability
P n (x, · ) − π f →0

sup Ex [SτC (f )] < ∞
C
π(f ) < ∞
1 Control Techniques ∆V (x) ≤ −f (x) + bIC (x)

Sean Meyn S. P. Meyn and R. L. Tweedie

2 Complex Networks

3 Architectures for Adaptation & Learning

4 Next Steps

2 / 26

Control Techniques

System model
d
α = µ σ −Cα + . . .
dt
d
q = 1 µ I −1 (C − . . .
2
dt
d
θ=q
dt

???

Control Techniques?
3 / 26

Control Techniques

Typical steps to control design

Obtain simple model that captures System model
essential structure d
dt
α = µ σ −Cα + . . .
d
– An equilibrium model if the goal is regulation dt
q = 1 µ I −1 (C − . . .
2
d
θ=q
dt

???

4 / 26

Control Techniques


dt
α = µ σ −Cα + . . .
d
q = 1 µ I −1 (C − . . .
2
d
θ=q
dt

???

Obtain feedback design, using dynamic programming, LQG, loop shaping, ...
Design for performance and reliability
Test via simulations and experiments, and reﬁne design

4 / 26

Control Techniques


dt
α = µ σ −Cα + . . .
d
q = 1 µ I −1 (C − . . .
2
d
θ=q
dt

???


If these steps fail, we may have to re-engineer the
system (e.g., introduce new sensors), and start over.

4 / 26

Control Techniques


dt
α = µ σ −Cα + . . .
d
q = 1 µ I −1 (C − . . .
2
d
θ=q
dt

???


If these steps fail, we may have to re-engineer the
system (e.g., introduce new sensors), and start over.
This point of view is unique to control

4 / 26

Control Techniques

Typical steps to scheduling

Inventory model: Controlled work-release, controlled routing,
uncertain demand

A simpliﬁed model of a semiconductor
manufacturing facility
Similar demand-driven models can be used
demand 1
to model allocation of locational reserves
in a power grid
demand 2

5 / 26

Control Techniques


uncertain demand

manufacturing facility
Similar demand-driven models can be used
demand 1
in a power grid
demand 2

Obtain simple model –
Frequently based on simple statistics to obtain a Markov model
Obtain feedback design based on heuristics, or dynamic programming
Performance evaluation via computation
(e.g., Neuts’ matrix-geometric methods)

5 / 26

Control Techniques

uncertain demand
manufacturing facility.
Similar demand-driven models can be used demand 1

in a power grid demand 2

Diﬃculty : A Markov model is not simple enough!
Frequently based on exponential statistics to obtain a Markov model
Performance evaluation via computation (e.g., Neut’s matrix-geometric methods)

With the 16 buﬀers truncated to 0 ≤ x ≤ 10,

6 / 26

Control Techniques

uncertain demand
manufacturing facility.
Similar demand-driven models can be used demand 1

in a power grid demand 2

Diﬃculty : A Markov model is not simple enough!
Frequently based on exponential statistics to obtain a Markov model
Performance evaluation via computation (e.g., Neut’s matrix-geometric methods)

With the 16 buﬀers truncated to 0 ≤ x ≤ 10,
policy synthesis reduces to a linear program of dimension 1116 !

6 / 26

Control Techniques

Control-theoretic approach to scheduling d
dt q = Bu + α

uncertain demand
q: Queue length evolves on R16 .
+

u: Scheduling/routing decisions —
demand 1
Convex relaxation

demand 2
α: Mean exogenous arrivals of work
B: Captures network topology

7 / 26

Control Techniques

dt q = Bu + α

uncertain demand
+

demand 1
Convex relaxation

demand 2

Control-theoretic approach to scheduling:
Dimension reduced from a linear program of dimension 1116 ...
to an HJB equation of dimension 16

7 / 26

Control Techniques

dt q = Bu + α

uncertain demand
+

demand 1
Convex relaxation

demand 2

Control-theoretic approach to scheduling:
Dimension reduced from a linear program of dimension 1116 ...
to an HJB equation of dimension 16
Does this solve the problem?

7 / 26

Complex Networks

Uncongested
Congested
Highly Congested

Complex Networks

8 / 26

Complex Networks

Uncongested
Congested
Highly Congested

Complex Networks
First, a review of some control theory...
8 / 26

Complex Networks

Dynamic Programming Equations
Deterministic model x = f (x, u)
˙

9 / 26

Complex Networks

˙

Controlled generator
d
Du h (x) = dt h(x(t)) t=0
x(0)=x
u(0)=u

9 / 26

Complex Networks

˙

d
Du h (x) = dt h(x(t)) t=0 = f (x, u) · h (x)
x(0)=x
u(0)=u

9 / 26

Complex Networks

˙

d
Du h (x) = dt h(x(t)) t=0 = f (x, u) · h (x)
x(0)=x
u(0)=u

Minimal total cost:
∞
J ∗ (x) = inf c(x(t), u(t)) dt , x(0) = x
U 0
HJB Equation:
min c(x, u) + Du J ∗ (x) = 0
u

9 / 26

Complex Networks

Diﬀusion model dX = f (X, U )dt + σ(X)dN

d
Du h (x) = E[h(X(t))] t=0
dt x(0)=x
u(0)=u

2
= f (x, u) · h (x) + 1 trace σ(x)T
2 h (x)σ(x)

10 / 26

Complex Networks


d
Du h (x) = E[h(X(t))] t=0
dt x(0)=x
u(0)=u

2
= f (x, u) · h (x) + 1 trace σ(x)T
2 h (x)σ(x)

Minimal average cost:
T
1
η ∗ = inf lim c(X(t), U (t)) dt
U T →∞ T 0

10 / 26

Complex Networks


d
Du h (x) = E[h(X(t))] t=0
dt x(0)=x
u(0)=u

2
= f (x, u) · h (x) + 1 trace σ(x)T
2 h (x)σ(x)

T
1
η ∗ = inf lim c(X(t), U (t)) dt
U T →∞ T 0
ACOE (Average Cost Optimality Equation):

min c(x, u) + Du h∗ (x) = η ∗
u

h∗ is the relative value function
10 / 26

Complex Networks

MDP model X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))

Du h (x) = E[h(X(1)) − h(X(0))]
= E[h(x + f (x, u, N ))] − h(x)

11 / 26

Complex Networks

MDP model X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))

Du h (x) = E[h(X(1)) − h(X(0))]
= E[h(x + f (x, u, N ))] − h(x)

T −1
∗ 1
η = inf lim c(X(t), U (t))
U T →∞ T
0

ACOE (Average Cost Optimality Equation):

min c(x, u) + Du h∗ (x) = η ∗
u

h∗ is the relative value function
11 / 26

Complex Networks

Approximate Dynamic Programming
ODE model from the MDP model, X(t + 1) − X(t) = f (X(t), U (t), N (t + 1))

Mean drift: f (x, u) = E[X(t + 1) − X(t) | X(t) = x, U (t) = u]

12 / 26

Complex Networks



Fluid Model: x(t) = f (x(t), u(t))
˙

12 / 26

Complex Networks



˙
First-order Taylor series approximation:

Du h (x) = E[h(x + f (x, u, N ))] − h(x)
≈ f (x, u) · h (x)

12 / 26

Complex Networks



˙
First-order Taylor series approximation:

Du h (x) = E[h(x + f (x, u, N ))] − h(x)
≈ f (x, u) · h (x)

A second-order Taylor series expansion
leads to a Diﬀusion Model.

12 / 26

Complex Networks

ADP for Stochastic Networks
Conclusions as of April 21, 2011

Stochastic Model: Q(t + 1) − Q(t) = B(t + 1)U (t) + A(t + 1)

d
Fluid Model: q(t) = Bu(t) + α Cost c(x, u) = |x|
dt
Relative value function h∗
Total cost value function J ∗

13 / 26

Complex Networks



d
dt

uncertain demand
+

demand 1
Convex relaxation
demand 2


13 / 26

Complex Networks



d
dt

Key conclusions – analytical
Stability of q implies stochastic stability of Q Dai, Dai & M. 1995

h∗ (x) ≈ J ∗ (x) for large |x| M. 1996–2011

In many cases, the translation of the optimal policy for q is
approximately optimal, with logarithmic regret M. 2005 & 2009

14 / 26

Complex Networks



d
dt

Key conclusions – engineering
Stability of q implies stochastic stability of Q
Simple decentralized policies based on q Tassiulas, 1995 –

Workload relaxation for model reduction
M. 2003 –, following “heavy traﬃc” theory: Laws, Kelly, Harrison, Dai, ...

Intuition regarding structure of good policies
15 / 26

Complex Networks

Workload Relaxations
R STO R∗
Inventory model: Controlled work-release, controlled routing, 50

uncertain demand
w2

demand 1

0

demand 2

-20

-20 0 50
w1

Workload process: W evolves on R2
Relaxation: Only lower bounds on rates are preserved
Eﬀective cost: c(w) is the minimum of c(x), over all x consistent w.
¯

16 / 26

Complex Networks

Workload Relaxations
R STO R∗

uncertain demand
w2

demand 1

0

demand 2

-20

-20 0 50
w1

Workload process: W evolves on R2
Relaxation: Only lower bounds on rates are preserved
Eﬀective cost: c(w) is the minimum of c(x), over all x consistent w.
¯
Optimal policy for ﬂuid relaxation: Non-idling on region R∗
Optimal policy for stochastic relaxation: Introduce hedging

16 / 26

Complex Networks

Policy translation
R STO R∗

uncertain demand
w2

demand 1

0

demand 2

-20

-20 0 50
w1

Complete Policy Synthesis
1. Optimal control of relaxation
2. Translation to physical system:
2a. Achieve the approximation c(Q(t)) ≈ c(W (t))
¯
2b. Address boundary constraints ignored in ﬂuid approximations

17 / 26

Complex Networks

Policy translation
R STO R∗

uncertain demand
w2

demand 1

0

demand 2

-20

-20 0 50
w1

Complete Policy Synthesis
1. Optimal control of relaxation
2. Translation to physical system:
2a. Achieve the approximation c(Q(t)) ≈ c(W (t))
¯
2b. Address boundary constraints ignored in ﬂuid approximations
achieved using safety stocks.

17 / 26

Architectures for Adaptation & Learning

Singular Perturbations
Mean-Field Games Workload Relaxations
1
(individual state)
(ensemble state)

q1 q5

q2 q6
Agent 5 q 13 q 15
0 barely controllable q3 q7

Station 1

Station 2
d1
q8
Agent 4
q 16 q 14 q4 q9 q 12
-1 4 d2
0 1 2 3 4 5 6 7 8 9 10 x 10
Station 5

q 11 µ 10a q 10
µ 10b
Station 4 Station 3

Fluid model R STO R∗
50
w2

12.6
Di usion model
Average
Cost
12.4
Standard VIA 1

Initialized with quadratic Optimal policy 0.06
12.2

Initialized with optimal uid value function
12 0.05 0

11.8 0.04

11.6 0
0.03 -20

11.4
-20 0 50
0.02
11.2 w1
0.01
11
50 100 150 200 250 300 Iteration n
−1
−1 0 1

Adaptation & Learning
18 / 26


Reinforcement Learning
Approximating a value function: Q-learning

ACOE Equation: min c(x, u) + Du h∗ (x) = η ∗
u
h∗ : Relative value function
η ∗ : Minimal average cost

19 / 26



u
“Q-function”: Q∗ (x, u) = c(x, u) + Du h∗ (x)
Watkins 1989 ... “Machine Intelligence Lab”@ece.uﬂ.edu

19 / 26



u

Q-Learning: Given parameterized family {Qθ : θ ∈ Rd }.
Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009

19 / 26



u

Q-Learning: Given parameterized family {Qθ : θ ∈ Rd }.
Qθ is an approximation of the Q-function, or Hamiltonian Mehta & M. 2009

Compute θ∗ based on observations — without using a system model.

19 / 26


Approximating a value function: TD-learning

Value functions: For a given policy U (t) = φ(X(t)),
T
1
η = lim c(X(t), U (t)) dt
T →∞ T 0
Poisson’s equation: h is again called a relative value function,

c(x, u) + Du h (x) =η
u=φ(x)

20 / 26



T
1
T →∞ T 0

c(x, u) + Du h (x) =η
u=φ(x)

TD-Learning: Given parameterized family {hθ : θ ∈ Rd }.

min{ h − hθ : θ ∈ Rd } Sutton 1988, Tsitsiklis & Van Roy, 1997

20 / 26



T
1
T →∞ T 0

c(x, u) + Du h (x) =η
u=φ(x)

TD-Learning: Given parameterized family {hθ : θ ∈ Rd }.

min{ h − hθ : θ ∈ Rd } Sutton 1988, Tsitsiklis & Van Roy, 1997

Compute θ∗ based on observations — without using a system model.

20 / 26


Approximating a value function: How do we choose a basis?

21 / 26



Basis selection: hθ (x) = θi ψi (x)
ψ1 : Linearize
ψ2 : Fluid model with relaxation
ψ3 : Diﬀusion model with relaxation
ψ4 : Mean-ﬁeld game

21 / 26



Basis selection: hθ (x) = θi ψi (x)
ψ1 : Linearize
ψ2 : Fluid model with relaxation
ψ3 : Diﬀusion model with relaxation
ψ4 : Mean-ﬁeld game

Examples: Decentralized control, nonlinear control, processor speed-scaling
1
1
Optimal policy 0.06 Approximate relative value function h
15 ∗
Fluid value function J
0.05
∗
Relative value function h
0.04
10
0
0
0.03

0.02
5
Agent 4 0.01

-1 −1
4 −1 0 1 0
0 5 10 x 10 0 5

Mean-Field Game Linearization Fluid Model

21 / 26

Next Steps

Nodal Power Prices in NZ: $/MWh
100
March 25:

50

0
4am 9am 2pm 7pm
Otahuhu
20,000
Stratford
March 26:

10,000

0 http://www.electricityinfo.co.nz/
4am 9am 2pm 7pm

Next Steps
22 / 26

Next Steps

Complex Systems
Mainly energy

23 / 26

Next Steps

Complex Systems
Mainly energy

Entropic Grid: Advances in systems theory...
Complex systems: Model reduction specialized to tomorrow’s grid
Short term operations and long-term planning
Resource allocation: Controlling supply, storage, and demand
Resource allocation with shared constraints.
Statistics and learning: For planning and forecasting
Both rare and common events

23 / 26

Next Steps

Complex Systems
Mainly energy

Entropic Grid: Advances in systems theory...
Complex systems: Model reduction specialized to tomorrow’s grid
Short term operations and long-term planning
Resource allocation: Controlling supply, storage, and demand
Resource allocation with shared constraints.
Statistics and learning: For planning and forecasting
Both rare and common events
Economics for an Entropic Grid: Incorporate dynamics and uncertainty
in a strategic setting.
How to create policies to protect participants on both sides of the
market, while creating incentives for R&D on renewable energy?

23 / 26

Next Steps

Complex Systems
Mainly energy

How to create policies to protect participants on both sides of the market,
while creating incentives for R&D on renewable energy?
Our community must consider long-term planning and policy, along with
traditional systems operations

24 / 26

Next Steps

Complex Systems
Mainly energy


Planning and Policy, includes Markets & Competition

24 / 26

Next Steps

Complex Systems
Mainly energy


Evolution?

24 / 26

Next Steps

Complex Systems
Mainly energy


Evolution? Too slow!

24 / 26

Next Steps

Complex Systems
Mainly energy


Evolution? Too slow!
What we need is Intelligent Design

24 / 26

Next Steps

Conclusions
The control community has created many techniques for understanding
complex systems, and a valuable philosophy for thinking about control
design

25 / 26

Next Steps

Conclusions
design
In particular, stylized models can have great value:
Insight in formulation of control policies
Analysis of closed loop behavior, such as stability via ODE methods
Architectures for learning algorithms
Building bridges between OR, CS, and control disciplines
The ideas surveyed here arose from partnerships with researchers in
mathematics, economics, computer science, and operations research.

25 / 26

Next Steps

Conclusions
design
In particular, stylized models can have great value:
Insight in formulation of control policies
Analysis of closed loop behavior, such as stability via ODE methods
Architectures for learning algorithms
Building bridges between OR, CS, and control disciplines
The ideas surveyed here arose from partnerships with researchers in
mathematics, economics, computer science, and operations research.

Besides the many technical open questions, my hope is to extend the
application of these ideas to long-range planning, especially in applications
to sustainable energy.
25 / 26

Next Steps

References

S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press,
Cambridge, 2007.
S. P. Meyn and R. L. Tweedie. Markov chains and stochastic stability. Second edition,
Cambridge University Press – Cambridge Mathematical Library, 2009.
S. Meyn. Stability and asymptotic optimality of generalized MaxWeight policies. SIAM J.
Control Optim., 47(6):3259–3294, 2009.
V. S. Borkar and S. P. Meyn. The ODE method for convergence of stochastic
approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447–469, 2000.
S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workload
relaxations. SIAM J. Control Optim., 42(1):178–217, 2003.
P. G. Mehta and S. P. Meyn. Q-learning and Pontryagin’s minimum principle. In Proc. of
the 48th IEEE Conf. on Dec. and Control, pp. 3598–3605, Dec. 2009.
W. Chen, D. Huang, A. A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, and
A. Wierman. Approximate dynamic programming using ﬂuid and diﬀusion approximations
with applications to power management. In Proc. of the 48th IEEE Conf. on Dec. and
Control, pp. 3575–3580, Dec. 2009.

26 / 26

Control Techniques for Complex Systems

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Control Techniques for Complex Systems

Ähnlich wie Control Techniques for Complex Systems (20)

Mehr von Sean Meyn

Mehr von Sean Meyn (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Control Techniques for Complex Systems