Markov Tutorial CDC Shanghai 2009

Lyapunov functions, value functions,
and performance bounds
Sean Meyn

Department of Electrical and Computer Engineering
University of Illinois
and the Coordinated Science Laboratory

Joint work with R. Tweedie, I. Kontoyiannis, and P. Mehta
Supported in part by NSF (ECS 05 23620, and prior funding), and AFOSR

Objectives

Nonlinear state space model ≡ (controlled) Markov process,
state process X

Typical form:

dX(t) = f (X(t), U (t)) dt + σ(X(t), U (t)) dW (t)
noise
control

Objectives

Nonlinear state space model ≡ (controlled) Markov process,
state process X
Typical form:
dX(t) = f (X(t), U (t)) dt + σ(X(t), U (t)) dW (t)
noise
control

Questions: For a given feedback law,
• Is the state process stable?
• Is the average cost finite? E[c(X(t), U (t))]
• Can we solve the DP equations? min c(x, u) + Du h∗ (x) = η ∗
u
• Can we approximate the average cost η∗? The value function h∗ ?

Outline

Markov Models P t (x, · ) − π f →0

sup Ex [SτC (f )] < ∞
C
π(f ) < ∞
Representations

Lyapunov Theory DV (x) ≤ −f (x) + bIC (x)

Conclusions

Notation

Markov chain: X = X(t) : t ≥ 0
Countable state space, X
Transition semigroup,
P t (x, y) = P X(s + t) = y X(s) = x , x, y ∈ X

Notation: Generators & Resolvents

Markov chain: X = X(t) : t ≥ 0
Countable state space, X
Transition semigroup,
P t (x, y) = P X(s + t) = y X(s) = x , x, y ∈ X

Generator: For some domain of functions h,
1
Dh (x) = lim E[h(X(s + t)) − h(X(s)) X(s) = x]
t→0 t

1 t
= lim (P h (x) − h(x))
t→0 t


1
t→0 t

1 t
= lim (P h (x) − h(x))
t→0 t

Rate matrix:
Dh (x) = Q(x, y)h(y) P t = eQt
y

α µ
Example: MM1 Queue


x + 1 Prob εα

Sample paths: X(t + ε) ≈ x − 1 Prob εµ


x Prob 1 − ε(α + µ)

Rate matrix:
 
−α α 0 0 0 0 ···
 µ −α − µ
 α 0 0 0 · · ·

 0
 µ −α − µ α 0 0 · · ·


Q= 0 0 µ −α − µ α 0 · · ·

 0
 0 0 µ −α − µ α · · ·

 0
 0 0 0 µ −α − µ · · ·

.
. .
. .
. .
. .
. .
.
. . . . . .

2 2
σW = 0 σW = 1

Example: O-U Model

Sample paths: dX(t) = AX(t) dt + B dW (t)
A n × n, Bn × 1, W standard BM

2
Generator: Dh (x) = (Ax)T h (x) + B T h (x)B

2 2
σW = 0 σW = 1

Example: O-U Model

Sample paths: dX(t) = AX(t) dt + B dW (t)
A n × n, Bn × 1, W standard BM

2
Generator: Dh (x) = (Ax)T h (x) + B T h (x)B

h quadratic, h(x) = 1 xT P x
2 h (x) = P x
2
h (x) = P

Dh (x) = 1 xT (P A + AT P )x + B T P B
2


1
t→0 t

1 t
= lim (P h (x) − h(x))
t→0 t

Rate matrix:
y

Resolvent:
∞
Rα = e−αt P t
0


1
t→0 t

1 t
= lim (P h (x) − h(x))
t→0 t

Rate matrix:
y

Resolvent: Resolvent equations:
∞
Rα = e−αt P t Rα = [ Iα − Q]−1
0
QRα = Rα Q = αRα − I


Motivation: Dynamic programming. For a cost function c,

hα (x) = Rα c (x) = Rα (x, y)c(y)
y∈X
∞
= eαt E[c(X(t)) X(0) = x] dt
0
Discounted-cost value function


Motivation: Dynamic programming. For a cost function c,

hα (x) = Rα c (x) = Rα (x, y)c(y)
y∈X
∞
= eαt E[c(X(t)) X(0) = x] dt
0
Discounted-cost value function

Resolvent equation = dynamic programming equation,

c + Dhα = αhα

Notation: Steady State Distribution

Invariant (probability) measure π: X is stationary. In particular,

X(t) ∼ π, t≥0

Notation: Steady State Distribution

Invariant (probability) measure π: X is stationary. In particular,

X(t) ∼ π, t≥0

Characterizations:

π(x)P t (x, y) = π(y)
x∈X

α π(x)Rα (x, y) = π(y), α > 0
x∈X

π(x)Q(x, y) = 0
x∈X y ∈X

Notation: Relative Value Function

Invariant measure π, cost function c , steady-state mean η

Relative value function:
∞
h(x) = E[c(X(t)) − η X(0) = x] dt
0

Notation: Relative Value Function

Invariant measure π, cost function c , steady-state mean η

Relative value function:
∞
h(x) = E[c(X(t)) − η X(0) = x] dt
0

Solution to Poisson’s equation (average-cost DP equation):

c + Dh = η

II
Representations
π ∝ ν[I − (R − s ⊗ ν)]−1

h = [I − (R − s ⊗ ν)]−1 c
˜

Irreducibility

ψ-Irreducibility:
ψ(y) > 0 =⇒ P X(t) reaches y X(0) = x > 0 all x

ψ(y) > 0 =⇒ R(x, y) > 0 all x

Small Functions and Small Measures

ψ-Irreducibility:

ψ(y) > 0 =⇒ R(x, y) > 0 all x

Small functions and measures: For a function s and probability ν,
R(x, y) ≥ s(x)ν(y), x, y ∈ X ∞
R= e−t P t dt
0


ψ-Irreducibility:

ψ(y) > 0 =⇒ R(x, y) > 0 all x

R(x, y) ≥ s(x)ν(y), x, y ∈ X
Resolvent dominates rank-one matrix, R=
∞
e−t P t dt
0
R ≥s⊗ν


ψ-Irreducibility:

ψ(y) > 0 =⇒ R(x, y) > 0 all x

R(x, y) ≥ s(x)ν(y), x, y ∈ X
Resolvent dominates rank-one matrix, R=
∞
e−t P t dt
0
R ≥s⊗ν
ψ-Irreducibility justi es assumption: s(x) > 0 for all x
and WLOG, ν = δx ∗ , where ψ(x∗ ) > 0

α µ
Example: MM1 Queue

R(x, y) > 0 for all x and y (irreducible in usual sense)

Conclusion:
R(x, y ) ≥ s(x)ν(y )

where s(x) := R(x, 0)
ν := δ0

2 2
σW = 0 σW = 1

Example: O-U Model
dX(t) = AX(t) dt + B dW (t)

R(0, . ) Gaussian
Full rank if and only if (A, B) is controllable.

Conclusion: Under controllability, for any m, there is ε s.t.,

R(x, A) ≥ s(x)ν(A) all x and A

where s(x) = ε I x ≤m
ν(A) uniform on x ≤m

Potential Matrix

∞
Potential matrix: G(x, y) = (R − s ⊗ ν)n (x, y)
n=0

G = [I − (R − s ⊗ ν)]−1

Representation of π

∞
n=0

π ∝ νG
νG (y) = ν(x)G(x, y)
x∈X

Representation of h

∞
n=0

h = RG c
˜ + constant

c(x) = c(x) − η
˜
G˜ (y) =
c G(x, y)˜(y)
c
η= π(x)c(x) y∈X
y∈X

Representation of h

∞
n=0

h = RG c
˜ + constant

c(x) = c(x) − η
˜
G˜ (y) =
c G(x, y)˜(y)
c
η= π(x)c(x) y∈X
y∈X

If sum converges, then Poisson’s equation is solved:
c(x) + Dh (x) = η

III
Lyapunov Theory
P n (x, · ) − π f →0

C
π(f ) < ∞

∆V (x) ≤ −f (x) + bIC (x)

Lyapunov Functions

DV ≤ −g + bs

Lyapunov Functions

DV ≤ −g + bs

General assumptions: V : X → (0,∞)
g : X → [1, ∞)
b < ∞, s small
e.g., s (x) = IC (x), C nite

Lyapunov Bounds on G DV ≤ −g + bs

Resolvent equation gives RV − V ≤ −Rg + bRs


Since s⊗ν is non-negative,
−[I − (R − s ⊗ ν)]V ≤ RV − V ≤ −Rg + bRs

G−1


−[I − (R − s ⊗ ν)]V ≤ RV − V ≤ −Rg + bRs

G−1

More positivity, V ≥ GRg − bGRs

Some algebra, GR = G(R − s ⊗ ν) + (Gs) ⊗ ν ≥ G − I
Gs ≤ 1


−[I − (R − s ⊗ ν)]V ≤ RV − V ≤ −Rg + bRs

G−1

More positivity, V ≥ GRg − bGRs

Some algebra, GR = G(R − s ⊗ ν) + (Gs) ⊗ ν ≥ G − I
Gs ≤ 1

General bound: GRg ≤ V + 2b
Gg ≤ V + g + 2b

Existence of π DV ≤ −g + bs

Condition (V2) DV ≤ −1 + bs

Representation: π ∝ νG

Bound: GRg ≤ V + 2b =⇒ G(x, X) ≤ V (x) + 2b

Conclusion: π exists as a probability measure on X

Existence of moments DV ≤ −g + bs

Condition (V3) DV ≤ −g + bs


Bound: Gg ≤ V + g + 2b

Existence of moments DV ≤ −g + bs



Bound: Gg ≤ V + g + 2b

Conclusion: π exists as a probability measure on X
and the steady-state mean is nite,

π(g) := π(x)g(x) ≤ b
x∈X

α α µ
Example: MM1 Queue ρ =
µ

Linear Lyapunov function, V (x) = x
∞
DV (x) = Q(x, y)y
y=0

= α(x + 1) + µ(x − 1) − (α + µ)x

= −(µ − α) x>0

Conclusion: (V2) holds if and only if ρ < 1

α α µ
µ

QuadraticLyapunov function, V (x) = x 2
∞
DV (x) = Q(x, y)y 2
y=0

= α(x + 1)2 + µ(x − 1)2 − (α + µ)x2
= α(x2 + 2x + 1) + µ(x2 − 2x + 1)2 − (α + µ)x2
= −2(µ − α)x + α + µ

Conclusion: (V3) holds, g(x) = 1 + x
if and only if ρ < 1

2 2
σW = 0 σW = 1

Example: O-U Model

2 h (x) = P x
2
h (x) = P

2

Suppose that P > 0 solves the Lyapunov equation,

P A + AT P = -I

2 2
σW = 0 σW = 1

Example: O-U Model

2 h (x) = P x
2
h (x) = P

2


P A + AT P = -I

Then (V3) follows from the identity,
2 2 2
Dh (x) = − 1 x
2 + σX , σX = B T P B

2 2
σW = 0 σW = 1

Example: O-U Model

The function h(x) = 1 xT P x solves Poisson’s equation,
2

1 2
Dh = −g + η g(x) = 2 x
2
η = σX


P A + AT P = -I

Then (V3) follows from the identity,
2 2 2
Dh (x) = − 1 x
2 + σX , σX = B T P B

Poisson’s Equation DV ≤ −g + bs


Representation: h = RG c
˜ + constant c(x) = c(x) − η
˜

Bound: RGg ≤ V + 2b =⇒ RGg (x) ≤ V (x) + 2b

Poisson’s Equation DV ≤ −g + bs


Representation: h = RG c
˜ + constant c(x) = c(x) − η
˜

Bound: RGg ≤ V + 2b =⇒ RGg (x) ≤ V (x) + 2b

Conclusion: If c is bounded by g, then h is bounded,

h(x) ≤ V (x) + 2b

α α µ
µ

Poisson’s equation with g (x) = x

Dh = −g + η

We have (V3) with V a quadratic function of x:

Recall, with h (x) = x 2

Dh (x) = −2(µ − α)x + α + µ x>0

α α µ
µ

Poisson’s equation with g (x) = x

Dh = −g + η
Solved with

x2 + x ρ
h(x) = 1
2 µ−α η=
1−ρ

P t (x, · ) − π f →0

C
Final words

π(f ) < ∞
DV (x) ≤ −f (x) + bIC (x)

Just as in linear systems theory, Lyapunov functions
provide a characterization of system properties, as well
as a practical verification tool

P t (x, · ) − π f →0

C
Final words

π(f ) < ∞
DV (x) ≤ −f (x) + bIC (x)

Just as in linear systems theory, Lyapunov functions
provide a characterization of system properties, as well
as a practical verification tool

Much is left out of this survey - in particular,

• Converse theory
• Limit theory
• Approximation techniques to construct Lyapunov functions
or approximations to value functions
• Application to controlled Markov processes, and
approximate dynamic programming

References
[1,4] ψ-Irreducible foundations
[2,11,12,13] Mean- eld models, ODE models, and Lyapunov functions
[1,4,5,9,10] Operator-theoretic methods. See also appendix of [2]
[3,6,7,10] Generators and continuous time models

[1] S. P. Meyn and R. L. Tweedie. Markov chains and stochastic [9] I. Kontoyiannis and S. P. Meyn. Spectral theory and limit
stability. Cambridge University Press, Cambridge, second theorems for geometrically ergodic Markov processes. Ann.
edition, 2009. Published in the Cambridge Mathematical Appl. Probab., 13:304–362, 2003. Presented at the INFORMS
Library. Applied Probability Conference, NYC, July, 2001.
[2] S. P. Meyn. Control Techniques for Complex Networks. Cam- [10] I. Kontoyiannis and S. P. Meyn. Large deviations asymptotics
bridge University Press, Cambridge, 2007. Pre-publication and the spectral theory of multiplicatively regular Markov
edition online: http://black.csl.uiuc.edu/˜meyn. processes. Electron. J. Probab., 10(3):61–123 (electronic),
[3] S. N. Ethier and T. G. Kurtz. Markov Processes : Charac- 2005.
terization and Convergence. John Wiley & Sons, New York, [11] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu,
1986. P. Mehta, S. Meyn, and A. Wierman. Approximate dynamic
[4] E. Nummelin. General Irreducible Markov Chains and Non- programming using ﬂuid and diffusion approximations with
negative Operators. Cambridge University Press, Cambridge, applications to power management. Accepted for inclusion in
1984. the 48th IEEE Conference on Decision and Control, December
[5] S. P. Meyn and R. L. Tweedie. Generalized resolvents 16-18 2009.
and Harris recurrence of Markov processes. Contemporary [12] P. Mehta and S. Meyn. Q-learning and Pontryagin’s Minimum
Mathematics, 149:227–250, 1993. Principle. Accepted for inclusion in the 48th IEEE Conference
[6] S. P. Meyn and R. L. Tweedie. Stability of Markovian on Decision and Control, December 16-18 2009.
processes III: Foster-Lyapunov criteria for continuous time [13] G. Fort, S. Meyn, E. Moulines, and P. Priouret. ODE
processes. Adv. Appl. Probab., 25:518–548, 1993. methods for skip-free Markov chain stability with applications
[7] D. Down, S. P. Meyn, and R. L. Tweedie. Exponential to MCMC. Ann. Appl. Probab., 18(2):664–707, 2008.
and uniform ergodicity of Markov processes. Ann. Probab.,
23(4):1671–1691, 1995.
[8] P. W. Glynn and S. P. Meyn. A Liapounov bound for solutions
of the Poisson equation. Ann. Probab., 24(2):916–931, 1996.

See also earlier seminal work by Hordijk, Tweedie, ... full references in [1].

Markov Tutorial CDC Shanghai 2009

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (9)

Ähnlich wie Markov Tutorial CDC Shanghai 2009

Ähnlich wie Markov Tutorial CDC Shanghai 2009 (20)

Mehr von Sean Meyn

Mehr von Sean Meyn (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Markov Tutorial CDC Shanghai 2009