1. Theory of Repeated Games
Lecture Notes on Central Results
Yosuke YASUDA
Osaka University, Department of Economics
yasuda@econ.osaka-u.ac.jp
Last-Update: May 21, 2015
1 / 36
2. Announcement
Course Website: You can find my corse websites from the link below:
https://sites.google.com/site/yosukeyasuda2/home/lecture/repeated15
Textbook & Survey: MS is a comprehensive textbook on repeated
games, K and P are highly readable survey articles, which complement MS.
MS Mailath and Samuelson, Repeated Games and Reputations:
Long-run Relationships. 2006.
K Kandori, 2008.
P Pearce, 1992.
Symbols that we use in lectures:£
¢
¡Ex : Example,
§
¦
¤
¥
Fg : Figure,
§
¦
¤
¥
Q : Question,
£
¢
¡Rm : Remark.
2 / 36
3. Finitely Repeated Games (1)
A repeated game, a specific class of dynamic game, is a suitable
framework for studying the interaction between immediate gains and
long-term incentives, and for understanding how a reputation mechanism
can support cooperation.
Let G = {A1, ..., An; u1, ..., un} denote a static game in which players 1
through n simultaneously choose actions a1 through an from the action
spaces A1 through An, and the corresponding payoffs are u1(a1, ..., an)
through un(a1, ..., an).
Definition 1
The game G is called the stage game of the repeated game.
Given a stage game G, let G(T) denote the finitely repeated game in
which G is played T times, with the outcomes of all preceding plays
observed before the next play begins.
Assume that the payoff for G(T) is simply the sum of the payoffs
from the T stage games. (future payoffs are not discounted)
3 / 36
4. Finitely Repeated Games (2)
Theorem 2
If the stage game G has a unique Nash equilibrium, then, for any finite
T, the repeated game G(T) has a unique subgame perfect Nash
equilibrium: the Nash equilibrium of G is played in every stage
irrespective of the past history of the play.
Proof.
We can solve the game by backward induction, that is, starting from
the smallest subgame and going backward through the game.
In stage T, players choose a unique Nash equilibrium of G.
Given that, in stage T − 1, players again end up choosing the same
Nash equilibrium outcome, since no matter what they play in T − 1
the last stage game outcome will be unchanged.
This argument carries over backwards through stage 1, which
concludes that the unique Nash equilibrium outcome is played in
every stage (irrespective of the past history).
4 / 36
5. Finitely Repeated Games (3)
When there are more than one Nash equilibrium in a stage game,
multiple subgame perfect Nash equilibria may exist.
Furthermore, an action profile which does not constitute a stage
game Nash equilibrium may be sustained (for any period t < T) in a
subgame perfect Nash equilibrium.
§
¦
¤
¥
Q The following stage game will be played twice. Can players support
non-equilibrium outcome (M1, M2) in the first period?
1 2 L2 M2 R2
L1 1, 1 5, 0 0, 0
M1 0, 5 4, 4 0, 0
R1 0, 0 0, 0 3, 3
£
¢
¡Rm Note that there are two Nash equilibria in the stage game:
(L1, L2), (R1, R2): what players choose in the first period may result in
different outcomes (equilibria) in the second period.
5 / 36
6. Infinitely Repeated Games (1)
Even if the stage game has a unique Nash equilibrium, there may be
subgame perfect outcomes of the infinitely repeated game in which no
stage game’s outcome is a Nash equilibrium of G.
Let G(∞, δ) denote the infinitely repeated game in which G is
repeated forever and the players share the discount factor δ.
For each t, the outcomes of the t − 1 preceding plays of the stage
game are observed before the t-th stage begins.
Each player’s payoff in G(∞, δ) is the average payoff defined as
follows.
Definition 3
Given the discount factor δ, the average payoff of the infinite sequence
of payoffs u1
, u2
, ... is
(1 − δ)(u1
+ δu2
+ δ2
u3
+ · · · ) = (1 − δ)
∞
t=1
δt−1
ut
.
6 / 36
7. Infinitely Repeated Games (2)
There are a few important remarks:
The history of play through stage t is the record of the players’
choices in stages 1 through t.
The players might have chosen (as
1, ..., as
n) in stage s, where for each
player i the action as
i belongs to Ai.
In the finitely repeated game G(T) or the infinitely repeated game
G(∞, δ), a player’s strategy specifies the action that she will take in
each stage, for every possible history of play.
In the infinitely repeated game G(∞, δ), each subgame beginning at
any stage is identical to the original game.
In G(T), a subgame beginning at stage t + 1 is the repeated game in
which G is played T − t times, denoted by G(T − t).
In a repeated game, a Nash equilibrium is subgame perfect if the
players’ strategies constitute a Nash equilibrium in every subgame,
i.e., after every possible history of the play.
7 / 36
8. Unimprovability (1)
Definition 4
A strategy σi is called a perfect best response to the other players’
strategies, when player i has no incentive to deviate following any history.
Consider the following requirement that, at first glance, looks much
weaker than the perfect best response condition.
Definition 5
A strategy for i is unimprovable against a vector of strategies of her
opponents if there is no t − 1 period history (for any t) such that i could
profit by deviating from her strategy in period t only and conforming
thereafter (i.e., switching back to the original strategy).
To verify the unimprovability of a strategy, one needs to checks only
“one-shot” deviations from the strategy, rather than arbitrarily
complex deviations.
8 / 36
9. Unimprovability (2)
The following result simplifies the analysis of SPNE immensely.
It is the exact counterpart of a well-known result from dynamic
programming due to Howard (1960), and was first emphasized in
the context of self-enforcing cooperation by Abreu (1988).
Theorem 6
Let the payoffs of G be bounded. In the repeated game G(T) or
G(∞, δ), strategy σi is a perfect best response to a profile of strategies σ
if and only if σi is unimprovable against that profile.
The proof is simple, and generalizes easily to a wide variety of dynamic
and stochastic games with discounting and bounded payoffs.
9 / 36
10. Unimprovability (3)
Proof of ⇒ (Note ⇐ is trivial).
We will only show “⇒” since “⇐” is trivial. Consider the contrapositive,
i.e., not perfect best response ⇒ not umimprovable.
1 If σi is not a perfect best response, there must be a history after
which it is profitable to deviate to some other strategy.
2 Then, because of discounting and boundedness of payoffs, there
must exist a profitable deviation involves defection for finitely many
periods (and conforms to σi thereafter).
If the deviation involves defection at infinitely many nodes, then for
sufficiently large T, the strategy σi that agrees with σi until time T
and conforms to σ thereafter, is also a profitable deviation (because
of discounting and boundedness of payoffs).
3 Consider a profitable deviation involving defection at the smallest
possible number of period, denoted by T.
4 In such a profitable deviation, the player must be improvable (not
unimprobable) after deviating for T − 1 period.
10 / 36
11. Repeated Prisoner’s Dilemma (1)
§
¦
¤
¥
Q The following prisoner’s dilemma will be played infinitely many times.
Under what conditions of δ, can a SPNE support cooperation (C1, C2)?
1 2 C2 D2
C1 2, 2 -1, 3
D2 3, -1 0, 0
Suppose that player i plays Ci in the first stage. In the t-th stage, if the
outcome of all t − 1 preceding stages has been all (C1, C2) then play Ci;
otherwise, play Di (thereafter).
This strategy is called trigger strategy, because player i cooperates
until someone fails to cooperate, which triggers a switch to
noncooperation forever after.
If both players adopt this trigger strategy then the outcome of the
infinitely repeated game will be (C1, C2) in every stage.
11 / 36
12. Repeated Prisoner’s Dilemma (2)
To show that the trigger strategy is SPNE, we must verify that the
trigger strategies constitute a Nash equilibrium on every possible
subgame that could be generated in the infinitely repeated game.
£
¢
¡Rm Since every subgame of an infinitely repeated game is identical to
the game as a whole (thanks to its recursive structure), we have to
consider only two types of subgames: (i) subgame in which all the
outcomes of earlier stages have been (C1, C2), and (ii) subgames in
which the outcome of at least one earlier stage differs from (C1, C2).
By unimprovability, it is sufficient to show that there is no one-shot
profitable deviation in every possible history that can realize when
players follow the trigger strategies.
Players have no incentive to deviate in (ii) since trigger strategy
involves repeated play of one shot NE, (D1, D2).
12 / 36
13. Repeated Prisoner’s Dilemma (3)
The following condition guarantees that there will be no (one-shot)
profitable deviation in (i).
2 + δ × 2 + δ2
× 2 + · · · ≥ 3 + δ × 0 + δ2
× 0 + · · ·
⇐⇒ 2(δ + δ2
+ · · · ) ≥ 1
⇐⇒
2δ
1 − δ
≥ 1 ⇐⇒ δ ≥
1
3
.
Mutual cooperation (C1, C2) can be sustained as an SPNE outcome
by using the trigger strategy when players are long-sighted.
Trigger strategy (in repeated prisoner’s dilemma) is the severest
punishment, since each player receives her minmax payoff (in every
period) after deviation happens.
13 / 36
14. Folk Theorem: Preparation (1)
£
¢
¡Rm The following expositions are Fudenberg and Maskin (1986).
For each j, choose Mj
= (Mj
1 , . . . , Mj
n) so that
(Mj
1 , . . . , Mj
j−1, Mj
j+1, . . . , Mj
n) ∈ arg min
a−j
max
aj
uj(aj, a−j),
and player j’s reservation value is defined by
v∗
j := max
aj
ui(aj, Mj
−j) = ui(Mj
).
The strategies Mj
= (Mj
1 , . . . , Mj
j−1, Mj
j+1, . . . , Mj
n) are minimax
strategies (which may not be unique) against player j, and v∗
j is the
smallest payoff that the other players can keep player j below.
We refer to (v∗
1, . . . , v∗
n) as the minimax point.
14 / 36
15. Folk Theorem: Preparation (2)
Definition 7
Let V be the set of feasible payoffs, i.e., a convex hull of payoff vectors
u yielded by (pure) action profiles, and V ∗
(⊂ V ) be the set of feasible
payoffs that Pareto dominate the minimax point:
V ∗
= {(v1, . . . , vn) ∈ V |vi > 0 for all i}.
V ∗
is called the set of individually rational payoffs.
There are a couple of versions of folk theorem.
The name comes from the fact that the statement (relying on NE
rather than SPNE) was widely known among game theorists in the
1950s, even though no one had published it.
15 / 36
16. Folk Theorem (1)
Theorem 8 (Theorem A)
For any (v1, . . . , vn) ∈ V ∗
, if players discount the future sufficiently little,
there exists a Nash equilibrium of the infinitely repeated game where,
for all i, player i’s average payoff is vi.
If a player deviates, it may not be in others’ interest to go through with
the punishment of minimaxing him forever. However, Aumann and
Shapley (1976) and Rubinstein (1979) showed that, when there is
no discounting, the counterpart of Theorem A holds for SPNE.
Theorem 9 (Theorem B)
For any (v1, . . . , vn) ∈ V ∗
there exists a subgame perfect equilibrium
in the infinitely repeated game with no discounting, where, for all i,
player i’s expected payoff each period is vi.
16 / 36
17. Folk Theorem (2)
One well-known case that admits both discounting and simple strategies
is where the point to be sustained Pareto dominates the payoffs of a
Nash equilibrium of the constituent game G.
Theorem 10 (Theorem C)
Suppose (v1, . . . , vn) ∈ V ∗
Pareto dominates the payoffs (y1, . . . , yn) of
a (one-shot) Nash equilibrium (e1, . . . , en) of G. If players discount the
future sufficiently little, there exists a subgame perfect equilibrium of
the infinitely repeated game where, for all i, player i’s average payoff is vi.
Because the punishments used in Theorem C are less severe than
those in Theorems A and B, its conclusion is weaker.
For example, Theorem C does not allow us to conclude that a
Stackelberg outcome can be supported as an equilibrium in an
infinitely repeated quantity-setting duopoly.
17 / 36
18. General Falk Theorem — Two Players
Abreu (1988) shows that there is no loss in restricting attention to
simple punishments when players discount the future. Indeed, simple
punishments are employed in the proof of the following result.
Theorem 11 (Theorem 1)
For any (v1, v2) ∈ V ∗
there exists δ ∈ (0, 1) such that, for all δ ∈ (δ, 1),
there exists a subgame perfect equilibrium of the infinitely repeated
game in which player i’s average payoff is vi when players have discount
factor δ.
After a deviation by either player, the players (mutually) minimax
each other for a certain number of periods, after which they return
to the original path.
If a further deviation occurs during the punishment phase, the phase
is begun again.
18 / 36
19. General Falk Theorem — Three or More Players
The method we used to establish Theorem 1 –“mutual minimaxing”–
does not extend to three or more players.
Theorem 12 (Theorem 2)
Assume that the dimensionality of V ∗
equals n, the number of players,
i.e., that the interior of V (relative to n-dimensional space) is nonempty.
Then, for any (v1, . . . , vn) in V ∗
, there exists δ ∈ (0, 1) such that for all
δ ∈ (δ, 1) there exists a subgame perfect equilibrium of the infinitely
repeated game with discount factor δ in which player i’s average payoff is
vi.
If a player deviates, he is minimaxed by the other players long
enough to wipe out any gain from his deviation.
To induce the other players to go through with minimaxing him,
they are ultimately given a “reward” in the form of an additional ε
in their average payoff.
The possibility of providing such a reward relies on the full
dimensionality of the payoff set.
19 / 36
20. Imperfect Monitoring (1)
Perfect Monitoring: Players can fully observe the history of their past
play. There is no monitoring difficulty or imperfection.
Bounded/Imperfect Recall: Players forget (part of) the history of
their past play, especially that of distant past, as time goes by.
Imperfect Monitoring: Players cannot directly observe the (full) history
of their past play, but instead observe signals that depend on actions
taken in the previous period.
§
¦
¤
¥
Public Monitoring Players publicly observe a common signal.
§
¦
¤
¥
Private Monitoring Players privately receives different signals.
20 / 36
21. Imperfect Monitoring (2)
Punishment necessarily becomes indirectly linked with deviation.
Players can punish the deviator only in reaction to the common
signals, since they cannot observe deviation itself.
Even if no one has deviated, punishment is triggered when bad
signal realizes (with positive probability).
⇒ Constructing (efficient) punishment becomes dramatically difficult.
21 / 36
22. Example | Prisoner’s Dilemma (1)
Consider the following Prisoner’s Dilemma as a stage game while each
player cannot observe the rival’s past actions.
Table: Ex ante Payoffs ui(ai, a−i)
1 2 C D
C 2, 2 -1, 3
D 3, -1 0, 0
§
¦
¤
¥
Q Can each player deduce the rival’s action through the realized payoff
(and her own action) ?
If this is the case indeed, then observation cannot be imperfect...
22 / 36
23. Example | Prisoner’s Dilemma (2)
Player i’s payoff in each period depends only on her own action,
ai ∈ {C, D} and the public signal, y ∈ {g, b}, i.e., u∗
i (y, ai).
Table: Ex post Payoffs u∗
i (y, ai)
i y g b
C
3 − p − 2q
p − q
−
p + 2q
p − q
D
3(1 − r)
q − r
−
3r
q − r
p, q, r (0 < q, r < p < 1) are conditional probabilities that g realizes:
p = Pr{g|CC}, q = Pr{g|DC} = Pr{g|CD}, r = Pr{g|DD}.
23 / 36
24. Example | Prisoner’s Dilemma (3)
To achieve cooperation, consider the (modified) trigger strategies:
Play (C, C) in the first period.
Continue to play (C, C) as long as g keeps realized.
Play (D, D) forever once b is realized.
The above trigger strategies constitute an SPNE if and only if the
following condition is satisfied:
δ(3p − 2q) ≥ 1 ⇐⇒ δ ≥
1
3p − 2q
(7.2.4 in MS)
Then, symmetric equilibrium (average) payoff becomes
2(1 − δ)
1 − δp
, which
converges 0 as δ goes to 1.
24 / 36
25. General Model (1)
n (long-lived) players engage in an infinitely repeated game with discrete
time horizon (t = 0, 1, . . . ∞) whose stage game is defined as follows:
ai ∈ Ai: Player i’s action (Ai is assumed finite)
y ∈ Y : Public signal realizes at the end of each period (Y is finite)
ρ(y|a): Conditional probability function (assuming full-support)
ρ(y|α): Extension to mixed action profile α ∈ Πn
i=1∆(Ai)
Πi(α−i) := ρ(·|·, α−i): |Ai| × |Y | matrix.
u∗
i (y, ai): Player i’s ex post payoff
ui(a): Player i’s ex ante payoff, expressed by
ui(a) =
y∈Y
u∗
i (y, ai)ρ(y|a) (7.1.1 in MS)
V (δ): Set of equilibrium (PPE, defined later) payoff under δ
25 / 36
26. General Model (2)
In the repeated game (of imperfect public monitoring), the only public
information available in period t is the t-period history of public signals:
ht
:= (y0
, y1
, . . . , yt−1
).
The set of public histories is (Y 0
is empty, note h0
is not well-defined):
H := ∪∞
t=0Y t
A history for player i includes both the public history and the history of
actions that i has taken:
ht
i := (y0
, a0
i ; y1
, a1
i ; . . . ; yt−1
, at−1
i ).
The set of histories for player i is ((Y, Ai)0
is empty):
Hi := ∪∞
t=0(Ai × Y )t
26 / 36
27. Perfect Public Equilibrium (1)
A pure strategy for player i is a mapping from all possible histories into
the set of pure actions,
σi : Hi → Ai.
A mixed strategy is a mixture over pure strategies.
A behavior strategy is a mapping
σi : Hi → ∆(Ai).
Definition 13 (Def 7.1.1)
A behavior strategy σi is public if, in every period t, it depends only on
the public history ht
∈ Y t
and not on i’s private history. That is, for all
ht
i, ˆht
i ∈ Hi satisfying yτ
= ˆyτ
for all τ ≤ t − 1,
σi(ht
i) = σi(ˆht
i).
A behavior strategy σi is private if it is not public.
27 / 36
28. Perfect Public Equilibrium (2)
Definition 14 (Def 7.1.2)
Suppose Ai = Aj for all i and j. A public profile σ is strongly
symmetric if, for all public histories ht
, σi(ht
) = σj(ht
) for all i and j.
Definition 15 (Def 7.1.3)
A perfect public equilibrium (PPE) is a profile of public strategies σ
that for any public history ht
, specifies a Nash equilibrium for the
repeated game. A PPE is strict if each player strictly prefers his
equilibrium strategy to every other public strategy.
Lemma 16 (Lemma 7.1.1)
If all players other than i are playing a public strategy, then player i has a
public strategy as a best reply.
Therefore, every PPE is a sequential equilibrium.
28 / 36
29. Dynamic Programming Approach
1 Decomposition
Transforming a dynamic game into a static game.
In so doing, recursive structure and unimprovability play key roles.
2 Self-Generation
Useful property to characterize the set of equilibrium (PPE) payoffs.
Without (explicitly) solving a game, the set of equilibrium payoffs
can be fully and computationally identified.
29 / 36
30. Decomposition — Perfect Monitoring
A continuation payoff can be decomposed by a current period payoff and
future payoffs of the repeated game starting from the next period:
vi = (1 − δ)ui(a) + δγi(a) (1)
where γ : A → V (δ) (⊂ Rn
) assigns an equilibrium payoff vector to each
action profile and γi is i’s element (i’s assigned payoff).
Theorem 17
v is supported (as an average payoff) by an SPNE if and only if there
exist a mixed action profile α ∈ ∆(A) and γ : ∆(A) → V (δ) such that
∀i ∀ai ∈ Ai vi(α) = (1 − δ)ui(α) + δγi(α)
≥ (1 − δ)ui(ai, α−i) + δγi(ai, α−i)
30 / 36
31. Decomposition — Imperfect Monitoring
A continuation payoff can be decomposed by a current period payoff and
future payoffs of the repeated game starting from the next period:
vi = (1 − δ)ui(a) + δ
y∈Y
γi(y)ρ(y|a) (2)
where γ : Y → V (δ) (⊂ Rn
) assigns an equilibrium (PPE) payoff vector
to each public signal and γi is i’s element (i’s assigned payoff).
Theorem 18
v is supported (as an average payoff) by a PPE if and only if there exist a
mixed action profile α ∈ ∆(A) and γ : ∆(A) → V (δ) such that
∀i ∀ai ∈ Ai vi(α) = (1 − δ)ui(α) + δ
y∈Y
γi(y)ρ(y|α)
≥ (1 − δ)ui(ai, α−i) + δ
y∈Y
γi(y)ρ(y|ai, α−i)
31 / 36
32. Self-Generation (1)
What happens if the range of the mapping γ, V (δ) is replaced with an
arbitrary set W(⊂ Rn
) ?
Definition 19
Let B(W) be a set of vector w = (w1, . . . , wn) if there exist a mixed
action profile α ∈ ∆(A) and γ : ∆(A) → W such that
∀i ∀ai ∈ Ai wi(α) = (1 − δ)ui(α) + δ
y∈Y
γi(y)ρ(y|α)
≥ (1 − δ)ui(ai, α−i) + δ
y∈Y
γi(y)ρ(y|ai, α−i)
W is called self-generating (or self-enforceable) if W ⊆ B(W).
32 / 36
33. Self-Generation (2)
Theorem 20
The set of average payoffs in PPE is the fixed point of mapping B(·).
Theorem 21
If W ⊆ W , then B(W) ⊆ B(W ) must be satisfied.
Theorem 22
If W is self-generating, then the following holds:
W ⊆
∞
t=1
Bt
(W) ⊆ V (δ) (3)
If W is bounded and V (δ) ⊂ W, then
∞
t=1
Bt
(W) = V (δ) (4)
33 / 36
34. Folk Theorem by FLM (1994) (1)
Definition 23
The profile α has individual full rank for player i if Πi(α−i) has rank
equal to |Ai|, that is, the |Ai| vectors {ρ(·|ai, α−i)}ai∈Ai are linearly
independent. If this is so for every player i, α has individual full rank.
Note that if α has individual full rank, the number of observable
outcomes |Y | must be at least maxi |Ai|.
Definition 24
Profile α is pairwise-identifiable for players i and j if the rank of matrix
Πij(α) equals rank Πi(α−i) + Πj(α−j) − 1.
Definition 25
Profile α has pairwise full rank for players i and j if the matrix Πij(α)
has rank |Ai| + |Aj| − 1.
34 / 36
35. Folk Theorem by FLM (1994) (2)
Pairwise full rank on α (for players i and j) is actually the conjunction
of two weaker conditions, individual full rank and
pairwise-identifiablity (on α for i and j).
1 Pairwise full rank obviously implies individual full rank: incentives
can be designed to induce a player to choose a given action.
2 It also ensures pairwise-identifiablity: deviations by players i and j
are distinct in the sense that they induce different probability
distributions over public outcomes.
3 Thus, player i’s incentives can be designed without interfering with
those of player j.
35 / 36
36. Folk Theorem by FLM (1994) (3)
Theorem 26
Suppose that every pure action profile a has individual full rank and
either (i) for all pairs i and j, there exists a mixed action profile α that
has pairwise full rank for that pair, or (ii) every pure-action,
Pareto-efficient profile is pairwise-identifiable for all pairs of players,
holds. Let W be a smooth subset in the interior of V ∗
. Then there exists
δ < 1 such that, for all δ > δ, W ⊆ E(δ), i.e., each point in W
corresponds to a perfect public equilibrium payoff with discount factor δ.
The theorem applies only to interior points and so do not pertain to
payoffs on the efficient frontier.
This contrasts with the standard Folk Theorem for observable
actions, in which efficient payoffs can be exactly attained.
36 / 36