Anzeige

Bellman equations and dynamic programmingSuppose we are in an MD.pdf

aminaENT
1. Apr 2023
Bellman equations and dynamic programmingSuppose we are in an MD.pdf
Bellman equations and dynamic programmingSuppose we are in an MD.pdf
Bellman equations and dynamic programmingSuppose we are in an MD.pdf
Nächste SlideShare
Cs229 notes12Cs229 notes12
Wird geladen in ... 3
1 von 3
Anzeige

Más contenido relacionado

Más de aminaENT(20)

Anzeige

Bellman equations and dynamic programmingSuppose we are in an MD.pdf

  1. Bellman equations and dynamic programming Suppose we are in an MDP and the reward function has the structure r(s, a) = r1(s, a) + r2(s, a). In this question, we investigate whether we could solve the problems defined by reward functions r1 and r2 independently and then somehow combine them in order to solve the problem defined by r. We are using the discounted setting (a) [10 points] Suppose you are given the action-value functions q 1 and q 2 corresponding to the action-value function of an arbitrary, fixed policy under the two reward functions. Using the Bellman equation, explain if it is possible or not to combine these value functions in a simple manner to obtain q pi corresponding to reward function r. (b) [10 points] Suppose you are given the optimal action-value functions q 1 and q 2 . Using the Bellman equation, explain if it is possible or not to combine these value functions in a simple manner to obtain q which optimizes reward function r (c) [10 points] Suppose you are given the optimal policies 1 and 2
  2. . Explain if it is possible or not to combine these policies in a simple manner to obtain policy which optimizes reward function r. Bellman equations and dynamic programming Suppose we are in an MDP and the reward function has the structure r(s, a) = r1(s, a) + r2(s, a). In this question, we investigate whether we could solve the problems defined by reward functions r1 and r2 independently and then somehow combine them in order to solve the problem defined by r. We are using the discounted setting (a) [10 points] Suppose you are given the action-value functions q 1 and q 2 corresponding to the action-value function of an arbitrary, fixed policy under the two reward functions. Using the Bellman equation, explain if it is possible or not to combine these value functions in a simple manner to obtain q pi corresponding to reward function r. (b) [10 points] Suppose you are given the optimal action-value functions q 1 and q 2 . Using the Bellman equation, explain if it is possible or not to combine these value functions in a simple manner to obtain q which optimizes reward function r (c) [10 points] Suppose you are given the optimal policies 1 and
  3. 2 . Explain if it is possible or not to combine these policies in a simple manner to obtain policy which optimizes reward function r.
Anzeige