Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Planning Under Uncertainty With Markov Decision Processes
1. Planning under Uncertainty with Markov Decision Processes: Lecture II Craig Boutilier Department of Computer Science University of Toronto
2.
3.
4. Dimensions of Abstraction (recap) A B C A B C A B C A B C A B C A B C A B C A B C A A B C A B A B C A B C = Uniform Nonuniform Exact Approximate Adaptive Fixed 5.3 5.3 5.3 5.3 2.9 2.9 9.3 9.3 5.3 5.2 5.5 5.3 2.9 2.7 9.3 9.0
5.
6.
7.
8.
9.
10.
11.
12.
13.
14. Structured Policy and Value Function DelC BuyC GetU Noop U R W Loc Go Loc HCR HCU 8.36 8.45 7.45 U R W 6.81 7.64 6.64 U R W 5.62 6.19 5.19 U R W 6.10 6.83 5.83 U R W HCR HCU 9.00 W 10.00 Loc Loc
15.
16. A Simple Action/Reward Example X Y Z X Y Z X Y 0.9 0.0 X 1.0 0.0 1.0 Y Z 0.9 0.0 1.0 Z 10 0 Network Rep’n for Action A Reward Function R
17. Example: Generation of V 1 V 0 = R Z 0 10 Y Z Z: 0.9 Z: 0.0 Z: 1.0 Step 1 Y Z 9.0 0.0 10.0 Step 2 Y Z 8.1 0.0 19.0 Step 3: V 1
18. Example: Generation of V 2 Y Z 8.1 0.0 19.0 V 1 Step 1 Step 2 Y X Y Z Y: 0.9 Z: 0.9 Y: 0.9 Z: 0.0 Y:0.9 Z: 1.0 Z Y: 1.0 Y: 0.0 Z: 0.0 Y:0.0 Z: 1.0 X Y Y: 0.9 Y: 0.0 Y: 1.0
20. A Bad Example for SPUDD/SPI Action a k makes X k true; makes X 1 ... X k-1 false; requires X 1 ... X k-1 true Reward: 10 if all X 1 ... X n true (Value function for n = 3 is shown)
22. A Good Example for SPUDD/SPI Action a k makes X k true; requires X 1 ... X k-1 true Reward: 10 if all X 1 ... X n true (Value function for n = 3 is shown)
26. A Pruned Value ADD 8.36 8.45 7.45 U R W 6.81 7.64 6.64 U R W 5.62 6.19 5.19 U R W HCR HCU 9.00 W 10.00 Loc [7.45, 8.45] Loc HCR HCU [9.00, 10.00] [6.64, 7.64] [5.19, 6.19]